Mill Computing, Inc

Participant

January 9, 2021 at 5:13 am

Post count: 33

About qemu on Mill. Obviously it would be trivial to run qemu or bochs on Mill, and it will probably compile out of the box with zero changes. However, qemu is not designed for emulation speed. It does only JIT (so no AoT), takes a long time startup and warmup, consume memory for both source and translated code, and the generated (JITed) code is of very poor quality (like 5 to 10 times worse than what normal compiler generate for the original code). There is very little optimizations in qemu to make JIT-ed code fast, only some minor things, like patching direct and indirect jumps, removing some condition code checks, but no code motions, no control flow recovery, no advanced register re-allocator, no instruction reordering, etc. The purpose of that qemu emulator code (tcg) is to be only reasonably fast, and VERY portable (tcg virtual machine has I think only 3 registers for example, which means you underutilize a lot of hardware, and loose data flow information, and add a lot of extra moves to memory, sure, it can be improved or recovered back, but again that is slow). So it will run on Mill, just like it runs on 20 other architectures. But don’t expect magic in terms of speed even on Mill.

Valgrind is extremely slow. It purpose is debugging, not speed.

There are other binary translation projects, but most of them don’t focus on speed or cross-emulation, more like changing (including runtime optimization) native binary on the fly for some purposes.

Writing a proper translator (that could be later integrated into qemu) is obviously possible, and there were many hybrid optimizing AoT/JIT in the world that showed that one can achieve very good results. See FX!32, Rosetta 1, Rosetta 2, box86. Microsoft has also pretty decent x86-arm dynrec.

It would be much better to reuse qemu where it makes sense (Linux virtio, chipset, usb, networking, storage, etc), but write specialized JIT module, or optimize a lot out of the tcg in qemu. Some target pairs in qemu do have some extra non-generic optimisations, so that is totally doable, to for example write amd64 to mill specific code.

However, at the end of a day, is it really that important?

If Mill is 10 times more efficient and faster, then for a lot of applications you don’t really need any binary translation, because you can just compile things for optimal performance. And well, you would want that to actually consider Mill anyway, because otherwise you are wasting hardware potential. That is why Mill is targeting generic server workloads, a bit of HPC maybish, and some generic Linux stuff. 99% of interesting workloads are developed in-house, so can be recompiled, or are open source, so also can be recompiled. Will you be able to run Oracle or Microsoft databases on Mill, probably not initially. Once the hardware is out, either important software will be ported (just look at how quickly Apple M1 got adopted by software developers, and already thousands of proprietary programs are ported to use M1 natively – all because, a) the hardware is fast, so there is incentive to do so, b) hardware is accessible to developers easily). Or open source community will do dynrec, or two or three. Mill Computing, Inc. doesn’t have resources of Apple to do it on their own on the product launch. Damn, even IBM doesn’t have resources to do that with their POWER and s390x systems.

Reply To: Is binary Translating i386/x86_64 to mill code practical?