- imbecileParticipantDecember 30, 2013 at 7:06 amPost count: 48
There were no dedicated talks about it yet, but it has been mentioned several times that there is a module load library that converts generic Mill bit streams (that’s supposedly very similar to llvm bit streams) into Mill family member specific binary code.
I’m speculating here but this library probably would tie into the exec() sytstem call and could be implemented as a binary load module in Linux.
The result would be making exec() (or the eqivalents on other OSs) a lot more expensive and complex. Maybe not as bad as the notorious long Java and .Net load times, but still.
You could alleviate a lot of this with smart caching and and even more added complexity, but the root problem is still there: an operation that supposedly needs to be only done once at install time is done on every module load.
I guess it’s clear where I’m going with this: Why is it load time (or maybe even just-in-time), and not install time. It’s kind of making the same general historical mistakes that the Mill architecture tries to do away with on the machine level on the OS level: continuously re-computing at run time what is clear at compile time.
In general I do consider JIT or load time assembly to be crutches to work around historical system interface/ABI limitations. Since they need to be fast at run/load time they will always limit what can be done in code optimizations and still never be as fast as simple mmaps or deflates of binaries into memory.
There could be a need to occasionally re-assemble the true binaries when new profiling data from the real usage patterns suggests there good gains to be made from it. But once you install a piece of software the actual hardware environment and compilation target it will run on is clear.
And it’s not like you need to implement and carry around all different code paths in the final binaries for different family member processors as is done extensively on closed source application on windows or macs. That’s what the binary stream distribution format is for to take care of among other things.
To go a little more into the philosophical sphere and probably completely off topic:
Software installation is probably the single most important and central operation for any OS. It’s done rarely and usually it is expected to disrupt normal work flows. But it is the point where your system acttually becomes useful to do the things you wanna do, where all necessary information to run code in a specific system comes together. It is also the point where any security violations or malicious code or incompatibilites enter the system.
Yet software installation generally is a very neglected operation on any operation system. It generally is a collection of ad-hoc scripts, even 3rd party scripts, that copy stuff without looking into it too much.
tldr; Don’t redo work thousands or millions of times semi-adequately that needs to be done right only once.
- Joe TaberParticipantDecember 30, 2013 at 12:08 pmPost count: 25
I don’t think they ever explicitly ruled out caching or pre-converting the binaries. And the module load library sounds like a generic conversion tool wrapped by an easy to use load module, not some inflexible black box.
This is squarely an OS problem. The OS itself will have to make tradeoffs between startup time, install time, disk usage, portability, and cache complexity with regards to member-conversion. And it will use the mill conversion tool as it chooses between them.
From a “just get something working” developer’s standpoint, wrapping the conversion tool by a load module is the easiest way to plug the mill into an existing OS without rewriting or authoring new major subsystems.
So, just chill and be patient. 🙂
- imbecileParticipantDecember 30, 2013 at 4:35 pmPost count: 48
All good points. Just writing about something that’s bothering me.
This module load/assembly library is an essential part of the whole architecture. And the only ones who have something like this are IBM in their mainframe systems. And they have full vertical integration and vast resources and a very specialized market.
Getting something like this to work and accepted in a diverse ecosystem with many more or less independent parties will be one of the crucial details.
This is code that will have to be on every system in a very pivotal position. So that better be fully open source with a liberal license and very adaptable. Neither Microsoft and Apple nor the Linux and BSD people would accept anything less and have a piece of code that they have no control over in the center of their systems.
- Will_EdwardsModeratorDecember 31, 2013 at 1:19 amPost count: 98
I see a couple of really interesting points raised here.
First, install-time vs load-time translation.
In the first talk on Instruction Encoding Ivan says that they do the translation at install time (at the 1 hour mark; Youtube’s transcript search feature helped me track it down; hope all the videos get transcripts eventually). He describes it in terms of the IBM mainframes too, as mentioned above.
They do have a lib to make the translation available at run-time too, e.g. for JIT, so I guess OS integration can pick and choose when to do it and if to persist it.
Secondly, using profiling data to retranslate the binary.
The Mill does actually take a small step on a similar path to this. In the talk on Prediction, its described how the branch predictor loads predictions from previous runs, and updates those predictions.
Presumably as the tooling improves, all the already-translated bitstreams can be retranslated with better optimisations via auto-upgrade built into your favourite OS/distro.
This isn’t all the way to annotating apps to gather profiling and then recompiling, but you could imagine this being an option in the overall toolchain. Its just a software thing, seems quite doable.
I hope I’ve tracked down the most relevant quotes; I have no more to go on than just what has been said in the published talks.
- igodardModeratorJanuary 1, 2014 at 7:30 pmPost count: 9
You guys got it right. The tool that produces binary is called the Specializer. It takes in serialized compiler internal representation and produces load modules or in-memory function bodies. The basic tool is an API, plus some wrappers to apply in different circumstances.
We expect that the normal case will Specialize at install time. However, the load module can cache several different specializations. If you (for example) upgrade your CPU chip to a different Mill family member then when you first run the program the loader will discover that there’s no specialization for the current host in the load module, and will do a load-time specialization on the fly; it’s much like load-time dynamic linking. The new specialization will then (assuming suitable permissions) will be cached back into the load module, so the next time the program is run the loader finds the desired specialization.
It is also possible to Specialize for an explicit target rather than for the current host. This is used to e.g. create a ROM for a different machine.
In general we do not expect to re-Specialize automatically based on the accumulated branch prediction information or other profile info; that would be an explicit manual step, or be under control of a higher-level framework such as an IDE. It’s not clear that respecializing would buy that much; code selection has already been done by the time that the Specializer gets at it, and operation scheduling (with a few exceptions, such as latency distribution in cascaded loads) does not appear to benefit much from profiles. However, the Specializer is also responsible for the layout of code in memory, so a profile could lead to improved cache behavior.
You must be logged in to reply to this topic.