- anselmschuelerParticipantJune 16, 2023 at 4:49 pmPost count: 3
Hi, I have a few concerns about the logistics and systems around binary distribution on a Mill.
1. You describe that Mill programs are compiled to an abstract form and then specialized
for the specific Mill family member that they need to be run on.
What format do you think application vendors and redistributors will distribute their packages in?
Do you believe that some will try to distribute pre-specialized binaries, perhaps in a hybrid format?
2. You describe that Mill programs dynamically analyze their branch prediction and are able
to write back the predictions. This, along with specialization, require either modifying the state
of the system or giving up performance. What about immutable systems or systems that regularly restore
from an immutable source (examples would be containers, NixOS/Guix)? Could this be sped up by
pre-specialized and/or pre-analyzed binaries, perhaps in a hybrid form? Would it be possible to separate
the stateful part of the binary into a separate non-system location and load from/store to there on entry/exit?
3. Could the dynamic branch prediction analysis be communicated back to the vendor using some sort of
telemetry system or aggregated in some cloud store so other users can benefit?
3. You say you want to include all operations that any Mill supports in the abstract assembly language
and substitute emulation. After you have released your first Mill, any subsequent Mills that extend the
operation set will require versioning the specializer. Do you have any specific versioning policy in mind?
Do you anticipate incompatibilities from newer operations being used in shipped code that is specialized
by an older specializer that does not know of the new operations yet? Could this be mitigated by shipping
hybrid or pre-specialized binaries?
I understand that you are not yet at the stage where these questions are particularly relevant but I would like to know if you have considered them before and if you have any ideas pertaining to them.
- FindecanorParticipantJune 17, 2023 at 12:40 pmPost count: 31
3. Could the dynamic branch prediction analysis be communicated back to the vendor using some sort of telemetry system or aggregated in some cloud store so other users can benefit?
I think you would meet a lot of resistance against such telemetry, citing reasons of security.
- Ivan GodardKeymasterJune 21, 2023 at 6:59 amPost count: 689
1) App distribution in binary vs. IR:
Certain parts of the software, such as the minimal BIOS and the boot specializer, must be distributed in binary. These will normally reside on a ROM, and the distribution of changes likely involves reflashing the ROM; vendors other than hardware manufacturers will never need to do this. Kernel vendors will distribute in IR form using the minimal canonical IR which is acceptable to the specializer in the ROM. Included in the kernel package will be a more feature-full app-IR specializer in ROM-IR form. That will be translated to native binary, and then run through itself so that the installed app-specializer has all the app-level features for its own work of translating app-IR to binary. Cascaded compiling-the-compiler is routine for cross-compilation, which is really what a Mill install is and does.
Other than updating the boot ROM there is really no reason to distribute pre-specialized code in binary. Even assembler-like code can be represented in Mill IR as intrinsics. Now it is possible to have Mill configs with extra nonstandard instructions, and code that uses those won’t run if the instruction doesn’t exist in the target config at hand. But you’d still use IR for it – and get a specializer error if the requisite intrinsic is not found.
2) Dynamic exit prediction update in read-only systems:
Prediction update is just an optimization: it starts the predictor table off with the state from prior runs instead of empty. If the load module cannot be written then the optimization doesn’t happen. Conceivably the vendor could build a table using a mutable load module and then distribute the module as binary. The gain from the optimization is unlikely to justify the nuisance of maintaining multiple binaries for the different members.
3) Sharing predictor history:
Certainly predictor history could be separated from the binary code that uses/updates it. However, the history is tied to a particular config just like the specialized code is, so you’d have the administrative booking problem of making sure that both addressed the same config.
3 redux) Varying config binary ISAs:
It’s common that configs lack some instructions that others have. For example, some of our test configs lack floating-point or quad (128-bit) data forms. The specializer still recognizes these in the IR, and generates calls on emulation routines, which are often inlined. The substitution is automatic – the install provides signature info and the corresponding routine for everything potentially in the IR.
The sig/emu info is tied to an IR level. If the IR changes to a new release then the installation must be upgraded with info to match. If you present an object module that uses IR12 but only IR9 is installed then you’ll get an error in the app install. It doesn’t matterwhether the actual hardware has the instructions: the specializer knows what the hardware can do, so it uses hardware if possible and emu otherwise. The IR install may provide emu routines for instructions that the particular hardware actually has; the hardware will be used.
- anselmschuelerParticipantJune 22, 2023 at 5:17 amPost count: 3
Thanks! One question: Is it possible to split the dynamic exit prediction from the load module so it can be stored in a location that is mutable even if the load module itself isn’t, and then join them on execution again?
- This reply was modified 8 months, 1 week ago by anselmschueler. Reason: fix? markup
You must be logged in to reply to this topic.