You guys got it right. The tool that produces binary is called the Specializer. It takes in serialized compiler internal representation and produces load modules or in-memory function bodies. The basic tool is an API, plus some wrappers to apply in different circumstances.
We expect that the normal case will Specialize at install time. However, the load module can cache several different specializations. If you (for example) upgrade your CPU chip to a different Mill family member then when you first run the program the loader will discover that there’s no specialization for the current host in the load module, and will do a load-time specialization on the fly; it’s much like load-time dynamic linking. The new specialization will then (assuming suitable permissions) will be cached back into the load module, so the next time the program is run the loader finds the desired specialization.
It is also possible to Specialize for an explicit target rather than for the current host. This is used to e.g. create a ROM for a different machine.
In general we do not expect to re-Specialize automatically based on the accumulated branch prediction information or other profile info; that would be an explicit manual step, or be under control of a higher-level framework such as an IDE. It’s not clear that respecializing would buy that much; code selection has already been done by the time that the Specializer gets at it, and operation scheduling (with a few exceptions, such as latency distribution in cascaded loads) does not appear to benefit much from profiles. However, the Specializer is also responsible for the layout of code in memory, so a profile could lead to improved cache behavior.