– 9) Profile-guided optimization
The tool chain does not yet have PGO support, so this answer is speculative. Many, perhaps nearly all PGO optimizations are counter-productive on a Mill. Thus for example unrolling loops prevents software pipelining and forces spill/fill of transients that otherwise would live only on the belt for their lifetime; shut unrolling off for better performance. It is also unclear how much function and block reordering will actually pay; our best estimate now is “not much” because so much Mill control flow is collapsed in the tool chain into much larger blocks and gets executed speculatively. Exit prediction also sharply cuts the fetch overhead that reorder is intended to help.
Lastly, SIMDization (they are not really vectors in the Cray sense) can be done in the tool chain as well for the Mill as for any architecture. Our major advance is the ability to do SIMD with per-element error control. Whether apps will take advantage of that to improve their RAS is as yet unclear.