– 7) Benchmarking support
Are there plans to support stable benchmarking in the Mill? Benchmarks often have to go through all sort of contortions to get filter out performance noise: warming up caches, running multiple times to smooth outliers out, maybe even recompiling multiple times to test performance with different code layouts, etc.
The Mill has some natural advantages: the Exit Table can be preloaded, and no runtime reordering means better stability. Do you have plans to include features specifically for stable execution? Things like purging caches, avoiding interrupts, pinning cores, etc?
– 8) Profiling support
Similar to the previous question, do you have plans to support fine-grained profiling? Profilers often give out coarse information like “this function was executed roughly X times”, “this function took roughly Y cycles on average”, etc.
You can use emulation tools like valgrind to get more fine-grained info like the number of L2 misses from a specific instruction, at a cost of massive performance loss. Could the Mill provide tools to help get fine-grained data without the massive overhead?
– 9) Profile-guided optimization
How does Mill plan to benefit from PGO?
Traditional PGO is mostly centered on layout: the profiling phase gets coarse-grained info about which functions are called most often in which order, and from that the compiler knows both which functions to optimize for size vs performance (in other words, which loops to unroll and vectorize) and which functions should be laid out together in memory.
It feels like, since the Mill is a lot more structured than traditional ISA, it could get a lot more mileage from PGO. It can already get benefits through the Exit Table. Are there plans to facilitate other optimizations, besides general code layout?
– 10) Specializer hints
Because of the whole Specializer workflow, the genASM format is essentially a compiler IR. As such, have you thought about including hints to help the Specializer produce better local code?
Some hints might include (straight from LLVM): aliasing info (saying that a load and a store will never alias), purity info for function calls (saying a function will never load/store/modify FP flags), generally saying that a loop is safe to unroll, etc.