My guess would be that the Mill is significantly less sensitive to this kind of thing just by accident.
Firstly, laying things out to avoid I$1 conflicts is certainly something that could be implemented, but it would require knowing which functions usually need to be called at the same time during specialization. However, exit prediction prefetch means that a much higher percentage of jumps end up as cache hits anyway, so the impact of not doing that is mitigated.
Secondly, the Mill will inherently store much less data on the application-visible stack because register saving is handled by the hardware and the Mill can store huge amounts of data in the Scratchpad. This makes stack data more compact, inherently giving better locality.
Thirdly, the Mill is smart about stack frames. A stack frame that is evicted doesn’t necessarily incur a cache miss when it is used again, since the Mill tracks stack frames across function call boundaries. In particular, operations on fresh stack frames should just work.
Fourthly, as a SAS-with-fork()-hack processor, the Mill will only hit the TLB if it’s going to memory anyway. Permission checks have to be done on each load, store, branch, and call, but the PLB is effectively fully associative and only stores one entry per range of bytes with a given permission, making it completely insensitive to both code and data layout changes that don’t cross protection boundaries.