I was careful to say wider belt, meaning more elements in a vector, rather than longer belt because I imagine its diminishing returns and stresses instruction cache and so on.
The key thing is that it is straightforward to simulate variations and evaluate them on representative target code. I’m sure that the current configurations haven’t been plucked from thin air, but rather represent what is considered the most advantageous mix for the first cut.
I do want a Platinum Mill on my desktop and to hell with cooling! When we have a monster for gaming rigs, compiler rigs and for the fun of it, then we can dream of an Unobtainium Mill.