Mill Computing, Inc

Participant

January 9, 2023 at 4:47 am

Post count: 23

Many, perhaps nearly all PGO optimizations are counter-productive on a Mill. Thus for example unrolling loops prevents software pipelining and forces spill/fill of transients that otherwise would live only on the belt for their lifetime; shut unrolling off for better performance.

Maybe this is a vocabulary problem, but there’s definitely a kind of unrolling that the Mill benefits from; we could call it “horizontal” unrolling as opposed to “vertical” unrolling which would be the traditional version.

But if you have code that looks like this:

`
LOOP:
CON(…),
ADD(…), LOAD(…),
STORE(…);
`

If your Mill is wide enough, you will definitely benefit from turning it into:

`
LOOP:
CON(…), CON(…), CON(…),
ADD(…), LOAD(…), ADD(…), LOAD(…), ADD(…), LOAD(…),
STORE(…), STORE(…), STORE(…);
`

I don’t know exactly how you’d call that operation, but to me it’s “unrolling” of a sort. And it definitely hits that “efficiency / code size” tradeoff that PGO helps optimize.

Many, perhaps nearly all PGO optimizations are counter-productive on a Mill. […] It is also unclear how much function and block reordering will actually pay;

That doesn’t sound right. As long as your codegen is based on heuristics, you’ll benefit from PGO somewhere.

To give a Mill-specific example, there’s load delays. Your current heuristic is “as long as you can get away with”, but a PGO analysis might say “actually, this load almost always hits L1, so we can give it a delay of 3 to start this following load earlier / to better pack instructions”.

Reply To: Grab bag of questions