Mill Computing, Inc

Participant

September 3, 2024 at 3:37 pm

Post count: 11

something as simple as a[i]->member + b[j]->member has the case where the Mill necessarily stalls twice

I fail to see the first stall. Normally, the first load batch, a[i] and b[j] should be amenable to be hoisted as much as possible (assuming a, b, i and j are known), such that, in cases where there’s not much to do in the meantime, it would mostly only stall on the second load batch only.

Also, I don’t see why the OoO would do any better in this context. In any case, the Mill compiler can see a “bigger window” to try to find something to do.

From what I remember from the talks, the case of chained memory indirection (including OOP but not exclusively) was explicitly mentioned time and again. And it’s a problem everyone faces. Their solution is also mentioned: try really hard to hoist as early as possible.

[a few minutes later]
I now believe you bring this example in the context of loops (in which case the first hoist would be very hard to achieve). I’d think your best bet is unrolling as much as possible (the limit would be the number of retire stations?). Not sure it’s that much different on an OoO, though.

I noticed that between the spiller, the metadata, and the exit predictor, you could probably make a very effective hardware scout

Neat. I’d like to read their response. My guess: it all hangs on power consumption.

Reply To: Memory level parallelism and HW scouts