Forum Replies Created
- AuthorPosts
- in reply to: Memory level parallelism and HW scouts #4000
something as simple as
a[i]->member + b[j]->member
has the case where the Mill necessarily stalls twiceI fail to see the first stall. Normally, the first load batch,
a[i]
andb[j]
should be amenable to be hoisted as much as possible (assuminga
,b
,i
andj
are known), such that, in cases where there’s not much to do in the meantime, it would mostly only stall on the second load batch only.Also, I don’t see why the OoO would do any better in this context. In any case, the Mill compiler can see a “bigger window” to try to find something to do.
From what I remember from the talks, the case of chained memory indirection (including OOP but not exclusively) was explicitly mentioned time and again. And it’s a problem everyone faces. Their solution is also mentioned: try really hard to hoist as early as possible.
[a few minutes later]
I now believe you bring this example in the context of loops (in which case the first hoist would be very hard to achieve). I’d think your best bet is unrolling as much as possible (the limit would be the number of retire stations?). Not sure it’s that much different on an OoO, though.I noticed that between the spiller, the metadata, and the exit predictor, you could probably make a very effective hardware scout
Neat. I’d like to read their response. My guess: it all hangs on power consumption.
- in reply to: Characterizing the Belt #3425
Those characterizations have one thing going for them: there’s no Mill (yet); it doesn’t run any software (yet); OSs haven’t been ported to it (yet).
So, skepticism is healthy.
But I think they go too far in dismissing these ideas.
My favorite: “Lack of object code compatibility between different family members implies a more embedded universe for the product, where the application is known ahead of time and doesn’t change much”. Android proved that’s not the case 10 years ago.
In particular, the point you are quoting puzzles me. I’m not an expert in CPU design and _I can think of at least 2 ways it absolutely rocks_ (maybe I’m a sucker for immutability): instruction coding entropy and reducing the capacitive load (IIRC, there’s no big crossbar for 500+ registers).
The part about SAS comes up time and again. The Mill guys _did_ explicitly described mechanisms to deal with fork, relocatable code, etc. They simply ignore all that. Maybe they missed it, which proves my point: everyone has such a strong opinion while _at the same time_ admitting _hey, I may be missing something_.
It _is_ risky, of course. Let them fail. It’s not you, after all.
If I remember correctly from one of the talks, every math op had 4 versions:
1. saturating: 250 + 30 = 255 (8 bits);
1. excepting: 250 + 30 = 💣;
1. wrap-around: 250 + 30 = 24;
1. expanding: 250 + 30 = 280 (16×8 bits => 2×8×8 bits);I believe arbitrary-precision math can be easily implemented using the excepting versions (just handle the special case in an interrupt).
— nachokb
- in reply to: LLVM pointers #2781
Funny: I thought C would be one of the offenders. After all, basic C syntax exposes pointer arithmetic without any checks (I imagine there *is* a lot of C code out there doing nasty stuff to pointers).
- in reply to: Continuous Refinements #2190
Speaking of which, how does the Mill cope with changing the clock frequency? Having different components running asynchronously sounds like bad news for overclockers.
Awesome work!
— nachokb
[EDIT: grammar]
- This reply was modified 8 years, 1 month ago by nachokb.
- in reply to: Continuous Refinements #2195
a supercollider-management op
Well, now I can’t wait for a supercollider-managing Mill…
There’s enough flexibility within the Mill architecture to permit a lot of tuning without having to depart incompatibly
Cool! That was exactly what I hoped.
- AuthorPosts