How accurate is the belt abstraction when it comes to the actual HW? That is, is it really the case, on a machine with 32 belt entries and roughly the same number of ALUs, that in each cycle, each ALU can take its operands from any belt location? And push a result to the belt?
I’m not a uArchitect (but I work with some), but it sounds to me as though either:
– There are some restrictions in place to keep the HW cost reasonable, so the belt abstraction isn’t the whole story. That would impact a lot of the described architecture in deep ways, though.
– There is the mother of all crossbars which can, in one cycle, route full operands from any of 32 sources to any of 32 destinations, and take 32 results back into (almost) arbitrary positions – which, I am told by my uArchitect friends, is “impractical” (they’d probably use a stronger word there
– There is some very clever uArchitecture trickiness going on which wasn’t described in the videos (including the belt video). There are tantalizing hints in the videos about a tagged structure but not enough to explain how this circle is squared.
Perhaps this is “too low level” to be covered in an architectural video/forum, but I’m burning with curiosity here