The Belt vs Multi-Accumulator
What is the advantage of the belt abstraction over a a multi-accumulator abstraction? In particular, why not treat each functional unit as an independent accumulator machine.
As noted in some of the talks, only 14% of results are used more than once.
The belt ISA (register) abstraction provides a bit-wise efficient encoding, since it eliminates the destination register. However, a multi-accumulator model requires (typically) even less – just one argument per operation, which can source values from other functional units. Basically the compressed instruction is a bitmap of active functional units, and the 2nd (+?) value for each operation is explicitly provided in the variable-length instruction trailer.
This architectural design seems orthogonal to other aspects – memory management, load/store stations, scratchpad, spill mechanism, etc.
Admittedly the belt abstraction provides a neat fn call abstraction. On the other hand, a callee-save instruction could efficiently push accumulators towards the spiller, and a ‘call’ instruction could efficiently marshal parameters from the accumulators.
A natural and maybe pragmatic extension is to provide multiple accumulators per functional pipeline – somewhere between a full register bank and a true multi-accumulator model.
Per the comment in talk #3 about extending the backless memory model to the heap: this might be critical for JVM middleware – typically the stack only contains pointers into the heap, and thread-local eden space is treated effectively like the traditional stack in C(++).
- This reply was modified 5 years, 5 months ago by rolandpj.