The scratchpad has a three cycle spill to fill latency, if you spill a value you won’t be able to get it back for 3 cycles because of this the length of the belt is set so that nearly everything lives for three
cycles on the belt. So the length of the belt needs to be 3 times the number of results that can be produced by functional units in one instruction for that family member.
That makes sense, but, I can’t imagine that a Tin can only retire three values a cycle, though. Then again, maybe I just suck at understanding real hardware.
The belt is quite different from the scratchpad
I’m not sure I understand specifically what you mean by a ‘slower belt’.
If you think of the belt abstraction: you’ve got this conveyor belt that values go onto and you pull some off, operate on them, and put the result on the belt. The newest results go on the front of the belt and the oldest results fall of the back of the belt. Now, imagine two of these belts. A spill operation moves a value onto the slower belt. It is the only reason the slower belt moves. The fill operation takes a value off the slow belt and puts it back onto the fast belt. The ALU (etc) operates off the fast belt. Values cycle on that belt quickly: it is fast. The slow belt only changes when we need to rename something as being slow.
The only thing I see with this is that people will find pathological algorithms which require an inane amount of working set to run.
The size of the available on chip memory is the same cost/speed trade off you make when buying DRAM.
Tin has only 128 bytes of scratchpad, and Gold has 512. Why so small? I realize that the scratchpad isn’t expected to be used frequently. Then again, maybe the Tin should have more Scratchpad to make up for its lack of Belt.