This reply addresses LarryP’s questions on configuration.
1) size of scratch in different members:
Sizes in bytes. All numbers are placeholders awaiting tuning with real code.
Scratch is byte addressable and packed, so less is needed than for a register file that needs a whole register to hold anything. Ignoring vector data, we project an average width of spilled data ~3 bytes, so a Tin scratch can hold ~40 separate operands. We project a peak non-pathological scratch load in open code to be perhaps 10 operands, so there’s plenty of extra space to buffer the spiller. In a piped loop, LCVs may demand many more than 10, but we expect to stay in such a loop for a while so the spiller won’t be active during the loop and won’t need to buffer, so the loop can use the whole of scratch without causing spiller stalls.