I understand that the Mill has a notion of a value in altch being live, i.e. “on the belt”. I like to call this “hardware live”. There is also a “software live” notion: the compiler knows for example that after a sequence of instructions, belt positions b10, b11, b14 and b15 are dead. I understand that if the next instruction contains a
inner, the spiller will save these four and a number of other latch values in its buffers and that the assumption is that this will not put too much pressure on the spiller’s performance (thanks for the extra explanation about the skid buffers).
There is not much information in the talks about what exactly happens on a
retn. I assume that the spiller puts all hardware live values of the “frame that will become current” back in its buffers. But since the spillers buffer on a Gold CPU has only room for 16 values, it seems that having 2 nested loops could easily make the CPU stall because the spiller is getting values back from L2$. Question is also how many belt positions are hardware live and hence saved/restored. Based on the statistic that belt values are used only once in 80% of the cases, I speculate that the number of hardware live values can be on average two times the number of software live values, which implies that the number of values to restore could be reduced significantly.
Simply stalling on a
retn due to putting many values back from L2$ into the spiller buffers seems too naive, so I assume that a piece of the spiller puzzle is missing which explains that performance is okay here. Ivan, can you explain?
- This reply was modified 6 years, 9 months ago by mhkool. Reason: fix grammar