They do have to be held somewhere; “somewhere” is called the scratchpad. Each function appears to get its own scratchpad. There is only one physical scratchpad, which is an in-core SRAM, not memory, and is reachable only by the “spill” and “fill” ops. The “appearance” of one per frame is provided by the spiller. Mill operation phasing lets an instruction contain “fill(), fill(), add(), spill()” to execute scratchpad-to-scratchpad without stalls, if desired. The instruction takes a single cycle, every time, sustained as long as you like. The equivalent in a general register RISC that has run out of registers would be “load; load; add; store” but because it uses memory the latency is longer, even if there are no cache misses. Machines like the VAX or x86 that have M2M operations suffer the same latency/miss problems, just encoded differently. Mill avoids all that.
Reply To: code density Ivan Godard