The expectation is that for “reasonable” nesting depths the spiller will keep spilled operands in its internal SRAM space, so spills will never go into the memory hierarchy. The spiller does use high-water marks to anticipate the need to restore values from the hierarchy, but an abrupt sequence of a lot of function returns can overwhelm that prefetching and produce a stall, just as an abrupt sequence of calls can overwhelm the spiller’s bandwidth flushing operands to the hierarchy.
The software does have the ability to discard values from the belt; every branch transfer can do so. I agree that it is tempting to add that capability to the call operation, but we have not done so because of problems with encoding such an operation. The call can specify a belt-full of arguments, and having it also specify a potential belt-full of preserved operands would overwhelm the space available in the encoding.
There are alternative encodings however, but these would significantly complicate the decoders. So we are leaving things as they are, until we have enough gate-level sim of large programs to see if there really is a problem with spiller bandwidth or to size of spiller SRAM.