(Sorry for my late reply, I wasn’t aware that the email notification option didn’t include activity on the whole thread.)
I also did think of one other software situation where the carry flag is important: Emulators, especially when the target architecture is of the same register width as the host.
Anyway, in the Lisp situation, a common case is where a function returns multiple values, but the user of that function only bothers with the first (and idiomatically the most important) return value. Since we can freely pass lambdas around, if we as the caller are only interested in 1 return value, it’s unknown whether the function we’re eventually calling will return more or not.
A common example is the hash table accessor ‘gethash’. It returns 2 values; the value looked up (or NIL if not found), and a boolean explicitly stating whether or not it was found. The 2nd return value is needed for disambiguation if NILs are stored as values in the hash table. This second return value is always generated, but ignored in the majority of cases.
The default calling syntax only passes through the first return value, but you can specifically capture the multiple return values if you’re interested in them.
The returned values can each be anything, primitive (immediate register value) or compound/boxed (tagged pointer). Returning multiple values is not the same as returning a list, which is returning one value.
In the x86-64 compiler implementation, there are 4 registers unsaved between calls. If there is only 1 return value, the first of these registers is used, with carry clear. If there are more return values, carry is set, the first 3 registers are used for return values, with the count in the 4th. If there are more than 3 return values, they are spilled to the stack.
The nice thing is that if the caller only cares about 1 value, they just read that first output register, ignoring carry & the others. The calling convention keeps everything tidy so stack spill storage is not lost or trampled.
If the caller wants 2 or 3 return values, they’re immediately available as registers. The count & carry check can be elided when running with “safety level 0” optimizations, or if the type propagation can guarantee the number of expected return values. It’s uncommon, but safe & supported to have more than 3; it just has to go out to memory.
Regarding the Mill, I agree that it looks like the calling convention there would likely be two return values per call (1st value and count), with the >1 return values stored externally.
Looking back at the belt video again, right, it doesn’t look like the scratchpad can be used to pass data across function boundaries. Since Lisp multiple value returns are effectively extra side-band data to use optionally, it would be a bit unfortunate to have it always manage system memory for writing this data that is often ignored.
However, I’m sure many things could be mitigated, like passing the number of expected return values to a function, or having different function entry points for single- or N- valued return. I’ve only dived deep into the optimized assembly output for 1 architecture (x86-64), so my view on what goes on inside might not encompass all the tricks used on other platforms.
Just like the Mill is a complete rethink of what goes on in a CPU, it is a natural conclusion that optimizing compilers targeting the family would require a complete rethink of strategies to best take advantage of it. I’m sure your C compilers reflect this already, and the optimization opportunities are especially wide open to be solved for languages that do complex things well beyond “portable assembly code”.