Return takes the same belt-argument list that call (and a few other operations line conform and rescue) do. You can’t build it up piecemeal. The arguments don’t have to be adjacent on the belt, and the same belt position can be used for more than argument. There’s no “nil” argument to return (or call etc.); you have to put a nil (whatever that is in the app) on the belt and then pass it as many times as you want.
Thread-local is possible. It is not in general possible for a callee to address the caller’s stack without action on the part of the caller, which would be more trouble than TLS.
Sounds like a null function would be in and out before a d$1 latency, arguing again for putting the loads after the call rather than in-flight over the call.
Varargs calls in the belt are a problem; by the time you have figured out how many arguments you have they may well have fallen off the belt. In C only the fixed arguments pass in the belt and the variable part is passed in the stack, but C VARARGS is rare enough that it doesn’t matter. Sounds like every call in Lisp is facing the issue, which would matter.
I think the in-memory solution, while it might be better than other CPUs, is fundamentally inapt for the Mill function model. Needs thought.