Thank you for your detailed analysis.
I had the belief that needing a loaded value early on a function would be a common case, and a 10 or even 3 cycle stall would be kind of a big deal on a machine as wide as the Mill.
But yeah, I don’t have data to really back it up.
This also seems to happen from the caller side, since you have to drop everything onto the belt in the right order before the call. In this case it could be mitigated (at some cost) if the hardware didn’t stall until the corresponding belt position is accessed by the callee.
This is all hand-waving at this point though.