2) Well, I waded through the paper you cite. Once the obfuscation is removed, their method appears to be to accumulate the active set of permissions at the top frame, adding to that set with each tailcall as necessary. This is different than keeping each added permission in the stack at the level of its grant, and then searching down the stack for a relevant permission when needed. That having a mutable current set at the top removes permission-stack-overflow is obvious, but it is necessary to wrap the observation in a few lambdas to achieve publication 🙂
The approach is not really relevant to the Mill, because it assumes that permissions can be granted piecemeal locally, as in an access control list (ACL) threaded through the stack. Grants are piecemeal on the Mill, but are global and heap-like. No stack search is required; instead the hardware maintains a persistent table accessed through the PLB. This table does not itself change as a result of call, and so does not inhibit TCO.
However, on the Mill packages of permissions are grouped into named turfs as an implementation optimization, and a program has the rights of a single turf at any one time, in addition to any global grants it may have acquired. The current turf is local to a frame, and hardware admits changing the current turf as a consequence of call (and restoration of the prior turf on return). Because turf is a local right, it potentially interferes with TCO, and in practice the hardware (the spiller) maintains information in each spiller frame sufficient to restore the turf context of the caller. It is this data structure that would inhibit TCO across portal calls.
To admit TCO across portals, it is necessary to overwrite the current top spiller frame with new information reflecting the new execution context, without overwriting the restoration information. This is trivial to do in the abstract, but hardware is “interesting” in the concrete. For example, what happens if the spiller has the structure half-overwritten when it gets an ECC error on a memory access? These issues are non-trivial, and in software would be addressed by Read/Copy/Update methods, but hardware doesn’t do RCU very well.
So at this point the most I can say is that I think I understand the issue; we can TCO anywhere that Java and .net can; and we might be able to TCO across portals but I don’t guarantee it.
3) Belt semantics is more than belt frames. For example, nothing stops a program from starting a MULF and then doing a tail call while the operation is in flight in the multiplier. Yes, it will be thrown away, but there’s no way to reach inside the multiplier and stop the result from coming out. We don’t want that in-flight value dropping on the called function’s belt, and the frame it should be dropping to is gone, courtesy TCO.
The problem with the belt and TCO is that we don’t want to complicate the engine in a way that everyone pays for, just to provide something of theoretical interest only (TCO over portals) or lacking a significant customer base (TCO in general). I think the spiller frame-ID problem will fall out of the spiller implementation if we are careful, which will give us regular TCO. I’m less confident about portal TCO.
5a) VARARGs etc will have to be passed on the heap or copied in such a way that they don’t overwrite themselves. Either way it’s the province of the compiler, and nothing the hardware can do.