2) My understanding is that .NET has a similar limitation. This limitation may not be strictly necessary, the paper A tail-recursive machine with stack inspection (non-paywalled link) may be relevant here.
3) It’s not strictly necessary to get rid of the execution context immediately. The important thing is that the retained context for all tail calls is bounded by a (hopefully small) constant. Perhaps it would be simpler to mark a sequence number as a tail call, and then have the spiller throw away the belt when it would have been otherwise spilled? This might not be an ideal implementation, as it would increase the pressure on the on-chip storage, but it should still be quite usable, and would probably be a more than adequate first implementation.
5a) Scheme supports varargs and mandates proper tail calls. Chez Scheme, Ikarus, Racket, Stalin, and/or Larceny might do what you describe, but I don’t know enough about any of these implementations on that particular count. In any case, it’s probably not important.
5b) Yup, even with a callret instruction, there would still likely be some fairly obvious reasons to perform tail recursion optimization on the Mill. Even some Scheme compilers have TRO, though they all also implement proper tail calls. If TRO was all there really was to proper tail calls, they wouldn’t be that interesting or significantly increase the expressive power of the language. But there’s a lot more to proper tail calls than TRO.