Things like Fibonacci are simple enough that the compilers recognize tail recursion and turn them into loops on any machine – try it and see. What causes trouble is two-way recursion, such as depth-first search in binary trees; tail conversion gets rid of one of the calls, but the second remains. Research compilers replace the recursion with a parallel state stack, but I don’t know of any commercial compilers that do, and it’s not clear that the state stack is any better than the regular stack.
Whether the Mill is better or worse than other architectures depends on the amount of state saved in a recursive call and what the save bandwidth is. On a genReg machine the state will be at least the PC and frame registers plus any live data state. The same is true on a Mill, plus any additional belt positions that are zombie; the Mill situation is equivalent to a callee-saves protocol in a genreg machine.
The number of zombies depends on the amount of work between the calls. The Mill call instruction marks all belt positions not occupied by arguments as invalid, so they won’t zombie in an immediately following call. The same occurs with branches. As other operations execute the invalids get pushed along the belt and eventually fall off, so the belt becomes populated with operands that might be zombie. Hence zombie spill increases with distance from the last control-flow transfer, either branch or the entry of the called function.
While it is easy to invent cases in which the whole belt is zombie, in practice recursive calls usually seem to be quite close to other control flow and the actual zombie count is ignorable.