I found Mr. Godard’s post in another thread,
about the spiller quite informative.
My impression is that when the spiller’s SRAM is full and and a call executes, the spiller must spill either enough SRAM to the L2 cache to save the caller’s entire state (Is frame the correct term in this context?) or enough of the caller’s frame to make room for the callee’s frame. So, the worst-case latency for a call should depend on the the largest permitted frame size and the time to write the corresponding number of cache lines to L2. Since the belt contents are a key part of a function’s state, a mill with a long belt may have a longer worst-case call latency than a model with a shorter belt.
Of course, if the (first cache line of) callee’s code isn’t already in cache, there will also be a time hit to bring that in. In the case of an insufficiently frequent interrupt or exception, the latency to bring in the first cache line of the corresponding handler may dominate worst-case interrupt latency.