The call op is in the flow block on the flow side of the decoders. That block parses in D0, so the presence of calls is known at the start of the D1 cycle, and there is all of D1 and D2 to get organized. When there are cascaded calls the hardware connects the return of the first to the entry of the second and so on; there is no cycle in between. You can think of it as hardware tail recursion removal. It’s not hard on a Mill because there cannot be any hazard between the two calls; on a genreg machine you’d have to check whether the first function did something nasty to the caller’s registers or stack, and even without that you still would have inter-call rename, something I don’t want to think about.
Art claims that he can cascade the trailing call with a branch or return too, so long as the instruction does not have any pick ops. I’m not sure I believe him.