Yes, pick takes no cycle at all, nor does conform, because both are in the belt-mapping/crossbar that is logically at the cycle boundary. It actually takes a cycle, or more, to set up a conform. The timing is different depending on whether the belt is CAM (logical) or mapping (physical) addressed, but for physical the hardware is tracking issue and drops a cycle ahead of when they happen and so it actually remaps the belt naming in what is the opPhase cycle to do a conform. However, that mapping is only effective for ops in the following cycle, so we can think of it as taking place at the cycle boundary, even though the hardware guys think of it as being in the X0 clock.
Transfer phase doesn’t really have a hardware equivalent; it is an artifact of the way the sim is implemented as it steps through the state machine that does phasing. The Mill is running at least three cycles ahead down the predicted path, so we have already transferred long before we actually execute the branch; possibly several times in fact. So transfer phase is really the point at which we recognize and stall a mispredict. A modern machine cannot stop on a dime, when the core can be several cycles across at speed-of-light. Stall is one of the most difficult parts of hardware design.
Call phase doesn’t take a clock if there are no calls, but if there are then in effect it preempts and uses the clock that writer phase would have used.
Unused phases just don’t issue anything from the instruction. A sequence of instruction with nothing but compute-phase ops will execute them one per cycle, but there will not be any reader/writer/ phase ops overlapped with them because the instructions don’t have any of those kinds of ops. Adjacent instructions run the same whether they have anything to do in a particular phase or not. There’s no cost to an unused phase, just a lost opportunity.