There isn’t any conform op any more; the functionality was moved to branch ops so a taken branch can rearrange the belt to match the destination. However, the same question applies: the rearranging takes zero cycles and has essentially no cost. Nothing is actually moved; all that happens is that the mapping from belt number (to decode) to operand location (dynamic) is shuffled.
There’s no real conflict/race. The “drop to the belt” doesn’t involve any real movement. The result stays in the producer’s output latches until the space is needed for something else (when a real move will happen). “Drop” just enters the new value’s physical location into the mapping from belt number to location. It tells the rest of the machine “when you want b3 it’s over there”. You can have multiple drops from different sources; each goes into its own output location, and each gets its own mapping. Yes, there are necessarily multiple updates to that mapping happening, but the values are 3-bit belt numbers, not 128-bit data, and the hardware can handle it.
The belt has no byte size; it’s a logical list of locations. The size is the number of locations it tracks; current configs exist with 8, 16, and 32 positions. It doesn’t have to be saved, because each operand self-identifies with a timestamp which is its modulo index in an infinite sequence of drops; the leading numbers of that are the current belt.
It’s a bit to get your head around, I know 🙂
We talk about the belt as if it were a shift register, and dropping as if they were copies, but that’s all “as if” metaphor.