Actually, it’s simpler even than Will says; I wish I could have gone into the mechanics in more detail, but the talks have time constraints.
It comes down the the fact that drops are always at the front of the belt, not at some slot index position as if the belt were an array rather than the FIFO that it is. Consequently the hardware must use a canonical ordering of drops so the software can ask for the values in the right place when it wants them later. That ordering is latency order: if a multiply (latency 3) and an add (latency 1) drop in the same cycle, the add will be at a higher position number (say b1) than the multiply (say b0).
The Nones that are dropped by retire are stand-ins for values that are in flight and haven’t dropped yet, and the reason they are in flight is because they are the result of a long-latency operation that hasn’t reached its steady-state yet. Consequently, the latency-order of drop says that the ones that haven’t dropped yet would have been at low belt numbers if they were in steady state. Consequently all the Nones from retire, no matter how many there are, will be in a block at the front of the belt, followed by any real retires, followed by the rest of the pre-drop belt.
Getting that to work without a special op to say where things should go is what makes Dave’s Device so clever IMO.
Will is right that there are two forms of retire. Recall that Mill operands have metadata tags telling the operand’s width and scalarity, which guide the functional units when the operands later become arguments to ops. Nones, including those dropped by retire, need width tags too. There is one form of retire that gives the desired widths explicitly. There is also another form that gives only a drop count, as shown in the talk, and that form drops Nones with a tag with an Undef width.
When an Undef width is later eaten by an op, the width is defaulted to a width that makes sense to that op, following defined rules, and the op’s result (always a None of course) carries the width that would have resulted from that default. However, for some ops the defaulting rules can result in a wrong width, different from what the real non-None would have done, that can screw up the belt for other operations. The scheduler in the specializer knows when that happens, because it knows what the desired width is, and when it detects that the default would be wrong it uses the explicit-width form of retire.
That would have been another ten slides and an explanation of width tagging, and there wasn’t time for it.