Often what makes sense from a software point of view makes no sense from the hardware. And vice versa; the Mill design reflects years of work balancing those forces. Here conceptually one could indeed squeeze out belt positions, and thereby cut belt pressure and the need for longer belts and/or more scratchpad traffic.
We have in fact considered self-deleting belt operands, and even more bizarre ideas than that 🙂 Here the problem is hardware: removing the holes requires rather complex circuitry; the complexity increases super-linearly with longer belts; and it has to be done in the D2 logical-to-physical belt mapping stage, which is already clock-critical. The Mill now simply increments all the logical numbers by the number of new drops. The cost of that grows roughly as the log of the length of the belt (fanning out a signal costs more with increasing destinations, even though the increment itself is constant), and we expect that cost will be the major limiter for larger Mill members than Gold.
One might think that the Mill already has logic to reorder the belt (for conform and call), and that logic could be used to remove the holes. However, those ops contain the new mapping to use as a literal, precomputed by the compiler, which can be simply dropped in place in the D2 mapper; they don’t have to figure anything out, as is required by a hole compressor.
So your idea is sound from an program-model view, but runs foul of circuit realities. I’m a software guy, and a great many of my ideas have hit the wastebasket for the same reason. It’s only fair though; I have shot down my share of the ideas the hardware guys have come up with 🙂