If so, is there an advantage to issuing instructions late as opposed to issuing them as soon as their arguments are available? In this particular case one could, at least naïvely, think that larger part of the logic could have been clock gated on cycles 3 and 2, possibly leading to power savings if ‘sub’ and ‘mul’ would have been issued together.
The TL;DR answer: there is no power difference as long as the number of belt value spills/fills remains the same.
As far as power savings due to clock gating is concerned, there are 2 major factors:
- The power consumed in performing the operation itself (the add, sub, mul, etc.)
- The power consumed in maintaining operands on the belt
The power consumed by the operation itself is independent of when the operation is performed. It is what it is, the same every time for a particular set of input values. When an operation is not being performed by a particular functional unit its clock is gated off. When the clock is gated off, the functional unit power consumption is only that due to static leakage current.
The power consumed in maintaining an operand on the belt is nearly constant and depends greatly upon the number of new result values arriving each clock times the number of potential destinations for each result.
The conclusion is that the biggest factor in reducing power is the number of belt spills/fills that must performed. The lower the number of spills/fills, the lower the power.