Mill Computing, Inc. › Forums › The Mill › Architecture › The Belt › Reply To: The Belt
In general all ops are fully pipelined, so each slot can issue one op per cycle. Consecutive ops in a pipeline can be the same or different, it doesn’t matter. Ops do not overtake one another in the pipeline, but they do differ in latency so a shorter latency op can retire before a longer-latency op that was issued earlier.
There are a few cases of intra-pipeline hazards that code gen must be aware of and statically avoid. An example is the FMA (fused multiply-add, sometimes called MAC for multiply-accumulate). If, for example, a regular multiply takes four cycles and a regular add takes two for some operand width, it is possible to do a FMA in five, not six, and with better precision than with separate operations. Hence, as the intermediate result is coming out of the multiplier in cycle 4 it needs to be fed directly into the adder. However, the code could be issuing an add operation in cycle 4 that is also goung to try to use the adder, and they can’t both use it.
As a result, while the FPU is pipelined and you can issue one FMA per cycle, or one add per cycle, with no problem, the right combination of FMA and add causes a pipeline hazard. The compiler should avoid this, and the hardware will throw a fault if it detects a hazard violation.
Some implementations of FMA have all their own hardware and don’t share the regular adder. These have no hazard, but do have more costly hardware; it’s all in the specification.