From Mill Computing Wiki
Jump to: navigation, search

This is an overview page briefly explaining several aspects of Mill code execution. Those topics were mainly covered in this talk, but there are some more.

All aspects of execution on the Mill are geared at improving data flow and control behavior in the hardware.

Wide Issue

The prerequisite for almost all the Mill methods to get good performance at low energy cost is the fact that it is a very wide issue machine. There are many functional units in comparison to conventional architectures, and it all revolves around feeding them with instructions and data. Each instruction can contain up to 30 or more operations, although only the high end Mill processors have this extreme width.

How widely a specific Mill processor can issue operations is determined by the number of Slots. Each slot has its own set of operations that it supports.


Phasing enables data flow connection, even over branch borders, within one instruction. The execution (and decode) of different kinds of operations within an instruction is tiered and chained in a phase shift over cycles.


All code is statically scheduled to maximize functional unit utilization on each cycle and as a corollary to have no stall or bubbles in the pipeline. This is particularly useful in loops, since the compiler has a lot to work with to unroll them and ultimately execute them in parallel to a large degree with little or no penalty in code size.


Speculation is another aspect of statically scheduling to increase parallel execution, ILP.


Gangs combine Slots and their inputs to form more complex operations and to facilitate not only data flow between different phases of one instruction, but to have data flow between operations within one phase of one instruction.

Multi-Branch, or the First Winner Rule

One instruction can contain several branches. Those are all executed at the same time to check for a lot of different conditions. Although, only one of them can be the correct one. While execution of branches is parallel, evaluation of branches is in a defined order: from left to right in instruction encoding order, which is issue order. The first conditional branch to be true in that order is the one taken.

The first successful conditional branch operation in an instruction, and as such consequently also the first in an EBB is taken.

This is called the First Winner Rule.

Another consequence is, there can only ever be one unconditional branch in an EBB, as the last operation in the last instruction.


Presentation on Execution by Ivan Godard - Slides
Presentation on Pipelining by Ivan Godard - Slides
Presentation on Metadata and Speculation by Ivan Godard - Slides