From Mill Computing Wiki
Revision as of 00:58, 7 January 2015 by Jan (Talk | contribs)

Jump to: navigation, search

This is an overview page briefly explaining several aspects of Mill code execution. Those topics were mainly covered in this talk, but there are some more.

All aspects of execution on the Mill are geared at improving data flow and control behavior in the hardware.

Wide Issue

The prerequisite for almost all the Mill ways to get good performance at low energy cost is it being a very wide issue machine. There are many many functional units in comparison to conventional architectures, and it all revolves around feeding them with instructions and data. Each instruction can contain up to 30 and more operations, although only the high end Mill processors have this extreme width.

How widely a specific Mill processor can issue operations is determined by the number of Slots. They are not all created equal and each slot has its own set of operations it supports.


Phasing enables data flow connection, even over branch borders, within one instruction. The execution (and decode) of different kinds of operations within an instruction is tiered and chained in a phase shift over cycles.


All code is statically scheduled to maximize functional unit utilization on each cycle and as a corollary to have no stall or bubbles in the pipeline. This is particularly useful in loops, since the compiler has a lot to work with to unroll them and ultimately execute them in parallel to a large degree with little or no penalty in code size.


Is another aspect of statically scheduling to increase parallel execution, ILP.


Gangs combine Slots and their inputs to form more complex operations and to facilitate not only data flow between different phases of one instruction, but to have data flow between operations within one phase of one instruction.

Multi-Branch, or the First Winner Rule

One instruction can contain several branches. Those are all executed at the same time too to check for a lot of different conditions. Only one of them can be the correct one though. While execution of branches is parallel, evaluation of branches is in a defined order, from left to right in instruction encoding order, which is issue order. The first conditional branch to be true in that order is the one taken.

The first successful conditional branch operation in an instruction, and as such consequently also the first in an EBB is taken.

This is called the First Winner Rule.

Another consequence is, there can only ever be one unconditional branch in an EBB, as the last operation in the last instruction.


Presentation on Execution by Ivan Godard - Slides
Presentation on Pipelining by Ivan Godard - Slides
Presentation on Metadata and Speculation by Ivan Godard - Slides