Difference between revisions of "Execution"
(→Gangs) | (spelling) | ||
Line 5: | Line 5: | ||
== [[Decode|Wide Issue]] == | == [[Decode|Wide Issue]] == | ||
− | The prerequisite for almost all the Mill ways to get good performance at low energy cost is it being a very wide issue machine. There are many many functional units in comparison to conventional | + | The prerequisite for almost all the Mill ways to get good performance at low energy cost is it being a very wide issue machine. There are many many functional units in comparison to conventional architectures, and it all revolves around feeding them with instructions and data. Each instruction can contain up to 30 and more operations, although only the high end Mill processors have this extreme width. |
How widely a specific Mill processor can issue operations is determined by the number of [[Slot]]s. They are not all created equal and each slot has its own set of operations it supports. | How widely a specific Mill processor can issue operations is determined by the number of [[Slot]]s. They are not all created equal and each slot has its own set of operations it supports. | ||
Line 15: | Line 15: | ||
== [[Pipelining]] == | == [[Pipelining]] == | ||
− | All code is statically scheduled to maximize functional unit utilization on each cycle and as a | + | All code is statically scheduled to maximize functional unit utilization on each cycle and as a corollary to have no stall or bubbles in the pipeline. This is particularly useful in loops, since the compiler has a lot to work with to unroll them and ultimately execute them in parallel to a large degree with little or no penalty in code size. |
== [[Speculation]] == | == [[Speculation]] == | ||
Line 23: | Line 23: | ||
== [[Ganging|Gangs]] == | == [[Ganging|Gangs]] == | ||
− | Gangs combine [[Slots]] and their inputs to form more complex operations and to facilitate not only data flow between different phases of one | + | Gangs combine [[Slots]] and their inputs to form more complex operations and to facilitate not only data flow between different phases of one instruction, but to have data flow between operations within one phase of one instruction. |
== Multi-Branch == | == Multi-Branch == |
Revision as of 18:49, 17 August 2014
This is an overview page briefly explaining several aspects of Mill code execution. Those topics were mainly covered in this talk, but there are some more.
All aspects of execution on the Mill are geared at improving data flow and control behavior in the hardware.
Wide Issue
The prerequisite for almost all the Mill ways to get good performance at low energy cost is it being a very wide issue machine. There are many many functional units in comparison to conventional architectures, and it all revolves around feeding them with instructions and data. Each instruction can contain up to 30 and more operations, although only the high end Mill processors have this extreme width.
How widely a specific Mill processor can issue operations is determined by the number of Slots. They are not all created equal and each slot has its own set of operations it supports.
Phasing
Phasing enables data flow connection, even over branch borders, within one instruction. The execution (and decode) of different kinds of operations within an instruction is tiered and chained in a phase shift over cycles.
Pipelining
All code is statically scheduled to maximize functional unit utilization on each cycle and as a corollary to have no stall or bubbles in the pipeline. This is particularly useful in loops, since the compiler has a lot to work with to unroll them and ultimately execute them in parallel to a large degree with little or no penalty in code size.
Speculation
Is another aspect of statically scheduling to increase parallel execution, ILP.
Gangs
Gangs combine Slots and their inputs to form more complex operations and to facilitate not only data flow between different phases of one instruction, but to have data flow between operations within one phase of one instruction.
Multi-Branch
One instruction can contain several branches. Those are all executed at the same time too to check for a lot of different conditions. Only one of them can be the correct one though. While execution of branches is parallel, evaluation of branches is in a defined order, from left to right in instruction encoding order, which is issue order. The first conditional branch to be true in that order is the one taken.
Media
Presentation on Execution by Ivan Godard - Slides
Presentation on Pipelining by Ivan Godard - Slides
Presentation on Metadata and Speculation by Ivan Godard - Slides