Difference between revisions of "Execution"
(create) | (→Multi-Branch, or the First Winner Rule) | ||
(7 intermediate revisions by 2 users not shown) | |||
Line 5: | Line 5: | ||
== [[Decode|Wide Issue]] == | == [[Decode|Wide Issue]] == | ||
− | The prerequisite for almost all the Mill | + | The prerequisite for almost all the Mill methods to get good performance at low energy cost is the fact that it is a very wide issue machine. There are many functional units in comparison to conventional architectures, and it all revolves around feeding them with instructions and data. Each instruction can contain up to 30 or more operations, although only the high end Mill processors have this extreme width. |
− | How widely a specific Mill processor can issue operations is determined by the number of [[Slot]]s. | + | How widely a specific Mill processor can issue operations is determined by the number of [[Slot]]s. Each slot has its own set of operations that it supports. |
== [[Phasing]] == | == [[Phasing]] == | ||
Line 15: | Line 15: | ||
== [[Pipelining]] == | == [[Pipelining]] == | ||
− | All code is statically scheduled to maximize functional unit utilization on each cycle and as a | + | All code is statically scheduled to maximize functional unit utilization on each cycle and as a corollary to have no stall or bubbles in the pipeline. This is particularly useful in loops, since the compiler has a lot to work with to unroll them and ultimately execute them in parallel to a large degree with little or no penalty in code size. |
== [[Speculation]] == | == [[Speculation]] == | ||
− | + | Speculation is another aspect of statically scheduling to increase parallel execution, <abbr title="Instruction Level Parallelism">ILP</abbr>. | |
− | == [[Gangs]] == | + | == [[Ganging|Gangs]] == |
− | Gangs combine [[Slots]] and their inputs to form more complex operations and to facilitate not only data flow between different phases of one | + | Gangs combine [[Slots]] and their inputs to form more complex operations and to facilitate not only data flow between different phases of one instruction, but to have data flow between operations within one phase of one instruction. |
− | == Multi-Branch == | + | == <span id="fwr">Multi-Branch, or the First Winner Rule</span> == |
− | One instruction can contain several branches. Those are all executed at the same time | + | One instruction can contain several branches. Those are all executed at the same time to check for a lot of different conditions. Although, only one of them can be the correct one. While execution of branches is parallel, evaluation of branches is in a defined order: from left to right in instruction encoding order, which is issue order. The first conditional branch to be true in that order is the one taken. |
+ | The first successful conditional branch operation in an instruction, and as such consequently also the first in an [[EBB]] is taken. | ||
+ | |||
+ | This is called the ''First Winner Rule''. | ||
+ | |||
+ | Another consequence is, there can only ever be one unconditional branch in an EBB, as the last operation in the last instruction. | ||
== Media == | == Media == |
Latest revision as of 07:12, 12 January 2015
This is an overview page briefly explaining several aspects of Mill code execution. Those topics were mainly covered in this talk, but there are some more.
All aspects of execution on the Mill are geared at improving data flow and control behavior in the hardware.
Contents
Wide Issue
The prerequisite for almost all the Mill methods to get good performance at low energy cost is the fact that it is a very wide issue machine. There are many functional units in comparison to conventional architectures, and it all revolves around feeding them with instructions and data. Each instruction can contain up to 30 or more operations, although only the high end Mill processors have this extreme width.
How widely a specific Mill processor can issue operations is determined by the number of Slots. Each slot has its own set of operations that it supports.
Phasing
Phasing enables data flow connection, even over branch borders, within one instruction. The execution (and decode) of different kinds of operations within an instruction is tiered and chained in a phase shift over cycles.
Pipelining
All code is statically scheduled to maximize functional unit utilization on each cycle and as a corollary to have no stall or bubbles in the pipeline. This is particularly useful in loops, since the compiler has a lot to work with to unroll them and ultimately execute them in parallel to a large degree with little or no penalty in code size.
Speculation
Speculation is another aspect of statically scheduling to increase parallel execution, ILP.
Gangs
Gangs combine Slots and their inputs to form more complex operations and to facilitate not only data flow between different phases of one instruction, but to have data flow between operations within one phase of one instruction.
Multi-Branch, or the First Winner Rule
One instruction can contain several branches. Those are all executed at the same time to check for a lot of different conditions. Although, only one of them can be the correct one. While execution of branches is parallel, evaluation of branches is in a defined order: from left to right in instruction encoding order, which is issue order. The first conditional branch to be true in that order is the one taken.
The first successful conditional branch operation in an instruction, and as such consequently also the first in an EBB is taken.
This is called the First Winner Rule.
Another consequence is, there can only ever be one unconditional branch in an EBB, as the last operation in the last instruction.
Media
Presentation on Execution by Ivan Godard - Slides
Presentation on Pipelining by Ivan Godard - Slides
Presentation on Metadata and Speculation by Ivan Godard - Slides