Difference between revisions of "Execution"

From Mill Computing Wiki
Jump to: navigation, search
(create)
 
(Multi-Branch, or the First Winner Rule)
 
(7 intermediate revisions by 2 users not shown)
Line 5:Line 5:
 
== [[Decode|Wide Issue]] ==
 
== [[Decode|Wide Issue]] ==
  
The prerequisite for almost all the Mill ways to get good performance at low energy cost is it being a very wide issue machine. There are many many functional units in comparison to conventional architecutures, and it all revolves around feeding them with instructions and data. Each instruction can contain up to 30 and more operations, although only the high end Mill processors have this extreme width.
+
The prerequisite for almost all the Mill methods to get good performance at low energy cost is the fact that it is a very wide issue machine. There are many functional units in comparison to conventional architectures, and it all revolves around feeding them with instructions and data. Each instruction can contain up to 30 or more operations, although only the high end Mill processors have this extreme width.
  
How widely a specific Mill processor can issue operations is determined by the number of [[Slot]]s. They are not all created equal and each slot has its own set of operations it supports.
+
How widely a specific Mill processor can issue operations is determined by the number of [[Slot]]s. Each slot has its own set of operations that it supports.
  
 
== [[Phasing]] ==
 
== [[Phasing]] ==
Line 15:Line 15:
 
== [[Pipelining]] ==
 
== [[Pipelining]] ==
  
All code is statically scheduled to maximize functional unit utilization on each cycle and as a correlary to have no stall or bubbles in the pipeline. This is particularly useful in loops, since the compiler has a lot to work with to unroll them and ultimately execute them in parallel to a large degree with little or no penalty in code size.
+
All code is statically scheduled to maximize functional unit utilization on each cycle and as a corollary to have no stall or bubbles in the pipeline. This is particularly useful in loops, since the compiler has a lot to work with to unroll them and ultimately execute them in parallel to a large degree with little or no penalty in code size.
  
 
== [[Speculation]] ==
 
== [[Speculation]] ==
  
Is another aspect of statically scheduling to increase parallel execution, <abbr title="Instruction Level Parallelism">ILP</abbr>.
+
Speculation is another aspect of statically scheduling to increase parallel execution, <abbr title="Instruction Level Parallelism">ILP</abbr>.
  
== [[Gangs]] ==
+
== [[Ganging|Gangs]] ==
  
Gangs combine [[Slots]] and their inputs to form more complex operations and to facilitate not only data flow between different phases of one instructio, but to have data flow between operation within one phase of one instruction.
+
Gangs combine [[Slots]] and their inputs to form more complex operations and to facilitate not only data flow between different phases of one instruction, but to have data flow between operations within one phase of one instruction.
  
== Multi-Branch ==
+
== <span id="fwr">Multi-Branch, or the First Winner Rule</span> ==
  
One instruction can contain several branches. Those are all executed at the same time too to check for a lot of different conditions. Only one of them can be the correct one though. While execution of branches is parallel, evaluation of branches is in a defined order, from left to right in instruction encoding order, which is issue order. The first conditional branch to be true in that order is the one taken.
+
One instruction can contain several branches. Those are all executed at the same time to check for a lot of different conditions. Although, only one of them can be the correct one. While execution of branches is parallel, evaluation of branches is in a defined order: from left to right in instruction encoding order, which is issue order. The first conditional branch to be true in that order is the one taken.
  
 +
The first successful conditional branch operation in an instruction, and as such consequently also the first in an [[EBB]] is taken.
 +
 +
This is called the ''First Winner Rule''.
 +
 +
Another consequence is, there can only ever be one unconditional branch in an EBB, as the last operation in the last instruction.
  
 
== Media ==
 
== Media ==

Latest revision as of 07:12, 12 January 2015

This is an overview page briefly explaining several aspects of Mill code execution. Those topics were mainly covered in this talk, but there are some more.

All aspects of execution on the Mill are geared at improving data flow and control behavior in the hardware.

Wide Issue

The prerequisite for almost all the Mill methods to get good performance at low energy cost is the fact that it is a very wide issue machine. There are many functional units in comparison to conventional architectures, and it all revolves around feeding them with instructions and data. Each instruction can contain up to 30 or more operations, although only the high end Mill processors have this extreme width.

How widely a specific Mill processor can issue operations is determined by the number of Slots. Each slot has its own set of operations that it supports.

Phasing

Phasing enables data flow connection, even over branch borders, within one instruction. The execution (and decode) of different kinds of operations within an instruction is tiered and chained in a phase shift over cycles.

Pipelining

All code is statically scheduled to maximize functional unit utilization on each cycle and as a corollary to have no stall or bubbles in the pipeline. This is particularly useful in loops, since the compiler has a lot to work with to unroll them and ultimately execute them in parallel to a large degree with little or no penalty in code size.

Speculation

Speculation is another aspect of statically scheduling to increase parallel execution, ILP.

Gangs

Gangs combine Slots and their inputs to form more complex operations and to facilitate not only data flow between different phases of one instruction, but to have data flow between operations within one phase of one instruction.

Multi-Branch, or the First Winner Rule

One instruction can contain several branches. Those are all executed at the same time to check for a lot of different conditions. Although, only one of them can be the correct one. While execution of branches is parallel, evaluation of branches is in a defined order: from left to right in instruction encoding order, which is issue order. The first conditional branch to be true in that order is the one taken.

The first successful conditional branch operation in an instruction, and as such consequently also the first in an EBB is taken.

This is called the First Winner Rule.

Another consequence is, there can only ever be one unconditional branch in an EBB, as the last operation in the last instruction.

Media

Presentation on Execution by Ivan Godard - Slides
Presentation on Pipelining by Ivan Godard - Slides
Presentation on Metadata and Speculation by Ivan Godard - Slides