Difference between revisions of "Speculation"

From Mill Computing Wiki
Jump to: navigation, search
(Speculable and Realizing Operations)
 
(12 intermediate revisions by the same user not shown)
Line 1:Line 1:
 +
Speculation preemptively does computing work you are not really sure you will need, so that by the time you are sure it is already done.
 +
 +
== [[Metadata#None_and_NaR|None and NaR]] ==
 +
 +
The problem is, often the work you try to do prematurely can clobber all the actual work you are doing, and can get in the way whenever there is any shared state. So the more you avoid shared state, the more you can do in parallel without getting in each others way.
 +
 +
The Mill already does a lot in this regard by having <abbr title="Static Single Assignment">SSA</abbr> semantics on the [[Belt]]. This works great for proper data values. Conventional architectures tend to have error and condition codes as global shared state though. [[Metadata]] to the rescue. In particular None and <abbr title="Not a Result">NaR</abbr> and the floating point status flags.
 +
 +
== Speculable and Realizing Operations ==
 +
 +
By far the most of the operations in the Mill instruction set can be speculated. What this means is, if an operand to the operation is None or NaR, all the operation does is to make the result None or NaR, or combine the [[Metadata#IEEE_754_Floating_Point_Flags|status flags]] in the case of floating point operations in the result. This is even true for the [[Instruction_Set/load|load]] operation.
 +
 +
Only when values are put into shared system state it becomes relevant whether the values are valid or not. This is when those values become realized. There are only comparatively few operations that realize values, in particlar [[Instruction_Set/store|load]], [[Instruction_Set/store|store]] and branches. They all are in the [[Phasing#Writer_Phase|writer phase]], or rather, in the phases following the [[Phasing#Compute_Phase|compute phase]].
 +
 +
When a realizing operation encounters a None, it does nothing.<br />
 +
A load from a None address produces a None.<br />
 +
A store with a None value or address doesn't write anything.<br />
 +
A branch or call to an address that is a None doesn't happen.
 +
 +
When a realizing operation encounters a NaR, it faults, and eventually a fault handler matching the NaR is called.<br />
 +
The exception is a load, which from a NaR address just produces the NaR, like a speculable operation. The main reason load isn't speculable, despite loads being potentially speculative, is that it can cause stalls and the like, which needs to be accounted for.<br />
 +
A store with a NaR in any value or address operand doesn't write anything, but raises the appropriate [[Fault]].<br />
 +
Same for branches or calls to a NaR address.
 +
 +
== [[Instruction_Set/pick|Pick]] ==
 +
 +
The pick operation is a special beast. It has the semantics of the C ?: operator and it has zero latency and 3 operands. All this is only really possible, and cheaply possible, because it doesn't actually need a functional unit. It is implemented in the renaming of [[Belt]] locations at the cycle boundary. And it is speculable, in contrast to true branches. With those attributes it can replace a lot of conditional branches, and tends to be the operation that picks which of all the speculatively computed values are passed on to be realized.
  
 
== Rationale ==
 
== Rationale ==
  
Speculation is one of the few areas where the Mill favours higher energy consumption, because the performance gains are so great. Unneeded computation still costs energy, but since the Mill architecure is very wide issue and has a lot of functional units, it can exploit a lot of instrucion level parallelism. It saves time. In general purpose code the problem usually is to find that much <abbr title="Instruction Level Parallelism">ILP</abbr>, because there are so many branches. Most branches only exist to avoid unwanted side effects under certain circumstances.
+
Speculation is one of the few areas where the Mill favours higher energy consumption, because the performance gains are so great. Unneeded computation still costs energy, but since the Mill architecure is very wide issue and has a lot of functional units, it can exploit a lot of instrucion level parallelism. It saves time. And actually on modern fab processes with high leakage transistors, an idle circuit doesn't use that much less energy than a busy one. So computing several branches in parallel, when you have the width, really saves energy in comparison to doing it in sequence with lots of idle units.
 +
 
 +
In general purpose code the problem usually is to find that much <abbr title="Instruction Level Parallelism">ILP</abbr>, because there are so many branches. Most branches only exist to avoid unwanted side effects under certain circumstances.
  
 
The Mill has devised a few ways to avoid the unwanted side effects, which means far fewer of the branches in a program are hard barriers to <abbr title="Instruction Level Parallelism">ILP</abbr>. [[Phasing]] is one of the ways. Software [[Pipelining]] of loops also makes extensive use of the <abbr title="Not a Result">NaR</abbr> and the None [[Metadata]] tags for this purpose.
 
The Mill has devised a few ways to avoid the unwanted side effects, which means far fewer of the branches in a program are hard barriers to <abbr title="Instruction Level Parallelism">ILP</abbr>. [[Phasing]] is one of the ways. Software [[Pipelining]] of loops also makes extensive use of the <abbr title="Not a Result">NaR</abbr> and the None [[Metadata]] tags for this purpose.
  
Speculation increases <abbr title="Instruction Level Parallelism">ILP</abbr> across branch boundaries independently of loops. <abbr title="Not a Result">NaR</abbr> and None and [[Instruction_Set/Pick|pick]] enable if-conversion on a massive scale on the Mill, removing branches altogether from the code by utilizing (meta)data flow instead of control flow. And even without [[Instruction_Set/Pick|pick]] but with condition-coded parallel branches the ILP is massively increased.
+
Speculation increases <abbr title="Instruction Level Parallelism">ILP</abbr> across branch boundaries independently of loops. <abbr title="Not a Result">NaR</abbr> and None and [[Instruction_Set/pick|pick]] enable if-conversion on a massive scale on the Mill, removing branches altogether from the code by utilizing (meta)data flow instead of control flow. And even without [[Instruction_Set/pick|pick]], but with [[Execution#Multi-Branch|parallel branches]] or with [[Condition Code|condition codes]] the ILP is greatly increased here, too.
 +
 
 +
=== Speculation vs. [[Prediction]] ===
 +
 
 +
Some might ask what the difference is between speculation and [[Prediction|prediction]]. Those concepts are only superficially connected. While both try to avoid stalls due to branches in the execution pipelines, both go about it very differently.
 +
 
 +
Speculation eliminates branches by going down all paths of execution and choosing the right result after the fact. This generally only works for relatively short and small differences in the paths and somewhat similar code, but can cover many paths all at once on wide issue machines and avoids all latencies.
 +
 
 +
Prediction chooses one path, and tries to become as good as possible at choosing the correct one. There is no excess computation work done here. But a wrong guess means a penalty of idleness for several cycles, usually 5 cycles on the Mill. With a really wrong guess this can become a full memory latency penalty in rare cases. This works for long paths and very different code down the different paths too.
  
 
== Media ==
 
== Media ==
 
[http://www.youtube.com/watch?v=DZ8HN9Cnjhc Presentation on Metadata and Speculation by Ivan Godard] - [http://millcomputing.com/blog/wp-content/uploads/2013/12/metadata.021.pptx Slides]
 
[http://www.youtube.com/watch?v=DZ8HN9Cnjhc Presentation on Metadata and Speculation by Ivan Godard] - [http://millcomputing.com/blog/wp-content/uploads/2013/12/metadata.021.pptx Slides]

Latest revision as of 02:49, 31 December 2014

Speculation preemptively does computing work you are not really sure you will need, so that by the time you are sure it is already done.

None and NaR

The problem is, often the work you try to do prematurely can clobber all the actual work you are doing, and can get in the way whenever there is any shared state. So the more you avoid shared state, the more you can do in parallel without getting in each others way.

The Mill already does a lot in this regard by having SSA semantics on the Belt. This works great for proper data values. Conventional architectures tend to have error and condition codes as global shared state though. Metadata to the rescue. In particular None and NaR and the floating point status flags.

Speculable and Realizing Operations

By far the most of the operations in the Mill instruction set can be speculated. What this means is, if an operand to the operation is None or NaR, all the operation does is to make the result None or NaR, or combine the status flags in the case of floating point operations in the result. This is even true for the load operation.

Only when values are put into shared system state it becomes relevant whether the values are valid or not. This is when those values become realized. There are only comparatively few operations that realize values, in particlar load, store and branches. They all are in the writer phase, or rather, in the phases following the compute phase.

When a realizing operation encounters a None, it does nothing.
A load from a None address produces a None.
A store with a None value or address doesn't write anything.
A branch or call to an address that is a None doesn't happen.

When a realizing operation encounters a NaR, it faults, and eventually a fault handler matching the NaR is called.
The exception is a load, which from a NaR address just produces the NaR, like a speculable operation. The main reason load isn't speculable, despite loads being potentially speculative, is that it can cause stalls and the like, which needs to be accounted for.
A store with a NaR in any value or address operand doesn't write anything, but raises the appropriate Fault.
Same for branches or calls to a NaR address.

Pick

The pick operation is a special beast. It has the semantics of the C ?: operator and it has zero latency and 3 operands. All this is only really possible, and cheaply possible, because it doesn't actually need a functional unit. It is implemented in the renaming of Belt locations at the cycle boundary. And it is speculable, in contrast to true branches. With those attributes it can replace a lot of conditional branches, and tends to be the operation that picks which of all the speculatively computed values are passed on to be realized.

Rationale

Speculation is one of the few areas where the Mill favours higher energy consumption, because the performance gains are so great. Unneeded computation still costs energy, but since the Mill architecure is very wide issue and has a lot of functional units, it can exploit a lot of instrucion level parallelism. It saves time. And actually on modern fab processes with high leakage transistors, an idle circuit doesn't use that much less energy than a busy one. So computing several branches in parallel, when you have the width, really saves energy in comparison to doing it in sequence with lots of idle units.

In general purpose code the problem usually is to find that much ILP, because there are so many branches. Most branches only exist to avoid unwanted side effects under certain circumstances.

The Mill has devised a few ways to avoid the unwanted side effects, which means far fewer of the branches in a program are hard barriers to ILP. Phasing is one of the ways. Software Pipelining of loops also makes extensive use of the NaR and the None Metadata tags for this purpose.

Speculation increases ILP across branch boundaries independently of loops. NaR and None and pick enable if-conversion on a massive scale on the Mill, removing branches altogether from the code by utilizing (meta)data flow instead of control flow. And even without pick, but with parallel branches or with condition codes the ILP is greatly increased here, too.

Speculation vs. Prediction

Some might ask what the difference is between speculation and prediction. Those concepts are only superficially connected. While both try to avoid stalls due to branches in the execution pipelines, both go about it very differently.

Speculation eliminates branches by going down all paths of execution and choosing the right result after the fact. This generally only works for relatively short and small differences in the paths and somewhat similar code, but can cover many paths all at once on wide issue machines and avoids all latencies.

Prediction chooses one path, and tries to become as good as possible at choosing the correct one. There is no excess computation work done here. But a wrong guess means a penalty of idleness for several cycles, usually 5 cycles on the Mill. With a really wrong guess this can become a full memory latency penalty in rare cases. This works for long paths and very different code down the different paths too.

Media

Presentation on Metadata and Speculation by Ivan Godard - Slides