Difference between revisions of "Instruction Set/load"
Line 18: | Line 18: | ||
There are several different addressing modes for loads. The general formula for computing addresses is | There are several different addressing modes for loads. The general formula for computing addresses is | ||
<code>base+offset+(scale*index)</code>.<br /> | <code>base+offset+(scale*index)</code>.<br /> | ||
− | Base can come from a number of special [[ | + | Base can come from a number of special [[Registers]] or the belt. Offset is always an inline constant. Those two are always present, although a zero offset doesn't take any space at all.<br /> |
Scale and index are optional and alway appear together. The scale is a compile time constant, the index is always from the belt. | Scale and index are optional and alway appear together. The scale is a compile time constant, the index is always from the belt. | ||
Revision as of 17:43, 4 February 2015
Schedule data load from memory address into belt.
The load operation is a central piece of the Mill architecture. Due to being a statically scheduled instruction set, all operations have a known constant latency to do the scheduling with. This approach is not possible for loads, since loads, by their very nature, depend on the memory hierarchy, and take different times depending on from where in the cache hierarchy the value had to be pulled.
When a value is pulled all the way from memory scheduling doesn't matter, since the machine stalls anyway, but being able to make the cache delays predictable, would remove the last obstacle to make all operations statically schedulable.
This is achieved by adding an additional parameter to all load operations: the delay that specifies the latency and defines when the load is retiring and dropping the acquired value on the belt. This way loads may have no constant, but still a known defined latency for each invocation.
A second option for scheduling loads is using tags. This is useful when you might wanna use the same loaded value in two different branches, but you need it in two different cycles in the two branches, or you may not need it at all in one branch. Instead of just dropping the value on the belt after the delay you have to explicitly retrieve or refuse the value from the Retire Station with the pickup or refuse operation.
This enables speculative loads.
Loads are aliasing-safe. This means the value load returns is the value that is at the address at the time load retires, not at the time load is issues. Any stores to the address of an in flight load are tracked and reflected in the result.
There are several different addressing modes for loads. The general formula for computing addresses is
base+offset+(scale*index)
.
Base can come from a number of special Registers or the belt. Offset is always an inline constant. Those two are always present, although a zero offset doesn't take any space at all.
Scale and index are optional and alway appear together. The scale is a compile time constant, the index is always from the belt.
Another compile time parameter is the width and scalarity of the loaded value. The minimum is one byte, but if the Core allows it, it is very possible to directly load a vector of eight 32bit elements.
related operations: store, pickup, refuse, loadf, loadd
load(base b, off o, s i, scale s, width w) → op r0
Core | In Slots | Latencies |
---|---|---|
Tin | F0 | 3 |
Copper | F0 F1 | 3 |
Silver | F0 F1 F2 F3 | 3 |
Gold | F0 F1 F2 F3 F4 F5 F6 F7 | 3 |
Decimal8 | F0 F1 F2 F3 | 3 |
Decimal16 | F0 F1 F2 F3 | 3 |
load(base b, off o, s i, scale s, width w, lit delay) → op r0
Core | In Slots | Latencies |
---|---|---|
Tin | F0 | 3 |
Copper | F0 F1 | 3 |
Silver | F0 F1 F2 F3 | 3 |
Gold | F0 F1 F2 F3 F4 F5 F6 F7 | 3 |
Decimal8 | F0 F1 F2 F3 | 3 |
Decimal16 | F0 F1 F2 F3 | 3 |
load(base b, off o, s i, scale s, width w, tag tag) → op r0
Core | In Slots | Latencies |
---|---|---|
Tin | F0 | 3 |
Copper | F0 F1 | 3 |
Silver | F0 F1 F2 F3 | 3 |
Gold | F0 F1 F2 F3 F4 F5 F6 F7 | 3 |
Decimal8 | F0 F1 F2 F3 | 3 |
Decimal16 | F0 F1 F2 F3 | 3 |
load(base b, off o, width w) → op r0
Core | In Slots | Latencies |
---|---|---|
Tin | F0 | 3 |
Copper | F0 F1 | 3 |
Silver | F0 F1 F2 F3 | 3 |
Gold | F0 F1 F2 F3 F4 F5 F6 F7 | 3 |
Decimal8 | F0 F1 F2 F3 | 3 |
Decimal16 | F0 F1 F2 F3 | 3 |
load(base b, off o, width w, lit delay) → op r0
Core | In Slots | Latencies |
---|---|---|
Tin | F0 | 3 |
Copper | F0 F1 | 3 |
Silver | F0 F1 F2 F3 | 3 |
Gold | F0 F1 F2 F3 F4 F5 F6 F7 | 3 |
Decimal8 | F0 F1 F2 F3 | 3 |
Decimal16 | F0 F1 F2 F3 | 3 |
load(base b, off o, width w, tag tag) → op r0
Core | In Slots | Latencies |
---|---|---|
Tin | F0 | 3 |
Copper | F0 F1 | 3 |
Silver | F0 F1 F2 F3 | 3 |
Gold | F0 F1 F2 F3 F4 F5 F6 F7 | 3 |
Decimal8 | F0 F1 F2 F3 | 3 |
Decimal16 | F0 F1 F2 F3 | 3 |
load(p b, off o, s i, scale s, width w) → op r0
Core | In Slots | Latencies |
---|---|---|
Tin | F0 | 3 |
Copper | F0 F1 | 3 |
Silver | F0 F1 F2 F3 | 3 |
Gold | F0 F1 F2 F3 F4 F5 F6 F7 | 3 |
Decimal8 | F0 F1 F2 F3 | 3 |
Decimal16 | F0 F1 F2 F3 | 3 |
load(p b, off o, s i, scale s, width w, lit delay) → op r0
Core | In Slots | Latencies |
---|---|---|
Tin | F0 | 3 |
Copper | F0 F1 | 3 |
Silver | F0 F1 F2 F3 | 3 |
Gold | F0 F1 F2 F3 F4 F5 F6 F7 | 3 |
Decimal8 | F0 F1 F2 F3 | 3 |
Decimal16 | F0 F1 F2 F3 | 3 |
load(p b, off o, s i, scale s, width w, tag tag) → op r0
Core | In Slots | Latencies |
---|---|---|
Tin | F0 | 3 |
Copper | F0 F1 | 3 |
Silver | F0 F1 F2 F3 | 3 |
Gold | F0 F1 F2 F3 F4 F5 F6 F7 | 3 |
Decimal8 | F0 F1 F2 F3 | 3 |
Decimal16 | F0 F1 F2 F3 | 3 |
load(p b, off o, width w) → op r0
Core | In Slots | Latencies |
---|---|---|
Tin | F0 | 3 |
Copper | F0 F1 | 3 |
Silver | F0 F1 F2 F3 | 3 |
Gold | F0 F1 F2 F3 F4 F5 F6 F7 | 3 |
Decimal8 | F0 F1 F2 F3 | 3 |
Decimal16 | F0 F1 F2 F3 | 3 |
load(p b, off o, width w, lit delay) → op r0
Core | In Slots | Latencies |
---|---|---|
Tin | F0 | 3 |
Copper | F0 F1 | 3 |
Silver | F0 F1 F2 F3 | 3 |
Gold | F0 F1 F2 F3 F4 F5 F6 F7 | 3 |
Decimal8 | F0 F1 F2 F3 | 3 |
Decimal16 | F0 F1 F2 F3 | 3 |
load(p b, off o, width w, tag tag) → op r0
Core | In Slots | Latencies |
---|---|---|
Tin | F0 | 3 |
Copper | F0 F1 | 3 |
Silver | F0 F1 F2 F3 | 3 |
Gold | F0 F1 F2 F3 F4 F5 F6 F7 | 3 |
Decimal8 | F0 F1 F2 F3 | 3 |
Decimal16 | F0 F1 F2 F3 | 3 |
load(p b, width w, memAttr m) → op r0
Core | In Slots | Latencies |
---|---|---|
Tin | F0 | 3 |
Copper | F0 F1 | 3 |
Silver | F0 F1 F2 F3 | 3 |
Gold | F0 F1 F2 F3 F4 F5 F6 F7 | 3 |
Decimal8 | F0 F1 F2 F3 | 3 |
Decimal16 | F0 F1 F2 F3 | 3 |
load(p b, width w, memAttr m, lit delay) → op r0
Core | In Slots | Latencies |
---|---|---|
Tin | F0 | 3 |
Copper | F0 F1 | 3 |
Silver | F0 F1 F2 F3 | 3 |
Gold | F0 F1 F2 F3 F4 F5 F6 F7 | 3 |
Decimal8 | F0 F1 F2 F3 | 3 |
Decimal16 | F0 F1 F2 F3 | 3 |
load(p b, width w, memAttr m, tag tag) → op r0
Core | In Slots | Latencies |
---|---|---|
Tin | F0 | 3 |
Copper | F0 F1 | 3 |
Silver | F0 F1 F2 F3 | 3 |
Gold | F0 F1 F2 F3 F4 F5 F6 F7 | 3 |
Decimal8 | F0 F1 F2 F3 | 3 |
Decimal16 | F0 F1 F2 F3 | 3 |
Instruction Set, alphabetical, Instruction Set by Category, Instruction Set, sortable, filterable