load

From Mill Computing Wiki
Revision as of 17:43, 4 February 2015 by Generator (Talk | contribs)

Jump to: navigation, search
realizing  flow stream  flow block  compute phase   operation   in the logical value domain  

native on: all

Schedule data load from memory address into belt.

The load operation is a central piece of the Mill architecture. Due to being a statically scheduled instruction set, all operations have a known constant latency to do the scheduling with. This approach is not possible for loads, since loads, by their very nature, depend on the memory hierarchy, and take different times depending on from where in the cache hierarchy the value had to be pulled.
When a value is pulled all the way from memory scheduling doesn't matter, since the machine stalls anyway, but being able to make the cache delays predictable, would remove the last obstacle to make all operations statically schedulable.

This is achieved by adding an additional parameter to all load operations: the delay that specifies the latency and defines when the load is retiring and dropping the acquired value on the belt. This way loads may have no constant, but still a known defined latency for each invocation.

A second option for scheduling loads is using tags. This is useful when you might wanna use the same loaded value in two different branches, but you need it in two different cycles in the two branches, or you may not need it at all in one branch. Instead of just dropping the value on the belt after the delay you have to explicitly retrieve or refuse the value from the Retire Station with the pickup or refuse operation.
This enables speculative loads.

Loads are aliasing-safe. This means the value load returns is the value that is at the address at the time load retires, not at the time load is issues. Any stores to the address of an in flight load are tracked and reflected in the result.

There are several different addressing modes for loads. The general formula for computing addresses is base+offset+(scale*index).
Base can come from a number of special Registers or the belt. Offset is always an inline constant. Those two are always present, although a zero offset doesn't take any space at all.
Scale and index are optional and alway appear together. The scale is a compile time constant, the index is always from the belt.

Another compile time parameter is the width and scalarity of the loaded value. The minimum is one byte, but if the Core allows it, it is very possible to directly load a vector of eight 32bit elements.

related operations: store, pickup, refuse, loadf, loadd


load(base b, off o, s i, scale s, width w) → op r0

operands: like IdentityNoSIMD xx:x


Core In Slots Latencies
Tin F0 3
Copper F0 F1 3
Silver F0 F1 F2 F3 3
Gold F0 F1 F2 F3 F4 F5 F6 F7 3
Decimal8 F0 F1 F2 F3 3
Decimal16 F0 F1 F2 F3 3

load(base b, off o, s i, scale s, width w, lit delay) → op r0

operands: like IdentityNoSIMD xx:x


Core In Slots Latencies
Tin F0 3
Copper F0 F1 3
Silver F0 F1 F2 F3 3
Gold F0 F1 F2 F3 F4 F5 F6 F7 3
Decimal8 F0 F1 F2 F3 3
Decimal16 F0 F1 F2 F3 3

load(base b, off o, s i, scale s, width w, tag tag) → op r0

operands: like IdentityNoSIMD xx:x


Core In Slots Latencies
Tin F0 3
Copper F0 F1 3
Silver F0 F1 F2 F3 3
Gold F0 F1 F2 F3 F4 F5 F6 F7 3
Decimal8 F0 F1 F2 F3 3
Decimal16 F0 F1 F2 F3 3

load(base b, off o, width w) → op r0

operands: like IdentityNoSIMD xx:x


Core In Slots Latencies
Tin F0 3
Copper F0 F1 3
Silver F0 F1 F2 F3 3
Gold F0 F1 F2 F3 F4 F5 F6 F7 3
Decimal8 F0 F1 F2 F3 3
Decimal16 F0 F1 F2 F3 3

load(base b, off o, width w, lit delay) → op r0

operands: like IdentityNoSIMD xx:x


Core In Slots Latencies
Tin F0 3
Copper F0 F1 3
Silver F0 F1 F2 F3 3
Gold F0 F1 F2 F3 F4 F5 F6 F7 3
Decimal8 F0 F1 F2 F3 3
Decimal16 F0 F1 F2 F3 3

load(base b, off o, width w, tag tag) → op r0

operands: like IdentityNoSIMD xx:x


Core In Slots Latencies
Tin F0 3
Copper F0 F1 3
Silver F0 F1 F2 F3 3
Gold F0 F1 F2 F3 F4 F5 F6 F7 3
Decimal8 F0 F1 F2 F3 3
Decimal16 F0 F1 F2 F3 3

load(p b, off o, s i, scale s, width w) → op r0

operands: like IdentityNoSIMD xx:x


Core In Slots Latencies
Tin F0 3
Copper F0 F1 3
Silver F0 F1 F2 F3 3
Gold F0 F1 F2 F3 F4 F5 F6 F7 3
Decimal8 F0 F1 F2 F3 3
Decimal16 F0 F1 F2 F3 3

load(p b, off o, s i, scale s, width w, lit delay) → op r0

operands: like IdentityNoSIMD xx:x


Core In Slots Latencies
Tin F0 3
Copper F0 F1 3
Silver F0 F1 F2 F3 3
Gold F0 F1 F2 F3 F4 F5 F6 F7 3
Decimal8 F0 F1 F2 F3 3
Decimal16 F0 F1 F2 F3 3

load(p b, off o, s i, scale s, width w, tag tag) → op r0

operands: like IdentityNoSIMD xx:x


Core In Slots Latencies
Tin F0 3
Copper F0 F1 3
Silver F0 F1 F2 F3 3
Gold F0 F1 F2 F3 F4 F5 F6 F7 3
Decimal8 F0 F1 F2 F3 3
Decimal16 F0 F1 F2 F3 3

load(p b, off o, width w) → op r0

operands: like IdentityNoSIMD xx:x


Core In Slots Latencies
Tin F0 3
Copper F0 F1 3
Silver F0 F1 F2 F3 3
Gold F0 F1 F2 F3 F4 F5 F6 F7 3
Decimal8 F0 F1 F2 F3 3
Decimal16 F0 F1 F2 F3 3

load(p b, off o, width w, lit delay) → op r0

operands: like IdentityNoSIMD xx:x


Core In Slots Latencies
Tin F0 3
Copper F0 F1 3
Silver F0 F1 F2 F3 3
Gold F0 F1 F2 F3 F4 F5 F6 F7 3
Decimal8 F0 F1 F2 F3 3
Decimal16 F0 F1 F2 F3 3

load(p b, off o, width w, tag tag) → op r0

operands: like IdentityNoSIMD xx:x


Core In Slots Latencies
Tin F0 3
Copper F0 F1 3
Silver F0 F1 F2 F3 3
Gold F0 F1 F2 F3 F4 F5 F6 F7 3
Decimal8 F0 F1 F2 F3 3
Decimal16 F0 F1 F2 F3 3

load(p b, width w, memAttr m) → op r0

operands: like IdentityNoSIMD xx:x


Core In Slots Latencies
Tin F0 3
Copper F0 F1 3
Silver F0 F1 F2 F3 3
Gold F0 F1 F2 F3 F4 F5 F6 F7 3
Decimal8 F0 F1 F2 F3 3
Decimal16 F0 F1 F2 F3 3

load(p b, width w, memAttr m, lit delay) → op r0

operands: like IdentityNoSIMD xx:x


Core In Slots Latencies
Tin F0 3
Copper F0 F1 3
Silver F0 F1 F2 F3 3
Gold F0 F1 F2 F3 F4 F5 F6 F7 3
Decimal8 F0 F1 F2 F3 3
Decimal16 F0 F1 F2 F3 3

load(p b, width w, memAttr m, tag tag) → op r0

operands: like IdentityNoSIMD xx:x


Core In Slots Latencies
Tin F0 3
Copper F0 F1 3
Silver F0 F1 F2 F3 3
Gold F0 F1 F2 F3 F4 F5 F6 F7 3
Decimal8 F0 F1 F2 F3 3
Decimal16 F0 F1 F2 F3 3


Instruction Set, alphabetical, Instruction Set by Category, Instruction Set, sortable, filterable