load

speculable flow stream flow block writer phase operation in the logical value domain

native on: all

Schedule data load from memory address into belt.

The load operation is a central piece of the Mill architecture. Due to being a statically scheduled instruction set, all operations have a known constant latency to do the scheduling with. This approach is not possible for loads, since loads, by their very nature, depend on the memory hierarchy, and take different times depending on from where in the cache hierarchy the value had to be pulled.
When a value is pulled all the way from memory scheduling doesn't matter, since the machine stalls anyway, but being able to make the cache delays predictable, would remove the last obstacle to make all operations statically schedulable.

This is achieved by adding an additional parameter to all load operations: the delay that specifies the latency and defines when the load is retiring and dropping the acquired value on the belt. This way loads may have no constant, but still a known defined latency for each invocation.

A second option for scheduling loads is using tags. This is useful when you might wanna use the same loaded value in two different branches, but you need it in two different cycles in the two branches, or you may not need it at all in one branch. Instead of just dropping the value on the belt after the delay you have to explicitly retrieve or refuse the value from the Retire Station with the pickup or refuse operation.
This enables speculative loads.

Loads are aliasing-safe. This means the value load returns is the value that is at the address at the time load retires, not at the time load is issues. Any stores to the address of an in flight load are tracked and reflected in the result.

There are several different addressing modes for loads. The general formula for computing addresses is base+offset+(scale*index).
Base can come from a number of special Registers or the belt. Offset is always an inline constant. Those two are always present, although a zero offset doesn't take any space at all.
Scale and index are optional and alway appear together. The scale is a compile time constant, the index is always from the belt.

Another compile time parameter is the width and scalarity of the loaded value. The minimum is one byte, but if the Core allows it, it is very possible to directly load a vector of eight 32bit elements.

related operations: store, pickup, refuse, loadf, loadd

load(base base0, off off0, width width0, )

operands: like Store px:

Core	In Slots	Latencies
Tin	F0	3
Copper	F0	3
Silver	F0 F1 F2	3
Gold	F0	3

load(base b, off o, width w, tag tag)

operands: like Store px:

Core	In Slots	Latencies
Tin	F0	3
Copper	F0	3
Silver	F0 F1 F2	3
Gold	F0	3

load(op op0, off off0, width width0, )

operands: like Store px:

Core	In Slots	Latencies
Tin	F0	3
Copper	F0	3
Silver	F0 F1 F2	3
Gold	F0	3

load(p b, off o, width w, tag tag)

operands: like Store px:

Core	In Slots	Latencies
Tin	F0	3
Copper	F0	3
Silver	F0 F1 F2	3
Gold	F0	3

load(op op0, width memAttr0, memAttr off0, )

operands: like Store px:

Core	In Slots	Latencies
Tin	F0	3
Copper	F0	3
Silver	F0 F1 F2	3
Gold	F0	3

load(p b, width w, memAttr m, tag tag)

operands: like Store px:

Core	In Slots	Latencies
Tin	F0	3
Copper	F0	3
Silver	F0 F1 F2	3
Gold	F0	3

Instruction Set, alphabetical, Instruction Set by Category, Instruction Set, sortable, filterable

load

Navigation menu

Personal tools

Namespaces

Variants

Views

Actions

Search

Navigation

Tools