Metadata
All operands on the belt, additionally to the actual byte pattern that makes up the value, carry around a few bits of metadata that inform and augment how the vast majority of operations on those operands work.
This metadata is not restricted to the Belt, but is preserved in the Scratchpad and carries through the result registers in the Slots and so forth.
While the store operation looks into the metadata to know how much to store, it strips the value of it. No metadata exists in the caches and in memory. Loads, on the other hand, initialize the metadata together with the value, since load operations have the basic metadata type tags hard-coded into them.
Contents
The Metadata Fields
Scalarity
Any operand can be a SIMD vector, or slice for short, usually with 4 elements. Although the SIMD element count depends on the Specification of the processor. It can be more. The width of these elements can be any of the available widths of scalars on the processor.
Most operations are polymorphic over the Width and Scalarity tags, i.e. the same opcode performs 8bit to 128bit integer arithmetic for scalars and vectors depending on the metadata of the input operands.
Narrow and widen work for slices too, although widen produces 2 outputs with doubled width elements to avoid overflows for the maximum widths.
Width
Every operand value is tagged with its byte width, i.e. 1, 2, 4, 8, 16. The width doesn't say anything about the interpretation of the bits. An 8 byte value can serve as input for signed and unsigned integer arithmetic, for double float operations and for pointer arithmetic. How successfully those operations work out depends on the specific bit pattern and the operation semantics. NaN bit patterns can cause Faults with IEEE 754 operations and work perfectly fine as integers.
There are narrow and widen instructions to change the width of an operand.
None and NaR
Every operand, and every element in a SIMD slice, has a bit that determines whether a value is valid or not. When the actual value content of the operand is zero, this is a None value, an invalid operand that just gets ignored and/or propagated by any operation performed on it. In fact whenever a new belt is created for a new frame, this is the value all belt positions are initialized to by default.
When the operand value is something else from zero it is a NaR value and also an invalid value that gets propagated by any operation performed on it. But some operations raise a fault when they encounter a NaR.
Nones and NaRs come in very handy for Speculation and Debugging. More on that there.
IEEE 754 Floating Point Flags
The overflow and underflow and rounding behavior of floating point operations is captured in a number of flags. On conventional processors those tend to be global state flags. On the Mill that wouldn't work because global state flags introduce unnecessary data dependencies and prevent speculation. For this reason, every operand carries its own complete set floating point state flags:
- divide by zero
- inexact
- invalid
- underflow
- overflow
As all metadata those bits are propagated through the functional units and all internal data flow whenever they occur. When they are set in operands, they are ored together in results and propagated further with the results. Only on realization, like stores, they are written into global state and trigger any of the possible Interrupts.
Rationale
Overall the effect of metadata can be described as making everything smoother and more regular, and as such easier to reason about. It curbs bloat, meaning unnecessary complexity.
The width and scalarity metadata tags massively reduce the bloat in the instruction set and make it not only denser and more effective, but also a lot more regular and logical. It even helps code reuse by introducing a form of polymorphism on the binary level.
The NaR bit in its two interpretations as an ignored None and an eventual fault eliminates the need for a lot of special and corner case code and opens up untapped reservoirs of instruction level parallelism in doing so. It makes speculative execution simple and straightforward and much more applicable. It enables efficient software Pipelining of most loops. The most limiting factor of ILP tends to be control flow, and both speculation and software pipelining together with Phasing vastly expand the windows for looking for such opportunities, across control flow borders.
Those are the main reasons for expending those few extra bits in the core, the huge payback in reduced complexity in a lot of other crucial areas. There are quite a few auxiliary benefits too, as in Debugging.