Decode

From Mill Computing Wiki
Revision as of 19:30, 5 November 2014 by Jan (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

The decode process turns the binary instruction streams into requests to the functional units.

Decode Chart

Slots and Pipelines and Functional Units

Each instruction is divided into blocks. Within those variable length blocks are the operations arranged in arrays. Each position in those arrays is called a slot. It corresponds directly to a hardware slot, which a dedicated decoder that can only decode the operations that can occur at this position in the instruction. The slot then sends a request to perform the decoded operation to the proper pipeline, or in some cases like the pick operation to the crossbar circuits.

A pipeline is a collection of functional units that share a common data path, the same inputs and outputs. It is in the functional units where the actual work gets done, where the data is manipulated and shoveled around by operations.

Since there are two separate instruction streams, both are decoded by their respective specialized decoders. And not only each stream has it's own decoder, within each instruction, each block has its own specialized decoder. And each slot within each block has its own specialized binary operation format. This format depends on what functional units are available on the hardware pipeline this slot feeds into. Different hardware slots provide different functionality.

According to this different functionality the slots are grouped into a FlowCore and an ExuCore, each to serve its respective instruction streams, but this is a purely conceptual distinction.

There are no discreet modules or cores on the chip. There are only different Slots and Pipelines arranged however is best.

Streams and Decoders

The decoders are distinct specialized modules though, each with their own data paths and caches and processing units to accommodate their specific work loads. The general format of the instruction streams is described under Encoding. Here we go into the details and see how those streams and instruction formats are different.

Morsels

A morsel is the basic unit to encode values within the instruction stream. They take as many bits as are needed to address all belt locations on a specific core, i.e. 3-5 bits.
In most instructions taking immediate values those immediates also are morsel-sized.

Exu Stream

Exu-instruction.png

The instruction headers in the exu stream of course contain a shift count to the next instruction. And then they also contain the count of the encoded slots in each block. Well, not of the exu block or block 3, since that can be simply inferred from the other slot counts.
The slots in each block have their own format.

Reader Slots

Exu reader operations have one hardcoded parameter source selector, and this source selector actually defines the whole operation. Which means you don't even need an opcode. Reader operations are encoded just as a sequential id to identify all the available reader sources, and any unused id values in the bit width are just filled up with Popular Constants. There is no internal structure in the operation and no wasted entropy.

Exu Slots

Exu slots encode all the operations with 2 operands. Those are structured. They have an opcode, the size of which is dependent on the operation population in that slot. It might be different for every slot in the exu block. And then there are 2 morsels for arguments. Each of which might be a belt location, or an immediate value. Although usually, if there is an immediate value it is the 2nd argument. And often operations with an immediate argument get shorter opcodes to gain more bits for the immediate value.
There are also the operations that take implicit arguments from neighboring slots or from condition codes from neighboring slots. In those cases there is of course an opcode prefix and the bits used for the arguments extend the opcode.

Pick Slots

There are only two pick slot and pick phase operations, consequently there is only one bit of opcode. There are 3 belt operands, so 3 morsels are needed. But there is an even shorter encoding of only two arguments for the picks that produce Nones in one path.

Writer Slots

And again, like in the reader slot, there is not really a need for opcodes here, since the destination fully describes the operation. It needs an additional morsel for the belt operand though.

Flow Stream

Flow-instruction.png

The operations in the flow stream have completely different requirements. It's not many operations with few small arguments, it's few operations with potentially many and large arguments. So there is only really one logical flow block that encodes all the flow operations by combining 3 physical instruction blocks.
There is of course the normal shift count for the size of the whole instruction. But there is only one operation count.

Heads

The operation heads are the most complex part of the flow operation encoding. There is one head for each available flow slot on the core. And each head contains:

  • an opcode, the size of which depends on the slot population
  • 2 bits of extension count
  • 2 bits of manifest size
  • 1 bit of manifest complement

Extensions

Each operation head has 0-3 morsel sized extensions. Those extensions can serve as extended op codes, as belt operands, as register selectors, as small immediate value or whatever else the operation needs.

Manifests

Manifests are of 0, 1, 2 or 4 bytes in size. How they are interpreted depends on the operation. They can be addresses, constants, operand lists. They can even be combined with the extension bits to form larger bit patterns.
A manifest value of 0 takes no additional size, since it is just a zero sized constant.
If the complement bit is set in the head, the bitpattern is inverted to form the manifest value. i.e. a zero length manifest with the complement bit set becomes a -1 value. A 1 becomes 0xFFFFFFFE. This results in a very compact encoding for most commonly used 32bit address offset bitpatterns.

Skinny Blocks

The slot counts in the instruction heads always have a few values or value combinations that are no valid slot counts. This would be all wasted entropy, if it wasn't for the skinny block mechanism. Operations that take no or only implicit arguments can be encoded in those unused value combinations without taking any additional space. The best examples of such operations would be NOPs or returns with no return value.

See Also

Encoding, Phasing, Slot, Instruction Set