Difference between revisions of "Decode"
(→Exu Stream) | |||
Line 25: | Line 25: | ||
=== Exu Stream === | === Exu Stream === | ||
+ | [[File:Exu-instruction.png]] | ||
+ | |||
+ | The instruction headers in the exu stream of course contain a shift count to the next instruction. And then they also contain the count of the encoded slots in each block. Well, not of the exu block or block 3, since that can be simply inferred from the other slot counts.<br /> | ||
+ | The slots in each block have their own format. | ||
+ | |||
+ | ==== Reader Slots ==== | ||
+ | |||
+ | Exu reader operations have one hardcoded parameter source selector, and this source selector actually defines the whole operation. Which means you don't even need an opcode. Reader operations are encoded just as a sequential id to identify all the available reader sources, and any unused id values in the bit width are just filled up with [[Popular Constants]]. There is no internal structure in the operation and no wasted entropy. | ||
+ | |||
+ | ==== Exu Slots ==== | ||
+ | |||
+ | Exu slots encode all the operations with 2 operands. Those are structured. They have an opcode, the size of which is dependent on the operation population in that slot. It might be different for every slot in the exu block. | ||
+ | And then there are 2 morsels for arguments. Each of which might be a belt location, or an immediate value. Although usually, if there is an immediate value it is the 2nd argument. And often operations with an immediate argument get shorter opcodes to gain more bits for the immediate value.<br /> | ||
+ | There are also the operations that take implicit arguments from neighboring slots or from condition codes from neighboring slots. In those cases there is of course an opcode prefix and the bits used for the arguments extend the opcode. | ||
+ | |||
+ | ==== Pick Slots ==== | ||
+ | |||
+ | Similarly to the Reader slots, there is not really a need to have opcodes in the pick slots. All you have is 3 morsels for the 3 belt arguments. And there is even a shorter encoding of only two arguments for the picks that produce Nones in one path. | ||
+ | |||
+ | ==== Writer Slots ==== | ||
+ | |||
+ | And again, like in the reader slot, there is not really a need for opcodes here, since the destination fully describes the operation. It needs an additional morsel for the belt operand though. | ||
=== Flow Stream === | === Flow Stream === |
Revision as of 22:53, 2 November 2014
The decode process turns the binary instruction streams into requests to the functional units.
Contents
Slots and Pipelines and Functional Units
Each instruction is divided into blocks. Within those variable length blocks are the operations arranged in arrays. Each position in those arrays is called a slot. It corresponds directly to a hardware slot, which a dedicated decoder that can only decode the operations that can occur at this position in the instruction. The slot then sends a request to perform the decoded operation to the proper pipeline, or in some cases like the pick operation to the crossbar circuits.
A pipeline is a collection of functional units that share a common data path, the same inputs and outputs. It is in the functional units where the actual work gets done, where the data is manipulated and shoveled around by operations.
Since there are two separate instruction streams, both are decoded by their respective specialized decoders. And not only each stream has it's own decoder, within each instruction, each block has its own specialized decoder. And each slot within each block has its own specialized binary operation format. This format depends on what functional units are available on the hardware pipeline this slot feeds into. Different hardware slots provide different functionality.
According to this different functionality the slots are grouped into a FlowCore and an ExuCore, each to serve its respective instruction streams, but this is a purely conceptual distinction.
There are no discreet modules or cores on the chip. There are only different Slots and Pipelines arranged however is best.
Streams and Decoders
The decoders are distinct specialized modules though, each with their own data paths and caches and processing units to accommodate their specific work loads. The general format of the instruction streams is described under Encoding. Here we go into the details and see how those streams and instruction formats are different.
Morsels
A morsel is the basic unit to encode values within the instruction stream. They take as many bits as are needed to address all belt locations on a specific core, i.e. 3-5 bits.
In most instructions taking immediate values those immediates also are morsel-sized.
Exu Stream
The instruction headers in the exu stream of course contain a shift count to the next instruction. And then they also contain the count of the encoded slots in each block. Well, not of the exu block or block 3, since that can be simply inferred from the other slot counts.
The slots in each block have their own format.
Reader Slots
Exu reader operations have one hardcoded parameter source selector, and this source selector actually defines the whole operation. Which means you don't even need an opcode. Reader operations are encoded just as a sequential id to identify all the available reader sources, and any unused id values in the bit width are just filled up with Popular Constants. There is no internal structure in the operation and no wasted entropy.
Exu Slots
Exu slots encode all the operations with 2 operands. Those are structured. They have an opcode, the size of which is dependent on the operation population in that slot. It might be different for every slot in the exu block.
And then there are 2 morsels for arguments. Each of which might be a belt location, or an immediate value. Although usually, if there is an immediate value it is the 2nd argument. And often operations with an immediate argument get shorter opcodes to gain more bits for the immediate value.
There are also the operations that take implicit arguments from neighboring slots or from condition codes from neighboring slots. In those cases there is of course an opcode prefix and the bits used for the arguments extend the opcode.
Pick Slots
Similarly to the Reader slots, there is not really a need to have opcodes in the pick slots. All you have is 3 morsels for the 3 belt arguments. And there is even a shorter encoding of only two arguments for the picks that produce Nones in one path.
Writer Slots
And again, like in the reader slot, there is not really a need for opcodes here, since the destination fully describes the operation. It needs an additional morsel for the belt operand though.