Forum Replies Created
- AuthorPosts
- in reply to: Prediction #691
This is obviously one of those details that had to be glossed over in the talk, but the line and instruction counts in the exit table can only refer to one stream. Are there two pairs of counts, one for each stream? Or have you worked out some trick to get by with just one.
- in reply to: Site-related issues (problems, suggestions) #677
I seem to have lost the ability to edit my forum posts.
Edit: Or maybe not. Is it time related?
- This reply was modified 10 years, 9 months ago by baking.
- in reply to: Site-related issues (problems, suggestions) #679
Still working on it.
The first talk on decoding instructions explains how instructions are split into two streams (Exucode and Flowcode) with each half-instruction (bundle) composed of a header and three blocks, for a total of six blocks per instruction. The most recent (sixth) talk on execution explains how these blocks are decoded and executed in phases. Together, they create a more complete picture of the Mill instruction set.
In the exucode stream, the three blocks correlate directly to the functional pipelines. (Numbers correspond to the “Gold” member of the Mill family.)
Block 1x: 8 read operations (fill from scratchpad*)
Block 2x: 4 integer or binary floating point (including multiply) operations
and 4 integer only (no multiply) operations
Block 3x: 4 pick** operations
and 5 write operations (spill to scratchpad*)* The only exucode side read/write operations that we currently know of are the fill and spill scratchpad ops.
** The pickPhase is describe as being between phase 2 and phase 3 but there are numerous reasons to include pick ops in the third block. For one, the pick logic resides in the crossbar that the writePhase ops use to select their inputs. Also, the maximum number of operations in one block is said to be nine.The flowcode stream is quite different with only 8 dedicated pipelines compared to 25 for the exucode stream. The reason for this is that flowcode operations are much larger due to their in-line address and constant operands so the code size (and cache size) for both streams balances out. Another difference is that the pipelines include functional hardware for multiple phases:
4 pipes that can do either immediate constant, load/store or control transfer (branch, call) operations
4 pipes that can do either immediate constant or load/store operationsSuppose you have an instruction with 8 stores (writePhase), followed by an instruction with 8 loads (opPhase), followed by an instruction with 8 constants (readPhase). Because of the phasing, all 24 operations would hit the 8 pipelines during the last cycle. Would the the separate functional units handle it easily, or are there constraints on the mix of flowcode ops to balance out the phasing?
Found this useful explanation in the split-stream encoding white paper:
if operations with similar encoding
requirements are assigned to the same stream,
then the bundle or operation formats can be
tailored to each stream, saving instruction space
and likely reducing decoding difficulty. For
example, all operations with an effective-address
parameter (including loads, stores, and branches)
can be assigned to one of the streams, so the logic
to decode an address is not needed in the other
decoder.Function arguments are specified the same way as for operations except you can have up to 16 of them. They also return any reasonable number of results just like operations. See the first talk on instruction decoding.
The function gets a brand-new belt with all of its arguments loaded in order. I think that was described in more detail in the second talk.
Edit: I’m not sure if I actually answered your question about the mechanics of passing those arguments to functions in the assembly language. I don’t know if that has been specifically mentioned.
- This reply was modified 10 years, 9 months ago by baking.
- AuthorPosts