Forum Replies Created

Viewing 11 posts - 1 through 11 (of 11 total)
  • Author
    Posts
  • hayesti
    Participant
    Post count: 12
    in reply to: Benchmarks #3503

    Ivan: When can we expect SPEC numbers–simulated or otherwise?

  • hayesti
    Participant
    Post count: 12

    I think I understand better now. One op can address a maximum of three belt operands (via the morsels) and this can be chained indefinitely using the special flowArgs op. So in the case of 10 belt operands being passed to EBB-C, the encoding would require one branch op w/ three morsels followed by three flowArg ops w/ three, three and one morsels?

    Could you provide an intuition what the logic of the decoder looks like? I presume the branch pipeline’s decoder would require logic to buffer N operands (where N is the length of The Belt)? Also, how does this affect branch latency? A single-cycle branch op with N operands would have to rename N belt entries in parallel, correct? Perhaps you anticipate doing this over several cycles, however, I imagine this might hamper ILP since it would be complicated to produce values onto The Belt while simultaneously renaming said entries?

    Any further insights would be greatly appreciated.

  • hayesti
    Participant
    Post count: 12

    What would brs(“foo”, b4, b6, b2, b2, b3, b8, b0, b9, b3, b11) look like in binary form though? If I understood your video correctly, one Mill VLIW instruction contains three variable-sized blocks, however, inside each block every operation is fixed length. How can the BRS operation be fixed size if it need encode anywhere from zero to N belt positions as parameters?

  • hayesti
    Participant
    Post count: 12

    Thanks for the extra information. I’m still a little uncertain of some details (apologies because I’m woefully ignorant about many facets of your ISA).

    brtrs(b0 %434, "printf$17_75", b5 %51) ^0,
    brs("printf$17_75", b2 %399) ^0,

    For these forms of branch instructions, what happens to the last argument, i.e. the belt positions? It seems like the values of b5/b2 are moved/copied to b0 before the branch is retired. Is this correct?

    If so, I’m still a little puzzled by the example in the original diagram. Let’s say that that EBB-A produces exactly 10 values and pushed onto The Belt positions 0-9 then later EBB-C consumes all of these values. If EBB-B produces exactly one value and pushes it onto The Belt at position 0, then the 10 values produced in EBB-A will subsequently be moved to positions 1-10. The question is: how does EBB-C know where to find the values it needs? Are they in positions 0-9 or 1-10? This seems very complicated without spatial addressing. Could you perhaps elaborate a little on how you would resolve this particular scenario?

  • hayesti
    Participant
    Post count: 12

    Thanks for the extra information. I’m still a little uncertain of some details (apologies because I’m woefully ignorant about many facets of your ISA).

     brtrs(b0 %434, "printf$17_75", b5 %51) ^0,
    

    brs(“printf$17_75”, b2 %399) ^0,

    For these forms of branch instructions, what happens to the last argument, i.e. the belt positions? It seems like the values of b5/b2 are moved/copied to b0 before the branch is retired. Is this correct?

    If so, I’m still a little puzzled by the example in the original diagram. Let’s say that that EBB-A produces exactly 10 values and pushed onto The Belt positions 0-9 then later EBB-C consumes all of these values. If EBB-B produces exactly one value and pushes it onto The Belt at position 0, then the 10 values produced in EBB-A will subsequently be moved to positions 1-10. The question is: how does EBB-C know where to find the values it needs? Are they in positions 0-9 or 1-10? This seems very complicated without spatial addressing. Could you perhaps elaborate a little on how you would resolve this particular scenario?

    • This reply was modified 6 years, 2 months ago by  hayesti.
  • hayesti
    Participant
    Post count: 12

    Well, in the example above I was thinking that EBB-B could modify a variable that is produced in EBB-A and consumed in EBB-C. I wasn’t aware of the branches with arguments. Could you point me to an example of this in the ISA?

    If we imagine EBB-A (and predecessors) producing many results and EBB-C (and successors) consuming said results, the branch from EBB-A to EBB-B may have to pass tens (possibly hundreds) of arguments in order to keep The Belt coherent. How can The Mill’s ISA encode more than three or four without running out of bits?

  • hayesti
    Participant
    Post count: 12

    Belt remapping is done by the flow and exu sides together, because both sides are evaluating or dropping operands concurrently. This takes place in the third decode cycle, a cycle ahead of use. That is, the remapper tracks what will be happening on the belt in the following cycle. Bulk rename (br/call/retn/rescue) arg lists are actually easier then simply belt eval/drop (add, LEA, etc) because they don’t happen until after still another instruction boundary.

    Could you expand on this a little more? Let’s take your your example implementation with sixteen Belt positions. If a passing branch is issued with all sixteen Belt positions, what would the logic look like to rename said positions within a single cycle?

    If I understood your video correctly, The Belt is implemented as a disperse set of registers each with an associated tag indicating which Belt position it currently represents. To address a particular Belt position, it’s first necessary to perform an associative lookup to determine which physical register is mapped to the specified logical Belt position. It follows that the microarchitecture must be capable of performing sixteen associative lookups in parallel. It would then be necessary to update all sixteen tags in parallel with ascending identifiers in order to re-arrange The Belt before the branch is completed. Is this correct?

    Is the idea to do these sixteen lookup/updates in parallel within one cycle? That sounds like a fairly complex crossbar with difficult timing constraints. Have you modelled this in RTL? Have you considered how this would scale beyond sixteen registers?

    Thanks in advance.

    • This reply was modified 6 years, 2 months ago by  hayesti.
  • hayesti
    Participant
    Post count: 12

    What if I require carrying 10 values into EBB-C? How is this encoded?

  • hayesti
    Participant
    Post count: 12

    Thanks for the extra information. I’m still a little uncertain of some details (apologies because I’m woefully ignorant about many facets of your ISA).

    For the form of branch instructions addressing printf$17_75, what happens to the last argument, i.e. the belt positions? It seems like the values of b5/b2 are moved/copied to b0 before the branch is retired. Is this correct?

    If so, I’m still a little puzzled by the example in the original diagram. Let’s say that that EBB-A produces exactly 10 values and pushed onto The Belt positions 0-9 then later EBB-C consumes all of these values. If EBB-B produces exactly one value and pushes it onto The Belt at position 0, then the 10 values produced in EBB-A will subsequently be moved to positions 1-10. The question is: how does EBB-C know where to find the values it needs? Are they in positions 0-9 or 1-10? This seems very complicated without spatial addressing. Could you perhaps elaborate a little on how you would resolve this particular scenario?

  • hayesti
    Participant
    Post count: 12

    Thanks for the extra information. I’m still a little uncertain of some details (apologies because I’m woefully ignorant about many facets of your ISA).

    brtrs(b0 %434, "printf$17_75", b5 %51) ^0,
    brs("printf$17_75", b2 %399) ^0,

    For these forms of branch instructions, what happens to the last argument, i.e. the belt positions? It seems like the values of b5/b2 are moved/copied to b0 before the branch is retired. Is this correct?

    If so, I’m still a little puzzled by the example in the original diagram. Let’s say that that EBB-A produces exactly 10 values and pushed onto The Belt positions 0-9 then later EBB-C consumes all of these values. If EBB-B produces exactly one value and pushes it onto The Belt at position 0, then the 10 values produced in EBB-A will subsequently be moved to positions 1-10. The question is: how does EBB-C know where to find the values it needs? Are they in positions 0-9 or 1-10? This seems very complicated without spatial addressing. Could you perhaps elaborate a little on how you would resolve this particular scenario?

    • This reply was modified 6 years, 2 months ago by  hayesti.
  • hayesti
    Participant
    Post count: 12

    Thanks for the extra information. I’m still a little uncertain of some details (apologies because I’m woefully ignorant about many facets of your ISA).

    brtrs(b0 %434, “printf$17_75”, b5 %51) ^0,
    brs(“printf$17_75”, b2 %399) ^0,

    For these forms of branch instructions, what happens to the last argument, i.e. the belt positions? It seems like the values of b5/b2 are moved/copied to b0 before the branch is retired. Is this correct?

    If so, I’m still a little puzzled by the example in the original diagram. Let’s say that that EBB-A produces exactly 10 values and pushed onto The Belt positions 0-9 then later EBB-C consumes all of these values. If EBB-B produces exactly one value and pushes it onto The Belt at position 0, then the 10 values produced in EBB-A will subsequently be moved to positions 1-10. The question is: how does EBB-C know where to find the values it needs? Are they in positions 0-9 or 1-10? This seems very complicated without spatial addressing. Could you perhaps elaborate a little on how you would resolve this particular scenario?

Viewing 11 posts - 1 through 11 (of 11 total)