Belt remapping is done by the flow and exu sides together, because both sides are evaluating or dropping operands concurrently. This takes place in the third decode cycle, a cycle ahead of use. That is, the remapper tracks what will be happening on the belt in the following cycle. Bulk rename (br/call/retn/rescue) arg lists are actually easier then simply belt eval/drop (add, LEA, etc) because they don’t happen until after still another instruction boundary.
Could you expand on this a little more? Let’s take your your example implementation with sixteen Belt positions. If a passing branch is issued with all sixteen Belt positions, what would the logic look like to rename said positions within a single cycle?
If I understood your video correctly, The Belt is implemented as a disperse set of registers each with an associated tag indicating which Belt position it currently represents. To address a particular Belt position, it’s first necessary to perform an associative lookup to determine which physical register is mapped to the specified logical Belt position. It follows that the microarchitecture must be capable of performing sixteen associative lookups in parallel. It would then be necessary to update all sixteen tags in parallel with ascending identifiers in order to re-arrange The Belt before the branch is completed. Is this correct?
Is the idea to do these sixteen lookup/updates in parallel within one cycle? That sounds like a fairly complex crossbar with difficult timing constraints. Have you modelled this in RTL? Have you considered how this would scale beyond sixteen registers?
Thanks in advance.
- This reply was modified 4 years, 11 months ago by hayesti.