I think I understand better now. One op can address a maximum of three belt operands (via the morsels) and this can be chained indefinitely using the special flowArgs op. So in the case of 10 belt operands being passed to EBB-C, the encoding would require one branch op w/ three morsels followed by three flowArg ops w/ three, three and one morsels?
Could you provide an intuition what the logic of the decoder looks like? I presume the branch pipeline’s decoder would require logic to buffer N operands (where N is the length of The Belt)? Also, how does this affect branch latency? A single-cycle branch op with N operands would have to rename N belt entries in parallel, correct? Perhaps you anticipate doing this over several cycles, however, I imagine this might hamper ILP since it would be complicated to produce values onto The Belt while simultaneously renaming said entries?
Any further insights would be greatly appreciated.