A couple of questions/thoughts.
What are the advantages of using the belt for addressing over direct access to the output registers? Is this purely an instruction density thing?
Why does the split mux tree design prevent you from specifying latency-1 instructions with multiple drops? Couldn’t you just have a FU with multiple output registers feeding into the latency-1 tree? I’m not able to visualize what makes it difficult.
For that matter, how does the second-level mux tree know how to route if the one-cycle mux tree only knows a cycle in advance? It seemed to me like either both mux trees would be able to see enough to route the info, or neither would. Does this have to do with the logical-to-physical belt mapping, because that’s the only thing I can think of that the second-level mux tree would have over the one-cycle tree.