You’re safe on the NYF department, so far
You are right that the Belt is essentially a code-visible forwarding network and faces the same problems that such networks have on other machines. The Mill partitions the network to get both speed (for some critical results) and volume (overall), in a way that we have filed for.
At the end of the belt talk there are a couple of slides on how the cascaded crossbar works in hardware. Essentially there is a fast path, that gets a single one-cycle result from each slot through in time linear in number of slots (MIMD width), while everything else first goes through a slow path that is (roughly) linear in number of FU results that can be produced in a cycle; remember a single FU can have ops of different latencies retire together.
Without the cascaded crossbar, the network would have been clock-critical on the larger machines. With cascading, it appears (based on very preliminary hardware work) to be out of the critical path.