We do it for speed.
“checking all the bits” is not free; you need an OR tree, the full operand width. That can’t be done in the gap between cycles, where the branch is resolved, or at least can’t be done without unattractive clock rate impact. However, the NEQ can be in the same instruction as the BRTR but happen in opPhase so its result (one bit) is available for the branch. The NEW is just making explicit the timing dependencies that your proposed NEW/BRTR merge does implicitly.
Predicate gangs also use the same one-bit paths. It costs nothing for an ADD (say) to also yield one bit values showing the comparison of the result with zero; many conventionals do this and set the condition codes with them. A predicate gang selects one of the generated signals and drops it as a boolean with a known-to-be-one-bit value that the BRTR can use.
Our approach does use an encoding slot and a belt position that a merged test-and-branch would not. However, it’s more general and costs no added time.