I think it is really important to a) have control over rounding modes, b) not make them a global state (this makes performance slower, and makes calling functions tricky, and context switch need to save and restore the state).
I favor a explicit rounding modes in the instructions. I.e. default is round to nearest (as dictated by specifics of IEEE 754), but each individual instruction can specific a bit to force rounding toward infinity, toward minus infinity, or towards zero. random rounding support definitively optional. It can be implemented using fixed 2 bits if standard modes are to be supported, or using 1+2 bits, with 1 bit being zero, to indicated round to nearest, and 1 indicating non-standard rounding, with two remaining bits indicating which other mode to use – this would of course make the instruction decoding a bit more hard and make instructions variable.
x87, SSE, SSE2, AVX all use global flags to control rounding modes. AVX-512 uses explicit rounding control in the instruction itself. This actually make code much faster, because there is no global state to care about, and if you are frequently switching rounding mode (i.e. for interval arithmetic), there is no cost involved with switching rounding modes at all.