FP rounding modes: modes are in the operation, indicated by the mnemonic (one of the choices does pick up from the PCSW). This is useful when you have FP ops with different rounding in the same instruction, for example in interval arithmetic.
vector casts: there is no such operation as your example; all widen and narrow ops are cardinality-preserving (N->N). We could narrow your four words to four shorts, but what value would you expect in the other four shorts of the result vector? However, there is a vector narrow that narrows two vectors to one with half-size elements. Thus 2X4Xword->8Xshort (i.e. 8->8). You can widen or narrow Nones and NaRs like any other data. A narrow that overflows gives you the same truncate/except/saturate choice as any other overflow (the fourth choice, double width result, doesn’t make much sense when narrowing and doesn’t exist).
Belt timings: all picks, including vector pick, are exu-side encoded. After the decoding there’s really no such thing as a “side” any more; execution itself is a collection of FU pipe with no particular “side”.