You wouldn’t do swapping in the core using order tagging. The way that hardware works there would be a swap delay on every operation, needed or not, which would cost you at least 25% of your clock rate.
Ad hoc swapping is rare and is best done with explicit operations. The only important case is when there is a file of the wrong endianness and the buffer must be swapped, or in general when an app is written to presume an endianness that is different from the native (little) of the core. The Mill has a status bit that lets it act like a big-endian machine, swapping on all access to memory. This is the same approach as used by PPC, MIPS, and some others; it’s not original with us.