The Mill has separate operations for pointers, distinct from those that would be used for integral operands of the same length. There are three masks, under application control. Two are eight bits and are used by normal load and store operations; the three-bit GC field in the address indexes a bit in the mask and traps if set. For storep (store pointer), the GC bits from the address and the bits in the pointer being stored are concatenated and index a bit in a 64 bit mask, again trapping if the bit is set.
Explicitly coded stack barriers are unnecessary on a Mill given this support; the hardware does the barrier checking. We not have measurements of the resulting gain yet, but expect it to be significant given the frequency with which GC language store pointers.