It could be implemented on any arch, yes. The Mill already has turfs for main memory, though, so a key part of the mechanism is already there. IIRC, they also do something different with caching that lets a Mill check whether a process has read/write permissions to cache locations more quickly than at least x86 (don’t quote me on that… I need to watch the talk on caching again).
It’s not that this would all have to fit in L1 as opposed to L2 or L3, it’s that physically everything is essentially L1. I’d imagine that this would make the cache slower (and possibly smaller overall)… With regard to performance, the question is whether the extra control over what stays in cache makes up for those possibly quite significant downsides. I wouldn’t know how to even begin answering that question.