There are issues with AES, any other crypto, and any block functional unit of any purpose. Recall that the Mill is a statically-scheduled fully pipelined machine with exposed timing. Long latency operations don’t play well, and the AES is hundreds to thousands of cycles depending on implementation.
Moreover, on a fully-pipelined machine like the Mill you must be able to issue a new AES every cycle, which means that you need hundreds to thousands of AES engines because the iterative nature of the algorithm doesn’t pipeline.
Next there are issues with data width. AES supports different data widths, 128-bit being typical. How would we feed it on Mills that do not support quad width?
There are similar issues with long-latency scheduling too, The compiler will find at most a handful of other operations that it can schedule in parallel with an AES operation, so the rest of the machine will be stalled for the great majority of the time of the AES. The stall would likely also block interrupts as well.
I sympathize with your desire that AES should be supported (and there are quite a few plausible other block functions that you don’t mention). However, I think you are confusing a desire for a primitive, which is a semantic notion, with an operation, which is an implementation notion. AES may make a very good primitive that the market will demand and we should support; it makes a very bad operation. Instead, it should be implemented as an out-of-band block functionality akin to an i/o device in its interface. That was it doesn’t have to fit into the decode/pipeline/belt that the Mill uses, and you only need one of them rather than hundreds.
It’s easy to think that what appears primitive in software can be primitive in hardware. I wish it were that easy 🙂