Yes, this is good stuff indeed! I also read the paper, and the conclusions on standard CPUs are scary. I also note that there is specific AES hardware on current Intel E3, E5, and E7 series server processors – likely for both performance and vulnerability concerns, although I have no specific knowledge of an Intel claim that their AES hardware reduces vulnerability to timing encryption cracking.
I also noted in the paper that the tables could be eliminated through the direct use of the operations the tables are meant to replace, which on a Mill could actually be faster than a table lookup (or sufficiently fast), and certainly could be fixed latency. I have not looked at the complexity of AES in detail to see if that is indeed the case, it might not be. The AES “performance contest” was held on the CPUs of the day, the Mill has characteristics that may lead to different implementations being optimal.
As for a hardware AES box for the Mill, I suspect that dedicated (dynamically configurable) hardware to compute the algorithm may well be a better implementation than using tables in fixed-latency SRAM. At least such an implementation should be investigated, rather than assuming the implementation for old CPU ISA’s will also be optimal for direct hardware.