So I have been taking a class on computer architecture (I am a software guy). The more I learn the more in awe I am with the beauty of the Mill instruction encoding and other features.
CISC sucks. It needs millions of transistors just to decode 6 instructions.
RISC is a clear improvement, but the superscalar OoO design is ridiculously complicated, as I learn about the Tomasulo algorithm, wide issue/decode, speculative execution, I can’t help but think “this is insane, there has to be a better way”. It feels like the wrong path.
VLIW seems like a more reasonable approach. I know binary compatibility problems and stalls have been a challenge for VLIW architectures.
The Mill is just beautiful, it has a sane encoding and simplicity of a VLIW. But phasing and double instruction stream really take it to the next level.
The separate load issue and retire is in hindsight the obvious way to solve the stalls due to memory latency that is so common in VLIW.
The branch predictor is so cool too, you can predict the several EBB’s in advance, even before you start execution. Mainstream predictors have to wait until they get to the branch instruction.
The specializer is a neat solution to binary compatibility.
I really hope to see this CPU make it to silicon.