Its possibly ambiguous, but I don’t think he was being skeptical of the Mill memory model. I think he was being skeptical of there being any business for non-x86 chips even if they are better? That was my reading of that part, anyway.
When I read the article the other day (proggit discussion), I cherry picked some of the technical problems he raised and summarised what the Mill was doing in each area:
- TLBs and TLB misses: translation of addresses is after the cache hierarchy; the cache is in virtual addresses and the memory is shared so there’s no translation needed in IPC and context switches.
- locks: we’re 100% hardware transactional memory (HTM) on which you can build conventional locks; but you can jump ahead and write your code to use HTM directly for a big performance gain
- Syscalls and context-switches: there aren’t any context-switches; we’re Single Address Space (SAS). Syscalls are much much faster, and you aren’t restricted to a ring architecture (hello efficient microkernels and sandboxing!)
- SIMD: the Mill is MIMD. Much more of your code is auto-vectorizable and auto-pipelinable on the Mill, and your compiler will be able to find this parallelism
- Branches: we can do much more load hoisting and speculation. We’re not Out-of-Order (OoO), so its swings and roundabouts and we often come out ahead
Free forward with any technical questions related to the article 🙂