Mill Computing, Inc. Forums The Mill Architecture multi-cpu memory model Reply To: multi-cpu memory model

Ivan Godard
Keymaster
Post count: 689

Memory consistency models are subtle and confusing. The exact definitions are too detailed and specialized to present here; start with https://en.wikipedia.org/wiki/Consistency_model.

The Mill presents sequential consistency, *not* global consistency. In practice this means that any single thread works as if the machine had only one core. The x86 is almost sequentially consistent, so any algorithm that works on an x86 will work on a Mill, but a few that work on a Mill won’t work reliably on x86.

That’s the hardware model, that you would see when writing in conAsm. The translation from HLL to conAsm is also subject to ordering issues (on any ISA, not just Mill). The Mill architecture is designed to let the compiler do massive reordering and speculation. Any bog-standard out-of-order architecture does the same; the difference is that the Mill’s static design does it at compile time. For both Mill and OOO, reordering and speculation is not intrinsically harmful; what matters is whether the reordering/speculation is visible to program semantics or a potential attacker.

Unlike other architectures, the great mass of Mill instructions are idempotent: you can execute them in any order consistent with dataflow, and speculate them with abandon. You will know if a compiler bug violates dataflow because your program won’t work, but otherwise you are good to go. The few non-speculable instructions, which are order-sensitive, require special handling.

The Mill compiler is based on LLVM. Languages like C present a single-thread model, and when used in contexts where there is potential language-opaque asynchronous access there are well known examples in the literature where the compiler did something to the code that was not what a naive programmer would expect; see Linus Rants(tm). We are subject to the same issues – there are C semantic issues that no ISA can fix, although liberal use of “volatile” will help.

Once we get the genAsm from LLVM, the translation to conAsm must preserve order semantics and be exploit-free (delta bugs of course). The general rule is that no non-speculable Mill instruction may alter machine state based on a value read out of program order. The details are many, but the crucial one is that the memory request streams on any single thread are always in program order, and any speculated operation is guarded in such a way that the guard is verified before the instruction alters machine state.

Consequently, the translation may move instructions over branches to speculate them, but only by carrying the branch predicate along as a guard. This gets rid of branch overhead and its attendant risk of misprediction costs.

Atomicity: the Mill uses a conventional optimistic model, with no locking. It works essentially the same as in the IBM Z-series mainframes, and what Intel tried to do but couldn’t get to work (in fairness to Intel, it’s a lot harder to be optimistic in an OOO). We don’t expect to do a video about it, although there will of course be technical documentation whenever we can get to that.