Mill Computing, Inc. › Forums › The Mill › Architecture › multi-cpu memory model
- AuthorPosts
- #3582 |
In the wiki, it says that memory is sequentially consistent across cores. Everything happens in order according to the program. Is this in-order according to the genAsm or just the conAsm? Is it guaranteed no reordering happens during conversion to conAsm? It seems like that would prevent some types of optimization. If it doesn’t, are there fence instructions for the genAsm that are “optimized out” in the conAsm?
Also, do you plan to release any details of the atomic model for the Mill in general?
I’d like to know the opening thread answer and I’d like to ask a follow up question about the atomic model. I remember in a video there seemed to be an instruction to watch a memory address which you pair with a store. I think the store returns a bool saying if it succeeded or not? and it may fail if another core (or another store?) writes to that memory address? Perhaps I remember wrong. It was suppose to allow us to build primitives like a compare and swap?
However in the wiki I see enterAtomic/exitAtomic/abortAtomic http://millcomputing.com/wiki/Instruction_Set/enterAtomic I guess the instructions I once saw no longer exit?
- This reply was modified 4 years, 3 months ago by CPUSpeedup.
Those ops support the Mill’s optimistic concurrency model, essentially a bounded hardware transaction memory (HTM). Google for it 🙂 You can use them to implement pessimistic locking primitives like compare-and-swap (CAS), but will lose the gain available if you used transactions instead.
I think the instructions you sorta remember were a description of the semantics of transactions as implemented in the hardware. The ones in the ISA in the Wiki provide the bounds of the transaction; they work in conjunction with loads and stores that are marked are transaction participants. The sequence is enter -> some loads and stores -> exit, and the changes between enter and exit happen all-or-none atomically.
Memory consistency models are subtle and confusing. The exact definitions are too detailed and specialized to present here; start with https://en.wikipedia.org/wiki/Consistency_model.
The Mill presents sequential consistency, *not* global consistency. In practice this means that any single thread works as if the machine had only one core. The x86 is almost sequentially consistent, so any algorithm that works on an x86 will work on a Mill, but a few that work on a Mill won’t work reliably on x86.
That’s the hardware model, that you would see when writing in conAsm. The translation from HLL to conAsm is also subject to ordering issues (on any ISA, not just Mill). The Mill architecture is designed to let the compiler do massive reordering and speculation. Any bog-standard out-of-order architecture does the same; the difference is that the Mill’s static design does it at compile time. For both Mill and OOO, reordering and speculation is not intrinsically harmful; what matters is whether the reordering/speculation is visible to program semantics or a potential attacker.
Unlike other architectures, the great mass of Mill instructions are idempotent: you can execute them in any order consistent with dataflow, and speculate them with abandon. You will know if a compiler bug violates dataflow because your program won’t work, but otherwise you are good to go. The few non-speculable instructions, which are order-sensitive, require special handling.
The Mill compiler is based on LLVM. Languages like C present a single-thread model, and when used in contexts where there is potential language-opaque asynchronous access there are well known examples in the literature where the compiler did something to the code that was not what a naive programmer would expect; see Linus Rants(tm). We are subject to the same issues – there are C semantic issues that no ISA can fix, although liberal use of “volatile” will help.
Once we get the genAsm from LLVM, the translation to conAsm must preserve order semantics and be exploit-free (delta bugs of course). The general rule is that no non-speculable Mill instruction may alter machine state based on a value read out of program order. The details are many, but the crucial one is that the memory request streams on any single thread are always in program order, and any speculated operation is guarded in such a way that the guard is verified before the instruction alters machine state.
Consequently, the translation may move instructions over branches to speculate them, but only by carrying the branch predicate along as a guard. This gets rid of branch overhead and its attendant risk of misprediction costs.
Atomicity: the Mill uses a conventional optimistic model, with no locking. It works essentially the same as in the IBM Z-series mainframes, and what Intel tried to do but couldn’t get to work (in fairness to Intel, it’s a lot harder to be optimistic in an OOO). We don’t expect to do a video about it, although there will of course be technical documentation whenever we can get to that.
- AuthorPosts
You must be logged in to reply to this topic.