Volatile
The Mill is hardware. While we are retargeting the clang/llvm compiler to the Mill, and no doubt will retarget GCC and others in time, we have no control over what the compiler does with your program before our code generator gets it, and even less control over what the languages define or don't define. This note describes our intentions in the hardware and code generator, for present and future members.
Mill hardware is inherently sequentially consistent. That is a technical term and does not mean that it imposes a global ordering; see https://en.wikipedia.org/wiki/Sequential_consistency. Sequential consistency is a stronger model than that offered by other commercial CPUs. Active agents (in practice cores) see memory changes they and other agents make in each mutator's program order, arbitrarily interleaved.
The Mill does not have memory-op, op-memory, read-modify-write, membar, or bus-locking operations (or busses to lock). It uses cache-based optimistic concurrency control.
There are some caveats to this, which are defined in our (incomplete, not published) ABI:
"volatile" is an attribute attached to particular memory-reference actions in the intermediate representation (genAsm) provided by the compiler to the code generator. It then becomes a flag bit attached to memory reference operations in the member-specific binary encoding (conAsm). The flag is in turn attached to requests created by the load/store functional units and passed down the hierarchy. A "volatile access" is such a request with the volatile bit set.
A volatile access that crosses a cache line boundary will fault. Naturally-aligned access never crosses a line boundary.
A volatile access to an object larger than the largest supported on a target Mill member will receive a code-generator diagnostic, for example a volatile access to a quad (16 byte) object on a target that does not support native quad.
A volatile access using a deferred load (of either kind) is generally uncodeable and will receive a code-generator diagnostic, but if codeable on a particular member will fault.
A volatile access that is part of the participant set of an optimistic-atomic group (sometimes called a transaction) is generally uncodeable and will receive a code-generator diagnostic, but if codeable on a particular member will fault.
A volatile access that is a within an optimistic-atomic group but is not a participant will execute as a normal volatile access on each retry.
Volatile accesses are not reordered by the code generator, which honors the compiler-supplied ordering. They may be arbitrarily reordered with respect to other operations, consistent with the dataflow and control flow of the program.
Volatile accesses are not speculated, and will not be executed unless actual program control flow requires.
Volatile accesses may issue across a misprediction, but without hardware-visible side effects until the prediction is verified.
A volatile access to a virtual address within the target-member-defined MMIO space behaves the same as a non-volatile access, namely a store is not cached and a load does not check for a cached value. Individual entities within MMIO space may constrain the access beyond the usual alignment restrictions.
A volatile access is cacheable normally unless the access is marked uncacheable or is to MMIO space. Both cacheable and uncacheable volatile accesses act identically to the corresponding non-volatile access, i.e. an uncacheable store updates cache normally followed by a succession of evicts to memory, thereby removing the line from all caches, while an uncacheable load invalidates the requested bytes at each cache level followed by satisfaction from memory and normal caching.
Volatile accesses to different virtual addresses that alias by page mapping are ordered as if they used the same virtual address.
A volatile load of backless memory that is not satisfied in cache receives a zero normally.
The effect of these rules is that a Mill volatile access is guaranteed to comprise a single atomic action that is executed in program order and iff required by actual program control flow. Volatile is not required to safely access devices (determined by an address in the MMIO region), nor does it alone guarantee that external memory is actually accessed (determined by the "uncached" flag).