Threading has been addressed a little here and there in other topics and presentations, but there hasn’t been a talk explicitly focused on it. There is some NYF material that we do not plan to disclose before product so as to preserve the eventual patent period. Here’s a brief summary of what we can say now, subject to change based on implementation and market experience:
* We will not support simultaneous multi-threading (SMT) initially, and maybe ever. In today’s hardware a whole second core is cheaper than what it costs to share a core, and Mill has a *very* fast process switch if the software doesn’t want to wait.
* Mill uses a shared virtual space model. We will support shared address space on chip but not past the pins. Off chip will use message passing through standard libraries and protocols.
* Atomicity, including multi-core atomicity, is supported using the top-level cache as a pending participant buffer, which appears to the program as a limited-capacity hardware transaction. This isn’t our idea, just an implementation of an approach that IBM has used for years and is well known in the lit and IBM documentation. Intel also tried to use the approach but couldn’t get it to work within the quirks of the x86. Given multi-factor atomicity, software support for standard synchronization primitives like semaphores is a straightforward library. We expect to provide high performance unbounded software transactions through a library.
* There are no barrier instructions; membars are unnecessary for coherency on a Mill, which is sequentially consistent. Programs that contain true data races must resolve them using the atomicity support.
* Streamers are NYF.