The Mill general purpose CPU architecture takes new approaches in most major areas of processor architecture. We have public presentation video recordings for most of the topics listed below, with more to come.
NOTE: the slides require genuine Microsoft PowerPoint to view; open source PowerPoint clones are unable to show the animations, which are essential to the slide content. If you do not have access to PowerPoint then watch the video, which shows the slides as intended.
A major portion of the area and power budget of modern high-end CPU cores is devoted to fetching and decoding instructions, to feed the functional units and to figure out what to do next. The instruction encoding techniques of the Mill CPU architecture allow high-end Mill family members to fetch, decode and issue up to 30 opcodes per cycle, sustained, within a three cycle decode pipeline.
white paper, talk more…
The Belt is the data interchange mechanism for the Mill general purpose CPU architecture, replacing the general registers of other architectures. The Mill’s belt is unique both in its programming model and its implementation at the micro-architecture level. Destination addressing is implicit, yielding more compact instruction encoding. The Belt is integrated with the function call mechanism; it eliminates caller/callee save conventions and callee pre-/postlude instructions, and it supports multi-result calls naturally. The Belt is Single-assignment, so rename registers and pipeline phases are unnecessary.
The Mill uses a novel load instruction that tolerates load misses as well as hardware out-of-order approaches can do, while avoiding the need for expensive load buffers and completely avoiding false aliasing. In addition, store misses are impossible on a Mill, and a large fraction of the memory traffic of a conventional processor can be omitted entirely.
The Mill uses a novel prediction mechanism; it predicts transfers rather than branches. It can do so for all code, including code that has not yet ever been executed, running well ahead of execution so as to mask all cache latency and most memory latency. It needs no area- and power-hungry instruction window, using instead a very short decode pipeline and direct in-order issue and execution.
The Mill conveys some of the semantics of execution in the form of metadata attached to the arguments of operations, in addition to that expressed by the operation encodings in the executed code stream. Metadata propagates through execution, following rules specified by the architecture, although it may be altered explicitly by code when needed.
A perennial objection to wide-issue CPU architectures such as VLIWs and the Mill is that there is insufficient instruction level parallelism (ILP) in programs to make effective use of the available functional width. This talk addresses the ILP issue, describing how the Mill is able to achieve much higher IPC even when the nominal ILP is relatively low.
Software bugs have always been a problem, but in recent years bugs have become an even more serious concern as they are exploited to breach system security for privacy violation, theft, and even terrorism or acts of war. The Mill CPU architecture addresses software robustness in three basic ways. This talk describes some of the Mill CPU features that defend against well-known error and exploit patterns.
The Mill CPU architecture defines a generic Mill processor, from which a family of specific processors can be configured. A particular configuration for a Mill CPU family member is defined by a specification, which is processed by Mill configuration software to build a member-specific assembler, simulator, compiler back-ends, Verilog for the hardware implementation, documentation, and other tools and components.
On a conventional machine, pipelining requires lengthy prelude and postlude instruction sequences to get the pipeline started and wound down, frequently destroying the benefit of pipelining the main body. Mill pipelines have neither prelude nor postlude, and early conditional exit has no added cost.
The Mill is a new general-purpose CPU architecture family that forms a uniquely challenging target for compilation – and also a uniquely easy target. This talk describes the Mill tool chain from language front end to binary executable.
talk more. . .
Multi-way branches, known as switches or case clauses in various languages, are a notorious pain for compiler writers and CPU architects. On the critical path in important applications from lexers to byte-code interpreters, switches often predict poorly. This talk shows how an ultra-wide-issue architecture responds to the switch challenge.
talk more. . .
The Mill is a new general-purpose architectural family, with an emphasis on secure and inexpensive communication across protection boundaries. The large (page) granularity of protection on conventional architectures makes such communication difficult compared to communication within a protection boundary, such as a function call. As a result, the large granularity has forced communication protocols on conventional architectures into two models: pass-by-sharing (using shared pages), and pass-by-copy (using the OS kernel for files/message passing). Both have drawbacks: sharing requires difficult-to-get-right synchronization, while copy involves kernel transitions as well as the costs of the copy itself.
The Mill is a new general-purpose CPU architectural family, with novel resource allocation and control facilities that are orders of magnitude less expensive than the equivalents on other CPUs. Critical to this gain is the direct Mill hardware support for threading.
The Mill is a new general-purpose CPU architectural family. The talk will present machine-level details of the Mill support for bigger-than-scalar data.
talk more. . .