Instruction Encoding

White paper – 2013-08-23

White paper: mill_cpu_split-stream_encoding (.PDF)

The Mill: Split-stream encoding

Real-world programs often thrash in the instruction cache, especially when SMT methods are used. The Mill™ split-stream encoding doubles the effective capacity of the instruction cache at no increase in per-instruction power usage or cache access latency, while also sharply increasing the potential maximal decode rate for instruction sets that use variable-length encoding.

Talk by Ivan Godard – 2013-05-29 at Stanford

NOTE: the slides require genuine Microsoft PowerPoint to view; open source PowerPoint clones are unable to show the animations, which are essential to the slide content. If you do not have access to PowerPoint then watch the video, which shows the slides as intended.

Slides: PowerPoint (.pptx) This talk at Stanford EE380 Computer Systems Colloquium

Instruction Encoding

Instructions can be wide, fast to decode and compact

The military maxim, “Amateurs study tactics, professionals study logistics” applies to CPU architecture as well as to armies. Less than 10% of the area and power budget of modern high-end cores is devoted to real work by the functional units such as adders; the other 90% marshals instructions and data for those units and figures out what to do next.

A large fraction of this logistic overhead comes from instruction fetch and decode. Instruction encoding has subtle and far reaching effects on performance and efficiency throughout a core; for example, the intractable encoding used by x86 instructions is why the x86 will never provide the performance/power of other architectures having friendlier encoding.

Some 80% of executed operations are in loops. A software-pipelined loop has instruction-level parallelism (ILP) bounded only by the number of functional units available and the ability to feed them. The limiting factor is often decode; few modern cores can decode more than four instructions per cycle, and none more than 10. The Mill is a new general-purpose CPU architecture that breaks this barrier; high-end Mill family members can fetch, decode, issue and execute over 30 instructions per cycle.

This talk explains the fetch and decode parts of the Mill architecture.

Speaker bio

Ivan Godard has designed, implemented or led the teams for 11 compilers for a variety of languages and targets, an operating system, an object-oriented database, and four instruction set architectures. He participated in the revision of Algol68 and is mentioned in its Report, was on the Green team that won the Ada language competition, designed the Mary family of system implementation languages, and was founding editor of the Machine Oriented Languages Bulletin. He is a Member Emeritus of IFIPS Working Group 2.4 (Implementation languages) and was a member of the committee that produced the IEEE and ISO floating-point standard 754-2011.

Ivan is currently CTO at Mill Computing, a startup now emerging from stealth mode. Mill Computing has developed the Mill, a clean-sheet rethink of general-purpose CPU architectures. The Mill is the subject of this talk.