Mill Computing, Inc. Forums The Mill Architecture How ganged operation work

  • Author
    Posts
  • Ivan Godard
    Keymaster
    Post count: 689
    #2136 |

    A question from comp.arch:

    > If I understood you correctly, ganging is a mechanism at
    > implementation layer. How do you handle carry-in at ISA level? Do you
    > have 3-input-2-output integer addition/subtraction as a standard part
    > of the ISA?

    We do not have an integer add/sub in the ISA, but we could, and there are other ops in the ISA that have similar behavior. Here’s some detail about how ganging works.

    There are actually two different ISAs involved: the assembly language that the programmer writes (conAsm), and the binary encoding that conAsm becomes. Most CPUs have slight differences between the written notation and the binary: for example, you may write “br”, but if you look at the encoding it is actually “br, always”, where “always” is one of several conditions encoded in the binary. These differences are usually minor notational conveniences, but are more significant on a Mill.

    For example, the operation FMAF (binary fused multiply-add) comes in three-argument (“fmaf(b1, b2, b3)”) and four argument (“fmaf(b1, b2, b3, b4)”) forms in conAsm. ConAsm is syntactically C++, so assembly language ops are written in C++ function notation. “bN” is a belt-position reference, akin to a register number on a general register machine. The four-argument form produces the sum and difference of two products – a*b ± c*d; the three-argument form is equivalent to four-argument “fmaf(b1, b2, b3, 1.0)”. That’s the conAsm.

    In the binary, there are also two fmaf operation, but they don’t have 3 or 4 arguments. Like other CPUs, each Mill arithmetic machine operation can have at most two inputs. The binary FMAF has either one or two inputs, but is ganged with an adjacent “EXUARG” operation that supplies two more. The assembler turns the conAsm “fmaf(b1, b2, b3) into “FMAF(b1), EXUARG(b2, b3)”, for free.

    We didn’t invent ganging. For example, the way a SPARC builds 32-bit literals requires two operations, which is effectively a sequential gang. As a wide machine, the Mill uses parallel rather than sequential ganging, but it is conceptually similar. The two parts of the Mill gang are closely coupled: on a SPARC, you can use half the 32-bit literal pair by itself if that is what you need bitwise, but on a Mill the machine FMAF must be adjacent to a EXUARG or the hardware will throw an illegalInstruction fault. To the decoders the gang is really two operations; to the execution hardware it is one operation that had been encoded across two encoding slots.

    Many gangs work like fmaf: the conAsm opcode corresponds to one machine op, and is ganged with EXUARG for additional operands but no other encoded fields or semantic info. However, in some gangs the second operation is not EXUARG but does convey further semantics. For example, the Mill equivalent of a condition code is implemented as a ganged pair of any ordinary arithmetic operation and a niladic (no arguments) predicate operation that tests the condition of interest. The gang drops two results: the normal arithmetic result, the same as if the predicate op was not present, and a boolean that is the result of the test. You write “subs(b1, b2), lss()”; the difference from the subtract drops to the belt, and also a bool that is true is the difference was negative (i.e. less than zero). Any condition-yielding op can be ganged with any predicate op. A arithmetic op can appear without a predicate if you don’t want to test any condition, but a predicate cannot appear without a ganged arithmetic op or the hardware will fault.

    On the exu side of the machine there are no defined gangs with more than two operations, although there could be if we found something useful. On the flow side of the machine many ops take multiple ganged FLOWARG operations so as to support argument lists. Thus “call1(“foo”, b1, b2, b3, b4, b5, b6, b7, b8, b9)” is a multi-argument function call operation in conAsm, but in the encoding it is a gang that bitwise packs all the belt references into a CALL1 operation and several adjacent FLOWARG operations.

You must be logged in to reply to this topic.