ivan
Branch operations can contain an explicit delay, the way loads can; a delay of zero may be omitted, indicating an immediate branch, which takes effect in the following cycle, as described. A delay of one takes effect in the cycle after that, and so on. As a result, there may be several branches in flight at the same time. These
all will occur, in their respective cycle.
Instructions may contain several different branch operations. For sensible code, these will normally use different predicates and there will normally be at most one unconditional branch. All predicates are evaluated, and any branches with unsatisfied predicates are ignored.
For any given cycle, there will be a (possibly empty) set of branches which are due to retire in that cycle; some with zero delay from the current instruction, and some from prior instructions that are at last timing out. One of these retiring branches wins, according to the First Winner Rule. FWR says that shorter delay beats longer delay, and for equal delay higher slot number beats lower slot number. Operations are in asm are packed into slots in reverse textual order, so slot zero is on the right of the instruction as written. Consequently you can read the code and the textually first branch that is satisfied will be the one taken; hence First Winner Rule.
As an example:
if (a == 0) goto A0;
else if (a == 1) goto A1;
else goto Arest;
codes as
eql(<a>, 0), eql(<a>, 1);
brtr(b0, "A0"), brtr(b1, "A1"), br("Arest");
With phasing (due in the 2/5 talk) the code can be reduced to a single instruction rather than the two shown here. I use goto for simplicity here, but the same structure is used for any branches.
Loops are no different:
while (--i != 0) { a += a; }
can be encoded as:
L("loop");
sub(<i>, 1);
eql(<sub>, 0);
brtr(<eql>, "xit");
add(<a>, <a>), br("loop")
but would be better coded as:
L("loop");
sub(<i>, 1);
eql(<sub>, 0);
add(<a>, <a>), brfl(<eql>, "loop");
//falls through on exit
Yes, the second form does the add an extra time on the last iteration, but the former value is still on the belt and is the correct value for "a" at the end. The belt permits optimizations like this that are not possible if the a+=a was updating a register. Combine this optimization with phasing and a NYF operation and the whole loop is one instruction.
I'm not sure what a flat-out OOO superscalar would do with this code, but it clearly would not be
better than the Mill :-)
ivan
From comp.arch, on how the Mill operation specification machinery works:
On 5/20/2016 8:04 PM, Ivan Godard wrote:
(in the "instruction decoding perfected" thread, to deprecate manual instruction bit-layout):
> In our spec machinery you supply traits for each encodable attribute.
> One of those traits tells the bit-allocator when the decoder will need
> the value. An attribute can be "pinned", in which case it gets its own
> bitfield (which need not be contiguous) in every operation in the slot,
> needed or not; or "direct", in which case it gets its own field, but
> that field is only in the ops that use that attribute and may be in
> different location in different ops; or "merged", in which case it is
> cross-producted with other attribute to make a single larger value-set
> that is encoded as if it were an attribute itself; or "uncoded" in which
> case it doesn't appear in the binary at all but is carried metainfo for
> the specification machinery.
People have asked for me to post Mill details occasionally. In case anyone is interested, as a p.s. on how specification driven layout works here's a cut from the actual source code that declares the operation attributes for Mills. The enumerations for those attributes that are enums are declared elsewhere. An example is:
enum directionCode {
leftward,
rightward
};
which attribute is used in shifts and a few other operations. As you see below, the value of this attribute is given (in conAsm) by the operation mnemonic rather than by an argument ("byMnemonic"); is encoded as part of a cross-product with other attributes ("merged"); and has the same meaning and potential valueset across all Mills ("universal").
To add a new attribute the architect must declare the enum (if there is one), add a line for the attribute to the declares below, and add a traits specification for the possible values; here's the traits for directionCode:
const
attrTraits
attrTraits::insts[intTraits::count] =
{
attrTraits(leftward, "l",
"toward greater significance"),
attrTraits(rightward, "r",
"toward lesser significance")
};
This gives the text that will be used in the mnemonic in conAsm to indicate the desired value ("l"/"r"), plus a short description that is used in online help and other documentation. For a simple attribute like this, the whole process takes around 15 minutes, including adding an operation that uses the attribute and rebuilding the world to generate the new binary encodings.
Here's the specs for all instruction attributes used in the Mill as specified 2016/05/21.
declareEnumeratedAttrTraits(accessCode, byMnemonic, merged, universal);
declareEnumeratedAttrTraits(awarenessCode, byMnemonic, merged, universal);
declareNumericAttrTraits(base0Code, byParam, uncoded, byMember);
declareEnumeratedAttrTraits(basedCode, byMnemonic, merged, universal);
declareNumericAttrTraits(bit0Code, byParam, direct, byMember);
declareNumericAttrTraits(bit1Code, byParam, direct, byMember);
declareEnumeratedAttrTraits(blockCode, byMnemonic, uncoded, universal);
declareEnumeratedAttrTraits(ccGenCode, byMnemonic, uncoded, universal);
declareEnumeratedAttrTraits(conBytesCode, byDerivation, pinned, universal);
declareNumericAttrTraits(con0Code, byParam, uncoded, universal);
declareEnumeratedAttrTraits(condSenseCode, byMnemonic, merged, universal);
declareEnumeratedAttrTraits(conSet, byMnemonic, uncoded, universal);
declareNumericAttrTraits(count0Code, byParam, direct, byMember);
declareEnumeratedAttrTraits(directionCode, byMnemonic, merged, universal);
declareEnumeratedAttrTraits(domainCode, byMnemonic, merged, universal);
declareNumericAttrTraits(elem0Code, byParam, direct, byMember);
declareNumericAttrTraits(field0Code, byParam, direct, byMember);
declareEnumeratedAttrTraits(exactitudeCode, byMnemonic, merged, universal);
declareEnumeratedAttrTraits(exclusivityCode, byMnemonic, merged, universal);
declareNumericAttrTraits(extCount, byDerivation, pinned, universal);
declareNumericAttrTraits(fault0Code, byParam, merged, universal);
declareNumericAttrTraits(imm0Code, byParam, direct, bySlot);
declareNumericAttrTraits(lit0Code, byParam, pinned, byMember);
declareNumericAttrTraits(lit1Code, byParam, pinned, byMember);
declareNumericAttrTraits(lit2Code, byParam, pinned, byMember);
declareEnumeratedAttrTraits(memCode, byMnemonic, uncoded, universal);
declareNumericAttrTraits(memAttrCode, byParam, merged, universal);
declareEnumeratedAttrTraits(memorySubOpCode, byMnemonic, merged, universal);
declareNumericAttrTraits(NaRcode, byParam, direct, byMember);
declareEnumeratedAttrTraits(off0Code, byParam, direct, universal);
declareNumericAttrTraits(opand0Code, byParam, pinned, byMember);
declareNumericAttrTraits(opand1Code, byParam, pinned, byMember);
declareNumericAttrTraits(opand2Code, byParam, pinned, byMember);
declareNumericAttrTraits(opand3Code, byParam, pinned, byMember);
declareEnumeratedAttrTraits(opCode, byMnemonic, merged, universal);
declareEnumeratedAttrTraits(overflowCode, byMnemonic, merged, declareEnumeratedAttrTraits(scaleCode, byParam, direct, byMember);
declareNumericAttrTraits(scratchCode, byParam, direct, byMember);
declareNumericAttrTraits(specr0Code, byParam, direct, byMember);
declareNumericAttrTraits(specw0Code, byParam, direct, byMember);
declareNumericAttrTraits(streamr0Code, byParam, direct, byMember);
declareNumericAttrTraits(streamw0Code, byParam, direct, byMember);
declareNumericAttrTraits(tag0Code, byParam, direct, byMember);
declareNumericAttrTraits(trap0Code, byParam, merged, universal);
declareNumericAttrTraits(widthCode, byParam, direct, byMember);
universal);
declareNumericAttrTraits(pop0Code, byParam, direct, bySlot);
declareEnumeratedAttrTraits(resCount, byMnemonic, merged, universal);
declareEnumeratedAttrTraits(resultCode, byMnemonic, uncoded, universal);
declareEnumeratedAttrTraits(roundingCode, byMnemonic, merged, universal);
declareEnumeratedAttrTraits(shrinkCode, byMnemonic, merged, universal);
declareEnumeratedAttrTraits(vectorFill, byMnemonic, merged, universal);
Feel free to ask questions.
Obligatory plug: if you'd like to be part of the Mill effort then see http://millcomputing.com/join-us/