Mill Computing, Inc. Forums The Mill Architecture switches Reply To: switches

Ivan Godard
Keymaster
Post count: 627

Sorry for the delay in response; a few posts here got blackholed in my mailer somehow 🙁

I edited your example to provide a declaration for function foo, but it is otherwise unchanged. On Silver with the LLVM default and the specializer at –O1 (multiple ops per instruction but minimal other changes) you get this conAsm:

 F("bar") %0;
    br("bar$0_8"),
    rd(w(1ul)) %1;
        // L3C1F1=/home/ivan/mill/specializer/src/test1.c

L("bar$0_1");
    br("bar$0_2", b0 %2),
    rd(w(4ul)) %2;                                              // L18C1F1

L("bar$0_2") %3/0/1/2/4/5;
    retn(b0 %3/0/1/2/4/5);                                      // L21C1F1

L("bar$0_7");
    br("bar$0_2", b0 %5),
    rd(w(3ul)) %5;                                              // L20C1F1

L("bar$0_8");
    eql(b1 %0, 0) %7;

    brfl(b0 %7, "bar$0_9"),
    br("bar$0_2", b1 %1);

L("bar$0_9");
    lsss(b2 %0, 2) %8;

    brfl(b0 %8, "bar$0_10"),
    br("bar$0_1");

L("bar$0_10");
    eql(b3 %0, 4) %9;

    brfl(b0 %9, "bar$0_11"),
    br("bar$0_1");

L("bar$0_11");
    eql(b4 %0, 5) %10;

    brfl(b0 %10, "bar$0_12"),
    br("bar$0_2", b5 %0);

L("bar$0_12");
    eql(b5 %0, 6) %11;

    brfl(b0 %11, "bar$0_13"),
    br("bar$0_2", b6 %0);

L("bar$0_13");
    eql(b6 %0, 7) %12;

    brfl(b0 %12, "bar$0_14"),
    br("bar$0_2", b7 %0);

L("bar$0_14");
    lsss(b7 %0, 8) %13, nope();

    brfl(b0 %13, "bar$0_15");

    call1("foo", b8 %0) %4,                                     // L14C1F1
    br("bar$0_2", b0 %4);

L("bar$0_15");
    eql(b9 %0, b0 %15) %14,
    con(w(100ul)) %15;

    brtr(b0 %14, "bar$0_7"),
    br("bar$0_1");

This shows how the switch is turned into a cascade of tests for the various cases, nested cascades if there are enough cases. The code is poor because there are so many EBBs (extended basic blocks) and we would expect several mis-predictions as control worked it way to the actual working case. (Mis-predicts are cheap on the Mill, but far from free).

Working from the same genAsm from LLVM but with the specializer at –O3 you get this conAsm:

F("bar") %0;
    eql(b2 %0, 4) %9,
    lsss(b2 %0, 2) %8,
    eql(b2 %0, 0) %7,
    retntr(b0 %7, b3 %1),                                       // L21C1F1
    retntr(b1 %8, b4 %2),                                       // L21C1F1
    retntr(b2 %9, b4 %2),                                       // L21C1F1
    rd(w(4ul)) %2,                                              // L18C1F1
    rd(w(1ul)) %1;
        // L3C1F1=/home/ivan/mill/specializer/src/test1.c

    eql(b5 %0, 7) %12,
    eql(b5 %0, 6) %11,
    eql(b5 %0, 5) %10,
    retntr(b0 %10, b8 %0),                                      // L21C1F1
    retntr(b1 %11, b8 %0),                                      // L21C1F1
    retntr(b2 %12, b8 %0);                                      // L21C1F1

    lsss(b8 %0, 8) %13, nope(),
    calltr1(b0 %13, "foo", b9 %0) %4;                           // L14C1F1

    eql(b14 %0, b0 %15) %14,
    con(w(100ul)) %15,
    retntr(b6 %13, b5 %4),                                      // L21C1F1
    retntr(b0 %14, b2 %5),                                      // L21C1F1
    retnfl(b0 %14, b14 %2),                                     // L21C1F1
    rd(w(4ul)) %17,                                             // L18C1F1
    rd(w(4ul)) %16,                                             // L18C1F1
    rd(w(3ul)) %5;                                              // L20C1F1

All the excess EBBs have gone away, so there will be at most one misprediction in the switch, which is as good as a table-based switch can do, but without the possibility of a data miss on the table. As the whole code is four instructions, executing it will take at most four cycles (plus a miss if any) which is substantially better than the alternative.

Some of what the specializer does in –O3 is new, and if you read the code carefully you can find a couple of places where the result is less than optimal. Have fun finding them 🙂