Mill Computing, Inc. › Forums › The Mill › Architecture › switches › Reply To: switches
Sorry for the delay in response; a few posts here got blackholed in my mailer somehow 🙁
I edited your example to provide a declaration for function foo, but it is otherwise unchanged. On Silver with the LLVM default and the specializer at –O1 (multiple ops per instruction but minimal other changes) you get this conAsm:
F("bar") %0;
br("bar$0_8"),
rd(w(1ul)) %1;
// L3C1F1=/home/ivan/mill/specializer/src/test1.c
L("bar$0_1");
br("bar$0_2", b0 %2),
rd(w(4ul)) %2; // L18C1F1
L("bar$0_2") %3/0/1/2/4/5;
retn(b0 %3/0/1/2/4/5); // L21C1F1
L("bar$0_7");
br("bar$0_2", b0 %5),
rd(w(3ul)) %5; // L20C1F1
L("bar$0_8");
eql(b1 %0, 0) %7;
brfl(b0 %7, "bar$0_9"),
br("bar$0_2", b1 %1);
L("bar$0_9");
lsss(b2 %0, 2) %8;
brfl(b0 %8, "bar$0_10"),
br("bar$0_1");
L("bar$0_10");
eql(b3 %0, 4) %9;
brfl(b0 %9, "bar$0_11"),
br("bar$0_1");
L("bar$0_11");
eql(b4 %0, 5) %10;
brfl(b0 %10, "bar$0_12"),
br("bar$0_2", b5 %0);
L("bar$0_12");
eql(b5 %0, 6) %11;
brfl(b0 %11, "bar$0_13"),
br("bar$0_2", b6 %0);
L("bar$0_13");
eql(b6 %0, 7) %12;
brfl(b0 %12, "bar$0_14"),
br("bar$0_2", b7 %0);
L("bar$0_14");
lsss(b7 %0, 8) %13, nope();
brfl(b0 %13, "bar$0_15");
call1("foo", b8 %0) %4, // L14C1F1
br("bar$0_2", b0 %4);
L("bar$0_15");
eql(b9 %0, b0 %15) %14,
con(w(100ul)) %15;
brtr(b0 %14, "bar$0_7"),
br("bar$0_1");
This shows how the switch is turned into a cascade of tests for the various cases, nested cascades if there are enough cases. The code is poor because there are so many EBBs (extended basic blocks) and we would expect several mis-predictions as control worked it way to the actual working case. (Mis-predicts are cheap on the Mill, but far from free).
Working from the same genAsm from LLVM but with the specializer at –O3 you get this conAsm:
F("bar") %0;
eql(b2 %0, 4) %9,
lsss(b2 %0, 2) %8,
eql(b2 %0, 0) %7,
retntr(b0 %7, b3 %1), // L21C1F1
retntr(b1 %8, b4 %2), // L21C1F1
retntr(b2 %9, b4 %2), // L21C1F1
rd(w(4ul)) %2, // L18C1F1
rd(w(1ul)) %1;
// L3C1F1=/home/ivan/mill/specializer/src/test1.c
eql(b5 %0, 7) %12,
eql(b5 %0, 6) %11,
eql(b5 %0, 5) %10,
retntr(b0 %10, b8 %0), // L21C1F1
retntr(b1 %11, b8 %0), // L21C1F1
retntr(b2 %12, b8 %0); // L21C1F1
lsss(b8 %0, 8) %13, nope(),
calltr1(b0 %13, "foo", b9 %0) %4; // L14C1F1
eql(b14 %0, b0 %15) %14,
con(w(100ul)) %15,
retntr(b6 %13, b5 %4), // L21C1F1
retntr(b0 %14, b2 %5), // L21C1F1
retnfl(b0 %14, b14 %2), // L21C1F1
rd(w(4ul)) %17, // L18C1F1
rd(w(4ul)) %16, // L18C1F1
rd(w(3ul)) %5; // L20C1F1
All the excess EBBs have gone away, so there will be at most one misprediction in the switch, which is as good as a table-based switch can do, but without the possibility of a data miss on the table. As the whole code is four instructions, executing it will take at most four cycles (plus a miss if any) which is substantially better than the alternative.
Some of what the specializer does in –O3 is new, and if you read the code carefully you can find a couple of places where the result is less than optimal. Have fun finding them 🙂
- This reply was modified 7 years, 7 months ago by Ivan Godard.