I’m very impressed that you could write this; reading conAsm gives me hives, and writing it manually is much worse 🙂 Your code looks reasonable to casual inspection, which was all I could give it. Silver has four ALUs and three controlFUs, so you haven’t exceeded the available slot population. Using the pick to avoid another retntr is clever but doesn’t actually buy much, because you have a free control slot in the second instruction and pick encoding is not that much smaller than retn encoding.
You did catch the duplicate rd((w(4)) in the original, and that the second of the retntr/retnfl pair could be a simple retn, saving one argument morsel in the encoding.
Another thing you caught we have LLVM to thank for, not our specializer: several of the result values are the same as the corresponding selector value (cases 5/6/7), and could be collected by CSE into a group picked up by a lss() rather than eql()s, saving several rd/eql/retn sequences in the specializer output and letting you go from four to two instructions. Alas, LLVM doesn’t catch that and the author didn’t write it in the source; CSE/value numbering is not something to do in a mere code generator like the specializer. Still, it’s nice to know that there is optimization headroom still, even with the decent code we are now producing: you have an average ILP of 8 and 16 ops, while the original only got 6 ILP and 24 ops.
My thanks to @goldbug for an interesting test case. It not only shows how switches get encoded, but also demonstrates how phasing and width boosts the ILP.
And @veedrac – please go to millcomputing.com->about->join us and consider what you find there 🙂