It’s taken a while for me to wade through the RISC-V document; sorry for the delay in response. The current Mill bit manipulation operations are a historical grab bag that we had planned to review and cull/extend when we had enough of a code base to get useful numbers. Your post prompted me to start that process; it’s relatively low priority so I don’t know when changes will happen.
I’m pretty sure that we should have a funnel shifter unit for the present shift and rotate group, although that implies more gates for the widest operands (compared to a normal shifter), and there are issues applying funnel shift to vector arguments.
There has been a long-standing desire for Galois Field operations, i.e. bit matrix multiply. A 8×8 BMM is straightforward, but things get out of hand at greater width – the control data for a quad x quad BMM is 4k bytes. That much data sounds like an I/O device rather than an operation.
Another desire is bit-stream parsing, chopping dynamic numbers of bits off the front of a byte-stream. For this the bit-chopping is not hard, but dealing with whether a chop has to advance the source or not is hard in a static machine. This one will likely wait until we add general streamer support.
Lastly, even simple bit-field extract/inject is problematic when the field is dynamic. An inject needs four arguments – source1, source2, position and size – and four-argument slots would be very expensive. We could construct such an op with ganging, but then you use two slot positions, while building the inject out of shifts and masks only needs six ops. The PPC version of funnel includes a mask step the would reduce the cost of an idiom even further. It’s not clear that dynamic inject is common enough to be worth the trouble. A static inject only needs two belt arguments, but it then needs logN encoding bits for the position/size info, which may increase the entropy of the slot supporting the op.
We’ll chew on these and other issues; thank you for shaking our tree.