Mill Computing, Inc. › Forums › The Mill › Architecture › Fractional byte packing?
Tagged: bits
- AuthorPosts
I’m not sure what you mean. Do you mean bitfield insert/extract instructions, or SIMD instructions for operating on multiple elements of a short vector at a time?
From previous info, I think The Mill is supposed to have both.
Like, if you were to pack five six-bit values and two boolean flags into a 32-bit word, is there a clean way to extract them without clogging the ALU pipelines with a bunch of intermediary shift-and-adds?
You could do that with one bitfield extract instructions per bitfield you’d want to extract: in this case six.
Those would all be independent, so max parallelism would be the number of execution units that implement that instruction.
If a bitfield is at the top or the bottom of the word, then a shiftr{s|u} or andl respectively could be done instead, possibly in another execution unit freeing up a bitfield extract unit.If you have such a layout with one at bits 31:26 and one at 0:0, and there is a CPU with two execution units that do bitfield extract and one unit that does shiftru/andl, then full extraction should be possible with only two cycles of latency.
(Speaking very generically of course. Someone from Mill Computing could fill in the details)
- This reply was modified 5 months ago by Findecanor.
- AuthorPosts
You must be logged in to reply to this topic.