Mill Computing, Inc. Forums The Mill Architecture Fractional byte packing?

Tagged: 

  • Author
    Posts
  • NXTangl
    Participant
    Post count: 21
    #3989 |

    Any thoughts on instructions to quickly load and store bit-packed representations? They are commonly used in graphics programming, such as here and here, and in general should help with memory-bandwidth-bound tasks…but isn’t very useful for general code, I’ll admit.

  • Findecanor
    Participant
    Post count: 34

    I’m not sure what you mean. Do you mean bitfield insert/extract instructions, or SIMD instructions for operating on multiple elements of a short vector at a time?

    From previous info, I think The Mill is supposed to have both.

    • NXTangl
      Participant
      Post count: 21

      Like, if you were to pack five six-bit values and two boolean flags into a 32-bit word, is there a clean way to extract them without clogging the ALU pipelines with a bunch of intermediary shift-and-adds?

      • Findecanor
        Participant
        Post count: 34

        You could do that with one bitfield extract instructions per bitfield you’d want to extract: in this case six.

        Those would all be independent, so max parallelism would be the number of execution units that implement that instruction.
        If a bitfield is at the top or the bottom of the word, then a shiftr{s|u} or andl respectively could be done instead, possibly in another execution unit freeing up a bitfield extract unit.

        If you have such a layout with one at bits 31:26 and one at 0:0, and there is a CPU with two execution units that do bitfield extract and one unit that does shiftru/andl, then full extraction should be possible with only two cycles of latency.

        (Speaking very generically of course. Someone from Mill Computing could fill in the details)

        • This reply was modified 5 months ago by  Findecanor.

You must be logged in to reply to this topic.