Forum Replies Created

Viewing 4 posts - 1 through 4 (of 4 total)
  • Author
    Posts
  • CarVac
    Participant
    Post count: 4
    in reply to: Pipelining #1232

    I just asked on Reddit because the password recovery email took its sweet time, but how large is the spiller in a typical Mill family member? Gold, copper, tin.

    I am interested in how large a loop can be pipelined for something like a box blur, which has 3 colors times blur length of info that it would need to save across loop iterations. I would be interested in blur lengths of ~20 pixels, making it need 60 saved belt entries. Is this feasible?

    • CarVac
      Participant
      Post count: 4
      in reply to: Pipelining #1240

      I just realized that the blur in my algorithm only happens on a single channel array, so ignore all mentions of three colors. ::sigh::

      That was used in another context of the same program, and it was actually faster than separate arrays for each color, at least on x86. It is a weird algorithm indeed.

      My original question still stands, though.

  • CarVac
    Participant
    Post count: 4
    in reply to: Pipelining #1236

    It could be if bandwidth to and from the GPU is unlimited and all of the work can fit there.

  • CarVac
    Participant
    Post count: 4

    What can you change in CPU design to reduce memory bandwidth issues? Would you be able to, say, have 4 or 6 memory channels instead of the 2 typical today?

    Given the memory bandwidth limitations and the fact that Mill can pipeline AND vectorize loops, wouldn’t trading a bit of latency help throughput on these very short loops?

Viewing 4 posts - 1 through 4 (of 4 total)