Forum Replies Created
- CarVacParticipantJuly 31, 2014 at 2:42 pmPost count: 4
I just asked on Reddit because the password recovery email took its sweet time, but how large is the spiller in a typical Mill family member? Gold, copper, tin.
I am interested in how large a loop can be pipelined for something like a box blur, which has 3 colors times blur length of info that it would need to save across loop iterations. I would be interested in blur lengths of ~20 pixels, making it need 60 saved belt entries. Is this feasible?
- CarVacParticipantJuly 31, 2014 at 3:40 pmPost count: 4
I just realized that the blur in my algorithm only happens on a single channel array, so ignore all mentions of three colors. ::sigh::
That was used in another context of the same program, and it was actually faster than separate arrays for each color, at least on x86. It is a weird algorithm indeed.
My original question still stands, though.
- CarVacParticipantFebruary 8, 2014 at 8:09 amPost count: 4
What can you change in CPU design to reduce memory bandwidth issues? Would you be able to, say, have 4 or 6 memory channels instead of the 2 typical today?
Given the memory bandwidth limitations and the fact that Mill can pipeline AND vectorize loops, wouldn’t trading a bit of latency help throughput on these very short loops?