Mill Computing, Inc. › Forums › The Mill › Architecture › Metadata › Reply To: Metadata
I’m not sure that’s exactly what we want. To get maximum performance, we should probably use this algorithm:
1: for d = 1 to log2 n do
2: for all k in parallel do
3: if k >= 2^d then
4: x[k] = x[k – 2^(d-1)] + x[k]
(from NVIDIA'as GPU Gems page)
Alternate would be useful for d = 3, but the other steps require a shuffle.
By the way, I would be personally excited for faster cumulative sum because it is useful for implementing fast gaussian blurs.