Mill Computing, Inc

Moderator

February 9, 2014 at 1:52 pm

Post count: 10

I’m not sure that’s exactly what we want. To get maximum performance, we should probably use this algorithm: Naive algorithm taken from Nvidia GPU Gems

1: for d = 1 to log2 n do
2:     for all k in parallel do
3:          if k >= 2^d  then
4:              x[k] = x[k – 2^(d-1)] + x[k]
(from NVIDIA'as GPU Gems page)

Alternate would be useful for d = 3, but the other steps require a shuffle.

By the way, I would be personally excited for faster cumulative sum because it is useful for implementing fast gaussian blurs.

This reply was modified 10 years, 8 months ago by mermerico.
This reply was modified 10 years, 8 months ago by mermerico.

Reply To: Metadata