Mill Computing, Inc. Forums The Mill Architecture Metadata Reply To: Metadata

mermerico
Moderator
Post count: 10

I’m not sure that’s exactly what we want. To get maximum performance, we should probably use this algorithm: Naive algorithm taken from Nvidia GPU Gems

1: for d = 1 to log2 n do
2:     for all k in parallel do
3:          if k >= 2^d  then
4:              x[k] = x[k – 2^(d-1)] + x[k]
(from NVIDIA'as GPU Gems page)

Alternate would be useful for d = 3, but the other steps require a shuffle.

By the way, I would be personally excited for faster cumulative sum because it is useful for implementing fast gaussian blurs.

  • This reply was modified 10 years, 5 months ago by  mermerico.
  • This reply was modified 10 years, 5 months ago by  mermerico.