Time overall will be determined by the element size (set by the program) and vector size (fixed by the member specification) which will set the value of N for the logN stages. A vector add is one clock, for integers <= 32 bits. A shuffle is one clock too, so you are looking at two clocks per stage. However, I have a sneaking suspicion that it may be more complicated to deal with None/NaR the way you want (a normal reduction would return None/NaR if any element was). I encourage you (and any other interested readers) to play with the code and see what you can do. You may well find that you would really like some kind of specialized shuffle or other specialize operation, and we can and would add such an operation.
Reply To: Metadata Ivan Godard