The general answer is to recognize that this is a pick reduction, and yields to the standard reduction strategy of a logN chain of alternate() ops applying the reduction operator at each stage. The operator here would pick the non-zero (or non-None) value at each stage, leaving the chosen value as element index zero at the end where the extract() op would yield it as a scalar.
Be aware that this is likely not the last word on this question and on reductions in general. We have made sure that vector semantics is correct, but have not paid much attention to vector performance, and won’t until auto-vectorization is working. An add reduction inherently requires logN adds, but there may be better ways to express that than the present alternate tree. Your pick reduction also fits nicely into the shuffle hardware – but it’s not clear how to fit it into the ISA yet.