I’m sorry that LLVM is frustrating your team so much. I’d be happy to try to help with that, on my own time, but the logistics are probably too messy. (I’ve got language designitis, too, with a few languages under my belt.)
I was actually thinking of manual vectorisation in this case, and wondering about the situation in which we can’t afford a load or the loop-carried values are calculated. Though I’m thinking the answer might be the NYF streamers.
Many of my queries seem to boil down to wanting to rearrange vector elements. In this case (and in many others, I think), a shuffle that can use two element sources would do the trick, though two shuffles and a pick can do the same, if they can all run at the same cycle edge.