Thanks for the encouragement!
On the first point, if the OoO can do all 10 loads in parallel then all 10 must be independent. If all 10 are independent then the compiler can statically schedule them in parallel too.
Things actually get interesting if the loads alias stores, as on the Mill the loads snoop/snarf on the stores.
Re patents: it’s best to to hint nothing at all! Please don’t go this route of disowning ideas publically.
The metadata aspect is interesting. Of course Mill SIMD is very familiar to people used to thinking about SIMD on other platforms, but its broader too.
The widening and narrowing ops are very fast and the range of ops that work on vectors is so much wider than other ISAs.
The tracking of error and None NaRs allows vectorization of so much more (see the auto vectorization in the Metadata talk and the Pipelining talk examples, and think how you can use masking in general open code too) so its swings and roundabouts but generally a really big win.