There is currently no way to load a partial vector. The interface with the cache could handle it (we send a byte-mask anyway), but it would be difficult to encode: we already need morsels for base, index amd width, and up to the four bytes of manifest for an offset. If (for example) the target is 32 bytes high and it’s a byte vector to load, that’s another 32 bits in the encoding for the mask. That doesn’t fit in one slot.
The “normal” reduction expansion produces None if any element is None and NaR if any is NaR. To get the behavior you suggest (to treat None as the identity element of the reduction operation) then you would: (assume is a byte vector)
rd(bv(0)), // get a vector of zero-bytes isNone(<data>), // test the data for none pick(<isNone>, <rd>, <data>); // replace the Nones with zero <reduce sequence using <pick> > // reduce
Your suggestion that there should be a sqrt op that the specializer replaces with emulation is in fact what happens now; we don’t expect to ever do a sqrt (or friends) in native, but it’s easy to co-opt the specializer’s emulation mechanism for intrinsic routines. I don’t think of them as microcode, because they are visible to the user and get scheduled mixed in with other macrocode, but YMMV.
Hazards are a pain. The specializer can do them (and does for FMA), but they louse up the schedule for the rest of the code. When a hazardful op offers no real advantage over hazard-free emulation then we’ll leave it out.