Where does the specialization of vectorized loops occur (eg., strcpy presented in one of the talks)? It seems that genAsm should have concrete instructions already (compiler has to know the iteration offset, right?), but how’s that possible, when each member can have different vector width?