Add-reductions keep coming up in my mind when doing 3D (e.g. as game and graphics engines will be doing buckets of). In 3D graphics there are lots of vectors which are 3 or 4 long.
I imagine that, whilst belt vectors are powers-of-2 in length, you can load a non-power-of-2 vector, and the load automatically pads it with Nones? So if you
load(addr,float32,3) you actually get
And you’d want an add reduction to treat Nones as 0 rather than propogate them.
shuffle sounds useful for computing cross-product.
Generally in games/graphics you want sqrt, inverse sqrt and dot product. You also likely want to sum a vector again when you do matrix multiplication.
My thinking would be that in the Mill IR sqrt, inverse sqrt, sum reduction, fork/exec/vfork, memcpy and memmove etc are built-in functions, and the specialiser on each target turns that into single or multiple operations as the target supports. So that’s like microcode (or standard function inlining), but in the specialising compiler rather than in the outer compiler or on-CPU. It would be a hassle for a specialiser to have to unravel some old IR that is coding its own sqrt loop using a lower-level operation if there is ever hardware with better built-in sqrt, for example?
And as for hazards, we all want to avoid them, but pragmatically if its the specialiser that has to know about them and it has to know about one of them, it might as well open the floodgates and have a few more 😉