The Mill's idea for encoding the size and scalarity in metadata is an amazing trick for reducing the number of opcodes. It has been mentioned many times that metadata does not include type information saying whether something is an integer vs a pointer vs a float, etc. I would like to know if this idea was ever given consideration and why it was determined to not be a good tradeoff. Naturally, it would reduce the number of opcodes by a lot, so you wouldn't need e.g.
add vs
addp vs
addf vs
addd, you would just have
add. This would apply to
mul,
sub,
div,
rem,
eql,
ne,
lt,
le,
gt,
ge,
max,
min,
narrow,
widen,
root, and more. There is the downside that you'd need instructions akin to what
widen and
narrow do for size, so you'd need some kind of
reinterpret_cast instruction which can write to the metadata bits to change the type. This would make casting take a cycle, as opposed to being completely free on conventional machines, but I doubt it actually comes up a lot that people cast from float to integer and vice versa. This could also affect the
convert operations if desired, but those are not planned to be implemented in hardware anyway so far as I know (although f64->i32, i32->f64, and a few others are
built-in to Arm, which speeds up JavaScript, 4 cycles for float->int and 6 cycles for int->float on A76, but they might/probably use microcode). This idea would have the same net effect as the size and scalarity metadata tags, which is that more of the bits that would have been in the opcodes would be moved into the metadata, which could be a win. There are a few drawbacks I can think of though. For one, not all instructions can benefit, and some probably shouldn't. E.g. "signed" right shift you'd probably still want to be a separate opcode from "unsigned" right shift, since it is not uncommon to want to do a sign-extending right shift on an "unsigned integer" (scare quotes because it might actually be a bitvector, not a numeric value), and you probably wouldn't want to pay an extra cycle to switch back and forth for ops that. (Some languages have two shift operators for this reason,
>> and
>>>. One nice way to compute the average of two signed integers is to do
(a + b) >> 1, which allows overflowing safely into the sign bit and then does an "unsigned" shift right to divide by two.) You also wouldn't necessarily have to move ALL types into metadata, e.g. it's not clear to me whether unsigned/signed should even be encoded in metadata by such a scheme. Maybe it would make more sense to just leave that encoding in the instructions but encode integer/pointer/float? The other downside is that you'd need more metadata bits although I am not sure how many. If you encoded it enum-style using successive integers you'd probably only need 2 or 3 extra metadata bits, but it would take more if you encoded it bitmap style. I wonder if the floating point metadata bits could be repurposed for non-floats? Another potential downside is that loads and constants would need to be able to encode those extra bits unless you want to
cast each one (could ganging help?).
This idea could also lead to more instruction-level polymorphism in certain cases too, potentially reducing code size, although obviously if floating point adds take several cycles they couldn't be scheduled interchangeably with integer adds that take one cycle, so it would only work in certain cases but things like pointers and integers should be interchangeable. It
could also lead to more instructions being supported for free. E.g. pointer ops are currently limited to
addp and
subp, but with this scheme the Mill could support all the arithmetic and bitwise operators on unsigned 60 bit integers (where the upper 4 are ignored), if that would be desireable. It also would lower the cost of adding more types, like e.g. 52 bit integers which are important for languages like JS and Lua(JIT) which use NaN tagging and allow 52-bit integers internally (double floats can also represent all integers up to 2^53 exactly, after that you can only represent every 2nd integer until the next power of 2, then every 4th integer until the next power of 2, then every 8th, etc. I know Ivan Godard knows this, as he was on the committee fighting against NaN's.) I have no idea how cheap or expensive it is in hardware to actually execute though. You could also go full LLVM and just support any bitsize smaller than the maximum supported unit, although that could mean adding a whole extra byte of metadata, so probably not worth it.
I also wonder if this would ever improve pipelining (across multiple instructions?) if there was ever a situation where you wanted the same
add operation to work regardless of whether the operand is a pointer or an integer, although I doubt that comes up much in a statically-scheduled belt architecture. It could be useful for eliminating branches in some cases though. E.g.
(cond) ? x + 1 : y + 1 could
pick between
x and
y and
add 1 regardless, even if
x is an integer and
y is a pointer. However, considering that the
pick phase comes after the
op phase, that might not work or would be more expensive than just speculatively computing both? Might it come in handy in open-code though to increase ILP?
By the way, this wouldn't
necessarily have to effect Mill assembly as it stands today. One could "just" make the specializer handle it, and infer how load/constant ops should encode the type based on usage and insert the necessary casts in places where a value is used in an op for a different type. As mentioned earlier, this could be "tuned" to an extent by making certain ops more agnostic of type metadata, e.g. like zero-filling right shift and sign-extending right shift could stay separate ops and work regardless of signedness, even if signedness goes into metadata. (And you might just want to keep signedness out of the type metadata in general depending on what the tradeoff looks like.)
Anyway, that's all I could think of. What other tradeoffs would there be on the Mill and what would the cost/benefit roughly look like?