4) Will all operations have the same latency across all Mill chips? Your examples have always had that multiply is a three cycle operation. Would a Mill ever be delivered with, for instance, a five cycle multiplier?
It’s been mentioned in a few talks that the latencies are dealt with in the specializer, so they’re allowed to vary across members. I don’t remember off-hand whether any hardware instructions vary in latency, but binary and decimal floating-point were given as examples of code where the hardware implementation might not exist at all and the specializer would implement it in terms of a library call or inline instructions, effectively changing the latency of the operation.