Implicit splat: We used to have implicit splat; makes the compiler easier. But when the hardware crew started on the implementation it turned out to be a big hit on the clock rate, paid by all operations whether used or not. So it was taken back out.
Endianness: Internally little-endian, your choice for memory.
Bool vector from rotating smear: The bool vector is not a bit vector, it’s a vector of ordinary operand which happen to be either zero or one; the width of each bool element is whatever the width metadata says it is, just like for ints. As a special case, a vector argument to a conditional branch tests the element that corresponds to the highest memory address if the vector is loaded from memory (or computed from something that was loaded). This saves an extract operation in the common case of memory-upwards loops and smeari. For memory-downwards loops (less common) the explicit extract would be necessary to get a testable bool. Instead, smearil (left inclusive smear) produces an already-extracted exit condition bool, the way that smearxr (smear right exclusive) does, as shown in the video.
Smearx requires either a second, scalar, result (as shown), or a rotating smear followed by an extract (at no gain over the existing), or branch variants that test each end of the bool vector (doable but branches are already pretty cluttered). Sort of by definition smear is only useful in loops, and in a pipelined loop latency is irrelevant, so the extra cycle doesn’t bother us