Several questions, with only some answers 🙂 In the future, please put each question in a separate posting.
First, regarding mixed width access. There are no dynamic-width loads; Mill access widths are static, so a particular load op is bound to a particular width. But the logic that tells what width to use can be used to predicate selectively among the hardware-supported widths, essentially a switch statement where each case is a different width. Any ISA can do the same; what’s different in the Mill is that the (frequently missing) branches that other ISAs need to implement the switch are unnecessary. Instead, the code will fire off all the different-width loads at once, with each guarded by the width predicate so only one actually gets to memory.
How long that takes depends on the provisioning of the Mill member running the code. For a mid-range Mill with two load units it takes two cycles; the predicate generation will overlay with the load instructions for free.
For your second question, about async DMA. There are two approaches. One can put explicit device-specific hardware in the configuration that accumulates and buffers data and interrupts a CPU when some desired number is available (or a timeout happens). Such hardware is common in the embedded world, and would work as well with a Mill as with any other ISA. Alternatively, one can use explicit Mill facilities for that kind of access, but those facilities are NYF. Sorry 🙂
Lastly, timers. Yes, there are timers, of several sorts that can be configured into a particular member chip. These are (in general) accessed through MMIO. Like any memory access, code can only use MMIO to addresses (and hence devices) for which it has permissions in the PLB. A permission manager can ensure that different threads don’t stomp on each others’ countdown.