Another fun thing with a single address space is that you can directly link calls to the loaded library code, instead of using a PLT. So the overhead of dynamic linking would be a 60-bit constant instead of whatever relative calls happen to be.
To do lazy loading of libraries we need to patch the constant to refer to the loaded library. You said that the Mill does not support self-modifying code, which does not bode well for such a feature. Is stopping all threads and flushing just the constant from the instruction cache sufficient?