(Moved from incorrect thread.)
I would like to mention that I think the Mill’s load semantics are *brilliant*. Absolutely genius work. People have been trying to make software prefetch work for years, and never getting something that works right.
By making it an actual load, just deferring the alias resolution, avoid the whole problem of haiving a second address computation, translation, cache lookup, etc.
Three questions about loads:
1. Your example seemed to have a minimum latency; on page 22 of your slides, couldn’t I ask for 0 delay and avoid the nops? Does it actually simplify things to leave out this feature?
2. Have you played with a “load abandon” instruction for pickup loads? That would let me start a likely-to-miss load speculatively, and if it turns out to be unnecessary, reclaim the retire station without having to wait for the load to complete.
3. I know it resolves same-processor aliasing, but what about multiprocessor? Does the technique let me hoist a load above a lock (e.g load-acquire)? That would definitely be a neat feature.