1) 0-latency deferred load when there’s no work to overlap. Turns out that it’s expensive to stall issue; modern cores don’t stop on a dime. Hence it’s cheaper to assume a D$1 cache hit and insert the requisite no-ops. However, in nearly all cases the no-ops occupy zero bits in the code stream, so they are free; see Encoding.
2) There is an abandon mechanism for pickup loads and other in-flight state. NYF, but will be covered in the Pipelining talk.
3) Locks: the Retire Stations do snoop on the coherency protocol in multicore configurations. Multicore will be a talk, but not soon.
- This reply was modified 8 years, 4 months ago by staff. Reason: fix link