A couple of questions–sorry if they’re basic:
1. You mention that when the return operation cuts back a stack that it clears the valid bits on the stack frame’s cache lines. Does the clearing of the valid bits have to cascade to all levels of cache?
2. Unless I’m mistaken, the TLB is a cache of PTEs and might not contain all the PTEs in the system (i.e. it’s a cache over operation system tables, right?). You mention in the talk that that the during a load-miss that gets to the TLB that also misses in the TLB the TLB directly returns a zero, without having to go to main memory. Wouldn’t the TLB have to go to main memory for PTEs, even if it doesn’t have to go to main memory for the actual value to be returned, at least some of the time? Are you using a data structure that makes this unlikely (i.e. you can answer “not found” queries without having access to the whole set of PTEs in the TLB) or is it just the fact that you have a large TLB and the “well known region” registers cover a lot of what would otherwise be PTEs and that makes it likely that all PTEs are in the TLB?
Thanks for the answers. I’m a software guy and not a hardware guy, so I’m sorry if the questions betray a lack of understanding.