Thanks goldbug. I see they do admit in the paper that indeed compiler hoisting loads to do speculative loads before checks, can make Mill be vulnerable to Spectre. “In our analysis we found and fixed one such bug in the Mill compiler tool chain.” And the solution was to improve compiler. I am not sure how it figures out where to make these speculative loads and where not, as it is really hard to predict where untrusted data is being used, without annotations or barriers in code. (and without killing performance in some other workloads that benefit from compiler doing speculative loads).
The fact that Meltdown-like load doesn’t pollute the cache is nice, and it puts a NaR as a result. This isn’t that much different than what AMD is doing on their CPUs, where they also do protection/TLB check soon enough, and do not put stuff in cache. Intel was doing this too late in the pipeline, and it was polluting cache. So nothing extremely special about that here.
For the Spectre variant 2, it is interesting that Mill actually does restore entire CPU, including caches, to a correct state even on missprediction in branch predictor. I guess this is doable because the misprediction latency is low, and only few instructions would be fetched and decoded, and it is easy to restore data cache back (because if it was L1 miss, the data would not arrive from L2 anyway in that time). Similarly if the misprediction targeted instructions that are not in instruction cache, (which would make it a bad branch predictor anyway, and is unlikely to be a miss), they will not arrive on time from L2 either, so it is easy to cancel the load and go back into proper path.
There are examples in the paper, exactly discussing the same example I was pointing on.
It appears the solution was to actually not do speculative loads before all checks that would skip the load, are executed, if possible in single instruction, i.e.
lsssb(%3, %2) %5, loadtr(%5, %0, 0, %3, w, 3) %6;
This is nice, and has very little performance penalty. In both cases the load here will produce a result on a belt, but if the condition is false, it will put a NaR on belt. So, rest of the code can still do speculative stuff, or use this value (real or NaR) as input to other stuff (including other loads indexes for example). I really like NaRs.
And people who write more performance oriented stuff, can probably pass a flag to do more aggressive (and Spectre prone) scheduling.
I guess this is not bad.