Regarding context switches, how much of the permissions buffer is cached? Given how it operates in parallel with L1 cache, I’m thinking the permissions cache would be about the same size. What I’m unclear on is how much of the total that represents and how often a portal call will need to load new permission table elements. Then again, this could be handled along with the call prefetch.
Regarding SIMD, I recently read a report on some benchmarking with a test case resembling
a++; b++; d++;
that ran slower than a test case applying to all 4 vars. The case incrementing all 4 could use x86 family SIMD. Applied to the mill I can see this case being implemented by loading from all of a through d then applying a null mask on the var not being altered.