From following the link you give, I see the general outline of the idea. The challenge of implementing such a device will be to make it deterministic for larger numbers of cores and banks, which is where it would potentially have the greatest benefit. I did not see sufficient detail to make any determination regarding that aspect of the idea’s feasibility.

That said, the problem being attacked is determining which core gets access to which bank. There are several sources of latency here: the arbitration mechanism latency, the “bank is busy” latency, routing latency and RAM access latency. This idea only directly addresses the arbitration mechanism latency. By allowing a larger number of banks, it appears to indirectly help the “bank is busy” or “bank access collision” latency. Unfortunately, a larger number of banks or cores will also increase routing latency. So in the end, routing latency may merely replace arbitration latency as being the performance limiting factor.

