Using real estate for more cores in preference to threading, resulting from the the Mill’s other architectural features, brings to mind a question about on-chip memory architecture that, while of no immediate consequence to the Mill chip, might affect future trade offs in real estate use.
With 14nm and higher density technologies coming on line, there is a point where it makes sense prefer on-chip shared memory to some other uses of real estate. This raises the problem of increasing latency to on-chip memory, not only with size of the memory but with the number of cores sharing it. In particular, it seems that with an increasing number of cores, a critical problem is reducing arbitration time among cores for shared, interleaved, on-chip memory banks. In this case, interleaving isn’t to allow a given core to issue a large number of memory accesses in rapid succession despite high latency; it is to service a large number of cores — all with low latency.
Toward that end I came up with a circuit that does the arbitration in analog which, if it works when scaled down to 14nm and GHz frequencies, might result in a architectural preference for a on-chip cross-bar switch between interleaved low-latency memory banks and a relatively large numbers of cores.
This spans disciplines — a problem well known to folks involved with the Mill architecture which spans disciplines between software and computer architecture (rather than between computer architecture and analog electronics).
I’d appreciate any feedback from folks who understand the difficulty of cross-disciplinary technology and have a better grasp the issues than do I.