Reply to bhurt #464
For graphics-like loads you would configure a Mill member that was narrow (6-8 total slots) but very high (perhaps 64-byte vector size) so you would have 16-element single-precision SIMD in each of possibly two arithmetic slots . That would give you a respectable number of shaders, but the problem is the load on the memory hierarchy. Each one of those vectors is a cache line, so to saturate the function units you are pulling four and pushing two lines every cycle. Granted, everything used for the drawing is going to live in a whopping big LLC, but the sheer bandwidth at the top is going to be hard.
There are ways to handle this – don’t use cache for the data, but configure NUMA in-core memory for example and push the problem to the software. But the result is pretty special-purpose; a chip with one of those and a handful of regular Mill cores is possible; we’d do fine for less graphics-intensive work. Nevertheless, for Call of Duty go to Nvidia.