I’m strictly an amateur at GPU design, but I agree that memory bandwidth is a major issue. What I figure is you want to read in a cache line worth of pixels, process those pixels against a cached set of polygons and then write the results out. This minimizes bandwidth to the output framebuffer. Not too complex scenes, limited by space for polygons and not switching pixel shaders, might be rendered reading and writing back each set of pixels only once. Cache space and bandwidth for texture buffers is still an issue without specialized hardware.
Reply To: Many core mill (GPU) PeterH