Forum Replies Created
- AuthorPosts
As I read the problem description, N is in a set of 2^24 records of 12 members each, so that data clearly would not fit in cache. If N is in a linked list with scattered order, cache thrashing could be fierce, with near every new N hitting the full latency to RAM. ~5M cycles total estimate using 300 cycles/RAM access. The size of an individual member isn’t given, guessing 32bits. I’m also figuring 156 loads from cache (small register set) to perform a set of 144 comparisons between members of N.i and V.j in the SISD case. ~18M cycles total estimate. Still a log way from accounting for the time reported.
Threads, vfork(), page aliasing, and copy on write all appear easy enough with mill mechanisms already described. fork() has the added complication of getting pointers defined by the parent to work with the child address range. My best guess is something along the lines of segments might be used. Done right the segments might be transparent to the application a majority of the time.
A JIT producing generic code and calling a service to specialize strikes me as a policy decision, not exactly forced by the architecture. But I can see doing it that way as strongly advisable in the general case, especially if the code is expected to run on other than a very narrow selection if Mill.
- in reply to: Prediction #848
So while a function call, aided by prediction, will usually take a single cycle, a portal call will need at least as long as 2 fetches from L1 code cache, one load for the portal itself and a second for the destination code?
- in reply to: Many core mill (GPU) #470
I’m strictly an amateur at GPU design, but I agree that memory bandwidth is a major issue. What I figure is you want to read in a cache line worth of pixels, process those pixels against a cached set of polygons and then write the results out. This minimizes bandwidth to the output framebuffer. Not too complex scenes, limited by space for polygons and not switching pixel shaders, might be rendered reading and writing back each set of pixels only once. Cache space and bandwidth for texture buffers is still an issue without specialized hardware.
- AuthorPosts