Forum Replies Created

Viewing 7 posts - 31 through 37 (of 37 total)
  • Author
    Posts
  • PeterH
    Participant
    Post count: 41
    in reply to: Execution #673

    Is there a summary of opcodes posted somewhere we could reference?

  • PeterH
    Participant
    Post count: 41
    in reply to: Searching #471

    As I read the problem description, N is in a set of 2^24 records of 12 members each, so that data clearly would not fit in cache. If N is in a linked list with scattered order, cache thrashing could be fierce, with near every new N hitting the full latency to RAM. ~5M cycles total estimate using 300 cycles/RAM access. The size of an individual member isn’t given, guessing 32bits. I’m also figuring 156 loads from cache (small register set) to perform a set of 144 comparisons between members of N.i and V.j in the SISD case. ~18M cycles total estimate. Still a log way from accounting for the time reported.

  • PeterH
    Participant
    Post count: 41
    in reply to: Fork #315

    Threads, vfork(), page aliasing, and copy on write all appear easy enough with mill mechanisms already described. fork() has the added complication of getting pointers defined by the parent to work with the child address range. My best guess is something along the lines of segments might be used. Done right the segments might be transparent to the application a majority of the time.

  • PeterH
    Participant
    Post count: 41
    in reply to: Security #875

    A JIT producing generic code and calling a service to specialize strikes me as a policy decision, not exactly forced by the architecture. But I can see doing it that way as strongly advisable in the general case, especially if the code is expected to run on other than a very narrow selection if Mill.

  • PeterH
    Participant
    Post count: 41
    in reply to: Prediction #848

    So while a function call, aided by prediction, will usually take a single cycle, a portal call will need at least as long as 2 fetches from L1 code cache, one load for the portal itself and a second for the destination code?

  • PeterH
    Participant
    Post count: 41
    in reply to: Memory #483

    If I’m figuring this right, specReg based addresses can be made fork() safe. Data addresses getting their base from the belt, on the other hand, are not so easily rebased. I’m seeing heap memory using the latter in many cases.

  • PeterH
    Participant
    Post count: 41
    in reply to: Many core mill (GPU) #470

    I’m strictly an amateur at GPU design, but I agree that memory bandwidth is a major issue. What I figure is you want to read in a cache line worth of pixels, process those pixels against a cached set of polygons and then write the results out. This minimizes bandwidth to the output framebuffer. Not too complex scenes, limited by space for polygons and not switching pixel shaders, might be rendered reading and writing back each set of pixels only once. Cache space and bandwidth for texture buffers is still an issue without specialized hardware.

Viewing 7 posts - 31 through 37 (of 37 total)