• Author
  • Ivan Godard
    Post count: 616
    #1734 |

    Now that we have finally gotten the patent filings for this area of the Mill in, I can explain a bit about how Mill fork() actually works. There will be more and better explanation at millcomputing.com as fast as we can get it up.

    The Mill is MAS-in-SAS. There is a global shared single address space, so if you mmap MAP_SHARED the allocation is in that global address space and all processes that reference the allocation use the same bit pattern in the pointer to reach it. Note that the bit pattern is the same, but the rights to use that bit pattern will typically differ between processes; permissions is a whole other topic.

    In addition, each turf (and in process-oriented systems each process corresponds to a turf) has a local address space that is located non-contiguously in the global shared address space. Pointers have a bit that says whether they refer to the global space or to the local space of the current turf/process. If you mmap() with MAP_PRIVATE (or without MAP_SHARED) then you get a local pointer back.

    The fork() operation creates a logical copy of the local space of the caller, located somewhere else in the global space. I say “logical” because fork uses copy-on-write to optimize the common case in which execve is promptly called before any memory is modified; this is routine for non-Mill systems too.

    Local pointers convert to global pointers by XORing high bits with the ID of the turf, and global pointers convert to local pointers also by XOR with the turf ID; XOR is like that 🙂

    After a fork(), all global pointers in the child refer to the same thing that they do in the parent, while all local pointers refer to the location in the child that corresponds to what the bitwise-identical pointer refers to in the parent.

    Any unmodified cache lines of parent-local data can continue unchanged after the fork, but a child reference will get a cache miss (because the caches use global addresses and the child’s global address is different) and must load its copy of the data line; thereafter the two cache lines are independent and may be modified separately without confusion.

    Any modified (dirty) parent-local lines in the cache must be written back to DRAM as part of the fork(), and the page tables used by the TLB set so that both parent and child use the same physical address for the data despite using different virtual addresses, and the page is marked copy-on-write. This is the usual COW lazy copy.

    The result of all of this is that fork() on a Mill requires a cache flush of dirty lines from the parent local space that is not required on (some) MAS machines, but has no other added overheads. The gain is that sharing between processes has byte granularity, not page granularity, and is vastly cheaper to use.

  • David
    Post count: 32

    “In addition, each turf (and in process-oriented systems each process corresponds to a turf) has a local address space that is located non-contiguously in the global shared address space.”

    By non-contiguously, do you mean individual local address spaces are not a single, contiguous regions in the global space?

    • Ivan Godard
      Post count: 616

      Both that different local spaces (each turf has one) are not contiguous with each other, and also the actually-used parts of any turf’s local space need not be internally contiguous.

  • PeterH
    Post count: 41

    global_address = local_address xor shift(turf_id) sounds like a kind of lightweight segmentation. Shades of the old 8086 segmentation if it had a virtual memory mapper. Though with global address space mapping 1:1 to every local space. And every local space mapping to every other, though a use for that isn’t obvious. An obvious implication is that when allocating global space to a turf some strategy must be used to avoid a collision with a forked child turf. In regard to David’s question, local space to a turf need not be contiguous, though it must not be allocated mindlessly.

    I’m thinking that if you had code like

    mystruct * foo = malloc(…);

    malloc() would return a local pointer, which the process would convert to global when needed. Is this handled by the memory IO operations?

    On the other hand, a service returning a pointer might return a global pointer, such as for a file structure open for both parent and child.

    • Ivan Godard
      Post count: 616

      The local space of a forked child is topologically similar to the local space of the parent, just at different addresses. The fork() function must allocate the new child local space such that none of that topology collides with existing global allocation, including those arising from the global equivalent of local allocations. It is not necessary that the new child have a topology-sized space all to itself, but its local must fit in the unallocated holes in global space.

You must be logged in to reply to this topic.