OK, I just watched the whole video. I would love to be part of supporting/developing the C API on top of turfs, as I think this is really cool. One question stuck out, which is that you say you have global addresses and the TLB *after* the cache, and then you say you have a local bit. But if the local address space is 60-bits you would have to put a TLB *before* the cache to support local addresses. Why not instead shrink the local address space to something smaller like 32-bits (convenient because applications generally have to support 32-bit arches, so you can just compile apps in 32-bit mode), and then just implicitly fill in a zero top with the thread+turf id, and thus preserve the global address space model, while still supporting the local addressing abstraction?
 As you pointed out, the C API is insufficient, because there is no concept of a slice (pointer and length, with constant property) to transfer address space.