Forum Replies Created
- AuthorPosts
- in reply to: How similar is Mill access protection to CHERI? #3745
Thank you for adding more details to my response to OP. Some further details:
1) Any system that relies on distinguishing caps from other data must deal with C unions and the equivalent in even more typeless languages such as asm. If an integer is added to a a memory location, is that an integer sum or an address increment? Similar problems arise when pointers are not simple memory addresses, as in XOR-bi-directional lists, for example. Grants avoid this, because in grants a pointer is whatever is given as an address to the LOAD and STORE instructions, regardless of how the address was constructed. The CHERI approach is not wrong given that they chose to use caps, but there will still be programs that violate typing enough to require rewrite.
2) Yes, Mill decouples paging from protection. All running programs share the paging; each protection domain (a turf in Mill terminology) has its own rights, which can cross page boundaries.
3) Both CHERI and Mill have chosen a unified single address space, and yes, that choice greatly improves performance on general purpose codes.
4) Revoke is a problem in any protection system that allows sharing. Caps systems generally have fine-grain protection at the granularity of the individual malloc allocation, while grants systems usually have coarse grained protection at the granularity of the mmap call or file open. The fine grain of caps leads to having many more things to keep track of than are present in a grants system, leading to garbage collection, while a grants can use a single-owner model and update (and revoke) rules typical of a conventional process model.
However, revoke offers semantic issues as well as implementation troubles: just what does it mean to revoke a right? If I create an object with certain rights, and I pass it on to you with its rights so you can use it too – can you in turn pass it on to a third party? If so, and I then revoke it, is the revoke only for you an not him, or for both of you? The meanings get messy.
- in reply to: How similar is Mill access protection to CHERI? #3740
We’ve had a fairly close relationship with the CHERI project for quite a while. Both projects are addressing the same issues, just in different ways: CHERI is a capability (caps) system, while Mill is a grant system. Both want to support localized security, so that programs and program parts are limited so they can’t use or misuse data that doesn’t belong to them.
In a capability system, rights are attached to references, i.e. pointers. In a grant system, rights are attached to data, i.e. what the pointers point to. In practice both work well and provide similar security in the common usage; the difference in semantics only comes up in the details. There are also differences in costs, and part of each project has been focused on architectural ideas to bring those costs down.
Because caps carry rights indications, a caps architecture must prevent forging of pointers and their rights; you can’t just take a random integer and call it a pointer, which C programs tend to do often. Grants have no rights in their pointers, so grant pointers can be built in traditional ways.
Caps pointers need space for the rights indicators, typically needing more bits than are free in a usual pointer size. Consequently caps are usually implemented as wide pointers, which causes trouble when the pointers must be stored in externally defined data structures that expect a traditional pointer size. The problem can be somewhat avoided by adding a level of indirection. The indirection cost can be mitigated by careful caching of the indirections; this is the approach CHERI has taken. A grant pointer is the usual size, and so doesn’t have the wide pointer problem.
A pointer-connected graph structure is a good example to show the difference between caps and grant semantics. Say you want to call a function to let it work on the graph, passing a pointer to one of the nodes (the “entry node”) of the graph, and giving it the necessary rights to get the work you need done. In caps like CHERI, you must mark the pointer with rights that let the callee do its job; this may not need any more than just copying a pointer you had that already gave you those rights. This pointer lets the callee access the node it points to.
But that entry node is just one of maybe many nodes in the whole graph. It probably contains pointers to other nodes – and those have pointers to still more, forming the linked structure of the whole graph. If the callee wants to access the next node on from the entry, it gets the pointer to that node out of the entry node and follows it. But that pointer carries rights too, set up when you created the whole graph. If you set it up right then the callee can wander throughout the whole graph, following pointers at will.
In contrast, a grants system attaches the rights to the entry node, or rather to the memory address range occupied by the node. As part of the function call, you (the grantor caller) say where and which rights you are giving to the grantee callee. The granted node contains pointers, just like in the caps case, but in grants those pointers have no attached rights. To let the callee wander over the graph you must grant rights to each node in the graph.
And there is the difference: in caps, it is easy to give rights to a whole structure, but hard to give rights to only part of a structure like just the entry node; you can’t stop the callee from following the internal pointers. In grants, it is easy to give rights to a single node, but hard to give rights to a whole linked structure.
There are workarounds on both sides. In caps, you can build the inner links out of non-traversable pointers that are keyed on something about the grantee, like an unforgeable ID. In grants, you can arrange that all the nodes are allocated from a memory region (an arena) and then grant the whole arena. And there are hybrid systems.
Which is better? It’s a mass of tradeoffs, too much to go into here. Mill chose grants primarily due to the wide pointer and pointer forging problems; we wanted to be able to compile existing C programs and have them run correctly and performatively without modification. CHERI thinks they can do that with acceptable performance and not that much rewrite in a caps model. Frankly, I’d prefer caps if it can be done – but it’s not clear caps can do it in general purpose applications.
- in reply to: Project still alive? #3723
Not stalled, just a lot of work done and a deliberate choice to hold off on filing new patents (there’s another 20 or so waiting for time and money) and talks in favor of development, until the next round and the resulting structural changes. COVID has had minimal impact on us: we were already a distributed work-from-home team. We may provide a kick-the-tires system on the cloud for people to play with; not yet decided.
- in reply to: The Mill's Competition: Can it still win? #3766
There’s a Python interpreter in the test suite, but no WASM unless it’s been added since I last looked. I’ve passed your suggestion on to Leon, who’s the one that deals with the suite. We’ll see if we can find an open source that doesn’t make too many rash assumptions about platform (for example lots of x86 ASM buried in the code). Thanks for the reminder.
There hasn’t been much news because we have no particular reason to put news out, and some reason to work on code instead of publicity.
The past year has mostly been spent on the uncore and tools; the ISA itself is essentially finished. Uncore includes behind-the-scenes parts of the chip like the PLB, TLB, interrupts, and spiller. Tools include the compiler driver, the post-linker and pre-linker, debug support, documentation, and LLVM intrinsics.
Sometimes it seems to us like it’s taking forever too. As a Stranger said: “Patience; waiting is.”.
Interesting bedtime reading; thank you.
I had one concern: the compaction phase is static preprocessing, which is fine for a fixed corpus but doesn’t really work when the a priori weights are unknown. Compaction looks to be the bin packing problem, and you shouldn’t (I surmise) stick a NP step in the processing. I wonder whether true isolation of the kernels is really necessary though – if the kernels are sparse enough, shouldn’t it be possible to just slam them together randomly and let any collisions be “learned around”in the style of a Bloom Filter?
Can the Mill serve as the “simple GPs”? How many transistors does one Mill GP take if its architecture is biased toward sparse array matrix multiplies and/or sparse boolean array (with bit sum) operations?
We clearly have a home in the control-processor role, but the actual ML bit-banging sure seems like it needs a dedicated architecture, not a general purpose one. Mill can of course put in the same operations and accelerators as any other CPU. It can do it with a faster manufacturing turn too, because of the specification-based design. I’m pig-ignorant about ML, but my impression is that the problem is not the computation, it’s all about getting data from here to there. Bio is self-modifying and basically analog, which Mill is not, nor is anything else built with the tools and fabs used for CPUs today. We do have some NYF stuff in the pipeline that addresses on-chip distributed memory, but frankly that’s for conventional programs, not ML.
You clearly are deep into the subject – care to tell what you’d like to have?
Mill cores build on the same processes as any other core. If the limit of core count is the mm^2 occupied by cache, pin pads, and other sparable regular structures then the cores-per-die count will be the same as other ISAs using the same amount of cache etc. If the limit is yield per wafer then we expect 2X more Mill cores, because the un-sparable space occupied by a Mill core is half the size or smaller than a conventional OOO architecture of similar performance, and it’s the part of the chip that you can’t use sparing on that dominates yield.
Had to look up Carpendale 🙂
We continue to feel little time pressure; our advantages (and drawbacks, such as they are) seem to persist. The industry and market have essentially given up on ISA advances; there have been process advances, but they apply to us as much as anyone. The work we can do in house results (eventually) in money to us instead of money to funders.
We know we can’t dawdle forever – patents time out, as do people. But for now, steady as she goes.
- in reply to: Project still alive? #3738
The Forum is driven by posts from outside. We do reply to all posts; no questions means no answers.
There is an internal Wiki as well as the external one. The problem with updating the external one is that most of the new stuff we could update it with is Not Yet Filed, and so by USPTO rules we can’t publish without losing patentability. This is a deliberate choice – we’re holding off on twenty or so filings because we don’t want to start the clock on them. So that’s why you don’t see a new comprehensive instruction list (for example) on the public Wiki – disclosing new instructions potentially loses IP. That’s also why there haven’t been new videos for a while.
- in reply to: Project still alive? #3736
@kwinz: we agree
Thank you for your support and enthusiasm. The company has done fine over the past year; being already a global virtual company almost nothing changed for us. But it’s been a rough year personally for many of us, especially on of the compiler team who spent a month in hospital in Poland an is still not fully back.
About your recent fund-raising experiences: In your VC contacts, were you face-to-face or video? Had you been vaccinated?
Personally I’m in a high-risk group, and more than a little paranoid about the situation here in the US.
- in reply to: Memory Allocation #3710
Turf and thread names are just another resource, and are expected to be handled the same way memory is; the software that implements the services defines what the policy is, not the hardware.
WKRs are just optimizations of the PLB; they catch common cases and save the power and delay of a PLB/table search. For example, the great majority of branch/call transfers are within the current object module. The code WKR, which describes the address range of that module, is checked first before more expensive things.
- in reply to: Memory Allocation #3708
Turf change is opaque to the running code and may vary in implementation. It occurs only during Portal transit during call or return hardware operations.
In a typical form, permissions are represented in three ways: the set of Well Known Regions, the Permission Lookaside Buffer, and a table in memory. The WKRs are reloaded during Portal transit; the PLB is scrubbed by transit; and each turf has its own table whose base is in a dedicated hardware register. A check is made to a particular WKR based on the kind of reference, then to the PLB, then to the table. Implementations may vary in such things as to whether the PLB is single-level or multi, and whether table search is hardware or software varies.
Mill does not have the SeL4 problem of needing to know permissions at thread creation. Because we use a grant model rather than a capability model, you can create a turf with no permissions at all and then add permissions dynamically later by subsequent grants of permission.
- AuthorPosts