Difference between revisions of "Protection"

From Mill Computing Wiki
Jump to: navigation, search
Line 28:Line 28:
  
 
As fast as the PLB checks are in comparison to conventional combined translation/protection, they still take time and energy you would like to avoid if you can. And the vast majority of memory accesses for each program happen to the same memory regions, namely the code, data, const data and bss segments, as well as the stack as well as the thread local storage.<br />
 
As fast as the PLB checks are in comparison to conventional combined translation/protection, they still take time and energy you would like to avoid if you can. And the vast majority of memory accesses for each program happen to the same memory regions, namely the code, data, const data and bss segments, as well as the stack as well as the thread local storage.<br />
For this reason those regions are defined via [[Registers]]. Each turf has the code, data and const data well known regions, defined by the [[Registers#cp|cp], [[Registers#ccp|ccp] and [[Registers#dp|dp] registers respectively. The program loader fills those registers, and they are saved and restored by the hardware as needed on context switches.<br />
+
For this reason those regions are defined via [[Registers]]. Each turf has the code, data and const data well known regions, defined by the [[Registers#cp|cp]], [[Registers#ccp|ccp]] and [[Registers#dp|dp]] registers respectively. The program loader fills those registers, and they are saved and restored by the hardware as needed on context switches.<br />
Each thread has the stack and the thread local storage, defined by the [[Registers#sp|sp] and [[Registers#tp|tp] registers. The stack region grows and shrinks together with the stack pointer. You can't access the stack beyond the top of the stack.
+
Each thread has the stack and the thread local storage, defined by the [[Registers#sp|sp]] and [[Registers#tp|tp]] registers. The stack region grows and shrinks together with the stack pointer. You can't access the stack beyond the top of the stack.
  
  

Revision as of 14:39, 1 August 2014

All security features on the Mill revolve around permissions on address space. There are no privileged instructions or protection rings. All processes are equal in the sense that they can share the permission they have, but only those they have, with whoever they choose, to the degree they choose.

Using the Mill security primitives in practice is not any different from calling normal a normal function in most cases, and also not much more expensive.

Protection Lookaside Buffer

In Architecture and Memory overview charts you can see that right above the highest cache lines there are the address protection units and the attached <abbre title="Protection Lookaside Buffer">PLB</abbr>s. There are two separate and specialized PLBs for data and instructions. And they all work on virtual addresses. Virtual address space is protected, independent of the translation to physical addresses.
This makes those buffers small and fast, and in fact protections can be checked in parallel with cache lookups, faulting when there are no permissions.

The reason address translation and protection were done together is mainly because historically the limited addresss space of 32bit was too small for all programs on the system, so each program got its own address space.

Regions and Turfs

So what does a region look like? It'ss just a continuous stretch in the address space with a start and an end and a few attached. They can overlap. A PLB region entry contains:

  • lower and upper bounds of the region
  • the rights, read and write for the data accesses, execute and portal for the instruction side
  • a turf ID and a thread ID, both which may be wildcarded, when both are set, both must match

All regions that share the same turf ID together comprise a turf. That's all a turf is, a collection of regions. The turf, identfied by its ID, is the protection domain the Mill operates on, in and width.
There is nothing preventing several region entries to have the same bounds, as long as the IDs in the entry are different. The IDs are maintained by hardware and cannot be forged.

Threads

A thread is mostly quite conventional. It is a contained flow of execution and identifiable via its ID. A thread always executes within its protection domain, within its turf. It only ever has one turf at any one time, but the turf can change over time by crossing protection barriers. Many threads can share the same turf.

Well Known Regions

As fast as the PLB checks are in comparison to conventional combined translation/protection, they still take time and energy you would like to avoid if you can. And the vast majority of memory accesses for each program happen to the same memory regions, namely the code, data, const data and bss segments, as well as the stack as well as the thread local storage.
For this reason those regions are defined via Registers. Each turf has the code, data and const data well known regions, defined by the cp, ccp and dp registers respectively. The program loader fills those registers, and they are saved and restored by the hardware as needed on context switches.
Each thread has the stack and the thread local storage, defined by the sp and tp registers. The stack region grows and shrinks together with the stack pointer. You can't access the stack beyond the top of the stack.


Portals

Protecting Control Stacks

Services

Interrupts

Implementation

Region Table

Granting and Revoking

Stacklets

Rationale

On conventional architectures context switches are incredibly expensive. They can run into hundreds of cycles just to change the processor core state. And on top of that comes the cache threashing and all the memory accesses to switch the working data sets. Context switches used to be used to hide memory latencies on single core machines. But over time, with improved caching and reduced memory latencies, exploding internal processor state and increasing core counts context switches increasingly become the main cause of memory stalls on conventional architectures.

As a result operating system architectures increasingly revolve around avoiding context switches, even in places where they would actually be needed from a security and stability standpoint. As a result it is still common for buggy device drivers to take down whole systems. And where the security features are absulutely unavoidable systems often spend a third and more of their cycles on context switches and related management like TLB and cache shuffling.[1]

To avoid all this, the goal should be to just pass the needed parameters back and forth between the protection domains with mutually protected code, but don't switch out all the surrounding context with it each time. This is of course an ideal and not fully attainable. But the closer you get to this ideal, the cheaper and cleaner the protection primitives become. And this is what drove the design.

All the context that needs to be switched to safely cross a protection barrier can be contained in one cache line. And in this clean and simple and cheap definition and execution of a security domain switch lies the basis of a flexible and reliable and efficient security framework. You define the gateways between the protection areas yourself and contain those definitions in small protected packages.

The protection domains, the regions and turfs, can be counted in the thousands or tens of thousands on a system. This is relative coarsely grained security. Having true capabilites, and protecting on the level of single objects and functions would be even better and cheaper from the security perspective, with even less or no context to switch and so forth. But implementing this requires non-commodity memory, and it also conflicts with the memory model of the most common languages like C, as they are used.

Media

Presentation on Security Features by Ivan Godard - Slides

References

  1. Linus Torvalds on the cost of page fault interrupts