Mill Computing, Inc. › Forums › The Mill › Architecture › ASLR (security)
- AuthorPosts
- #892 |
Making this its own thread as the existing security thread is getting huge.
What impact, if any, does the Mill design (single address space, turfs) have on address randomization? Many exploits rely on data being stored by applications or the kernel at predictable addresses, so modern OSes randomize where they place their structures. This is at the expense of determinism though, so if the existing Mill security is sufficient then not needing ASLR would be nice, but the Mill’s protection for the stack doesn’t necessarily prevent tricking apps into exposing say secret heap data, so ASLR still seems desirable. If I understand right all Mill code is PIC so this seems doable, unless there are hurdles in NYF (fork?).
Address Space Randomization has been primarily a defense against Return-Oriented programming exploits, which are impossible on a Mill. In the absence of any way for an exploit to execute arbitrary code, it’s not clear what data location randomization would buy you. We’re not claiming that an exploit cannot trick a Mill program (exploiters are clever), but we have no example that does in a well structured program.
For example, say the application has secret data, and also has an API that will print out non-secret data from a buffer. A buffer-overrun could alter the program pointer to the buffer to point to the secret data instead, and a following API request would then print out the secret. One might think that ASDRE would help in this case, but it’s actually unnecessary as well as insufficient.
Properly structured, the app would put all secrets in a service in a different turf from the application and the user API. The exploit could then smash whatever it wants in the app, without gaining any access to the secrets.
Consequently we (so far) feel that ASDR is a a snare, giving a false sense of protection, and should not be used. This is true both on Mills and other machines. It is perfectly possible to do the same separation on a conventional by using processes to isolate the secrets from the exploiter, and IPC to access the consequences of the protected information but not the protected info itself, just like a service on the Mill.
The reason it’s not often done that way is a matter of performance and coding ease; mutiprocess code on a conventional is difficult and slow, so ASDR is suggested as a half-hearted kludge alternative. Service code on a Mill is easy and fast, so there’s no need to pile ASDR on top of it.
We’re open to counter-examples 🙂
The advantage of the Mill regarding return oriented programming is it increases the security of existing code. I’m not sure if by well structured you mean changed to take advantage of the Mill security model. Assuming you are interested in protecting users against poorly written code (after all if you’re not just tell the user to port to a language with bound checking) here is a counter-example of sorts:
In a login program I have a heap allocated struct containing a buffer for the user password followed by a function pointer pointing to what should be executed when a login attempt fails. The program fails to check for the user entering extremely long passwords, and the user knows where the login success function is usually located. They enter an extremely long password, overflowing the buffer and writing the address of the login success function into the function pointer that points to the code to execute on login failure. The program decides the user password is invalid, follows the function pointer, and grants the user access. If the location of the login success function is randomized, the attacker is thwarted.
Note that no return oriented programming or digging through stack rubble was needed. Returns are convenient on conventional architectures because they are a frequently occurring jump that can be influenced by corrupt data, but every C function pointer and every C++ vtable pointer can potentially be used as well. These attacks are harder to pull off because you don’t always have conveniently located buffers next to function pointers, and vtable pointers are usually at the front of objects, but there have been real vulnerabilities where the attacker relies on knowing two objects will be next to reach other on the heap and overflowing a buffer in the first to overwrite the vtable pointer in the second. A lot of the browser hacking competitions are won by people stringing together long sequences of this sort of gymnastics.
By “well-structured” I mean “a structure that would be secure on a non-Mill”, typically through use of processes where a Mill would use services.
In this context your login example is poorly structured. On a conventional machine, the exploit can be eliminated by putting the password checker/secret keeper in a separate process that records and returns (via IPC) a pass/fail boolean. The (buggy) UI can suffer buffer overwrite and be arbitrarily perverted by the attacker, but the keeper cannot be because it is getting its arguments via IPC rather than via read().
This well-structured and secure, but expensive and inconvenient on a conventional. The same basic strategy can be applied to any app on any machine: put threat-facing code in one protection domain and secret-keeping code in another, and use a trusted interface (IPC on a conventional, portal calls on a Mill) to communicate between them.
I’m well aware of the exploits you describe, but don’t feel they are an argument for ASLR. ASLR does not prevent such exploits, merely makes them more difficult; good structure prevents them and makes ASLR pointless. The reason why major software (like browsers) is not well structured is cost, in performance and convenience. That’s the same reason that people write passwords on Post-its. The Mill lowers the cost of good structure; we’re still working on the Post-its.
Say the program works as you describe but the service is poorly written instead. It receives variable length data over IPC and copies it into a statically sized buffer. The buffer stil overflows and potentially overwrites function pointers in that service, causing it to return true instead of false. The exploit is essentially the same and is still possible even for two separate machines communicating over a network, and is still thwarted by ASLR.
I completely agree that ASLR is only a mitigator for underlying problems, but that’s an argument for it having less value rather than none at all. In practice many exploits are thwarted by it.
Your example seems contrived to me. No protection system is immune to error, just like none is immune to pretty spies; you might as well propose an example in which a bug causes the secret holder posts the secret on slashdot if the password is 123456789 🙂
The question is whether ASLR adds anything useful and worth the nuisance. Or does it merely provide enough hocus-pocus that it can be successfully sold as a protection device. IMO, it’s only hocus-pocus when in well-structured protection environments. More, I consider it to be actually dangerous because it invites ignorant use leading to a false sense of security, which does nothing but help the exploiters.
I don’t think it’s evil; if it were already present on a system of mine then I would enable it. But I wouldn’t feel any more secure, and I wouldn’t put off restructuring the code.
Clearly your mileage varies, so we’ll have to leave it at that.
It all comes down to the tradeoffs. The mill’s way of doing it has many advantages:
- Separation of concerns by splitting the PLB and TLB
- Allowing the TLB to be moved down the cache hierarchy, giving it the opportunity to be smarter, more complex, larger, slower, and cheaper without impacting performance.
- Unifying the entire cache hierarchy with the processor core.
- Freeing the only parallelizable part of the classic-TLB structure, i.e. protection, to be parallelized in the mill’s PLB.
- Making the PLB small, fast, and no longer a bottleneck.
- Allowing cache access to be fast and deterministic.
- Opening the opportunity to introduce real security primitives based on authorization and least-privilege instead of obfuscation.
(See the memory talk for these points.)
Insisting on ASLR throws all of that away including the significant performance benefit of removing that TLB chokepoint out of the hottest part of the memory highway: between the processor and L1. All for what boils down to a form of security through obscurity.
Make your secure services small, simple, with a low surface area and they will be much easier to keep secure, and avoid things like buffer overflows.
Joe, I just be missing something, how are the Mill and ASLR at odds? Making sure that they aren’t is the thrust of my question; I was curious if anything in the Mill prohibits it. Ivan and I’s discussion on its merits aside I still was left with the impression that it could be done if it was OS policy. I don’t see how placing each process/service/library/stack (different implementations take the randomization to different lengths) at a random offset means throwing away the PLB/TLB separation. I figured ASLR would be complementary, with the Mill’s resistance to stack exploits preventing most buffer overflow vulnerabilities and ASLR mopping up what was left.
Furthur, characterizing ASLR as security by obscurity is like saying that encryption is security by obscurity because you have to keep your private key secret. ASLR is about per machine per bootup randomization of critical locations, not a secret backdoor put in by the Mill’s designers. We can measure the number of bits of security against brute force provided by ASLR precisely, and in fact you can see an example of this analysis on its Wikipedia page.
Security is a belt and braces thing. And ASLR is a cheap thing.
The Mill contains a lot of facilities for doing the right thing in the right way, and we can all strongly recommend they are used, but they don’t preclude classic bandaids.
The Mill is going to run a lot monolithic OSes with portable apps which also run on hardware without the Mill’s innovations, and we’re going to do everything we can to secure them for their users short of banning them 😉
I can address the compatibility question.
There are some things that are at fixed locations on the Mill.
The physical map is at virtual address zero, so physical and virtual addresses are the same up to the end of physical.
Within physical, the core MMIO map is at address zero, and its layout is fixed and member-dependent. This does not include device MMIO, which may be anywhere.
The top of virtual is reserved for stacklets, and immediately below that are the stacklet info blocks. The size of these structures are fixed but member dependent.
The boot sequence is in the MMIO map, but it transfers to the boot ROM whose location is not fixed in physical but is by convention after the MMIO block.
The boot turf (the All) can reach all of these, but most are used only for power-up or device driver services and would not be passed on to applications.
That’s all I can think of. Of course, the OS can organize things as it sees fit.
Note that because the OS is just another app on the Mill, where it starts up is not under its own control but is set by the boot ROM.Hence the ROM boot is a reasonable place to randomize at least the OS if desired. There are any number of ways for a ROM to get a number random enough for this use.
Very interesting. Calling it “cargo cult” is waving the red flag at the bull though. Or maybe not – I doubt many of the younger generation can identify “cargo cult” without .
The Mill security model abandons (or rather permits the implementation to abandon) protection of addresses, on the grounds that protecting addresses is impossible in real code written and maintained by real people. It relies instead on the inability to program via return addresses and the impracticality to program via function pointers. The goal is to let the attacker have a complete dump of memory to craft the exploit; to give an ability to overwrite all of application dataspace as the threat entry point; and still leave a kernel-busting exploit infeasible.
Yes, that’s a challenge. When we have an OS ported I’d be willing to put some money behind it 🙂
Of course, a technical fix doesn’t stop phishing, Mata Hari, or corruption.
The article is a bit bumbling, it basically says ASLR doesn’t help except for all the cases it does. There are specific challenges with kernel randomization as opposed to userspace, and he basically ignores the usefulness of it against userspace attacks entirely. Also as the first commenter points out it is effective against remote kernel exploits where all of the possibilities for infoleaks that he lists don’t apply. My favorite bit is when he makes the ridiculous claim that somehow on Linux ASLR won’t matter because of people compiling custom kernels, as if that provided enough randomization and as if every organization runs a custom kernel rather than what Redhat ships and as if that would change that it will still be the same addresses across potentially thousands of a company’s machines.
I do agree with ASLR being a half measure, but I don’t know of a better way to fill in the gaps in security caused by C. Ivan is correct to point out proper design can ameliorate the problem but I still think there’s always the possibility for exploits akin to my earlier function pointer examples. If we’re allowed to fantasize, the Mill should only run proof carrying code that shows it will never violate its permissions and then the PLB can be removed entirely and the Mill can be even more amazingly power efficient. But that’s not happening as long as you want (need) to run existing C 😉
Yes, it is rallying against KASLR as a means to defeat known privilege exploits. Well, in that narrow context he has a point.
In the broader scheme of things, I fully expect OSes running on the Mill to use some form of ASLR, especially in user space. There have been internal discussions about it.
Canaries, however, are not needed.
I do hope OSes fully embrace the finer grained security the Mill provides too.
We seem all agreed that The Mill makes ROP impossible. Back when we first decided to put the call metadata in the spiller directly, we also looked at whether the equivalent exploit using function pointers (including VTABs) was possible, and if so what we could do about it. Your advocacy of AS:R has prompted me to rethink the issues.
To an attacker, ROP offers an easy way to create programs of arbitrary length: identify code fragments with interesting operation sequences ending with return, construct an arbitrarily long sequence of return addresses on the stack where the return address points at the next desired code fragment in sequence, and let the next program return enter the constructed sequence. Straightforward. Returns are ubiquitous, so it’s easy to find desired fragments.
Considering conventional architectures (say x86) first, doing the same idea with function or label pointers is not so easy. Rather than returning, the fragment must end with a branch through a label pointer or a call through a function pointer. Label pointers are quite rare in code, dispatch tables for switch statements mostly, and those are often in read-only memory. However function pointers are common, especially in OO languages. An exploit could overwrite one in dataspace and when the function was called could send the program to an arbitrary exploit-chosen location.
But is that sufficient for penetration? No: it gets only one fragment executed, whereas the exploit needs an arbitrary sequence so as to be able to construct a program. On ROP, sequences fall out of of the stack structure and the way return works; if you can get the first fragment executed, it’s easy to get the second/ But how do you get the second fragment using function pointers?
To chain a call exploit, the fragment must 1) end with another function-pointer call; 2) it must find and access the FP to call through; 3) the new FP must be changeable by the attacker to redirect the call. As FP calls are common in some languages I’ll assume that #1 can be satisfied by scanning enough code; the scan may be made more difficult by ASLR or other defenses, but won’t be impossible.
#2 depends on how hardware supports indirect call. On some architectures (x86?) you can call through a memory address as well as calling toone. On other machines (most RISC) you have to load the FP to a register before executing an indirect call. To get that second fragment, there must be a FP that can be made to point to it (#1, assumed) that the first one indirects through. Consequently, the number of fragments in the exploit program is limited by the number of different FPs used for indirection in the host program. It does the exploit no good if the end call of a candidate fragment goes through a FP that is already used by a different fragment.
This is a rather strong limitation; a fragment must not only end with a FP call, it must end with a FP call to a function not called by any other fragment. As FPs are used by the host program for function entry, this limitation means that the number of fragments must be no more than the number of functions (functions, not calls) in the host that are the target of dispatch. That number is small enough in C and C++ to make an FP exploit infeasible, but a naive Java implementation that dispatches every function would remain exploitable. So let’s say that #2 is open to an attacker, so long as the exploit program is short, in the order of a thousand or so equivalent instructions.
#3 requires that the FP be in writable memory. This seems the greatest constraint to me: most indirect calls are vtables, and those are in read-only memory (or the defender can put them there). FPs as data are certainly used, but the number of useful fragments that end with a call through a data FP is very few, and it seems unlikely that an exploit could find enough to be able to chain them into a program.
So our conclusion then, and my conclusion now, is that FP oriented programming is possible on a conventional in the abstract, but it seems unlikely that it coiukd be used in practice to construct a usable exploit. Still, unlikely is not never.
Now how does the Mill impact this? The Mill has a unique encoding because it executes two instruction streams in opposite directions from an entry point in the code. Unlike a conventional, both streams must make sense to the decoder or the hardware will fault. Sensibility is easy if the exploit fragment is a whole EBB, because the entry address has real code on both sides. However, it makes finding a fragment at a point other than at the EBB entry near impossible: on one side or the others, bits intended for a different engine must still make sense and do something useful when bit-reversed. The situation is more extreme even than jumping into an array of floating-point numbers and hoping to execute them as code.
Moreover, not only must the code make sense, but its belt behavior must also make sense, on both sides. And note that the Mill does not have a memory-indirection call operation, so the chain call to the next fragment must itself modify the belt. As fragments can only pass data along to the next fragment within the belt, a useful fragment depends not only on valid bit encoding, but also on a jigsaw-puzzle belt fit with adjacent fragments. I may be unimaginative, but I consider finding and composing such fragments to be impossible.
This leaves using existing EBBs as fragments. The nature of Mill code, as a wide-issue machine, means that there are many short (often one instruction) EBBs. However, to form a chain the EBB must end with a FP call, and the FP must be in the belt. At the right place. So in practice, the EBB must contain a load of the FP too, or a fill from scratch. Scratch cannot be overwritten by a buffer overflow, so getting the modified FP to scratch seems impossible. Mill code hoists loads, including those of FPs, so the fragment must have at least enough instructions to load the FP to the belt and call it, without screwing up the belt position order for the FP data.
Bottom line: ain’t gonna happen on a Mill.
Of course, if the code contains a function that unlocks the front door unconditionally, then an exploit can arrange for that function to be called. Bang, you’re in, on a Mill or any architecture. But this gets back to “well structured”: no program desiring security should have the access control in userland; access control should be a simple generic library run as a service.
Thanks for that in depth analysis Ivan, I hadn’t even thought about the dual instruction stream issue. I agree with everything you said, I would only add that even though the vtables are read only the vtable pointers within objects are not, so you can still use a buffer overflow to cause an object to switch types and thus get virtual methods running on unintended bytes. I don’t think that increases the risk that greatly based on your other points, but it’s one more attack vector to keep in mind.
Still, I am tempted to hunt for such an exploit when the Mill is made available out of pure stubbornness 😉 There was a real flash exploit a couple years ago I can’t find the link to now that relied on a byte stream being simultaneously valid x86 op codes, actionscript and shell.
The ASLR is a clever bad aid (software only) to try hold together the feet of clay of all the flawed system OSes that are now far too costly or impossible to fix.
I believe to make a secure system requires a complex ballet between the protective hardware and the supporting software based on simple and clearly understood principles. Each one alone, hardware or software is not enough.
Security through obscurity is not enough.
You are still doomed, even if you are running everything in an very smart Interpreter running from ROM where each byte and address of user code is being validity checked and even with the 200+ times slow down penalty.
My system design maxim (original or not) is “A little bit of (security) hardware beats an awful lot of software any day”.
Regards LenThere was a subtlety in the original question which we may have overlooked:
What impact, if any, does the Mill design (single address space, turfs) have on address randomization?
Its interesting to reflect on how a naive non-randomizing pointer-bumping mmap would provide a side-channel to an attacker because of the single address space.
If whether a service has allocated memory or not is inferable by others, then that may leak some internal state and the decisions that the service has made. This would be a bad thing.
Closing implicit side channels is interesting intellectual play but not very real-world IMO. In principle, if you have access to the box and unlimited prepared-text attack ability then you can learn a ton by measuring the power drain at the wall socket. Or you can etch the lid off a chip and do RF sniffing at the nanometer level. And I’m sure there are 3-letter agencies that do exactly that sort of thing. But I doubt that we are looking at customer sales impact from whatever can be extracted from the global pattern of mmaps, even if you had an exact list of all such calls without having to infer anything.
I feel that the automatic sloppy randomization that will come from the shared address space will in fact help the Mill get and maintain a reputation for solidity. I don’t think it’s anything that we should trumpet or make marketing muchness out of, but it will make an attacker’s job harder even then the user turns off ASLR by oversight or misguided “tuning”, and that has to be a good thing.
YMMV.
- AuthorPosts
You must be logged in to reply to this topic.