Forum Replies Created
- AuthorPosts
- in reply to: Porting JNode #960
There’s a chicken-and-egg problem with new hardware – you need compilers to write the OS, and you need the OS to run the compilers. The usual route is cross-compilation, and we’re at work on that. For now, OS work – any OS – remains conceptual. And the concept work is very low level, down in the guts of a kernel: how does an interrupt work? How does addressing work? How do you talk to an I/O device? That sort of thing, which must be present for any OS.
When the LLVM port is up enough to get significant code running (still in sim, of course) then we will make it available and those who, like you, really want to try their own favorite projects on the Mill can do so. We’ll announce it here – but don’t hold your breath, it will take a while yet.
- in reply to: MIPS/sqrt(W*$) as a better metric #954
We stopped using the metric due to lack of hard data and because too many people were confused by it; the concept of a design space (from which the metric came), as opposed to benchmark comparisons, turned out to be too much for many of the audience, and the resulting wrangling was pointlessly distracting.
We still don’t have better data, but have no objection to thoughtful posts on the subject, such as yours.
- in reply to: Hard/soft realtime #925
Well, not quite. There is also the possibility of I$ misses. As a general rule you can’t do Hard Real Time if anything is cached.
The standard way to deal with the issue is to pin the HRT on-chip, either by pinning lines in the cache; by turning all or part of the cache into scratchpad; or by using NUMA with the HRT in an uncached but on-chip level, typically SRAM or 1T on-chip DRAM. All these approaches work as well on the Mill as for any other architecture. And perhaps somewhat better (especially if the chip is being used for both HRT and background regular apps), due to the low cost of interrupts, task switch, and inter-process security.
- in reply to: Programmer oriented overview #903
Thank you 🙂
I’m impressed that you could fit so much into a blog posting. The only thing I might suggest is to embed a link to the videos (http://millcomputing.com/docs) and forum (http://millcomputing.com/forum) where you mention them.
Ivan
- in reply to: ASLR (security) #893
Address Space Randomization has been primarily a defense against Return-Oriented programming exploits, which are impossible on a Mill. In the absence of any way for an exploit to execute arbitrary code, it’s not clear what data location randomization would buy you. We’re not claiming that an exploit cannot trick a Mill program (exploiters are clever), but we have no example that does in a well structured program.
For example, say the application has secret data, and also has an API that will print out non-secret data from a buffer. A buffer-overrun could alter the program pointer to the buffer to point to the secret data instead, and a following API request would then print out the secret. One might think that ASDRE would help in this case, but it’s actually unnecessary as well as insufficient.
Properly structured, the app would put all secrets in a service in a different turf from the application and the user API. The exploit could then smash whatever it wants in the app, without gaining any access to the secrets.
Consequently we (so far) feel that ASDR is a a snare, giving a false sense of protection, and should not be used. This is true both on Mills and other machines. It is perfectly possible to do the same separation on a conventional by using processes to isolate the secrets from the exploiter, and IPC to access the consequences of the protected information but not the protected info itself, just like a service on the Mill.
The reason it’s not often done that way is a matter of performance and coding ease; mutiprocess code on a conventional is difficult and slow, so ASDR is suggested as a half-hearted kludge alternative. Service code on a Mill is easy and fast, so there’s no need to pile ASDR on top of it.
We’re open to counter-examples 🙂
Thank you; the explanation cleared up a lot for me.
I would be very doubtful about directly exposing a device to applications, the premise of exokernels. Devices (the external thingy itself, not the driver) often have *very* fragile interfaces. Apps in quest of speed are notoriously buggy. There’s no conflict if the device can be assigned to just one process; the process is then both app and driver, and can tinker to its heart’s desire and harm no one but itself. But few devices are as simple and inherently single-user as a card reader these days.
For a shared device, such as a drive, the app may get speed if it does its own driving. However, this exposes all other uses of the device to screw-ups on the part of the app/library/driver. Apps will tend to “just tweak it a bit”, and break everybody else.
There are also issues with global behavior not being the same as local behavior in shared devices. There is a reason for central control of drives: all requests from all apps need to be sorted into seek order instead of request-issue order, or all apps using the drive suffer.
Now if all the library does is what a monolith driver would do, and the apps are trusted not to tinker with it, then the library is a service in the Mill sense, and on the Mill the app no longer must be trusted.
So again, I’m not really seeing much different between the micro- and exo-kernal approaches, at least on a Mill. In the micro, a driver process becomes a fast, cheap, secure service on the Mill; in the exo a trusted library becomes a fast, cheap, secure service. Calling the service a micro or an exo seems mostly a matter of legacy terminology and marketing.
BTW, nothing on the MIT exo site is less than a decade old, so I guess they abandoned the idea. I’d really like to see a paper that tells why.
1) Bottom stacklets are fixed size. Stack overflow (which can happen on the very first frame if it’s big enough) pushes a grant of the occupied part of the overflowing segment, allocates a bigger (policy) stack segment somewhere, and sets the various specRegs to describe it and thereby grant permission. Return unwinds. Unwind is lazy so you don’t get ping-ponging segments if you are real close to a segment boundary.
2) I doubt that the search hardware would be exposed. To specialized, and will vary by member so not portable.
3) I looked at the exokernel work. So far as I can see “exokernel” is just marketese for “microkernel” plus libraries. The libs would be services on a Mill, but otherwise I haven’t seen anything that novel compared to prior work in microkernels and capability architectures. Please point out anything I have missed.
- in reply to: ASLR (security) #946
We seem all agreed that The Mill makes ROP impossible. Back when we first decided to put the call metadata in the spiller directly, we also looked at whether the equivalent exploit using function pointers (including VTABs) was possible, and if so what we could do about it. Your advocacy of AS:R has prompted me to rethink the issues.
To an attacker, ROP offers an easy way to create programs of arbitrary length: identify code fragments with interesting operation sequences ending with return, construct an arbitrarily long sequence of return addresses on the stack where the return address points at the next desired code fragment in sequence, and let the next program return enter the constructed sequence. Straightforward. Returns are ubiquitous, so it’s easy to find desired fragments.
Considering conventional architectures (say x86) first, doing the same idea with function or label pointers is not so easy. Rather than returning, the fragment must end with a branch through a label pointer or a call through a function pointer. Label pointers are quite rare in code, dispatch tables for switch statements mostly, and those are often in read-only memory. However function pointers are common, especially in OO languages. An exploit could overwrite one in dataspace and when the function was called could send the program to an arbitrary exploit-chosen location.
But is that sufficient for penetration? No: it gets only one fragment executed, whereas the exploit needs an arbitrary sequence so as to be able to construct a program. On ROP, sequences fall out of of the stack structure and the way return works; if you can get the first fragment executed, it’s easy to get the second/ But how do you get the second fragment using function pointers?
To chain a call exploit, the fragment must 1) end with another function-pointer call; 2) it must find and access the FP to call through; 3) the new FP must be changeable by the attacker to redirect the call. As FP calls are common in some languages I’ll assume that #1 can be satisfied by scanning enough code; the scan may be made more difficult by ASLR or other defenses, but won’t be impossible.
#2 depends on how hardware supports indirect call. On some architectures (x86?) you can call through a memory address as well as calling toone. On other machines (most RISC) you have to load the FP to a register before executing an indirect call. To get that second fragment, there must be a FP that can be made to point to it (#1, assumed) that the first one indirects through. Consequently, the number of fragments in the exploit program is limited by the number of different FPs used for indirection in the host program. It does the exploit no good if the end call of a candidate fragment goes through a FP that is already used by a different fragment.
This is a rather strong limitation; a fragment must not only end with a FP call, it must end with a FP call to a function not called by any other fragment. As FPs are used by the host program for function entry, this limitation means that the number of fragments must be no more than the number of functions (functions, not calls) in the host that are the target of dispatch. That number is small enough in C and C++ to make an FP exploit infeasible, but a naive Java implementation that dispatches every function would remain exploitable. So let’s say that #2 is open to an attacker, so long as the exploit program is short, in the order of a thousand or so equivalent instructions.
#3 requires that the FP be in writable memory. This seems the greatest constraint to me: most indirect calls are vtables, and those are in read-only memory (or the defender can put them there). FPs as data are certainly used, but the number of useful fragments that end with a call through a data FP is very few, and it seems unlikely that an exploit could find enough to be able to chain them into a program.
So our conclusion then, and my conclusion now, is that FP oriented programming is possible on a conventional in the abstract, but it seems unlikely that it coiukd be used in practice to construct a usable exploit. Still, unlikely is not never.
Now how does the Mill impact this? The Mill has a unique encoding because it executes two instruction streams in opposite directions from an entry point in the code. Unlike a conventional, both streams must make sense to the decoder or the hardware will fault. Sensibility is easy if the exploit fragment is a whole EBB, because the entry address has real code on both sides. However, it makes finding a fragment at a point other than at the EBB entry near impossible: on one side or the others, bits intended for a different engine must still make sense and do something useful when bit-reversed. The situation is more extreme even than jumping into an array of floating-point numbers and hoping to execute them as code.
Moreover, not only must the code make sense, but its belt behavior must also make sense, on both sides. And note that the Mill does not have a memory-indirection call operation, so the chain call to the next fragment must itself modify the belt. As fragments can only pass data along to the next fragment within the belt, a useful fragment depends not only on valid bit encoding, but also on a jigsaw-puzzle belt fit with adjacent fragments. I may be unimaginative, but I consider finding and composing such fragments to be impossible.
This leaves using existing EBBs as fragments. The nature of Mill code, as a wide-issue machine, means that there are many short (often one instruction) EBBs. However, to form a chain the EBB must end with a FP call, and the FP must be in the belt. At the right place. So in practice, the EBB must contain a load of the FP too, or a fill from scratch. Scratch cannot be overwritten by a buffer overflow, so getting the modified FP to scratch seems impossible. Mill code hoists loads, including those of FPs, so the fragment must have at least enough instructions to load the FP to the belt and call it, without screwing up the belt position order for the FP data.
Bottom line: ain’t gonna happen on a Mill.
Of course, if the code contains a function that unlocks the front door unconditionally, then an exploit can arrange for that function to be called. Bang, you’re in, on a Mill or any architecture. But this gets back to “well structured”: no program desiring security should have the access control in userland; access control should be a simple generic library run as a service.
- in reply to: ASLR (security) #927
Very interesting. Calling it “cargo cult” is waving the red flag at the bull though. Or maybe not – I doubt many of the younger generation can identify “cargo cult” without .
The Mill security model abandons (or rather permits the implementation to abandon) protection of addresses, on the grounds that protecting addresses is impossible in real code written and maintained by real people. It relies instead on the inability to program via return addresses and the impracticality to program via function pointers. The goal is to let the attacker have a complete dump of memory to craft the exploit; to give an ability to overwrite all of application dataspace as the threat entry point; and still leave a kernel-busting exploit infeasible.
Yes, that’s a challenge. When we have an OS ported I’d be willing to put some money behind it 🙂
Of course, a technical fix doesn’t stop phishing, Mata Hari, or corruption.
Minor quibble:
belt
The belt is the Mill operand holding/interchange device. It is a fifo of a fixed length of 8, 16 or 32 depending on the specific core family member.
Chips can have several cores, and the cores can be different members.
I think a glossary is a great idea, and I’ve started the steps to create a Wiki on the site for the glossary and other community product.
- in reply to: ASLR (security) #906
I can address the compatibility question.
There are some things that are at fixed locations on the Mill.
The physical map is at virtual address zero, so physical and virtual addresses are the same up to the end of physical.
Within physical, the core MMIO map is at address zero, and its layout is fixed and member-dependent. This does not include device MMIO, which may be anywhere.
The top of virtual is reserved for stacklets, and immediately below that are the stacklet info blocks. The size of these structures are fixed but member dependent.
The boot sequence is in the MMIO map, but it transfers to the boot ROM whose location is not fixed in physical but is by convention after the MMIO block.
The boot turf (the All) can reach all of these, but most are used only for power-up or device driver services and would not be passed on to applications.
That’s all I can think of. Of course, the OS can organize things as it sees fit.
Note that because the OS is just another app on the Mill, where it starts up is not under its own control but is set by the boot ROM.Hence the ROM boot is a reasonable place to randomize at least the OS if desired. There are any number of ways for a ROM to get a number random enough for this use.
- in reply to: ASLR (security) #899
Your example seems contrived to me. No protection system is immune to error, just like none is immune to pretty spies; you might as well propose an example in which a bug causes the secret holder posts the secret on slashdot if the password is 123456789 🙂
The question is whether ASLR adds anything useful and worth the nuisance. Or does it merely provide enough hocus-pocus that it can be successfully sold as a protection device. IMO, it’s only hocus-pocus when in well-structured protection environments. More, I consider it to be actually dangerous because it invites ignorant use leading to a false sense of security, which does nothing but help the exploiters.
I don’t think it’s evil; if it were already present on a system of mine then I would enable it. But I wouldn’t feel any more secure, and I wouldn’t put off restructuring the code.
Clearly your mileage varies, so we’ll have to leave it at that.
- in reply to: ASLR (security) #897
By “well-structured” I mean “a structure that would be secure on a non-Mill”, typically through use of processes where a Mill would use services.
In this context your login example is poorly structured. On a conventional machine, the exploit can be eliminated by putting the password checker/secret keeper in a separate process that records and returns (via IPC) a pass/fail boolean. The (buggy) UI can suffer buffer overwrite and be arbitrarily perverted by the attacker, but the keeper cannot be because it is getting its arguments via IPC rather than via read().
This well-structured and secure, but expensive and inconvenient on a conventional. The same basic strategy can be applied to any app on any machine: put threat-facing code in one protection domain and secret-keeping code in another, and use a trusted interface (IPC on a conventional, portal calls on a Mill) to communicate between them.
I’m well aware of the exploits you describe, but don’t feel they are an argument for ASLR. ASLR does not prevent such exploits, merely makes them more difficult; good structure prevents them and makes ASLR pointless. The reason why major software (like browsers) is not well structured is cost, in performance and convenience. That’s the same reason that people write passwords on Post-its. The Mill lowers the cost of good structure; we’re still working on the Post-its.
There are issues with AES, any other crypto, and any block functional unit of any purpose. Recall that the Mill is a statically-scheduled fully pipelined machine with exposed timing. Long latency operations don’t play well, and the AES is hundreds to thousands of cycles depending on implementation.
Moreover, on a fully-pipelined machine like the Mill you must be able to issue a new AES every cycle, which means that you need hundreds to thousands of AES engines because the iterative nature of the algorithm doesn’t pipeline.
Next there are issues with data width. AES supports different data widths, 128-bit being typical. How would we feed it on Mills that do not support quad width?
There are similar issues with long-latency scheduling too, The compiler will find at most a handful of other operations that it can schedule in parallel with an AES operation, so the rest of the machine will be stalled for the great majority of the time of the AES. The stall would likely also block interrupts as well.
I sympathize with your desire that AES should be supported (and there are quite a few plausible other block functions that you don’t mention). However, I think you are confusing a desire for a primitive, which is a semantic notion, with an operation, which is an implementation notion. AES may make a very good primitive that the market will demand and we should support; it makes a very bad operation. Instead, it should be implemented as an out-of-band block functionality akin to an i/o device in its interface. That was it doesn’t have to fit into the decode/pipeline/belt that the Mill uses, and you only need one of them rather than hundreds.
It’s easy to think that what appears primitive in software can be primitive in hardware. I wish it were that easy 🙂
- AuthorPosts