Forum Replies Created

Viewing 13 posts - 1 through 13 (of 13 total)
  • Author
    Posts
  • joseph.h.garvin
    Participant
    Post count: 21

    WebAssembly sandboxing between modules on x86 still relies on each module’s memory occupying disjoint pages. The only reason the sandboxing is “cheap” is because 32-bit values can’t express numbers greater than or equal to 2**32, and implementations reserve that much address space for each module. But forcing them to run on disjoint pages results in scalability limits on how many sandboxes you can have and what the performance of communication between them will be. More pages means more TLB pressure.

    I have not worked through the mental exercise, but my guess is that portal calls and the like on the Mill would let a WebAssembly JIT make sandboxing more performant than on x86 and generalize better to 64-bit. I’d need to refresh myself on how portals on the Mill interact with paging and permissions to say with confidence though and it would still be speculative until we have hardware.

    I still find it a little odd to be thinking along the lines of “well if you ignore Mill’s core economic advantage will it still be better?” though.

  • joseph.h.garvin
    Participant
    Post count: 21

    WebAssembly is a fantastic development, it’s gaining steam because compared to JavaScript, it’s much faster, the performance is much more reliable (it’s easy to accidentally break the JavaScript JIT’s ability to optimize a particular chunk of code), there’s no GC so latency is better, and you can target it with statically typed languages like Rust, so correctness is often better as well. So compared to using JavaScript it’s amazing, and will continue to get better with WASI (WebAssembly on its own has no syscalls, WASI adds a posix like layer).

    But it doesn’t negate any of the performance/power advantages of the Mill. A WebAssembly compiler targeting the Mill should still end in better performance per watt than a WebAssembly compiler targeting x86.

    This thread is a bit like saying, “Is the Mill still relevant because the JVM let’s you write code once and have it run everywhere, as well as preventing all the common security issues like buffer overflows and out of bounds accesses?” Look at how that worked out.

  • joseph.h.garvin
    Participant
    Post count: 21

    Not a Mill spokesperson but figured I’d chime in.

    Last I looked the sandboxing for WASM relied on you being restricted to a 4GB address space, which means most heavy duty server applications are out. When 64-bit WASM arrives it will likely involve slower code that has to perform runtime checks to enforce the sandboxing.

    Also the power efficiency gains are the whole point of the Mill! This is the arbitrage Ivan mentions in his talks figuring out how to get the generality of an x86 CPU but with DSP like performance.

    Cerebras is not really comparable because they are throwing out the window one of the main advantages of the Mill. The Mill only requires you to recompile your existing C/C++ code. Cerebras and GPUs require requiring it, and they require that the particular computation your software is doing is the kind that is massively parallel. Most day to day applications most people use for their work and that are running the world servers are don’t resemble the kind of workloads that GPU/Cerebras are good at.

  • joseph.h.garvin
    Participant
    Post count: 21
    in reply to: Benchmarks #3496

    Ivan could you expand a bit on what is being compared?

    You’re saying Josh compiled the xv68k emulator, targeting Mill silver, and then compiled it again targeting the 68k, then ran the emulator on the emulator, then inside that inner emulator ran hello world?

    And then we are comparing the number of instructions the Mill executed during the whole run vs the number reported by the outer emulator that a real 68k would have run emulating itself running hello world?

    Also is this using the C++-as-mill-assembly approach detailed in the talks or is this FPGA?

    Also the lack of inlining, pipelining etc is regarding the Mill, not the emulated 68k, right?

  • joseph.h.garvin
    Participant
    Post count: 21
    in reply to: Glossary #952

    You could add implicit zero. I’d give the entry but I can’t remember all of the details ATM.

  • joseph.h.garvin
    Participant
    Post count: 21
    in reply to: ASLR (security) #896

    The advantage of the Mill regarding return oriented programming is it increases the security of existing code. I’m not sure if by well structured you mean changed to take advantage of the Mill security model. Assuming you are interested in protecting users against poorly written code (after all if you’re not just tell the user to port to a language with bound checking) here is a counter-example of sorts:

    In a login program I have a heap allocated struct containing a buffer for the user password followed by a function pointer pointing to what should be executed when a login attempt fails. The program fails to check for the user entering extremely long passwords, and the user knows where the login success function is usually located. They enter an extremely long password, overflowing the buffer and writing the address of the login success function into the function pointer that points to the code to execute on login failure. The program decides the user password is invalid, follows the function pointer, and grants the user access. If the location of the login success function is randomized, the attacker is thwarted.

    Note that no return oriented programming or digging through stack rubble was needed. Returns are convenient on conventional architectures because they are a frequently occurring jump that can be influenced by corrupt data, but every C function pointer and every C++ vtable pointer can potentially be used as well. These attacks are harder to pull off because you don’t always have conveniently located buffers next to function pointers, and vtable pointers are usually at the front of objects, but there have been real vulnerabilities where the attacker relies on knowing two objects will be next to reach other on the heap and overflowing a buffer in the first to overwrite the vtable pointer in the second. A lot of the browser hacking competitions are won by people stringing together long sequences of this sort of gymnastics.

  • joseph.h.garvin
    Participant
    Post count: 21
    in reply to: Security #874

    I really like the Mill’s approach to stack safety and that in particular it prevents Return Oriented Programming.

    Has making random number generation support built in been considered? Besides the stack, the other major bane of embedded security is random number generation. Bad random numbers weaken crypto, and there have been a ton of vulnerabilities relating to routers picking their random seed at boot up time before sufficient entropy has been collected by the OS, leading to predictable random numbers and allowing entry to attackers. Intel has RDRAND but that doesn’t help most of the embedded world. Obviously there’s no reason in principle why a hardware entropy source couldn’t be integrated into the Mill, but it would be great for it to be part of the minimum configuration (Tin) so that its hard to screw up and help cement a reputation for the Mill as being more secure than other CPUs. This could also be a great opportunity to take advantage of multiple outputs on the belt and/or the Mill vector types, lots of simulations will use random matrices/vectors. If you wanted to be really fancy you could support different distributions (e.g. Zipfian vs. Gaussian).

  • joseph.h.garvin
    Participant
    Post count: 21
    in reply to: ASLR (security) #949

    Thanks for that in depth analysis Ivan, I hadn’t even thought about the dual instruction stream issue. I agree with everything you said, I would only add that even though the vtables are read only the vtable pointers within objects are not, so you can still use a buffer overflow to cause an object to switch types and thus get virtual methods running on unintended bytes. I don’t think that increases the risk that greatly based on your other points, but it’s one more attack vector to keep in mind.

    Still, I am tempted to hunt for such an exploit when the Mill is made available out of pure stubbornness 😉 There was a real flash exploit a couple years ago I can’t find the link to now that relied on a byte stream being simultaneously valid x86 op codes, actionscript and shell.

  • joseph.h.garvin
    Participant
    Post count: 21
    in reply to: ASLR (security) #935

    The article is a bit bumbling, it basically says ASLR doesn’t help except for all the cases it does. There are specific challenges with kernel randomization as opposed to userspace, and he basically ignores the usefulness of it against userspace attacks entirely. Also as the first commenter points out it is effective against remote kernel exploits where all of the possibilities for infoleaks that he lists don’t apply. My favorite bit is when he makes the ridiculous claim that somehow on Linux ASLR won’t matter because of people compiling custom kernels, as if that provided enough randomization and as if every organization runs a custom kernel rather than what Redhat ships and as if that would change that it will still be the same addresses across potentially thousands of a company’s machines.

    I do agree with ASLR being a half measure, but I don’t know of a better way to fill in the gaps in security caused by C. Ivan is correct to point out proper design can ameliorate the problem but I still think there’s always the possibility for exploits akin to my earlier function pointer examples. If we’re allowed to fantasize, the Mill should only run proof carrying code that shows it will never violate its permissions and then the PLB can be removed entirely and the Mill can be even more amazingly power efficient. But that’s not happening as long as you want (need) to run existing C 😉

  • joseph.h.garvin
    Participant
    Post count: 21
    in reply to: ASLR (security) #904

    Joe, I just be missing something, how are the Mill and ASLR at odds? Making sure that they aren’t is the thrust of my question; I was curious if anything in the Mill prohibits it. Ivan and I’s discussion on its merits aside I still was left with the impression that it could be done if it was OS policy. I don’t see how placing each process/service/library/stack (different implementations take the randomization to different lengths) at a random offset means throwing away the PLB/TLB separation. I figured ASLR would be complementary, with the Mill’s resistance to stack exploits preventing most buffer overflow vulnerabilities and ASLR mopping up what was left.

    Furthur, characterizing ASLR as security by obscurity is like saying that encryption is security by obscurity because you have to keep your private key secret. ASLR is about per machine per bootup randomization of critical locations, not a secret backdoor put in by the Mill’s designers. We can measure the number of bits of security against brute force provided by ASLR precisely, and in fact you can see an example of this analysis on its Wikipedia page.

  • joseph.h.garvin
    Participant
    Post count: 21
    in reply to: ASLR (security) #898

    Say the program works as you describe but the service is poorly written instead. It receives variable length data over IPC and copies it into a statically sized buffer. The buffer stil overflows and potentially overwrites function pointers in that service, causing it to return true instead of false. The exploit is essentially the same and is still possible even for two separate machines communicating over a network, and is still thwarted by ASLR.

    I completely agree that ASLR is only a mitigator for underlying problems, but that’s an argument for it having less value rather than none at all. In practice many exploits are thwarted by it.

  • joseph.h.garvin
    Participant
    Post count: 21
    in reply to: Security #883

    If the Mill doesn’t have some other solution I think you could roll your own protocol using regions to allow the service to be sure who is calling the portal. Have the portal put a random value only readable by the desired calling thread into memory. When the calling thread wants to make the portal call, it passes the number provided by the service that only it knows, and the service then verifies that the caller’s number is equal to its, and then sets its number to a new random value in anticipation of the next call. Critically when the check fails the service needs to sleep for a period or fault or cause the caller to fault somehow, and pick a new random value for the next challenge, otherwise the caller can brute force retry. Random numbers need to be used rather than just an incrementing counter, because otherwise a malicious thread can guess the current value from information like how long the system has been running or how many times the service has likely been invoked.

    Edit: this scheme assumes the calling thread is not ‘in cahoots’ with a malicious thread and thus won’t deliberately share the random value with it. But since the Mill protection model is that threads with a given permission can always grant other threads a subset of their permissions, I think this is OK.

    • This reply was modified 10 years ago by  joseph.h.garvin. Reason: simplified scheme considerably
    • This reply was modified 10 years ago by  joseph.h.garvin.
  • joseph.h.garvin
    Participant
    Post count: 21
    in reply to: Security #880

    PeterH, I am not super familiar with the Atari hardware so I may be looking at the wrong document, but what I found here suggests the Atari used an LFSR, which as I understand it is not ‘first class’ in the context of cryptography, where it’s important that a determined adversary doing statistical analysis not be able to figure out the stream. You need a real entropy source for the seed and a cryptographically secure RNG algorithm.

Viewing 13 posts - 1 through 13 (of 13 total)