Forum Replies Created

Viewing 15 posts - 16 through 30 (of 32 total)
  • Author
    Posts
  • Findecanor
    Participant
    Post count: 36

    I’ve been following CHERI too for a while… and I would say: not similar at all.

    I think the article you linked to is all over the place. Let me summarise:
    CHERI has tagged memory and “capabilities” on top of ARM/MIPS/RISC-V’s traditional paging. Each memory address is tagged to tell if it contains a “capability” (address, upper bound, protection bits) or regular data. Capabilities are used as pointers, but each memory access using one is bounds-checked and checked for privilege (read/write/execute). The tags are stored in separate memory areas instead of in a special bit in 9-bit bytes (which is what historic capability hardware did) — this makes it possible to use off-the-shelf DRAM, and traditional OS’es with only small modifications, with one address space per process as usual.

    What this does in practice is adding bounds-checking to C/C++ programs, requiring only a recompile instead of having to be rewritten in a properly type-safe, memory-safe language.
    Temporal safety (protection against dangling pointers) needs a system or kernel service similar to a garbage collector though, and the overhead is not negligible.

    In particular they mentioned that their capabilities system makes it safe to run everything in one address space and this makes it possible to get speed ups from avoiding context switches.

    I think they mean that CHERI reduces the need to break up a program into multiple isolated processes (a type of “compartmentalisation”) to achieve better security, which is how e.g. Chrome (and web browsers based on it) are designed.
    So far, I’ve not seen any paper about using CHERI instead of the protection offered by traditional paging with protection but I’m sure that would be possible, and that is something that historic systems did.

    What (I got the impression that) The Mill does is to put all programs in the same address range, but not in the same protection domain. Protection is decoupled from paging. I’d think that a Unix-like system on The Mill would work largely the same as on other hardware just that the memory layout inside each process would not overlap that of any other process (except for shared objects).
    The Mill does have fine-grained address ranges, and support revocation so that it would be feasible to temporarily pass access to individual objects in a Portal call instead of sharing full pages like on other hardware. Revocation in the capability model can get complex and expensive, and I think that this in CHERI would also require a service scheme such as with dangling pointers.

    • This reply was modified 3 years, 3 months ago by  Findecanor.
    • This reply was modified 3 years, 3 months ago by  Findecanor.
    • This reply was modified 3 years, 3 months ago by  Findecanor.
  • Findecanor
    Participant
    Post count: 36

    I believe that the most important property that a virtual ISA such as WebAssembly (or JVM, CLR, …) would have for cloud application developers is not to remove the need to recompile but to remove the need to test the code on each type of hardware that it is supposed to be deployed on. This determinism is something that has hitherto not been available for C/C++ developers.
    If WebAssembly/WASI would develop in the right direction, I think that it would not pose a threat but rather help to reduce the cost of adopting new types of hardware, which would allow them to better compete on just such aspects such as performance/Watt where The Mill would excel.

    Even though WebAssembly has limited performance (as it has been designed for fast code-generation rather than fast code-execution), I think that could be a lesser concern as a lot of an application’s run-time is often spent in imported modules anyway — which could still be in native code. But WebAssembly is also evolving.

    I don’t see advantages of using WASM on the server/desktop for anything but C/C++ code though, but the Rust community seems to also have adopted it for some reason… Maybe just hype?

    • This reply was modified 4 years, 5 months ago by  Findecanor.
    • This reply was modified 4 years, 5 months ago by  Findecanor.
  • Findecanor
    Participant
    Post count: 36
    in reply to: Multi core machine? #3620

    Earlier posts about some of the things you mention:
    “We don’t do SMT.”.
    I think it has also been said as an answer at a Q&A session after a talk (?) that all multi-core Mills will be single-chip designs sharing the same cache.

    Synchronisation primitives (such as those used for buffers for producer/consumer threads) are notoriously difficult to make right. As on any platform, those would be best left to the guys writing standard libraries I think.
    A little has been said about synchronisation on The Mill though:
    multi-cpu memory model
    Volatile

    How to schedule communicating threads and when to move threads between cores is a question for operating system designers, I think. I would be surprised if that is significantly different on the Mill than on any other architecture.

    • This reply was modified 4 years, 7 months ago by  Findecanor.
  • Findecanor
    Participant
    Post count: 36

    I disagree. While measuring performance is important, I think that too fine-grained performance counters should be unavailable to unprivileged user programs for security reasons.

    There has been a lot of talk on the CPU-level vulnerabilities Spectre and Meltdown this past year. Those consist of two components: first the use of speculation to access secrets and second the use of side-channels to exfiltrate the secrets to a receiver. The side-channels in question use timing of memory accesses to find cache hits and misses. Now, we all know that The Mill is impervious to Spectre and Meltdown because it stops the access because it does not have speculative execution (except as explicit instructions put there by the compiler..), but there are many other types of CPU-level attacks out there that have variations of the second: side-channels that depend on precise timing.
    Among these are various attacks that monitor the CPU time and memory use of other processes to determine what they do: for instance for sniffing password prompts and monitoring encryption algorithms to reduce the search space for encryption keys.
    Having fine-grained timing privileged does not make it impossible to conduct all types of side-channel attacks, but it could make some attacks significantly harder to pull off.

  • Findecanor
    Participant
    Post count: 36

    Memory timing is used by the Meltdown and Spectre attacks only as a side-channel to leak the loaded values but there are other types of attacks that use cache timing (or even DRAM timing!) for side-channels or for snooping on other processes.

    The x86 is especially susceptible to cache timing-based attacks because instructions for flushing the cache and fine-grained timing (Flush+Reload) are available for programs in user-mode. On ARM for instance, the similar instructions are privileged thus making it not impossible but at least harder for an attacker to perform timing-based attacks with high accuracy.
    Therefore, that the Mill architecture does not have any privileged mode does worry me a bit.

    One way to make instructions privileged in an operating system (on both the Mill and x86) could be to enclose a manifest in the executable in which the compiler has described which (groups of) instructions that the executable is using. Then permit a program to run only those instructions that the parent process/runtime has explicitly granted it.
    That would require protection of the binary, however: either through strong guarantees in the OS or through a cryptographic signature.
    I have been trying to find prior similar work but not found anything – where binaries have been signed by a toolchain instead of just the person/organisation that published it. If that reminds you of something, please let me know because I would like to code something up. (and not just for binaries for The Mill, but a generic framework)

  • Findecanor
    Participant
    Post count: 36

    Thanks. I think that that behaviour is what most C programmers subconsciously expect out of the “<<” operator.
    You mentioned the cost of “or-trees” in another thread. I suppose that the same or-tree used for saturating shifts is also used for the other shifts.
    Putting an “and” before a “shiftlu” would be less instructions than the other way around for sure.

    BTW, don’t widening instructions always widen? What about vector ops that would drop two elements on the belt?

  • Findecanor
    Participant
    Post count: 36

    I hope you don’t mind if I continue on this thread despite it being a bit old. I thought it was fitting here since it is about C semantics.

    I was looking recently at what C has “undefined”. One of these is when a bitcount for a shift is larger than the width of the value being shifted.
    If I interpret the Wiki correctly, a shift instruction’s result would be marked “NaR” (Not a Result) if the bitcount (from a belt operand) was too large — even if the instruction is not one of the excepting instructions.
    Did I get this right?

    Could the NaR flag be discarded (not throw a CPU exception) and code continue using an erroneous result — and what would that be?

    I agree that C functions returning structs just to return multiple results is ugly. Syntactic sugar would help everyone I think. But that’s really a compiler thing and not a chip thing.

    In Apple’s Swift programming language, multiple return values from functions are structs — from the viewpoint of the LLVM-based parts of the compiler.
    However.. the calling conventions used by Swift on ARM and x86-64 do specify that small structs be passed as multiple values in registers.
    The Mill’s compiler’s calling convention for C could be similar but for belt positions instead of registers.

  • Findecanor
    Participant
    Post count: 36

    We have waited quite a while for a video of this talk by now…

  • Findecanor
    Participant
    Post count: 36

    There is nothing that is necessarily slow with capability-based access control. I think you might be confusing it with slow message-passing in classic microkernels, for which capabilities are sometimes used for referring to the message ports. But that is just one way in which capabilities can be used, and it is not the use of capabilities that makes them slow.
    Capabilities is first and foremost a way to model access control as primitives.

    WASI got its model from CloudABI’s libc, which is a libc restricted to use only Capsicum capabilities. Capsicum is part of FreeBSD UNIX (implemented in its monolithic kernel) and have been used by several of its standard utilities for years, with a negligible performance impact compared to earlier.

  • Findecanor
    Participant
    Post count: 36

    I don’t see that Fuchsia/Zircon’s way of doing IPC: asynchronous message passing would be that easily mapped to the Mill’s Portal calls though. So, the Mill wouldn’t have an advantage over any other architecture on that point … or put another way: Zircon would have slow IPC anywhere.
    (But if I’m wrong, please tell. 😉 )

    I can imagine that synchronous IPC where the caller always blocks (such as in L4 or QNX) to be easier to do though, even though it is not a perfect model.

    BTW. Another thing with Zircon that irks me is that capabilities are not revocable (except by killing the process/group). The Mill’s temporary grants are precisely what I’ve always wanted.

  • Findecanor
    Participant
    Post count: 36
    in reply to: The Belt #3596

    By the way, what are the maximum number of arguments on the belt in function calls? The whole belt? Or is it the same on all members?
    I could not find that info in the Wiki. The Wiki also used to have the bitwise format of instructions but that eludes me too.

  • Findecanor
    Participant
    Post count: 36
    in reply to: The Belt #3594

    There isn’t any conform op any more; the functionality was moved to branch ops so a taken branch can rearrange the belt to match the destination.

    Recently, I have been comparing different other ISA’s calling conventions and how they assign function arguments to registers. Arguments are not necessarily specified in the order (or reverse order) of urgency, so for register-based ISA’s it will not necessarily be the arguments that best need to be in registers that get assigned to them.
    And in the case of the Mill’s belt, it will also not necessarily be arguments with longer life-times that get assigned to the first belt positions, furthest away from the precipice.

    That got me thinking… Before removing the conform op, had you considered putting the conform op at the top of a function body to reduce the probability of having to spill/fill arguments further down?
    I’m thinking that something equivalent to that could also be achieved by reordering arguments during the link-time optimisation pass right before specialisation, but of course for module-private/non-referenced functions only.

  • Findecanor
    Participant
    Post count: 36

    I think forum is broken. I see user Findecanor replied in this topic, but there is no reply when you actually open the topic.

    Before there were more posts, my reply used to be visible when logged out but hidden when logged in. Now it is not visible at all. Weird…
    BTW. The bug got triggered when I tried to edit my post. The edit didn’t get take.

    What my post was about:
    I would like to see that The Mill should make it possible for an operating system to make access to performance counters be privileged to the operating system, and/or that care should be taken about what it is exactly that performance counters in user-mode does measure.

    The concern is about security. CPU cycle counters are often used for side-channel attacks to find out what another process does: measuring its own portion of total CPU usage to find out the target process’ CPU-usage (“timing attack”) or measuring the time of memory accesses to find out which addresses the other process had loaded into cache. (“Cache attack”) Cache side-channels are maybe best known to be a major part of Spectre and Meltdown (The second phase: the “exfiltration” part. While the first phase of Spectre and Meltdown are not possible on the Mill because the CPU does not execute instructions speculatively there are many other types of side-channels attacks that don’t rely on speculative execution.) Some attacks target password prompts. Other target encryption algorithms, reducing the search space for cracking encryption keys.
    Access to CPU cycle counters are privileged instructions on ARM but in user-mode on x86. Therefore many attacks are easier to conduct on x86 and harder or even impossible on ARM.

    I do realise that the issue is not easy.
    I know first hand from working with video compression that there are many cases where the performance of your code depends on the data cache, and where you therefore really want to be able to measure how changes to the memory layout affects caching performance.

  • Findecanor
    Participant
    Post count: 36

    Please correct me if I’m wrong but I was under the impression that Mill software was supposed to be distributed as GenAsm and compiled to machine code at install time – by a local compiler which would have to be trusted implicitly anyway. That would be an opportunity for code-checking to create a list of which groups of instructions that are used in an object file.

    There are centralised compiler systems out there already. Apple encourages iOS developers to submit apps in LLVM bitcode for ARM which Apple optimises differently for each of their CPUs and then those Mach-O binaries are signed. A Linux distribution could accept packages in source form and build and sign them on trusted build servers, and it would not change much because all binary packages are from that source anyway.

    Anyway, obviously MMIO would be a better design choice for protecting CPU features than some weird software. I think however that some kind of enclosed compiler log could perhaps become useful as a complement if some bug or vulnerability would need to be mitigated (knock on wood…) or if there is a change in the ABI. For one thing, it could make it easier to find which and what would need to be recompiled.
    I also had a wider scope in mind (I started before Meltdown was public): Not just low-level issues but also for safe linking of object files compiled from “type-safe” languages where the rules of the language are relied upon to enforce security policy.

  • Findecanor
    Participant
    Post count: 36

    Thank you. Not quite what I was looking for but there were a couple of leads to follow and now I have downloaded a few more megabyte of papers to read.

    • This reply was modified 7 years, 4 months ago by  Findecanor.
Viewing 15 posts - 16 through 30 (of 32 total)