Forum Replies Created
- FindecanorParticipantDecember 5, 2020 at 8:03 amPost count: 18
I believe that the most important property that a virtual ISA such as WebAssembly (or JVM, CLR, …) would have for cloud application developers is not to remove the need to recompile but to remove the need to test the code on each type of hardware that it is supposed to be deployed on. This determinism is something that has hitherto not been available for C/C++ developers.
If WebAssembly/WASI would develop in the right direction, I think that it would not pose a threat but rather help to reduce the cost of adopting new types of hardware, which would allow them to better compete on just such aspects such as performance/Watt where The Mill would excel.
Even though WebAssembly has limited performance (as it has been designed for fast code-generation rather than fast code-execution), I think that could be a lesser concern as a lot of an application’s run-time is often spent in imported modules anyway — which could still be in native code. But WebAssembly is also evolving.
I don’t see advantages of using WASM on the server/desktop for anything but C/C++ code though, but the Rust community seems to also have adopted it for some reason… Maybe just hype?
- FindecanorParticipantOctober 10, 2020 at 1:23 pmPost count: 18
Earlier posts about some of the things you mention:
“We don’t do SMT.”.
I think it has also been said as an answer at a Q&A session after a talk (?) that all multi-core Mills will be single-chip designs sharing the same cache.
Synchronisation primitives (such as those used for buffers for producer/consumer threads) are notoriously difficult to make right. As on any platform, those would be best left to the guys writing standard libraries I think.
A little has been said about synchronisation on The Mill though:
multi-cpu memory model
How to schedule communicating threads and when to move threads between cores is a question for operating system designers, I think. I would be surprised if that is significantly different on the Mill than on any other architecture.
- This reply was modified 1 year ago by Findecanor.
- FindecanorParticipantDecember 17, 2018 at 5:47 pmPost count: 18
I disagree. While measuring performance is important, I think that too fine-grained performance counters should be unavailable to unprivileged user programs for security reasons.
There has been a lot of talk on the CPU-level vulnerabilities Spectre and Meltdown this past year. Those consist of two components: first the use of speculation to access secrets and second the use of side-channels to exfiltrate the secrets to a receiver. The side-channels in question use timing of memory accesses to find cache hits and misses. Now, we all know that The Mill is impervious to Spectre and Meltdown because it stops the access because it does not have speculative execution (except as explicit instructions put there by the compiler..), but there are many other types of CPU-level attacks out there that have variations of the second: side-channels that depend on precise timing.
Among these are various attacks that monitor the CPU time and memory use of other processes to determine what they do: for instance for sniffing password prompts and monitoring encryption algorithms to reduce the search space for encryption keys.
Having fine-grained timing privileged does not make it impossible to conduct all types of side-channel attacks, but it could make some attacks significantly harder to pull off.
- FindecanorParticipantJanuary 7, 2018 at 5:32 amPost count: 18
Memory timing is used by the Meltdown and Spectre attacks only as a side-channel to leak the loaded values but there are other types of attacks that use cache timing (or even DRAM timing!) for side-channels or for snooping on other processes.
The x86 is especially susceptible to cache timing-based attacks because instructions for flushing the cache and fine-grained timing (Flush+Reload) are available for programs in user-mode. On ARM for instance, the similar instructions are privileged thus making it not impossible but at least harder for an attacker to perform timing-based attacks with high accuracy.
Therefore, that the Mill architecture does not have any privileged mode does worry me a bit.
One way to make instructions privileged in an operating system (on both the Mill and x86) could be to enclose a manifest in the executable in which the compiler has described which (groups of) instructions that the executable is using. Then permit a program to run only those instructions that the parent process/runtime has explicitly granted it.
That would require protection of the binary, however: either through strong guarantees in the OS or through a cryptographic signature.
I have been trying to find prior similar work but not found anything – where binaries have been signed by a toolchain instead of just the person/organisation that published it. If that reminds you of something, please let me know because I would like to code something up. (and not just for binaries for The Mill, but a generic framework)
- FindecanorParticipantDecember 8, 2017 at 6:08 amPost count: 18
Thanks. I think that that behaviour is what most C programmers subconsciously expect out of the “<<” operator.
You mentioned the cost of “or-trees” in another thread. I suppose that the same or-tree used for saturating shifts is also used for the other shifts.
Putting an “and” before a “shiftlu” would be less instructions than the other way around for sure.
BTW, don’t widening instructions always widen? What about vector ops that would drop two elements on the belt?
- FindecanorParticipantDecember 6, 2017 at 9:32 amPost count: 18
I hope you don’t mind if I continue on this thread despite it being a bit old. I thought it was fitting here since it is about C semantics.
I was looking recently at what C has “undefined”. One of these is when a bitcount for a shift is larger than the width of the value being shifted.
If I interpret the Wiki correctly, a shift instruction’s result would be marked “NaR” (Not a Result) if the bitcount (from a belt operand) was too large — even if the instruction is not one of the excepting instructions.
Did I get this right?
Could the NaR flag be discarded (not throw a CPU exception) and code continue using an erroneous result — and what would that be?
I agree that C functions returning structs just to return multiple results is ugly. Syntactic sugar would help everyone I think. But that’s really a compiler thing and not a chip thing.
In Apple’s Swift programming language, multiple return values from functions are structs — from the viewpoint of the LLVM-based parts of the compiler.
However.. the calling conventions used by Swift on ARM and x86-64 do specify that small structs be passed as multiple values in registers.
The Mill’s compiler’s calling convention for C could be similar but for belt positions instead of registers.
- FindecanorParticipantJune 11, 2015 at 3:21 pmPost count: 18
We have waited quite a while for a video of this talk by now…
- FindecanorParticipantApril 5, 2015 at 5:22 amPost count: 18
I re-watched this talk, and there was one thing that was not answered:
1. How would a “break” (“leave” instruction?) out of a nested loop look like in Mill code?, and:
2. How would a Python-style for-break-else be done in Mill code?
In C/C++, either would be written using a “goto”. I think the two above are the most common uses of “goto” and therefore more often permitted in coding standards than any other use.
Java has labelled loops, where you would “break <label>” to break out of a nested loop. You could also do (2) in Java with nested loops blocks where the outer loop has only one iteration.
- FindecanorParticipantDecember 6, 2020 at 12:19 pmPost count: 18
There is nothing that is necessarily slow with capability-based access control. I think you might be confusing it with slow message-passing in classic microkernels, for which capabilities are sometimes used for referring to the message ports. But that is just one way in which capabilities can be used, and it is not the use of capabilities that makes them slow.
Capabilities is first and foremost a way to model access control as primitives.
WASI got its model from CloudABI’s libc, which is a libc restricted to use only Capsicum capabilities. Capsicum is part of FreeBSD UNIX (implemented in its monolithic kernel) and have been used by several of its standard utilities for years, with a negligible performance impact compared to earlier.
- FindecanorParticipantSeptember 15, 2020 at 12:34 pmPost count: 18
I don’t see that Fuchsia/Zircon’s way of doing IPC: asynchronous message passing would be that easily mapped to the Mill’s Portal calls though. So, the Mill wouldn’t have an advantage over any other architecture on that point … or put another way: Zircon would have slow IPC anywhere.
(But if I’m wrong, please tell. 😉 )
I can imagine that synchronous IPC where the caller always blocks (such as in L4 or QNX) to be easier to do though, even though it is not a perfect model.
BTW. Another thing with Zircon that irks me is that capabilities are not revocable (except by killing the process/group). The Mill’s temporary grants are precisely what I’ve always wanted.
- FindecanorParticipantSeptember 10, 2020 at 6:25 amPost count: 18
- FindecanorParticipantSeptember 9, 2020 at 12:02 pmPost count: 18
There isn’t any conform op any more; the functionality was moved to branch ops so a taken branch can rearrange the belt to match the destination.
Recently, I have been comparing different other ISA’s calling conventions and how they assign function arguments to registers. Arguments are not necessarily specified in the order (or reverse order) of urgency, so for register-based ISA’s it will not necessarily be the arguments that best need to be in registers that get assigned to them.
And in the case of the Mill’s belt, it will also not necessarily be arguments with longer life-times that get assigned to the first belt positions, furthest away from the precipice.
That got me thinking… Before removing the conform op, had you considered putting the conform op at the top of a function body to reduce the probability of having to spill/fill arguments further down?
I’m thinking that something equivalent to that could also be achieved by reordering arguments during the link-time optimisation pass right before specialisation, but of course for module-private/non-referenced functions only.
- FindecanorParticipantDecember 24, 2018 at 11:02 amPost count: 18
I think forum is broken. I see user Findecanor replied in this topic, but there is no reply when you actually open the topic.
Before there were more posts, my reply used to be visible when logged out but hidden when logged in. Now it is not visible at all. Weird…
BTW. The bug got triggered when I tried to edit my post. The edit didn’t get take.
What my post was about:
I would like to see that The Mill should make it possible for an operating system to make access to performance counters be privileged to the operating system, and/or that care should be taken about what it is exactly that performance counters in user-mode does measure.
The concern is about security. CPU cycle counters are often used for side-channel attacks to find out what another process does: measuring its own portion of total CPU usage to find out the target process’ CPU-usage (“timing attack”) or measuring the time of memory accesses to find out which addresses the other process had loaded into cache. (“Cache attack”) Cache side-channels are maybe best known to be a major part of Spectre and Meltdown (The second phase: the “exfiltration” part. While the first phase of Spectre and Meltdown are not possible on the Mill because the CPU does not execute instructions speculatively there are many other types of side-channels attacks that don’t rely on speculative execution.) Some attacks target password prompts. Other target encryption algorithms, reducing the search space for cracking encryption keys.
Access to CPU cycle counters are privileged instructions on ARM but in user-mode on x86. Therefore many attacks are easier to conduct on x86 and harder or even impossible on ARM.
I do realise that the issue is not easy.
I know first hand from working with video compression that there are many cases where the performance of your code depends on the data cache, and where you therefore really want to be able to measure how changes to the memory layout affects caching performance.
- FindecanorParticipantJanuary 13, 2018 at 6:23 amPost count: 18
Please correct me if I’m wrong but I was under the impression that Mill software was supposed to be distributed as GenAsm and compiled to machine code at install time – by a local compiler which would have to be trusted implicitly anyway. That would be an opportunity for code-checking to create a list of which groups of instructions that are used in an object file.
There are centralised compiler systems out there already. Apple encourages iOS developers to submit apps in LLVM bitcode for ARM which Apple optimises differently for each of their CPUs and then those Mach-O binaries are signed. A Linux distribution could accept packages in source form and build and sign them on trusted build servers, and it would not change much because all binary packages are from that source anyway.
Anyway, obviously MMIO would be a better design choice for protecting CPU features than some weird software. I think however that some kind of enclosed compiler log could perhaps become useful as a complement if some bug or vulnerability would need to be mitigated (knock on wood…) or if there is a change in the ABI. For one thing, it could make it easier to find which and what would need to be recompiled.
I also had a wider scope in mind (I started before Meltdown was public): Not just low-level issues but also for safe linking of object files compiled from “type-safe” languages where the rules of the language are relied upon to enforce security policy.
- FindecanorParticipantJanuary 12, 2018 at 1:36 pmPost count: 18