Mill Computing, Inc. › Forums › The Mill › Architecture › The Mill's Competition: Can it still win?
Tagged: advantages, cerebras, competition, competitors, future, WASI, WASM, WebAssembly
- AuthorPosts
- #3623 |
Physically zeroing out memory and the other security features of the Mill are great, but aside from some power efficiency gains, can’t I get most of the Mill’s advantages by simply using WebAssembly + WASI?
Secondly, it looks like someone finally put the CPU closer to RAM and also made a 1.2-trillion(!) transistor CPU: https://venturebeat.com/2020/11/17/cerebras-wafer-size-chip-is-10000-times-faster-than-a-gpu/amp/ . Is the Mill better than this new chip?
I love what you guys are doing but I’m worried you’re moving too slowly to beat out these other technologies.
Good luck!
–Steve Phillips / @elimisteve
Not a Mill spokesperson but figured I’d chime in.
Last I looked the sandboxing for WASM relied on you being restricted to a 4GB address space, which means most heavy duty server applications are out. When 64-bit WASM arrives it will likely involve slower code that has to perform runtime checks to enforce the sandboxing.
Also the power efficiency gains are the whole point of the Mill! This is the arbitrage Ivan mentions in his talks figuring out how to get the generality of an x86 CPU but with DSP like performance.
Cerebras is not really comparable because they are throwing out the window one of the main advantages of the Mill. The Mill only requires you to recompile your existing C/C++ code. Cerebras and GPUs require requiring it, and they require that the particular computation your software is doing is the kind that is massively parallel. Most day to day applications most people use for their work and that are running the world servers are don’t resemble the kind of workloads that GPU/Cerebras are good at.
I hope that energy efficiency is a sufficiently compelling reason for the world to adopt a new CPU architecture.
Re: the Cerebras I didn’t realize it’s more like a GPU than a CPU. Also, this:
> A single Cerebras CS-1 is 26 inches tall
Not exactly mobile-optimized! …but for supercomputers and mobile devices, energy efficiency is really important, so maybe the Mill can shine there first, although on mobile much of the energy is spent on the screen, so making the CPU a lot more efficient may not help battery life a ton.
And I’ll look into the WASM sandboxing details, thanks.
If the fabs can build a 1.2M chip for one company then they can and will for all; the ISA question is whether you can use those transistors. In the CPU space you cannot; you are pin-pad limited on the wafer, and more transistors are useless except for heating office buildings. You can use them in some kinds of AI work, and likely also in rare embedded applications, data reduction in your friendly local SuperCollider for example. They won’t run Windows, and the control processors they need (and for which you can see them as peculiar I/O devices) can be Mills.
We still don’t see any prospect of a general change upcoming in the CPU ISA space. Except us. Thanks for your support; we will need it especially as we scale up, which was supposed to have happened this year, but 2020 đ
According to Wikipedia WebAssembly may get about 65% of native assembly speed (mileage may vary).
WebAssembly runs on a VM and is more a 32-bit virtual machine running on modern hardware as much as Steve Wozniakâs Sweet-16 ran on the Apple 2 series (could run on other machines easily).
The biggest âvalueâ for running WebAssembly on another processor rather than native is a portable executable that doesnât need to be recompiled natively (or, not as much) but you lose so much more in the process of that, including execution speed, energy efficiency, and you need to have a lot of other things added to make it useful.
Itâd be interesting to see how efficiently a Mill processor can execute the VM in practice because itâs clear itâs something a number of online things will use in web browsers. Outside of web browsers it makes far more sense to compile a higher-level language down to local Mill ISA than to use WebAssembly.
WebAssembly is not a hardware ISA. It is similar to java bytecodes or .Net MSIL. The instructions in WebAssembly are meant for a virtual machine.
There is no hardware that can run WebAssembly directly. Instead, there are programs that can take WebAssembly code and generate the equivalent x86 and ARM machine code, these are called Just In Time compilers, which are core parts of virtual machines. There can be another JIT compiler for Mill machine code.
So basically WASM does not compete with Mill anymore than it competes with x86 or ARM CPUs.
If anything WASM can help the Mill. Code that is distributed in WASM format can potentially run in any platform including the Mill. This can reduce the barrier to entry for Mill adoption.
Of course, someone will have to sit down and write a WebAssembly to Mill JIT compiler.
So I didnât mention the JIT, big deal: that doesnât materially change the fact that adding WebAssembly into the mix and targeting that instead of whatâs already been documented for the specializer is a major waste, and wonât give any advantage to run anything not (usually) targeted to run within the confines of a web browser with 32-bit limitations on a 64-bit processor line. Absolutely, itâs worth having a web browser for the mill processors that handle WebAssembly, and for those that desire to run it outside of the browser as well, why not? It doesnât have any real benefits outside that context.
I have a huge multiplier of trust with Ivan and his team about their design and implementation decisions over someone second-guessing them as this nonsense is. It makes far more sense to have an intermediate representation that has knowledge of the nature and low-level things a processor line can do, that was generated from a higher-level language that is processor-agnostic, than it does to badly translate some low-level representation of a specific type of virtual hardware and its artificial limitations that was already translated/compiled from the same original higher-level processor-agnostic language and now runs at a fraction of the speed of native code that would be generated by the specializer. This is the code version of playing the telephone game through more than one dissimilar human language, say, Chinese-English-Thai, which loses meaning and efficiency along the way as a result of no 1-to-1 mapping of thoughts.
At least Sweet-16 provided a practical advantage for the Apple 2: a way to execute more compact code on a virtual processor with more registers than the 6502, albeit slower than native. WebAssembly for the Mills only has value as another way to get very limited applications running at a speed and efficiency disadvantage compared to compiling the same code sent through the (say, C++) compiler to Mill without going through a more conceptually-limited VM representation in between. WebAssembly has a very limited use-case scenario in the real world right now, and only makes sense to think about implementing as an entirely separate thing after there is a full OS with web browser for actual Mill CPU hardware already running.
Yup!
No problem.
Well, Iâd like to think between the two of us this has been sufficiently explained how WebAssembly is irrelevant to the Mill processor family and its development and ecosystem.
You miss my point entirely, and your conception of what it means to compete against a technology is too narrow.
A key advantage of the Mill is supposed to be security. But the efficient sandboxing that WebAssembly provides means that the world now has a way to make C and C++ much safer without needing the hardware to zero out memory, among other things. This substantially decreases the appeal of buying Mill hardware for many of us, since we can make a cheap software change rather than an expensive hardware change.
I was hoping to receive a compelling argument for why the Mill would still be compelling.
A 35% overhead for WASM only sounds big to C/C++ afficionados who don’t realize that a large fraction of all software is written in JavaScript (200% slower than C) or Python (2,500% slower than C); no end user will choose a Mill over others due to trivial differences they can’t even detect.
Data centers care more about energy efficiency, but if their customers can’t get easily their software to run on the Mill then it doesn’t matter.
The hardware world is moving quickly. I hope you all can release soon so the world doesn’t pass you by; “WebAssembly is not a hardware ISA” doesn’t matter and will not save you. Only a 5-10x advantage in some meaningful dimension will. Does the Mill have one?
Fair enough. Your point is that the security benefits the Mill provides can be done in software, and WASM does it albeit with a ~30% performance hit.
The Mill is supposed to provide 10x perf/watt improvement over OoO superscalar CPUs according to Ivan’s guestimate in his videos. We are all waiting for that simulator and compiler to be available to see some real numbers.
If you can get 10x perf/watt and all you have to do is recompile your C code, I think that would make the Mill very attractive.
Another interesting aspect is that microkernels are slow in modern CPUs. A simple call to a driver takes 70-300 cycles, which make microkernels nonstarters. The Mill does have innovation here with their portal calls, which allow one process to call another at the cost of a simple function call. If successful, the Mill can make microkernels competitive, which can improve security significantly.
For those of you who care a lot about security (and are perhaps most interested in the Mill for security) but who still have the impression that WebAssembly is some in-browser-only toy and haven’t read about WASI and the efficient sandboxing capabilities — which sandbox libraries in your program *from each other*(!) thanks to the new nanoprocess model — I highly recommend reading https://hacks.mozilla.org/2019/03/standardizing-wasi-a-webassembly-system-interface/ and their big update just recently: https://bytecodealliance.org/articles/1-year-update .
Excerpt:
To help app developers build more stable apps [by making their backends scale better], Shopify wants to allow app code to run internally right within Shopify.
But how can you run this third-party code in a fast and safe way? By using WebAssembly.
With their new platform, built on top of Lucet [a WASM compiler, not interpreter], theyâve been able to run a flash sale with 120,000 WebAssembly modules spinning up and executing in a 60 second flash sale window, while maintaining a runtime performance of under 10 ms per module.
Yes, that’s WebAssembly running _on the server_, in production, at scale, and solving a very real security problem.
According to source of Wikipedia, a 700MHz Transmeta Crusoe ran programs like a 500 MHz Pentium 3 (mobile-to-mobile, offerings as of Jan 2000).
If a 30% speed hit to run your code in WASM is “not a thing”, then what’s another 30% speed hit to use nearly a third the electricity (Crusoe’s 6W, Mobile P3 17W)? It’s not the 5-10x you’re looking for, but there’s still a margin there.
Security is a big deal, as it is difficult to implement security on insecure hardware, however, one of these days, some government is going to become wise to the amount of electricity we waste in computing….
I believe that the most important property that a virtual ISA such as WebAssembly (or JVM, CLR, …) would have for cloud application developers is not to remove the need to recompile but to remove the need to test the code on each type of hardware that it is supposed to be deployed on. This determinism is something that has hitherto not been available for C/C++ developers.
If WebAssembly/WASI would develop in the right direction, I think that it would not pose a threat but rather help to reduce the cost of adopting new types of hardware, which would allow them to better compete on just such aspects such as performance/Watt where The Mill would excel.Even though WebAssembly has limited performance (as it has been designed for fast code-generation rather than fast code-execution), I think that could be a lesser concern as a lot of an application’s run-time is often spent in imported modules anyway â which could still be in native code. But WebAssembly is also evolving.
I don’t see advantages of using WASM on the server/desktop for anything but C/C++ code though, but the Rust community seems to also have adopted it for some reason… Maybe just hype?
- This reply was modified 3 years, 11 months ago by Findecanor.
- This reply was modified 3 years, 11 months ago by Findecanor.
I donât see advantages of using WASM on the server/desktop for anything but C/C++ code though, but the Rust community seems to also have adopted it for some reason⌠Maybe just hype?
The Rust community jumped in early on in WASM’s development in creating tooling and support libraries. From what I understand, some of the prominent tools that also support C/C++ were first developed by Rust teams. It was seen as a growth niche that didn’t really have a dominant language yet, unlike most other areas Rust excels at where there are strong incumbent languages.
From what I’ve seen, there are some pretty nice browser frameworks that are in the works for Rust WASM. On the desktop, I would expect the first uses to come with speeding up critical portions of Electron-based applications. For server software in general, WASI seems to developed with both C and Rust as primary targets to keep in mind.
- This reply was modified 3 years, 10 months ago by pingveno.
WebAssembly is a fantastic development, it’s gaining steam because compared to JavaScript, it’s much faster, the performance is much more reliable (it’s easy to accidentally break the JavaScript JIT’s ability to optimize a particular chunk of code), there’s no GC so latency is better, and you can target it with statically typed languages like Rust, so correctness is often better as well. So compared to using JavaScript it’s amazing, and will continue to get better with WASI (WebAssembly on its own has no syscalls, WASI adds a posix like layer).
But it doesn’t negate any of the performance/power advantages of the Mill. A WebAssembly compiler targeting the Mill should still end in better performance per watt than a WebAssembly compiler targeting x86.
This thread is a bit like saying, “Is the Mill still relevant because the JVM let’s you write code once and have it run everywhere, as well as preventing all the common security issues like buffer overflows and out of bounds accesses?” Look at how that worked out.
but aside from some power efficiency gains, canât I get most of the Millâs advantages by simply using WebAssembly + WASI?
WASI looks like it trying to be a capability based design. Which is certainly possible on existing hardware, it’s just always been slow. The Mill ISA and security model should adapt quite well to WASI, and you might even have a micro kernel with paravitualized drivers that can run the API directly. Great effeciency, and scalability.
There is nothing that is necessarily slow with capability-based access control. I think you might be confusing it with slow message-passing in classic microkernels, for which capabilities are sometimes used for referring to the message ports. But that is just one way in which capabilities can be used, and it is not the use of capabilities that makes them slow.
Capabilities is first and foremost a way to model access control as primitives.WASI got its model from CloudABI’s libc, which is a libc restricted to use only Capsicum capabilities. Capsicum is part of FreeBSD UNIX (implemented in its monolithic kernel) and have been used by several of its standard utilities for years, with a negligible performance impact compared to earlier.
WebAssembly sandboxing between modules on x86 still relies on each module’s memory occupying disjoint pages. The only reason the sandboxing is “cheap” is because 32-bit values can’t express numbers greater than or equal to 2**32, and implementations reserve that much address space for each module. But forcing them to run on disjoint pages results in scalability limits on how many sandboxes you can have and what the performance of communication between them will be. More pages means more TLB pressure.
I have not worked through the mental exercise, but my guess is that portal calls and the like on the Mill would let a WebAssembly JIT make sandboxing more performant than on x86 and generalize better to 64-bit. I’d need to refresh myself on how portals on the Mill interact with paging and permissions to say with confidence though and it would still be speculative until we have hardware.
I still find it a little odd to be thinking along the lines of “well if you ignore Mill’s core economic advantage will it still be better?” though.
Interesting conversation; thank you all.
There doesn’t seem to be anything in WASM (present and proposals I am aware of) that would be hindered in the Mill. In particular, translating to WASM from C/Rust/etc should work just like on any other machine. Naive JITing from WASM vm to Mill native should be slightly simpler on the Mill, because the stack maps naturally to the hardware belt. Non-naive JITing would be more complex on a Mill because the stack code would have to be first analyzed for inter-operation dependencies so that independent calculations can be scheduled to execute in parallel. That analysis would typically involve a translation to SSA-form, from which our existing schedulers can produce multi-instruction bundles. The benefit would be much better runtime at the cost of the extra analysis time, typical of JITs in general.
Where x86 and other ISAs with page-level protection have to allocate a fixed page-granularity sandbox, on the Mill each WASM job would have a unique turf and would use the Mill native byte-granularity protection, and portal calls for access to libraries. This should economize on address space and TLB thrashing when compared to page-based systems. I would also expect that the Mill approach to shared libraries would make it more difficult for an attacker to leak out of its sandbox into the shared space.
It’s been a while since this thread was active, and WASM is coming along nicely, if not quickly. Here are a few neat developments (picked to sample recent progress, rather than totality of progress): First, just-in-time code generation within webassembly which uses a JIT strategy that would fit right in on a Mill, and then Postgres compiled to WASM running in browser which is surprising.
Ivan’s analysis seems spot on that the general WASM execution model maps fairly nicely to the Mill. I don’t think it’s too far to say that WASM’s principles align reasonably well with the Mill in general, and notably better than existing architectures. Don’t be fooled by the presence of the word “web” in its name, it’s design is closer to an intermediate representation for architecture-agnostic machine code than something that needs a heavyweight VM runtime like JS/JVM/.NET.
I suspect that WASM would be a great ‘IR’ to quickly port existing programs to work on the Mill without requiring (further) architecture retargeting. Once a WASM interpreter/compiler is built for the Mill — probably a more tractable task than adding the Mill as a new backend architecture in an existing compiler — many hundreds of existing WASM programs will start working right away. This could be a boon to get your hands on a wide array of nontrivial programs for testing and development, and as a fast path to write custom applications for the Mill.
There’s a Python interpreter in the test suite, but no WASM unless it’s been added since I last looked. I’ve passed your suggestion on to Leon, who’s the one that deals with the suite. We’ll see if we can find an open source that doesn’t make too many rash assumptions about platform (for example lots of x86 ASM buried in the code). Thanks for the reminder.
Maybe Wasmer? It looks like it can use LLVM as its backend. Wasmtime also looks promising, but would need extra work to get the Cranelift compiler working on it. Either one would need some work to get Rust functioning on Mill, though.
- AuthorPosts
You must be logged in to reply to this topic.