Forum Replies Created
- Will_EdwardsModeratorJanuary 4, 2018 at 10:49 amPost count: 98
Meltdown is not a fundamental bug in speculation, its a bug in various Intel implementations of it. AMD, for example, say they are not vulnerable even though they implement the same x86 ISA. I think its safe to say that Meltdown won’t work on future Intel CPUs – speculating page faults is game-over and the mitigation with KAISER so expensive that Intel will disable it.
Needless to say, Meltdown doesn’t work on the Mill either.
Spectre is a whole different ballgame. Its named because its going to haunt the industry for a long time 🙁
At first and second reading, Spectre as narrowly described doesn’t work on the Mill. After a third reading we’ll make a more detailed post explaining exactly why – please give us time.
- Will_EdwardsModeratorFebruary 11, 2017 at 1:52 amPost count: 98
We haven’t really been hampered in our efforts so far because the target code we’ve been focusing on porting and benchmarking is all C and C++, which are memory-unsafe languages. Our event bits, mentioned in the talks, are really helpful for precise GC runtimes. So so far we’ve just ignored it.
But there are precise GC languages using LLVM, and that non-integral pointers are being supported in LLVM is enabling those languages to tell our backend that its storing a pointer and enabling them to make use of our event bits and so on.
- Will_EdwardsModeratorApril 24, 2015 at 1:32 pmPost count: 98
I agree that C functions returning structs just to return multiple results is ugly. Syntactic sugar would help everyone I think. But that’s really a compiler thing and not a chip thing.
Its useful to consider the two ways we use vectors on a CPU – long and short. This blog post does a good job explaining the distinction. The Mill, of course, supports both.
So at a C level you may find yourself decorating short vectors explicitly (using the extensions compilers already offer) and we’ll work in the compiler back-ends to make sure that works much as it does on other CPUs. But we also make it easy for the compiler to do auto-vectorization and to pipeline a lot more than on other architectures.
So to unlock the full power of the Mill you don’t need to rewrite your C programs – normal C maps great to the Mill (which is no accident).
Modern compilers are aligning on how to extend with type, function and variable attributes so I could imagine we expose saturating and excepting arithmetic as well as some of the Mill security mechanisms that way. This won’t preclude people using toolchains where the frontend doesn’t know that that the target is a Mill, though, of course.
- Will_EdwardsModeratorMarch 11, 2015 at 7:51 amPost count: 98
We haven’t hit any major architectural problems with LLVM. We can take LLVM IR and pretty much find-replace it to our genAsm format.
We are doing some work in LLVM around tracking pointers specially, but this is not on our critical path.
The LLVM backend will be available shortly. Its actually working in a very basic fashion already, but nowhere near settled-down enough to be made available. Its growing features and fixes continuously. Its only useful in conjunction with our simulator and such.
But we are as eager to share it with the world as the world is eager to get its hands on it 😀
- Will_EdwardsModeratorApril 28, 2015 at 10:55 amPost count: 98
Its possible that the lock-in of legacy proprietary apps is no longer the barrier to new ISAs that it used to be? The next generation of OS are being built around running a browser and nothing else (Chrome OS, Firefox OS etc). The other day I tried to run an old program on the latest version of Windows and it wouldn’t run, and I had to resort to running it in Wine on Linux (where it was quiet happy!).
That aside, business is not my forte so lets keep things technical 🙂 Binary translation e.g. McSema from x86-64 to LLVM IR would then allow the code to be optimized and targeted by the Mill LLVM IR. For on-the-fly emulation of individual programs running in the host environment you could imagine something more like the hot translation that Valgrind does. But to emulate a whole OS (memory management and all) likely needs a conventional VM approach.
- Will_EdwardsModeratorMarch 25, 2015 at 8:06 amPost count: 98
I feel mean holding back from giving you all a proper status update… but I’m not going to spoil Ivan’s keynote at the LLVM conf in London Tues 14th April 🙂
http://llvm.org/devmtg/2015-04/ not sure the time slot. The announcement will be going out on the mailing list too of course 🙂
Sorry I haven’t spilled the beans on the current LLVM status 😀
- Will_EdwardsModeratorJanuary 17, 2015 at 1:51 pmPost count: 98
Regards PLB size:
Consider the size of a high-end conventional L1 TLB; it might contain 64 4K page entries, 32 2MB page entries and 4 1GB pages.
The conventional L1 TLB has to do the address translation before the load from L1 cache itself; the translation and lookup are serial.
This is why the L1 TLB is forced to be small to be fast and hasn’t been growing in recent high-end OoO superscaler microarchitectures. They have actually been adding L2 TLB and so on because of this problem.
A recent article on conventional CPUs actually counts TLB evictions for various real syscalls:
Some of these syscalls cause 40+ TLB evictions! For a chip with a 64-entry d-TLB, that nearly wipes out the TLB. The cache evictions aren’t free, either.
Now consider the situation for the Mill PLB: the entries are arbitrary ranges (rather than some page count), and it has as many cycles as the actual L1 lookup to do its protection check… it can be large and slow as its work is in parallel to the lookup.
Now this really emphasises the real and practical advantages of a virtual cache and Single Address Space architecture 🙂
On the second question about SIMD: exactly! 🙂
Excess slots in a vector can be filled with
1.0or whatever value nullifies those elements for the operations to be performed.
- Will_EdwardsModeratorJanuary 15, 2015 at 2:42 amPost count: 98
Its possibly ambiguous, but I don’t think he was being skeptical of the Mill memory model. I think he was being skeptical of there being any business for non-x86 chips even if they are better? That was my reading of that part, anyway.
When I read the article the other day (proggit discussion), I cherry picked some of the technical problems he raised and summarised what the Mill was doing in each area:
- TLBs and TLB misses: translation of addresses is after the cache hierarchy; the cache is in virtual addresses and the memory is shared so there’s no translation needed in IPC and context switches.
- locks: we’re 100% hardware transactional memory (HTM) on which you can build conventional locks; but you can jump ahead and write your code to use HTM directly for a big performance gain
- Syscalls and context-switches: there aren’t any context-switches; we’re Single Address Space (SAS). Syscalls are much much faster, and you aren’t restricted to a ring architecture (hello efficient microkernels and sandboxing!)
- SIMD: the Mill is MIMD. Much more of your code is auto-vectorizable and auto-pipelinable on the Mill, and your compiler will be able to find this parallelism
- Branches: we can do much more load hoisting and speculation. We’re not Out-of-Order (OoO), so its swings and roundabouts and we often come out ahead
Free forward with any technical questions related to the article 🙂
- Will_EdwardsModeratorJanuary 15, 2015 at 2:35 amPost count: 98
I missed the Mill reference when I read the article the other day.
The paragraph before the one you quote says:
This is long enough without my talking about other architectures so I won’t go into detail, but if you’re wondering why anyone would create a spec that allows that kind of crazy behavior, consider that before rising fab costs crushed DEC, their chips were so fast that they could run industry standard x86 benchmarks of real workloads in emulation faster than x86 chips could run the same benchmarks natively.
He then says, as you quoted:
BTW, this is a major reason I’m skeptical of the Mill architecture. Putting aside arguments about whether or not they’ll live up to their performance claims and that every chip startup I can think of failed to hit their power/performance targets, being technically excellent isn’t, in and of itself, a business model.
So I think he’s saying that being technically excellent isn’t going to sell chips. He says it didn’t sell Alpha chips?
Like you, I am a little unsure of my interpretation 🙂
- Will_EdwardsModeratorJanuary 2, 2015 at 2:31 pmPost count: 98
Very nice to see it online; I hadn’t read it.
Has it been behind a paywall previously?
Anton is a name I recognise; he was slightly active in my Corewar-playing youth; small world 🙂
Another great intro to interpreter loop performance was given by Google at I/O 2008:
We often used the GCC computed gotos extensions in Corewars interpreters.
We also used a lot of explicit data prefetches.
- Will_EdwardsModeratorDecember 28, 2014 at 12:31 amPost count: 98
Spot on Peter 🙂
We go to new lengths to prevent vulnerabilities in your programs turning into exploits too. With the hardware-managed call stack and such it would be very difficult for an attacker to make your program unwittingly set up functions to call others at runtime (e.g. return-oriented-programming) too. More in the Security Talk.
- Will_EdwardsModeratorDecember 21, 2014 at 1:11 pmPost count: 98
1. Yes we make sure that results don’t end up in the wrong frame and that results that are in-flight when a function returns are discarded. How this happens is implementation-specific; there are several approaches that could be used.
2. If the Bad Guy can create ‘fake’ functions etc then its already game-over 🙁 The Mill contains novel hardware protection mechanism to prevent vulnerabilities being exploitable. More in the Security Talk.
3. I am hopeful we can have real tail-calls on the Mill 🙂