Forum Replies Created
- goldbugParticipantJanuary 25, 2021 at 12:37 pmPost count: 52
So I have been taking a class on computer architecture (I am a software guy). The more I learn the more in awe I am with the beauty of the Mill instruction encoding and other features.
CISC sucks. It needs millions of transistors just to decode 6 instructions.
RISC is a clear improvement, but the superscalar OoO design is ridiculously complicated, as I learn about the Tomasulo algorithm, wide issue/decode, speculative execution, I can’t help but think “this is insane, there has to be a better way”. It feels like the wrong path.
VLIW seems like a more reasonable approach. I know binary compatibility problems and stalls have been a challenge for VLIW architectures.
The Mill is just beautiful, it has a sane encoding and simplicity of a VLIW. But phasing and double instruction stream really take it to the next level.
The separate load issue and retire is in hindsight the obvious way to solve the stalls due to memory latency that is so common in VLIW.
The branch predictor is so cool too, you can predict the several EBB’s in advance, even before you start execution. Mainstream predictors have to wait until they get to the branch instruction.
The specializer is a neat solution to binary compatibility.
I really hope to see this CPU make it to silicon.
- goldbugParticipantDecember 28, 2020 at 2:34 pmPost count: 52
Technical details are very sparse, but from their presentations, they say they are not VLIW.
They sometimes compare their stuff with Itanium (VLIW), but they claim they don’t stall as much as Itanium. Supposedly it has dynamic issue but it is not out of order. From what I gather, their compiler generates instruction bundles that encode dependencies between instructions. That sounds an awful lot like an EDGE architecture such as the TRIPS and Microsoft’s E2.
- goldbugParticipantDecember 14, 2018 at 9:06 amPost count: 52
They recently discovered SplitSpectre, which is a spectre variant with a much simpler gadget.
With regular spectre, this was the gadget needed in the victim space:
if (x < array1_size) y = array2[array1[x] * 4096];
Which is not that common.
With SplitSpectre, this is the gadget needed:
if (x < array1_size) y = array1[x];
Which happens practically everywhere.
Access to array2 can be in the villan’s space if y is returned.
From your talk, I reckon the mill is still not affected.
- goldbugParticipantNovember 27, 2020 at 9:57 amPost count: 52
Fair enough. Your point is that the security benefits the Mill provides can be done in software, and WASM does it albeit with a ~30% performance hit.
The Mill is supposed to provide 10x perf/watt improvement over OoO superscalar CPUs according to Ivan’s guestimate in his videos. We are all waiting for that simulator and compiler to be available to see some real numbers.
If you can get 10x perf/watt and all you have to do is recompile your C code, I think that would make the Mill very attractive.
Another interesting aspect is that microkernels are slow in modern CPUs. A simple call to a driver takes 70-300 cycles, which make microkernels nonstarters. The Mill does have innovation here with their portal calls, which allow one process to call another at the cost of a simple function call. If successful, the Mill can make microkernels competitive, which can improve security significantly.
- goldbugParticipantNovember 25, 2020 at 2:39 pmPost count: 52
- goldbugParticipantNovember 25, 2020 at 9:58 amPost count: 52
WebAssembly is not a hardware ISA. It is similar to java bytecodes or .Net MSIL. The instructions in WebAssembly are meant for a virtual machine.
There is no hardware that can run WebAssembly directly. Instead, there are programs that can take WebAssembly code and generate the equivalent x86 and ARM machine code, these are called Just In Time compilers, which are core parts of virtual machines. There can be another JIT compiler for Mill machine code.
So basically WASM does not compete with Mill anymore than it competes with x86 or ARM CPUs.
If anything WASM can help the Mill. Code that is distributed in WASM format can potentially run in any platform including the Mill. This can reduce the barrier to entry for Mill adoption.
Of course, someone will have to sit down and write a WebAssembly to Mill JIT compiler.
- goldbugParticipantAugust 10, 2019 at 7:31 amPost count: 52
What about code size Ivan?
Your instruction format is so alien, it would be interesting to see if it takes more or less space for comparable code.
I realize inlining and loop unrolling could have a big impact on code size.
- This reply was modified 3 years, 5 months ago by goldbug.
- goldbugParticipantJuly 31, 2019 at 1:51 pmPost count: 52
These are pretty awesome and encouraging number Ivan.
” I suspect inlining and pipelining would make little difference to the counts when enabled because they improve cycle time and overall program latency”
Wouldn’t inlining help a lot in the instruction count? you eliminate the call operation, and if the inlined function is small, you might even be able to squeeze the operations into existing instructions, making the inlined function essentially free.
- This reply was modified 3 years, 6 months ago by goldbug.
- goldbugParticipantDecember 27, 2018 at 10:55 amPost count: 52
I asked the same thing a while ago.
They published a paper with the answer
see the section called “Software and compiler speculation”
Their answer seems to be a loadtr operation, that will only perform the load if the predicate is met, avoiding branching.
When I asked, they said they were still trying to decide if there was a better way.
- This reply was modified 4 years, 1 month ago by goldbug.
- goldbugParticipantMarch 28, 2018 at 11:21 amPost count: 52
If the Mill is half as good as it looks on the presentations, and if/when they manage to get it out the door, I can’t imagine it failing.
It would certainly be an interesting case study in the classroom. It shows that there is still plenty of room for innovation in CPU architecture
- goldbugParticipantJanuary 16, 2018 at 8:02 amPost count: 52
Thank you for taking the time to write that paper. It is very enlightening.
I saw that you did have a Spectre-like bug and you fixed it by using loadtr prefixed by the guard.
Is that a new operation? I don’t recall seeing a loadtr in the wiki before?
By the way, awesome job, such a simple and elegant solution.
- goldbugParticipantJanuary 9, 2018 at 7:32 amPost count: 52
I’m really curious on how the mill can defend against spectre.
As I understand it, there are really 2 ways to have speculative execution in the mill:
1) Branch prediction, somewhat similar to your typical OOO, only you predict exits.
2) Compiler scheduled. The compiler flattens 2 or more branches, executes them all simultaneously and then picks the result of the winning branch.
Both mechanisms can have side-channel effects, for example in the cache or the spiller. Especially the compiler scheduled speculation since it can be very large.
An attacking program can then check the status of the cache lines or whether one of his scratch pad areas have been spilled and deduce secret data from the victim across turfs. Other than checking array bounds (tough or impossible in C), how on earth can you defend against that?
Or maybe there is something I am not aware of?
- This reply was modified 5 years ago by goldbug.