- nachokbParticipantNovember 1, 2016 at 8:41 pmPost count: 7
Something current CPU vendors* have gotten good at** is periodically releasing newer CPUs which are, relatively speaking***, a little faster, cheaper, and/or more efficient by tweaking their designs.
OoO superscalars have the advantage that they can basically change everything in their guts, as long as they keep their ABI somewhat**** stable, all software will run.
I’m not sure how many (or what kinds of) tweaks would a Mill be able to suffer without changes to the genasm ABI.
* caches, belt size, FU count… seem to have a sweet spot beyond which there’s not much gain from advances in fab processes / increasing transistor count budgets (e.g. incrementing morsel size would at the very least increase average op size, fitting less ops in a cache line, etc)
* I didn’t hear you talk about such a sweet spot for multicore Mills, but judging by the maximum being a dual core system, I guess there’s not a lot of gain there either (maybe a quad, octo?)
* maybe classic stuff like increases in clock frequency or replacing components with more expensive ones? (like single-cycle multipliers, or a faster and bigger TLB even if it’s not on the critical path)
So I guess my question boils down to: how do you see the Mill improving in the next X years?
* and even GPU vendors, though it’s easier for them as their software has always been distributed in some form of intermediate representation (they managed to shift from VLIW cores to SIMT without most people even noticing)
** for variable values of “good” (AMD is constantly hitting bumps along the road, but…)
*** Intel’s managed a ~10% year-over-year for a long time now and people complain that’s not enough
**** ARM forces ISVs to at least recompile all software every few years ¯\_(ツ)_/¯
- This topic was modified 1 year, 9 months ago by nachokb.
- nachokbParticipantNovember 1, 2016 at 9:08 pmPost count: 7
- Ivan GodardKeymasterNovember 1, 2016 at 9:55 pmPost count: 506
The implementation uses quite conventional clock distribution, and we expect normal binning and overclocking capability. Mills are not asynchronous; different members have different timing, but within any member latencies are fixed.
- Ivan GodardKeymasterNovember 1, 2016 at 9:48 pmPost count: 506
Mill is not limited to 2x multicore; you can have as many Mill cores on a chip as will fit if you can power and feed them. Our best guess at the moment is that the constraint in current tech will be pin bandwidth to memory. If on-chip memory and/or direct fiber gets real then we expect the constraint to become cooling, although it might be intra-chip inter-core routing. All are WAGs.
At heart though you have identified the fundamental tech problem: CPUs don’t scale. Mills don’t either, it’s just that we have better constants. Details:
* GenAsm is pretty extensible. It does not have any assumptions about belt size, FU population, cache size, etc; all that information is provided by the specializer from the desired target description. The emulation substitution mechanism is quite generic; if the genAsm contains an op invocation that the target doesn’t have then the specializer searches for a function of a related name and signature and substitutes it for the op. So each member carries a bundled specializer for that target and a library of emulation functions for every ISA op that exists on any Mill. That library can be later updated by DLC to handle later Mill versions with new ops. This system breaks down only if a member ISA has an op that cannot be represented as a function in other ISAs. For example, code for a member with a supercollider-management op won’t work in a Mill that doesn’t have a supercollider to manage 🙂
Aside from particular ops, genAsm is fairly high level, a direct SSA representation of the program. The bulk of our problems with LLVM have been because LLVM IR is lower-level than genAsm, and we can’t recover information that clang/LLVM have thrown away. There’s a fair amount of the machine that you can’t reach from C, including most of the NYF.
* We doubt that a 6-bit morsel would pay, and are quite sure that a 7-bit one doesn’t.
* There’s enough flexibility within the Mill architecture to permit a lot of tuning without having to depart incompatibly. To make a change big enough to be worth while then it would be a new architecture, and no longer a Mill. For example, a capability machine might have a belt, but would no longer be a Mill.
- nachokbParticipantNovember 1, 2016 at 10:35 pmPost count: 7
a supercollider-management op
Well, now I can’t wait for a supercollider-managing Mill…
There’s enough flexibility within the Mill architecture to permit a lot of tuning without having to depart incompatibly
Cool! That was exactly what I hoped.
You must be logged in to reply to this topic.