That’s where I am looking for reassurance, because I just love the Mill. But loving it, doesn’t mean being convinced about the value it can deliver.
Agreed and I’m not going to tell you I offer that reassurance in my very top-down technical market gap which, in the final analysis, is mainly about keeping shared RAM access on die given the fact that increased density has been outstripping increased clock rates for a decade or so.
But the machine learning world is not only an emerging market for silicon — it is breaking out of its drunken path-dependent stupor about dense models born of cheap GPUs to realize the value of sparse models — not only for better models (see Algorithmic Information Theory and Solomonoff Induction’s formalization of Occam’s Razor), but for a factor of 100 energy savings per inference. The large models everyone is so excited about are not just ridiculously wasteful of silicon, their energy costs dominate.
NVIDIA’s newest ML foray (Grace) at 80e9 transistors claims it supports “sparsity”. This is (to be _very_ kind) marketing puffery. Their “sparsity” is only about a factor of 2. In other words, each neuron is assumed to be connected to half of all the other neurons. All their use of that term tells us is that the market demands sparsity and that NVIDIA can’t deliver it but knows they need to. Actual graph clustering coefficients in neocortical neurons, and actual weight distillation metrics indicate you’re probably going to hit the broad side of the market’s barn by simply turning those 80e0 transistors into a cross-bar RAM where a large number of banks of phased access RAM are on one axis and a large number of simple GPs are on the other axis.
Can the Mill serve as the “simple GPs”? How many transistors does one Mill GP take if its architecture is biased toward sparse array matrix multiplies and/or sparse boolean array (with bit sum) operations?
As for as the “switch of ISA” is concerned, what do you think CUDA is? What I mean by that is there is a lot of work out there to adapt software to special purpose hardware motivated by machine learning. I don’t see why a pretty substantial amout of that couldn’t be peeled off to make the compilers more intelligent for matching the Mill ISA to the hardware market.