Can the Mill serve as the “simple GPs”? How many transistors does one Mill GP take if its architecture is biased toward sparse array matrix multiplies and/or sparse boolean array (with bit sum) operations?
We clearly have a home in the control-processor role, but the actual ML bit-banging sure seems like it needs a dedicated architecture, not a general purpose one. Mill can of course put in the same operations and accelerators as any other CPU. It can do it with a faster manufacturing turn too, because of the specification-based design. I’m pig-ignorant about ML, but my impression is that the problem is not the computation, it’s all about getting data from here to there. Bio is self-modifying and basically analog, which Mill is not, nor is anything else built with the tools and fabs used for CPUs today. We do have some NYF stuff in the pipeline that addresses on-chip distributed memory, but frankly that’s for conventional programs, not ML.
You clearly are deep into the subject – care to tell what you’d like to have?