There are tuning issues in the smaller members too, Tin and Copper. There the issue is belt size. Even an 8-position belt is enough for the tests’ transient data, but the codes also have long-lived data too, and the working sets of that data don’t fit on the smaller belts, and some not on the 16-belt of a Silver. As a result the working sets get spilled to scratch and the core essentially runs out of scratch with tons of fill ops; much like codes on original 8086 or DG Nova. This is especially noticeable on FP codes like the numerics library for which the working set is full of big FP constants. Working out of scratch doesn’t impact small-member performance much, but has scratch bandwidth consequences, and power too that we can only guess at. We may need to config the smaller members with bigger belts, but that too has consequences. In other words, the usual tuning tradeoffs.
I think you can use a small register set that is a logical extension of belt, using additional bit in argument address.
Encoding cost is acceptable and it solves problem of “frequently used arguments”.
It can be entropy optimized by restricting number of register arguments to one per operation or by limiting number of functional units that can use register arguments. Small models can use more bits for register specifier.
The C++ library is coming up because we are doing the OS kernel in C++
I am under strong impression that you are trying to innovate too much at once.
Your initial goal should be a “software stack accelerator”: processor that needs minimal OS modifications and is fully compatible with existing applications (Linux/Java/Android).
Forget single address space: it doesn’t save a lot of power (TLB uses ~15% IIRC) but is the biggest blocker in quick adoption. You can easy make it optional.
You can win the market by offering “only” double performance to power and performance to cost ratios, as long as you are software compatible/sane. “Datacenters and smartphones” are sensitive enough to 2-3x power advantage, but they are not able to rewrite their software!
Time is running out – volume of computations is moving into visual/AI domain.