I’m not sure I’ve understood your suggestions; if my response here seems inapt to you then please post again.
First, as to “sector”. This seems to be a notion where there are multiple distinct address spaces, with one of them being designated as “current” some how, so one can address within a sector using fewer bits than would be required to address across sectors. If that’s right, then in effect we already have such sectoring: “sectors” are created and destroyed automatically by the call/return mechanism, so each called function has its own private sector. This is a little different from your sectors though, because there is no inter-sector addressing; a function is stuck with its sector and cannot switch to another.
Then, as to the “chain of waterfalls” idea, where each space “falls off the end” into the next space. This won’t really help, because eventually everything ever produced at the top will eventually fall over every waterfall – which means that the path to DRAM (the last) would have to have the same bandwidth as the rate of result production at the top, in the core; the whole machine would run at memory speed, which in core terms is sloooow. The chain is also pointless, because most operands are finished with before they fall off, so there’s no need to preserve them over the subsequent falls.
So there needs to be some way to designate which operands are to be discarded and which passed on to the next level. The Mill belt by default discards everything, with the program explicitly saving any still alive. It could be done the other way, saving everything except what is explicitly discarded, but then one would still need to say where things are to be saved, and yes, an auto-save to a belt-like next level is possible. However, that level in turn would have to be kept congruent over control flow, which costs bits too. Which is the better trade-off depends on the likelihood of control flow over the life of an operand on the second level and the size of that level; looking at the code we are trying to execute, we decided that using spatial addressing on the second (scratchpad) level worked best.
Then, you suggest a “save many” and “restore many” so as to reduce the save/restore entropy, analogous to the register bulk save/restore of legacy ISAs. A “many” operation would be easier on a microcoded machine (Mill doesn’t use microcode) but could certainly be implemented on a non-micro physical architecture. However, to be useful the lifetimes of the operands in the “many” would need to be similar, so they could be saved and restored as a group. While the start of lives in a group would be similar (everything needing saving that was produced since the last save), the end of lives turn out to be very different, almost random, obviating a “restore many”. This suggests that a “save many” and “restore one” design would be possible. It’s not clear to me whether the entropy saving of a “save many” operation would justify its hardware complication; needs thinking about.
Then about sector+offset addressing. The value of this depends on whether multiple references are contiguous enough to pay the cost of switching sectors (and keeping the “current sector” congruent over control flow). Looking into that and doing actual measurement would be a good intern project, but by eyeball based on the code that we have running now I’m inclined to think that the reference patterns are too random to offer enough clustering value. It would also complicate the software, which is why sectoring has dropped out of use in legacy architectures.