Mill Computing, Inc. Forums The Mill Architecture The Belt Reply To: The Belt

rolandpj
Participant
Post count: 4

My concern was not so much entropy of encoding, as hardware efficiency, but both are interesting.

The belt on high-end (30+ ops/cycle) operates as a massively multi-port register file – particularly if all operations are allowed to access all belt positions. This is true (I think) no matter whether it’s really implemented as a CAM, as shadow register files, etc. The talks allude to some combination of the above but I am reading between a whole lot of lines, no pun intended. From the talks, for example, you do actually intend to have FU-local shift register banks, and the mapping to the belt abstraction is a hardware problem(!).

The belt abstraction is useful, indeed, for steady-state implementation of software pipelining, and you have extended the concept of intrinsically circular rotating buffers into your ‘scratch-pad’ – which can be seen as a local/remote register file concept (or internal/external register). In short, the belt abstraction is awesome for compiler writers, which is a nice reaction to RISC, and incorporates a lot of the advantages of a stack abstraction – encoding entropy in particular.

I don’t really know what I’m talking about, but I there are so many interesting aspects of the design, most of which are old ideas, maybe yours (tagged typing of local registers, entropy encoding of intent, virtual memory translation behind the cache, blah blah). I am not aware of a hardware abstraction that is a use-it-or-lose-it register file (the belt) – it’s certainly a standard software device. The other aspect that I haven’t seen before, with little conviction, is single-instruction phasing – i.e. instruction encoding that pragmatically crosses hardware pipeline boundaries – however, I’m not sure how generally useful that is, beyond efficient encoding of short subroutines (which the compiler should inline, no?).

Regarding a general-purpose belt, vs. local specialisation. Most floating-point computations are really logically distinct from integer computations. Most branch predicate results are independent from real integer results (even though they are often flag-setting variants of the same silicon area). Most vector operations are very different from integer operations, particularly when you get ridiculously wide – 512 bits. Why would you carry them on the same belt (particularly unusually bit-wide operations)? The answer, I guess, is that the belt is an abstraction, but I think there is entropy opportunity there too.

I am fascinated. When do we see silicon?

  • This reply was modified 5 years ago by  rolandpj. Reason: More blah