Mill Computing, Inc. Forums The Mill Architecture The Belt Reply To: The Belt

Ivan Godard
Post count: 627

You are right that exposing the belt length by admitting temporal addressing does make the binary target-dependent, but the Mill is an exposed-pipeline machine and so the binaries are target-dependent for many other reasons, so temporal addressing does not add any complication that is not already there. Our solution is the specializer, which let’s us have target-independent load modules even though the binary is not. As no one but hardware verifiers (and the code of the specializer library) ever looks at the bits in memory the nominal binary dependence doesn’t matter; you get the same portability you get on any other machine family.

It is my impression (please correct me) that the ring-buffer-based queue machine forces a uniform and universal opset in each executing position. This is trivially supplied if there is only one execution pipe, but if there are more than one then non-uniformity has advantages. Thus if there are three uniform pipes the the QM can issue three ops if there are arguments available, but if say there are two pipes with ALUs and one pipe with a multiplier then the arguments for the adds and the multiply must be in relative order to each other to align with the intended pipes.. Alternatively, the adds and the mul could use your positional-addressing extension to get the correct arguments, but the result is no longer a QM.

The conform op was explained in the talks – Memory if I recall. Conform normalizes the belt, arbitrarily reordering the belt contents. It is used to force all control flow paths to a join point to have the same belt structure, at least w/r/t data that is live across the join. The implementation is just a rename, because actual data movement would be much too expensive. As with the rest of the Mill, the design takes into account the need for a commercially viable product in silicon.

Rescue is a similar but more compact encoding of conform, in which the belt order is preserved but dead belt operands are squeezed out, whereas conform admits arbitrary reorder. Rescue is used, as the name suggests, to prevent live operands from falling off the end of the belt.

Mill encoding is wide-issue like a VLIW, but otherwise very different; see the Encoding talk. The need to have not only live belt operands in the belt but also multi-cycle computation in the FUs across control flow forces us to use a very different scheduler than the DAG approach which is inherently restricted to basic blocks. We use a time-reversed tableau scheduler, similar to what some VLIWs use.