Mill Computing, Inc. Forums The Mill Architecture Could shift instructions be replaced by bit field instructions? Reply To: Could shift instructions be replaced by bit field instructions?

Ivan Godard
Keymaster
Post count: 689

Phasing can be confusing, because it is both a representation of hardware timing and of ordering abstractions within a single hardware time unit. At it’s simplest, it represents “this comes before that”.

In the hardware eacgh clock admits a hardware defined number of gate transits between clock edges. Signals pass across a clock edge from one set of gates to a different one; the signals can be thought of as operand results of the former (the producer or source) and arguments of the latter (the consumer or sink). In conventional circuit design, including ours, there are two kinds of source-to-sink connection of interest. In one, a particular source can only pass signals to a particular sink; this is a pipeline. In the other, there is a many-to-many signal connection between a set of sources and a set of sinks; this is a bypass network or crossbar. Pipeline connections typically take only a few gates to stabilize the signals at the clock boundary; crossbars take many more, roughly growing as to the square of the number of participants, and often with significant wire delay.

You ask about consecutive shifts, and the answer depends on whether the result of the first shift is pipelined to the second, or is routed to the second as one of many possible consumers though a crossbar. From the view of the hardware, time is discrete, quantized and integral, and only happens at the clock boundary. (Yes, there are asynchronous hardware designs in which time is a real number and there is no ticking clock, but those are not mainstream.) Mill phases only happen at clock boundaries in hardware terms; there is no between-the-clocks phasing, because there is no notion of time between the clocks. Consequently the notion of doing two shifts (in the sense of actions with source and destination) in one clock is not semantically meaningful: by definition you can only do one action per clock, because that’s what an action is defined to be.

However, there is another sense in which your question is meaningful: is it possible to take the gates that do one shift action (in one clock) together with the gates that do another shift action (in another clock) and make a bigger pile of gates that together comprise an action whose result is the same as the second of the two shift actions when cascaded together, but without the cascade, and all in one clock? That is, is it possible to replace two consecutive shifts (taking two clocks) with one double-shift (taking one clock)?

That question is meaningful, but has an unfortunate answer: it depends. Hardware shifts are usually implemented as a tree of multiplexers, and the depth of the tree depends on the maximal shift amount, which for us depends on the width of the maximal operands: a Mill with quad (128-bit) data needs deeper muxing that one with only double (64 bit) data. More depth means more gates have to fit between the clocks. The limit to the maximal gate chain length is set by the fab process, the circuit design, and the clock rate, and will vary from chip design to chip design. However, it is likely that two mux trees can be fit into a single clock, at least up to double width. That could be used as a bit-field isolate instruction.

However, for bit fields there is a better way that puts less pressure on the gate chain depth: a shift and a mask. The shift part brings the desired field into the low bits, and a mask AND selects only the bits of the field. An AND does not use a mux tree and its gate depth does not depend on the operand width, so it uses fewer gates than a second full tree would need for the second shift. Mill bitfield instructions use this approach.