Mill Computing, Inc. Forums The Mill Architecture Vector ops Reply To: Vector ops

Ivan Godard
Post count: 627

Like most wide-issue machines the Mill has a notion of “slot”, or scalar lane; one op can issue in each slot in each cycle, and can execute in any of the functional units attached to that slot using independent data. The data may be scalars from one byte up to the maximum supported scalar (64 or 128 bits depending on family member), or vectors of any scalar element size. That is, the architecture is fundamentally MIMD. The data paths feeding the FUs of the slots define a maximum operand width for data in the machine. This width is at least as big as the maximum scalar, but may be bigger so as to support larger vectors.

The con() op drops literal values to the belt. It is completely general and can drop any operand up to the maximum operand size, both scalar and vector. Scalars use the b/h/w/d/q width tags, and vectors use v, as in v16b. If every element of the literal has the same value, you may get better code by con()ing a scalar and extending it to vector with splat().

The rd() op is not general, but drops one of a member-dependent set of popCons (popular constants). If a particular literal is an available popcon then the specializer use the rd() op because it is more compact and does not plug up the flow-side slots that con() uses. Popcons may be scalar or vector, and each has a specific width that it drops; the same value but different widths are different popcons. Some popCons are always present on every member (0 and 1 of all scalar widths for example), and some are always present if the configuration includes hardware for which they would be useful (pi and e of relevant floating-point widths for members with FP for example). In addition, there will always be a few bit patterns left over after the configuration software has determined the bit patterns of all other operations configured in the readerBlock of the encoding (where the rd() op encodes). These are used to add additional popcons until all the bits are used up.

There are no reduction operations defined other than any() or all(). The alternate() op is a special form of vector swizzling that lets reductions be constructed in logN steps.