Vector ops

Author
Posts
Joe Taber
Participant
October 18, 2017 at 10:32 am
Post count: 25
#3030 |
I started playing around with writing genAsm for some small things and I ran into a couple roadblocks with vectors, probably because I didn’t know where to find the answer. 🙂
What is the syntax for creating a vector constant? I see examples on the forums/slides for declaring a constant scalar:
```
con    (w(100ul))              %1
rd     (w(1ul))                %2
```
But not a vector. (Honestly I don’t know what the functional difference between con and rd is here either.)
Are there any particular instructions that could be useful for reducing a vector besides any and all, e.g. by add, orl, xorl etc, or would it need to be done “manually” (probably log n of number of vector elements)? I don’t know if such a thing could even exist, honestly I’m not very familiar with this space, please excuse my ignorance. 🙂
Joe Taber
Participant
October 18, 2017 at 10:59 am
Post count: 25
#3035
Another question.
Say I have two vectors: [A B C D], [0 0 1 0] where the second is a logical vector indicating which element of the first vector to extract (probably computed from the vector itself). What is the best way to get a scalar C out?
I know I could do a pick(a,b) to get Nones or zeros like [None None C None], but how do I get the C out into a scalar? I know I could use a series of extract + picks (4 extract, 3 picks in this case), is there a better way?
- Ivan Godard
  Keymaster
  October 18, 2017 at 11:37 am
  Post count: 689
  #3037
  The general answer is to recognize that this is a pick reduction, and yields to the standard reduction strategy of a logN chain of alternate() ops applying the reduction operator at each stage. The operator here would pick the non-zero (or non-None) value at each stage, leaving the chosen value as element index zero at the end where the extract() op would yield it as a scalar.
  Be aware that this is likely not the last word on this question and on reductions in general. We have made sure that vector semantics is correct, but have not paid much attention to vector performance, and won’t until auto-vectorization is working. An add reduction inherently requires logN adds, but there may be better ways to express that than the present alternate tree. Your pick reduction also fits nicely into the shuffle hardware – but it’s not clear how to fit it into the ISA yet.
Ivan Godard
Keymaster
October 18, 2017 at 11:17 am
Post count: 689
#3036
Like most wide-issue machines the Mill has a notion of “slot”, or scalar lane; one op can issue in each slot in each cycle, and can execute in any of the functional units attached to that slot using independent data. The data may be scalars from one byte up to the maximum supported scalar (64 or 128 bits depending on family member), or vectors of any scalar element size. That is, the architecture is fundamentally MIMD. The data paths feeding the FUs of the slots define a maximum operand width for data in the machine. This width is at least as big as the maximum scalar, but may be bigger so as to support larger vectors.
The con() op drops literal values to the belt. It is completely general and can drop any operand up to the maximum operand size, both scalar and vector. Scalars use the b/h/w/d/q width tags, and vectors use v, as in v16b. If every element of the literal has the same value, you may get better code by con()ing a scalar and extending it to vector with splat().
The rd() op is not general, but drops one of a member-dependent set of popCons (popular constants). If a particular literal is an available popcon then the specializer use the rd() op because it is more compact and does not plug up the flow-side slots that con() uses. Popcons may be scalar or vector, and each has a specific width that it drops; the same value but different widths are different popcons. Some popCons are always present on every member (0 and 1 of all scalar widths for example), and some are always present if the configuration includes hardware for which they would be useful (pi and e of relevant floating-point widths for members with FP for example). In addition, there will always be a few bit patterns left over after the configuration software has determined the bit patterns of all other operations configured in the readerBlock of the encoding (where the rd() op encodes). These are used to add additional popcons until all the bits are used up.
There are no reduction operations defined other than any() or all(). The alternate() op is a special form of vector swizzling that lets reductions be constructed in logN steps.

Participant

October 26, 2017 at 12:05 pm

Post count: 53

My shot at this using alternate:

 
// %0 contains [0,0,1,0]
// %1 contains [A,B,C,D]
F("pick_reduce_4") %0, %1;  
   pick      (%0, %1)        %2;     
   
   alternate (%2, %2)        %3 %4,
   recur     (%3, %4, %3)    %5;

   alternate (%5, %5)        %6 %7,
   recur     (%6, %7, %6)    %8;

   extract   (%8, w(0ul))   %9
   ret       (%9);

Some example runs:


   input: %0 = [1,0,0,0] %1 = [a,b,c,d]

   %2 = [a,,,]
   %3 = [a,a,,]   %4=[,,,]
   %5 = [a,a,,]
   %6 = [a,a,a,a] %7=[,,,]
   %8 = [a,a,a,a]
   %9 = a


   input: %0 = [0,1,0,0] %1 = [a,b,c,d]

   %2 = [,b,,]
   %3 = [,,b,b]      %4=[,,,]
   %5 = [,,b,b]
   %6 = [,,,]        %7=[b,b,b,b]
   %8 = [b,b,b,b]
   %9 = b


   input: %0 = [0,0,1,0] %1 = [a,b,c,d]

   %2 = [,,c,]
   %3 = [,,,]      %4=[c,c,,]
   %5 = [c,c,,]
   %6 = [c,c,c,c]  %7=[,,,]
   %8 = [c,c,c,c]
   %9 = c


   input: %0 = [0,0,0,1] %1 = [a,b,c,d]

   %2 = [,,,d]
   %3 = [,,,]      %4=[,,d,d]
   %5 = [,,d,d]
   %6 = [,,,]      %7=[d,d,d,d]
   %8 = [d,d,d,d]
   %9 = d

Shuffle is to reorder a vector? If I understand it correctly:


    shuffle ([a,b,c,d],  [3,2,1,0])  =  [d,c,b,a]
    shuffle ([a,b,c,d],  [1,1,1,1])  =  [b,b,b,b]
    shuffle ([a,b,c,d], [0, None, None, None]) = [a, None, None, None]

Then perhaps we can use the left operation to find the index of the selected element, this should work for vectors of any size and it would be O(1):


F("pick_reduce") %0, %1;  
   left       (%0)           %2;

   // this just builds a vector the same width as the data vector
   // the first element is the index of the element we want
   // the rest do not matter and can be junk
   inject     (%1, %2, 0)    %3;
   
   shuffle    (%1, %3)       %4;

   extract    (%4, 0)        %5,
   ret        (%5);

If I understand correctly, the left operation will return the amount of leading zeroes. So if %0 = [0,0,1,0], then %2 will be 2.
Then I create a vector with that result as the first element in %3 and use that to shuffle the input vector, the first element in the result will be the element at position 2.

Is this correct Ivan?

This reply was modified 6 years, 9 months ago by goldbug.
This reply was modified 6 years, 9 months ago by goldbug.
This reply was modified 6 years, 9 months ago by goldbug.

goldbug
Participant
October 26, 2017 at 12:53 pm
Post count: 53
#3061
Duh, there is no need to shuffle:
```
F("pick_reduce") %0, %1;  
   left       (%0)           %2;

   extract    (%1, %2)        %3,
   ret        (%3);   
```
Simply get the index of the desired element and extract it. Just 2 instructions, that’s more like it.
- This reply was modified 6 years, 9 months ago by goldbug.
Author
Posts

You must be logged in to reply to this topic.