Mill Computing, Inc. Forums The Mill Architecture Support for multidimensional arrays

  • Author
    Posts
  • bakul_retired
    Member
    Post count: 4
    #3102 |

    In some languages an array carries sufficient runtime information that makes it easy to pass a subarray or array of any size (in given dimensions) to generic routines. Or to allocate such arrays at runtime. Any thoughts on how this might be done on the Mill? Thx

  • Ivan Godard
    Keymaster
    Post count: 689

    These are commonly called “dope vectors” and are used in Fortran and Ada among others. Any single dimension is a normal indexing access (effectively a LEA for all dimensions but the last). The multidimensional case is hard to put into the hardware because it needs a way to encode several indices, and most hardware has hard limits on the number of arguments per op (not the Mill though). Another issue is the size of the dope vector itself, needing a stride and offset at each level, which gets out of hand after the second dimension. However, the biggest reason for not directly supporting dope vectors is the embedded stride multiply: you can overlap all the rest of the indexing with the multiplies, so making it one op rather than several doesn’t buy anything.

  • bakul_retired
    Member
    Post count: 4

    What I was wondering about is whether Mill can optimize each “step”. So for example, a dope vector for 3D array x, declared as

    int32_t x[a:b, c:d, e:f]; // I am "extending" C here to allow arbitrary upper/lower bounds, and a:b is half open

    may be

    typedef struct { int base, elemsize, size; } Dope;
    
    Dope dv[] = { {a, s1, s0}, {c, s2, s1}, {e, s3, s2} };
    // s3 = 4, s2 = s3*(f-e), s1 = s2*(d-c), s0 = s1*(b-a)

    Given an expression x[i,j,k] you’d have to do a bunch of multiplies and additions or subtractions and comparisons. I think you should be able to do computation for each dim. in parallel before a final add.

    • Ivan Godard
      Keymaster
      Post count: 689

      Indeed you can overlap them, but there’s no need for any special hardware; normal instruction scheduling in the specializer will minimize the overall latency.

You must be logged in to reply to this topic.