Mill Computing, Inc. Forums The Mill Tools Applications Application Walkthrough Reply To: Application Walkthrough

Ivan Godard
Keymaster
Post count: 689

A good list. We do have to pick a Mill member, but for now assume one big enough (unlimited slots and belt) for anything; the actual slot and belt requirement is an in interesting result in its own right.

I’ll take the first: GCD.

/* code based on Rosetta C++ example:
   int gcd(int u, int v) {
   return (v != 0)?gcd(v, u%v):u;
   } */
F("gcd");            // u in b0, v in b1
   neqs(b1, 0), rems(b0, b1);
   retnfl(b0, b2);
   nop(4);           // wait for rem
   call1("gcd", b3, b0);
   retn(b0);

This needs a 3-long belt, one flow slot and two exu slots (suitably populated); 8 cycles, excluding the nested call body.

/* code based on Rosetta C++ example:
   int
   gcd_iter(int u, int v) {
     int t;
     while (v) {
       t = u; 
       u = v; 
       v = t % v;
     }
     return u < 0 ? -u : u; /* abs(u) */
   } */
F("gcd_iter");       // u in b0, v in b1
   L("loop");
      neqs(b1, 0), rems(b0, b1);
      brfl(b0, "xit");
      nop(4);        // wait for rem
      conform(b3, b0);
      br("loop");
   L("xit");
      lsss(b0, 0), negs(b0);
      pick(b0, b1, b2);
      retn(b0);

This needs a 3-long belt, two exu slots, onr flow slot and a pick slot; the loop body is 8 cycles, plus 3 cycles for the wrap-up.

In both I have used speculation to launch the rems operation before it is known to be needed; without speculation the cycle counts would be 8 not 7.

The code does not use phasing (NYF). With phasing the count drops to 7 cycles for the first, while the second gets a 7 cycle loop and a one cycle wrap-up.

Your turn 🙂

  • This reply was modified 10 years, 3 months ago by  staff. Reason: formatting fix