Mill Computing, Inc. Forums The Mill Tools Applications Application Walkthrough Reply To: Application Walkthrough

Ivan Godard
Keymaster
Post count: 689

A good list. We do have to pick a Mill member, but for now assume one big enough (unlimited slots and belt) for anything; the actual slot and belt requirement is an in interesting result in its own right.

I’ll take the first: GCD.

``````/* code based on Rosetta C++ example:
int gcd(int u, int v) {
return (v != 0)?gcd(v, u%v):u;
} */
F("gcd");            // u in b0, v in b1
neqs(b1, 0), rems(b0, b1);
retnfl(b0, b2);
nop(4);           // wait for rem
call1("gcd", b3, b0);
retn(b0);
``````

This needs a 3-long belt, one flow slot and two exu slots (suitably populated); 8 cycles, excluding the nested call body.

``````/* code based on Rosetta C++ example:
int
gcd_iter(int u, int v) {
int t;
while (v) {
t = u;
u = v;
v = t % v;
}
return u < 0 ? -u : u; /* abs(u) */
} */
F("gcd_iter");       // u in b0, v in b1
L("loop");
neqs(b1, 0), rems(b0, b1);
brfl(b0, "xit");
nop(4);        // wait for rem
conform(b3, b0);
br("loop");
L("xit");
lsss(b0, 0), negs(b0);
pick(b0, b1, b2);
retn(b0);``````

This needs a 3-long belt, two exu slots, onr flow slot and a pick slot; the loop body is 8 cycles, plus 3 cycles for the wrap-up.

In both I have used speculation to launch the rems operation before it is known to be needed; without speculation the cycle counts would be 8 not 7.

The code does not use phasing (NYF). With phasing the count drops to 7 cycles for the first, while the second gets a 7 cycle loop and a one cycle wrap-up.