Mill Computing, Inc. › Forums › The Mill › Tools › Applications › Application Walkthrough › Reply To: Application Walkthrough
A good list. We do have to pick a Mill member, but for now assume one big enough (unlimited slots and belt) for anything; the actual slot and belt requirement is an in interesting result in its own right.
I’ll take the first: GCD.
/* code based on Rosetta C++ example:
int gcd(int u, int v) {
return (v != 0)?gcd(v, u%v):u;
} */
F("gcd"); // u in b0, v in b1
neqs(b1, 0), rems(b0, b1);
retnfl(b0, b2);
nop(4); // wait for rem
call1("gcd", b3, b0);
retn(b0);
This needs a 3-long belt, one flow slot and two exu slots (suitably populated); 8 cycles, excluding the nested call body.
/* code based on Rosetta C++ example:
int
gcd_iter(int u, int v) {
int t;
while (v) {
t = u;
u = v;
v = t % v;
}
return u < 0 ? -u : u; /* abs(u) */
} */
F("gcd_iter"); // u in b0, v in b1
L("loop");
neqs(b1, 0), rems(b0, b1);
brfl(b0, "xit");
nop(4); // wait for rem
conform(b3, b0);
br("loop");
L("xit");
lsss(b0, 0), negs(b0);
pick(b0, b1, b2);
retn(b0);
This needs a 3-long belt, two exu slots, onr flow slot and a pick slot; the loop body is 8 cycles, plus 3 cycles for the wrap-up.
In both I have used speculation to launch the rems operation before it is known to be needed; without speculation the cycle counts would be 8 not 7.
The code does not use phasing (NYF). With phasing the count drops to 7 cycles for the first, while the second gets a 7 cycle loop and a one cycle wrap-up.
Your turn 🙂
- This reply was modified 10 years, 8 months ago by staff. Reason: formatting fix