Forum Replies Created
- AuthorPosts
I also tried my hand at this guy for Gold, and got
// ILP of 5⅓, mostly because of the silly load latency :P. F("parseval") %str; load(%str, 0, b) %l0, load(%str, 1, b) %l1, // I'd fill this with loads if the two loops had the load(%str, 2, b) %l2; // same cycle length, but alas they do not. eqlb(%l0, 45) %neg, addp(%str, 1, b) %strplus1, pick(%neg, %strplus1, %str) %start, pick(%neg, %l1, %l0) %L0, pick(%neg, %l2, %l1) %L1; // Now we can finally start doing fun stuff. rd(b(0)) %zero, load(%start, 2, b) %char, // Not loading early means we take a stall in hexmode, eqlb(%L0, 48) %L00, // but loading early means decimalmode gets early loads. :( eqlb(%L1, 120) %L1x, addp(%start, 2) %startplus2, pick(%L00, %L1x, %L00) %hexmode, innertr(%hexmode, "parseval$hexmode", %zero, %startplus2, %neg), inner(%decimalmode, "parseval$decimalmode", %zero, %startplus2, %L1, %L0, %neg); // ILP of 9! Cycle length equals loop-caried dependency. L("parseval$hexmode") %total %str %neg; // a wild %char appears, right on time! load (%str, b, 2) %char, // Ideally this load would be fetching further ahead, gequ (%char, 48) %digitlo, // but we want one byte every 2 cycles, whereas the lequ (%char, 57) %digithi, // other branch want one every 3 cycles, so you can't gequ (%char, 97) %letterlo, // hoist the setup into the header. lequ (%char, 102) %letterhi, sub (%char, 48) %digitval, sub (%char, 87) %letterval, shiftl(%total, 4) %shifttot, addp (%str, 1, b) %newstr, pick (%digitlo, %digithi, %digitlo) %digit, pick (%letterlo, %letterhi, %letterlo) %letter; orl (%digit, %letter) %inrange, add (%shifttot, %digitval) %digittot, add (%shifttot, %letterval) %lettertot, pick (%digit, %digittot, %lettertot) %newtotal, conform(%newtotal, %newstr, %neg, %total, %char), leavefl(%inrange, "parseval$finish", %total, %char, %neg), br("parseval$hexmode"); // ILP of only 4⅓. Cycle length equals loop-caried dependency. // Not much one can do when the multiply is slow... :( L("parseval$decimalmode") %total %str %newchar %char %neg; // A wild %newnewchar appears... six cycles early. // That's pretty lame, but fixing it take a branch. load (%str, b, 3) %newnewchar, gequ (%char, 48) %digitlo, lequ (%char, 57) %digithi, addp (%str, 1, b) %newstr, sub (%char, 48) %digitval, shiftl(%total, 1) %totalx2, shiftl(%total, 3) %totalx8, pick (%digitlo, %digithi, %digitlo) %digit, leavefl(%digit, "parseval$finish", %total, %char, %neg); // This is a very sad instruction. add (%totalx2, %totalx8) %totalx10; add (%totalx10, %digitval) %newtotal, rescue(%newtotal, %newstr, %newnewchar, %newchar, %neg), br("parseval$hexmode"); // ILP of 4½. Meh, it works. L("parseval$finish") %total %char %neg; // a wild %fakechar appears eqlb(%char, 75) %K, eqlb(%char, 77) %M, neg(%total) %negtotal, pick(%neg, %negtotal, %total) %totalsigned; shiftl(%totalsigned, 10) %totalK, shiftl(%totalsigned, 20) %totalM, pick(%K, %totalK, %totalsigned) %totalelse, retntr(%M, %totalM), retn(%totalelse);
Interesting architecture indeed. I’ve not handled the busywork belt numbers, but I’ve checked they’re in a consistent position. I expect one’s better off having a branch before entering either loop where you flood it with correctly-delayed loads ahead of time; that way you don’t risk stalls in the loop itself, but it adds a cycle (and a jump) to every call to parseval. Delayed loads don’t work because the loop count is different, and tagged loads don’t work because they don’t pipeline.
- This reply was modified 7 years, 6 months ago by Veedrac.
I had a play around with optimizing this with what I, probably incorrectly, think I understand about the hardware. Is the following a valid program for Silver?
F('bar') %0; con (w(100ul)) %1, // F3 ↥ rd (w(1ul)) %2, // R0 ↥ rd (w(3ul)) %3, // R0 ↥ rd (w(4ul)) %4, // R1 ↥ eql (b4 %0, b3 %1) %5, // E0 ▸ gtrs (b4 %0, 7) %6, // E1 ▸ gtrs (b4 %0, 3) %7, // E2 ▸ eql (b4 %0, 0) %8, // E3 ▸ pick (b0 %8, b6 %2, b6 %4) %9, // P0 ? retntr (b4 %5, b6 %3), // F0 ↦ retntr (b3 %6, b5 %4), // F1 ↦ retntr (b2 %7, b9 %0); // F2 ↦ gtrs (b9 %0, 1) %10, // E0 ▸ calltr1(b0 %10, 'foo', b10 %0) %11, // F0 ↱ retnfl (b1 %10, b2 %9), // F1 ↦ retn (b1 %11); // F2 ↦
The letter-number pairs are the slots, and the arrows are the phase (reader ↥, compute ▸, call ↱, pick ?, writer ↧, conform =, transfer ↦). The symbol choice is from http://millcomputing.com/instructions.html, with the phase column collapsed.
Though if I had my say it’d be reader ⤒, compute ⏩, call ↺, writer ↧, conform ♺, transfer ⤷.
Nah, man, conAsm is awesome (says they guy that’s written ~20 lines total). I know exactly where everything is and where it will be, and I can see exactly what resources are available to make it happen. Much nicer than doing the same thing to an OoO core.
The wiki says returns with values can only go in slots F0, F1 and F2, and I’ve used those all in the second instruction, so I don’t see how I could avoid using pick. Actually, I’m tempted to say it’s the other way around: I have a pick slot I didn’t use, so I’m wasting resources!
I like how this lets you treat the problem as a resource optimization problem, with the ability to move parts from an overburdened phase to another without playing guessing games with the hardware. Pick was an example, but I can think of a lot, like speculating more when there’s spare capacity. Writing an optimizer for this sounds like a ton of fun, but I’m pretty sure my employer wouldn’t want me helping the competition ;).
All (but one) of the previous years have videos linked, so I imagine we just need to be patient. The Wayback Machine is hinting that it might take a while.
- in reply to: Introduction to the Mill CPU Programming Model #1542
By far the most of the operations in the Mill instruction set can be speculated. What this means is, if an operand to the operation is None or Nar, all the operation does is to make the result None or NaR, or combine the status flags in the case of floating point operations in the result. This is even true for the load operation.
- in reply to: Introduction to the Mill CPU Programming Model #1520
I imagine you would do something like
add b0, b1 pick cond, b0, None add b0, 1
“None” effectively absorbs all computation without error. So you would have:
[12, 11, cond=1, ...] [23, 12, 11, cond=1, ...] (add b0, b1) [23, 23, 12, 11, cond=1, ...] (pick cond, b0, None) [24, 23, 23, 12, 11, cond=1, ...] (add b0, 1)
or
[12, 11, cond=0, ...] [23, 12, 11, cond=0, ...] (add b0, b1) [None, 23, 12, 11, cond=0, ...] (pick cond, b0, None) [None, None, 23, 12, 11, cond=0, ...] (add b0, 1)
It’s not obvious how to turn None into its value, though. You could do
add b0, b1 # store pick cond, b0, None add b0, 1 # store (may do nothing) # load
but that seems like a lot of overhead to do very little.
—
Obviously, though, in this case the best option is just
add b0, b1 add b0, cond
🙂
- AuthorPosts