Page 2 – Mill Computing, Inc

Forum Replies Created

Viewing 8 posts - 16 through 23 (of 23 total)

Previous 1 2

Author
Posts

Veedrac

Participant

May 23, 2017 at 12:53 am

Post count: 25

in reply to: switches #2846

I also tried my hand at this guy for Gold, and got


// ILP of 5⅓, mostly because of the silly load latency :P.
F("parseval") %str;
    load(%str, 0, b) %l0,
    load(%str, 1, b) %l1, // I'd fill this with loads if the two loops had the
    load(%str, 2, b) %l2; // same cycle length, but alas they do not.

    eqlb(%l0, 45) %neg,
    addp(%str, 1, b) %strplus1,
    pick(%neg, %strplus1, %str) %start,
    pick(%neg, %l1, %l0) %L0,
    pick(%neg, %l2, %l1) %L1;

    // Now we can finally start doing fun stuff.
    rd(b(0)) %zero,
    load(%start, 2, b) %char,     // Not loading early means we take a stall in hexmode,
    eqlb(%L0,   48) %L00,         // but loading early means decimalmode gets early loads. :(
    eqlb(%L1,  120) %L1x,
    addp(%start, 2) %startplus2,
    pick(%L00, %L1x, %L00) %hexmode,
    innertr(%hexmode, "parseval$hexmode", %zero, %startplus2, %neg),
    inner(%decimalmode, "parseval$decimalmode", %zero, %startplus2, %L1, %L0, %neg);

// ILP of 9! Cycle length equals loop-caried dependency.
L("parseval$hexmode") %total %str %neg;
    // a wild %char appears, right on time!
    load  (%str, b, 2) %char,     // Ideally this load would be fetching further ahead,
    gequ  (%char,  48) %digitlo,  // but we want one byte every 2 cycles, whereas the
    lequ  (%char,  57) %digithi,  // other branch want one every 3 cycles, so you can't
    gequ  (%char,  97) %letterlo, // hoist the setup into the header.
    lequ  (%char, 102) %letterhi,
    sub   (%char,  48) %digitval,
    sub   (%char,  87) %letterval,
    shiftl(%total,  4) %shifttot,
    addp  (%str, 1, b) %newstr,
    pick  (%digitlo,  %digithi,  %digitlo)  %digit,
    pick  (%letterlo, %letterhi, %letterlo) %letter;

    orl   (%digit,    %letter)    %inrange,
    add   (%shifttot, %digitval)  %digittot,
    add   (%shifttot, %letterval) %lettertot,
    pick  (%digit, %digittot, %lettertot) %newtotal,
    conform(%newtotal, %newstr, %neg, %total, %char),
    leavefl(%inrange, "parseval$finish", %total, %char, %neg),
    br("parseval$hexmode");

// ILP of only 4⅓. Cycle length equals loop-caried dependency.
// Not much one can do when the multiply is slow... :(
L("parseval$decimalmode") %total %str %newchar %char %neg;
    // A wild %newnewchar appears... six cycles early.
    // That's pretty lame, but fixing it take a branch.
    load  (%str, b, 3) %newnewchar,
    gequ  (%char,  48) %digitlo,
    lequ  (%char,  57) %digithi,
    addp  (%str, 1, b) %newstr,
    sub   (%char,  48) %digitval,
    shiftl(%total,  1) %totalx2,
    shiftl(%total,  3) %totalx8,
    pick  (%digitlo, %digithi,  %digitlo) %digit,
    leavefl(%digit, "parseval$finish", %total, %char, %neg);

    // This is a very sad instruction.
    add   (%totalx2,  %totalx8)  %totalx10;

    add   (%totalx10, %digitval) %newtotal,
    rescue(%newtotal, %newstr, %newnewchar, %newchar, %neg),
    br("parseval$hexmode");

// ILP of 4½. Meh, it works.
L("parseval$finish") %total %char %neg;
    // a wild %fakechar appears
    eqlb(%char, 75) %K,
    eqlb(%char, 77) %M,
    neg(%total) %negtotal,
    pick(%neg, %negtotal, %total) %totalsigned;

    shiftl(%totalsigned, 10) %totalK,
    shiftl(%totalsigned, 20) %totalM,
    pick(%K, %totalK, %totalsigned) %totalelse,
    retntr(%M, %totalM),
    retn(%totalelse);

Interesting architecture indeed. I’ve not handled the busywork belt numbers, but I’ve checked they’re in a consistent position. I expect one’s better off having a branch before entering either loop where you flood it with correctly-delayed loads ahead of time; that way you don’t risk stalls in the loop itself, but it adds a cycle (and a jump) to every call to parseval. Delayed loads don’t work because the loop count is different, and tagged loads don’t work because they don’t pipeline.

This reply was modified 7 years, 11 months ago by Veedrac.

Veedrac

Participant

May 16, 2017 at 7:23 pm

Post count: 25

in reply to: switches #2836

I had a play around with optimizing this with what I, probably incorrectly, think I understand about the hardware. Is the following a valid program for Silver?


F('bar') %0;
    con    (w(100ul))              %1,  //          F3 ↥
    rd     (w(1ul))                %2,  // R0          ↥
    rd     (w(3ul))                %3,  // R0          ↥
    rd     (w(4ul))                %4,  // R1          ↥
    eql    (b4 %0,  b3 %1)         %5,  //    E0       ▸
    gtrs   (b4 %0,      7)         %6,  //    E1       ▸
    gtrs   (b4 %0,      3)         %7,  //    E2       ▸
    eql    (b4 %0,      0)         %8,  //    E3       ▸
    pick   (b0 %8,  b6 %2, b6 %4)  %9,  //       P0    ?
    retntr (b4 %5,  b6 %3),             //          F0 ↦
    retntr (b3 %6,  b5 %4),             //          F1 ↦
    retntr (b2 %7,  b9 %0);             //          F2 ↦

    gtrs   (b9 %0,      1)         %10, //    E0       ▸
    calltr1(b0 %10, 'foo', b10 %0) %11, //          F0 ↱
    retnfl (b1 %10, b2 %9),             //          F1 ↦
    retn   (b1 %11);                    //          F2 ↦

Veedrac
Participant
May 23, 2017 at 1:39 am
Post count: 25
in reply to: switches #2848
@goldbug
The letter-number pairs are the slots, and the arrows are the phase (reader ↥, compute ▸, call ↱, pick ?, writer ↧, conform =, transfer ↦). The symbol choice is from http://millcomputing.com/instructions.html, with the phase column collapsed.
Though if I had my say it’d be reader ⤒, compute , call ↺, writer ↧, conform ♺, transfer ⤷.
- This reply was modified 7 years, 11 months ago by Veedrac.
- This reply was modified 7 years, 11 months ago by Veedrac.
Veedrac
Participant
May 17, 2017 at 4:31 pm
Post count: 25
in reply to: switches #2838
Nah, man, conAsm is awesome (says they guy that’s written ~20 lines total). I know exactly where everything is and where it will be, and I can see exactly what resources are available to make it happen. Much nicer than doing the same thing to an OoO core.
The wiki says returns with values can only go in slots F0, F1 and F2, and I’ve used those all in the second instruction, so I don’t see how I could avoid using pick. Actually, I’m tempted to say it’s the other way around: I have a pick slot I didn’t use, so I’m wasting resources!
I like how this lets you treat the problem as a resource optimization problem, with the ability to move parts from an overburdened phase to another without playing guessing games with the hardware. Pick was an example, but I can think of a lot, like speculating more when there’s spare capacity. Writing an optimizer for this sounds like a ton of fun, but I’m pretty sure my employer wouldn’t want me helping the competition ;).
Veedrac
Participant
May 24, 2015 at 12:13 pm
Post count: 25
in reply to: Ivan Godard and Mill at the 2015 European LLVM Conference #1803
All (but one) of the previous years have videos linked, so I imagine we just need to be patient. The Wayback Machine is hinting that it might take a while.
Veedrac
Participant
April 21, 2015 at 6:35 pm
Post count: 25
in reply to: Execution #1751
It seems you should use <pre> tags for code on the Wiki. I’ve edited it in.
Veedrac
Participant
November 17, 2014 at 6:36 pm
Post count: 25
in reply to: Introduction to the Mill CPU Programming Model #1542
By far the most of the operations in the Mill instruction set can be speculated. What this means is, if an operand to the operation is None or Nar, all the operation does is to make the result None or NaR, or combine the status flags in the case of floating point operations in the result. This is even true for the load operation.
From http://millcomputing.com/wiki/Speculation#None_and_NaR
Veedrac
Participant
November 10, 2014 at 8:07 pm
Post count: 25
in reply to: Introduction to the Mill CPU Programming Model #1520
I imagine you would do something like
```
add b0, b1
pick cond, b0, None
add b0, 1
```
“None” effectively absorbs all computation without error. So you would have:
```
[12, 11, cond=1, ...]
[23, 12, 11, cond=1, ...] (add b0, b1)
[23, 23, 12, 11, cond=1, ...] (pick cond, b0, None)
[24, 23, 23, 12, 11, cond=1, ...] (add b0, 1)
```
or
```
[12, 11, cond=0, ...]
[23, 12, 11, cond=0, ...] (add b0, b1)
[None, 23, 12, 11, cond=0, ...] (pick cond, b0, None)
[None, None, 23, 12, 11, cond=0, ...] (add b0, 1)
```
It’s not obvious how to turn None into its value, though. You could do
```
add b0, b1
# store
pick cond, b0, None
add b0, 1
# store (may do nothing)
# load
```
but that seems like a lot of overhead to do very little.
—
Obviously, though, in this case the best option is just
```
add b0, b1
add b0, cond
```
- This reply was modified 10 years, 6 months ago by Veedrac.
- This reply was modified 10 years, 6 months ago by Veedrac.
- This reply was modified 10 years, 6 months ago by Veedrac.
Author
Posts

Viewing 8 posts - 16 through 23 (of 23 total)

Previous 1 2