Forum Replies Created

Viewing 15 posts - 16 through 30 (of 42 total)
  • Author
    Posts
  • goldbug
    Participant
    Post count: 53

    Spectre seems a bit more complicated, as no NaR’s are involved, would it work on the mill?

    https://spectreattack.com/spectre.pdf

  • goldbug
    Participant
    Post count: 53

    To describe the problem:

    An instruction is speculated. The speculated instruction has one side effect: it fills a cache line (with valid data). The specific line that is filled depends on data that is not accessible. After the speculated instruction is discarded, the program can check which cache line got filled and use that to determine what the inaccessible value was.

    The mill can speculate a lot of instructions in advance, and the speculation can fill cache lines no problem.

    I believe the mill would not be vulnerable to this problem because reading the inaccessible data will simply return a NaR which will be passed to the subsequent speculative operations. The actual data never makes it to the cpu correct? on x86, the data makes it to the CPU before the page fault is generated, so a couple instructions can be speculated using that value.

  • goldbug
    Participant
    Post count: 53
    in reply to: Vector ops #3061

    Duh, there is no need to shuffle:

    
    F("pick_reduce") %0, %1;  
       left       (%0)           %2;
    
       extract    (%1, %2)        %3,
       ret        (%3);   
    

    Simply get the index of the desired element and extract it. Just 2 instructions, that’s more like it.

    • This reply was modified 6 years, 5 months ago by  goldbug.
  • goldbug
    Participant
    Post count: 53
    in reply to: Vector ops #3057

    My shot at this using alternate:

     
    // %0 contains [0,0,1,0]
    // %1 contains [A,B,C,D]
    F("pick_reduce_4") %0, %1;  
       pick      (%0, %1)        %2;     
       
       alternate (%2, %2)        %3 %4,
       recur     (%3, %4, %3)    %5;
    
       alternate (%5, %5)        %6 %7,
       recur     (%6, %7, %6)    %8;
    
       extract   (%8, w(0ul))   %9
       ret       (%9);
    

    Some example runs:

    
       input: %0 = [1,0,0,0] %1 = [a,b,c,d]
    
       %2 = [a,,,]
       %3 = [a,a,,]   %4=[,,,]
       %5 = [a,a,,]
       %6 = [a,a,a,a] %7=[,,,]
       %8 = [a,a,a,a]
       %9 = a
    
    
       input: %0 = [0,1,0,0] %1 = [a,b,c,d]
    
       %2 = [,b,,]
       %3 = [,,b,b]      %4=[,,,]
       %5 = [,,b,b]
       %6 = [,,,]        %7=[b,b,b,b]
       %8 = [b,b,b,b]
       %9 = b
    
    
       input: %0 = [0,0,1,0] %1 = [a,b,c,d]
    
       %2 = [,,c,]
       %3 = [,,,]      %4=[c,c,,]
       %5 = [c,c,,]
       %6 = [c,c,c,c]  %7=[,,,]
       %8 = [c,c,c,c]
       %9 = c
    
    
       input: %0 = [0,0,0,1] %1 = [a,b,c,d]
    
       %2 = [,,,d]
       %3 = [,,,]      %4=[,,d,d]
       %5 = [,,d,d]
       %6 = [,,,]      %7=[d,d,d,d]
       %8 = [d,d,d,d]
       %9 = d
    

    Shuffle is to reorder a vector? If I understand it correctly:

    
        shuffle ([a,b,c,d],  [3,2,1,0])  =  [d,c,b,a]
        shuffle ([a,b,c,d],  [1,1,1,1])  =  [b,b,b,b]
        shuffle ([a,b,c,d], [0, None, None, None]) = [a, None, None, None]
    

    Then perhaps we can use the left operation to find the index of the selected element, this should work for vectors of any size and it would be O(1):

    
    F("pick_reduce") %0, %1;  
       left       (%0)           %2;
    
       // this just builds a vector the same width as the data vector
       // the first element is the index of the element we want
       // the rest do not matter and can be junk
       inject     (%1, %2, 0)    %3;
       
       shuffle    (%1, %3)       %4;
    
       extract    (%4, 0)        %5,
       ret        (%5);   
    

    If I understand correctly, the left operation will return the amount of leading zeroes. So if %0 = [0,0,1,0], then %2 will be 2.
    Then I create a vector with that result as the first element in %3 and use that to shuffle the input vector, the first element in the result will be the element at position 2.

    Is this correct Ivan?

    • This reply was modified 6 years, 5 months ago by  goldbug.
    • This reply was modified 6 years, 5 months ago by  goldbug.
    • This reply was modified 6 years, 5 months ago by  goldbug.
  • goldbug
    Participant
    Post count: 53
    in reply to: IPC questions #3042

    Very cool.

    I have never seen an architecture that can do IPC without involving the kernel. Had that been available 20 years ago, the operating systems today would be very different.

    When are we going to have the sim available? I am dying to play with it.

    • This reply was modified 6 years, 5 months ago by  goldbug.
  • goldbug
    Participant
    Post count: 53
    in reply to: IPC questions #3039

    Doesn’t the XOR make it hard to fork?

    It means the OS needs to find a turf id such that when globalizing addresses, they don’t overlap with global addresses that are already in use.

    And what is it that you XOR? you have that register with the current turf, you xor that with a local pointer to globalize it? I guess I am a bit confused why you chose xor as opposed to simply adding, then the register essentially has the starting address of the current turf.

  • goldbug
    Participant
    Post count: 53

    I’m really curious on how the mill can defend against spectre.

    As I understand it, there are really 2 ways to have speculative execution in the mill:

    1) Branch prediction, somewhat similar to your typical OOO, only you predict exits.
    2) Compiler scheduled. The compiler flattens 2 or more branches, executes them all simultaneously and then picks the result of the winning branch.

    Both mechanisms can have side-channel effects, for example in the cache or the spiller. Especially the compiler scheduled speculation since it can be very large.

    An attacking program can then check the status of the cache lines or whether one of his scratch pad areas have been spilled and deduce secret data from the victim across turfs. Other than checking array bounds (tough or impossible in C), how on earth can you defend against that?

    Or maybe there is something I am not aware of?

    • This reply was modified 6 years, 2 months ago by  goldbug.
  • goldbug
    Participant
    Post count: 53

    I see.

    Still, it is certainly possible that array1[x] is withing the process memory space. My understanding is that the mill does not do array bounds check, so array1[-1] would simply read data from another part of memory which could be within the turf address space. If you pass the right value for x you can make it read any byte within the process address space. No NaR would be generated in this case.

    If you pass the wrong value for x, that would point to a memory area that the turf does not have access to, then yeah, you would get a NaR, but an attacker would be careful to pass the right value.

  • goldbug
    Participant
    Post count: 53

    Yes, that is what I figured as far as Meltdown goes.

    But Spectre is a different beast, much more complicated, no NaR’s are involved at all.

    “Well, I can imagine attacks along these lines that would let someone figure out which hidden addresses are in caches”

    This is precisely the problem. With spectre, you can trick a program to load different areas of a cache depending on secret information. The attacker can then check which cache line was loaded and deduce the secret information based on that. The contents of the cache are not important and can remain inaccessible, what is important is which line was loaded.

    • This reply was modified 6 years, 2 months ago by  goldbug.
    • This reply was modified 6 years, 2 months ago by  goldbug.
  • goldbug
    Participant
    Post count: 53
    in reply to: code examples? #3011

    Loops are hard to vectorize, the compiler needs to reason about the structure of the program in order to do that.

    Is pipelining easier or it is also a monster problem?

  • goldbug
    Participant
    Post count: 53
    in reply to: code examples? #2934

    I would gladly take the credit… if I came up with it. Those snippets came straight out of your talks. I was hopping that since you already looked into them you would have something already.

  • goldbug
    Participant
    Post count: 53
    in reply to: switches #2940

    Hmmm, on second thought, maybe picking using a function result as selector is not that uncommon, for example:

    
    if (isEmpty(mylist))
        return 3;
    else
        return 5;
    

    Sorry if I am wasting precious Ivan cycles.

  • goldbug
    Participant
    Post count: 53
    in reply to: switches #2939

    I agree, it probably is not that common.

    Regarding phasing. It would be highly undesireable to speculate calls. Calls will probably be expensive, may have side effects, and might cause your current belt to spill/fill, unlike say an add or a eql that have no side effects.

    Therefore, would it not make more sense to move up the pick phase in front of call phase so that you can use picks to avoid calls? I assume the pick is currently after the calls because calls can return values, but if you think about it, you would never want to execute a call speculatively which greatly diminishes the value of picking call results.

    This is a much more generic and common occurrence than the switch ranges.

  • goldbug
    Participant
    Post count: 53
    in reply to: switches #2937

    I see the point. As it is the instruction set is beautiful for it’s uniformity and that betw operation would break that.

  • goldbug
    Participant
    Post count: 53
    in reply to: switches #2936

    That is cool that you can use the pick results in the calltr. In the documentation it sounds like the pick phase is after the call phase so I was not sure.

    The chain of less than is a great approach if you have a bunch of returns or branches, but it is not great when you have a bunch of calls. Consider this example:

    
    int foo(int x)
    {
       switch(x) {
          case 0:
          case 1:
          case 2:
             return bar();
          case 3:
          case 4:
          case 5:
             return baz();
          default:
             return 0;
       }
    }

    Since calls don’t stop execution, if you try to do a chain of lsss, you will end up calling bar multiple times. You can do this in one instruction if you checked ranges instead. Like this:

    F("foo") %0;
        rd     (w(0ul))     %1,
        geqs   (%0, 0)      %2,
        leqs   (%0, 2)      %3,
        leqs   (%0, 5)      %4,
        pick   (%2, %3, %1) %5,    // range check x>=0 && x <= 2
        pick   (%3, %1, %4) %6,    // range check !(x<=2) && x <= 5   same as  x>=3 && x <=5
        calltr1 (%5, "bar") %7,    // call function only if in range a simple lsss would not be enough
        calltr1 (%6, "baz") %8,    // call function only if in range a simple lsss would not be enough
        retntr (%5, %7),
        retntr (%6, %8),
        retn (%1);
    

    I would think that switching function calls is common, perhaps not with ranges though, so maybe I am overthinking it.

Viewing 15 posts - 16 through 30 (of 42 total)