Forum Replies Created

Viewing 15 posts - 1 through 15 (of 37 total)
  • Author
    Posts
  • PeterH
    Participant
    Post count: 41

    Looking at this:
    – traditional architectures respond to an invalid memory access (and some other invalid operations) by faulting.
    – with OOO speculative execution, the fault can’t be issued until the speculation is resolved and we know whether the code needed to run.
    – So speculatively executed code has security deferred and applied retroactively. Which would not be a problem except for side effects, like the cache.

    The mill, in contrast, uses NARs to defer faulting as far as possible. We expect to be explicitly executing code with bad input, and then tossing the result based on a comparison performed in parallel. The invalid access is marked BEFORE the result can be used for any operation which might produce side effects. So the exfiltration step gets not a speculative value, but a very real NAR.

    I’m thinking that a mostly traditional architecture could avoid Specter if it also used NAR tagging on invalid memory reads.

    Going back to the example, what if A1[x] was a legal read for the program, but outside the range of the array A1? No fault or NAR would be generated. So the execution would have to be ordered to perform the range check before the lookup in A2 was performed. An easy enough check for the Mill specializer, not such an easy patch for existing OOO hardware.

  • PeterH
    Participant
    Post count: 41

    Sometimes a repeatable result is more important than a statistical small improvement in precision. In one example case, 3D modeling, a random round-off may require a slightly larger “close” check to tell if 2 vertices on polygons should be considered the same point.

  • PeterH
    Participant
    Post count: 41
    in reply to: MILL and OSS #3203

    One persistent conspiracy theory concerning existing commodity CPUs is “features” undisclosed to the consumer that allow Three Letter Agencies to do unfriendly things with your computer. An open source specializer, operating in place of the closed out of order computing manager, would make the Mill favorable to people concerned with secret features on their computers.

  • PeterH
    Participant
    Post count: 41
    in reply to: switches #2858

    At the risk of diverging the topic, I’m wondering at the impact of different optimization settings on code size. With the architectures I’m familiar with, -O3 and above have an unfortunate tendency to produce larger code in the interest of speed. But techniques like unwinding loops don’t appear to be desirable on the Mill.

  • PeterH
    Participant
    Post count: 41

    Is $1M/year going to be enough funding for the Mill?

  • PeterH
    Participant
    Post count: 41
    in reply to: Execution #1908

    In the Compiler talk it was mentioned that launching coroutines is a user level operation. How is a process prevented from DoSing the chip by endlessly launching coroutines?

  • PeterH
    Participant
    Post count: 41

    Putting a Mill and an x86 compatible core on the same chip doesn’t make much sense to me. Putting the processors in separate chips together on a motherboard might make sense, though code translation removes much of the need for that.

    One issue putting 2 CPU chips on a board, even of the same kind, would have to deal with is coordination of physical memory allocation. My impression is that the mill hardware automates a great deal of physical memory allocation, though provisions would have to be made for telling the hardware what address space is available for the purpose, and addressing specific hardware addresses.

  • PeterH
    Participant
    Post count: 41

    The exact rounding mode sounds like it would fault on most division and many addition operations.

  • PeterH
    Participant
    Post count: 41

    Given expanding arithmetic and splitting large words into 2 smaller words, arbitrary precision arithmetic can be implemented without too bad efficiency. That’s as much as you get on x86, and I believe x64 architectures. With the vector operations of higher end Mill processors, it gets better. Not sure what else you might want for such numeric representation.

  • PeterH
    Participant
    Post count: 41

    1) What is NaR + NaR? Whose metadata used?
    It will be one of the inputs; which one is implementation dependent.

    So if it were a case of Null + some NaR that triggers an exception when written to memory, the behavior would depend on the particular Mill core? I was expecting some priority between the various NaRs.

  • PeterH
    Participant
    Post count: 41

    Encrypting spiller frames and PLB tables being written to DRAM sounds like a viable option for higher security situations. Encrypting other traffic would present complications if that same data is used by some other device, such as DMA for a hard drive or network.

  • PeterH
    Participant
    Post count: 41

    Noit, reordering to mix foo() and bar() calls would be a no brainer in a strictly functional programming language. In that case reworking the code to
    return bar(foo()) + ...
    But in C a function may have side effects, and the order in which calls are made may impact the results. And so the compiler must issue the calls in the order given in the source code to insure correctness.

  • PeterH
    Participant
    Post count: 41

    Sounds like a fast deep recursion, like the classic Fibonacci algorithm, would stress the Mill. Then again, I’m not sure classic superscalar OOO processors would do better.

  • PeterH
    Participant
    Post count: 41

    When a subroutine call is made it shouldn’t be necessary for the spiller to save the state of the entirety of the caller’s belt at once. Give that functional units are tagged with both frame and belt position, I don’t expect the belt would need to be spilled any faster than new values are being produced and old values would drop off the end of the belt had there not been a subroutine call. If the subroutine is short, or uses a different subset of available functional units, it’s possible much of the caller’s belt would not need spilling.

  • PeterH
    Participant
    Post count: 41
    in reply to: Prediction #2056

    Considering how the Mill uses metadata and select statements, 10-20 deep prediction may be a match for 20-40 deep prediction on a more conventional out-of-order superscalar. If that’s enough to effectively hide memory latency, likely better to keep with the simple two-bit predictor leaving more silicon real-estate for other mechanisms.

Viewing 15 posts - 1 through 15 (of 37 total)