Forum Replies Created

Viewing 15 posts - 1 through 15 (of 19 total)
  • Author
    Posts
  • Joe Taber
    Participant
    Post count: 25

    It’s been a while since this thread was active, and WASM is coming along nicely, if not quickly. Here are a few neat developments (picked to sample recent progress, rather than totality of progress): First, just-in-time code generation within webassembly which uses a JIT strategy that would fit right in on a Mill, and then Postgres compiled to WASM running in browser which is surprising.

    Ivan’s analysis seems spot on that the general WASM execution model maps fairly nicely to the Mill. I don’t think it’s too far to say that WASM’s principles align reasonably well with the Mill in general, and notably better than existing architectures. Don’t be fooled by the presence of the word “web” in its name, it’s design is closer to an intermediate representation for architecture-agnostic machine code than something that needs a heavyweight VM runtime like JS/JVM/.NET.

    I suspect that WASM would be a great ‘IR’ to quickly port existing programs to work on the Mill without requiring (further) architecture retargeting. Once a WASM interpreter/compiler is built for the Mill — probably a more tractable task than adding the Mill as a new backend architecture in an existing compiler — many hundreds of existing WASM programs will start working right away. This could be a boon to get your hands on a wide array of nontrivial programs for testing and development, and as a fast path to write custom applications for the Mill.

    • This reply was modified 2 years ago by  Joe Taber.
    • This reply was modified 2 years ago by  Joe Taber.
  • Joe Taber
    Participant
    Post count: 25

    I made some conasm stuff, I’d love to be able to test it.

  • Joe Taber
    Participant
    Post count: 25

    The scary thing about this is that it’s an attack on how speculation works in the context of caches and prediction in very general terms.

    Mill has a lot of things going for it: like returning NaR in loads without permission, not being OOO, and turning traditionally branch-heavy code into linear code with a pick at the end (basically auto-converting tons of code into constant-time algorithms for free… which might turn out to be an important *security* feature now that I think about it).

    So it looks like some of these attacks are straightforwardly not possible on the mill. But other attacks seem so general… Spectre is essentially a whole new category of timing attacks. I’d like to hear the mill team’s initial impressions.

  • Joe Taber
    Participant
    Post count: 25
    in reply to: Vector ops #3035

    Another question.

    Say I have two vectors: [A B C D], [0 0 1 0] where the second is a logical vector indicating which element of the first vector to extract (probably computed from the vector itself). What is the best way to get a scalar C out?

    I know I could do a pick(a,b) to get Nones or zeros like [None None C None], but how do I get the C out into a scalar? I know I could use a series of extract + picks (4 extract, 3 picks in this case), is there a better way?

  • Joe Taber
    Participant
    Post count: 25
    in reply to: Memory #2984

    I have a question about hoisted deferred loads and function boundaries.

    Consider a relatively large function that requires multiple loads. I would expect at the start of the function to be a flurry of address calculations and deferred loads, as much as possible before getting on with the rest of its’ functionality in order to hide cache/dram latency as much as possible. I might even call it a ‘deferred load preamble’, not officially, but I could see it being a common enough pattern to recognize it.

    So my first question: Does this scenario sound reasonable? Would you expect it to be that common?

    Now lets extend it. Break up the function into three smaller functions. Lets assume it’s very simple and you can just group instructions together into their own functions, with outputs flowing to inputs etc. So instead of one big section at the beginning where all the loads are issued, each smaller function has its own ‘deferred load preamble’. This would mean that e.g. the last of the three was not able to defer its loads as far and may suffer more from memory latency issues.

    Does this also sound reasonable? Is it just the compiler’s (|| specializer’s) responsibility to inline functions and hoist loads as much as possible or does mill hardware offer any mitigation to this issue? It’s not OOO, so I wouldn’t really expect it to “peek ahead” to see those loads, but then again the mill’s durability to speculation would really help such an implementation.

    Thoughts?

  • Joe Taber
    Participant
    Post count: 25
    in reply to: What's New? #1489

    If anyone’s interested I managed to find the RSS feed url for the whole forum. It’s not very active at the moment so it’s easy to keep up with it in my RSS reader, NewsBlur. If you want additional options like filtering sub forums etc, I believe this uses the built-in wordpress rss engine so a quick search for additional query parameters might work, though at the current volume that’s hardly necessary.

    http://millcomputing.com/?feed=rss2&post_type=forum

  • Joe Taber
    Participant
    Post count: 25
    in reply to: ASLR (security) #901

    It all comes down to the tradeoffs. The mill’s way of doing it has many advantages:

    • Separation of concerns by splitting the PLB and TLB
    • Allowing the TLB to be moved down the cache hierarchy, giving it the opportunity to be smarter, more complex, larger, slower, and cheaper without impacting performance.
    • Unifying the entire cache hierarchy with the processor core.
    • Freeing the only parallelizable part of the classic-TLB structure, i.e. protection, to be parallelized in the mill’s PLB.
    • Making the PLB small, fast, and no longer a bottleneck.
    • Allowing cache access to be fast and deterministic.
    • Opening the opportunity to introduce real security primitives based on authorization and least-privilege instead of obfuscation.

    (See the memory talk for these points.)

    Insisting on ASLR throws all of that away including the significant performance benefit of removing that TLB chokepoint out of the hottest part of the memory highway: between the processor and L1. All for what boils down to a form of security through obscurity.

    Make your secure services small, simple, with a low surface area and they will be much easier to keep secure, and avoid things like buffer overflows.

  • Joe Taber
    Participant
    Post count: 25

    Wow, thank you for the detailed analysis! I’m glad the problems that the paper tried to address wouldn’t even apply on a mill.

    I hope the security talk is posted soon!

  • Joe Taber
    Participant
    Post count: 25

    Perhaps an appropriate OSS project would be to build a mill backend for LLVM. Since this would plug into all the existing front-end compilers for free, we’d be able to see how large real world programs would look in mill asm.

    Or maybe start implementing a bunch of the interesting RosettaCode examples in mill asm by hand. I don’t know if RosettaCode would accept code samples in mill asm, but at the very least they could be listed and discussed here.

    Of course, both of these would require some form of an asm spec. We may be able to construct effective mill asm even without the complete specification, i.e. omit everything NFY, but I wouldn’t know if it would be useful in that state.

  • Joe Taber
    Participant
    Post count: 25

    I’m not familiar with wordpress, but Discourse is a next gen discussion software that is extremely awesome. I’m pretty sure there’s a wordpress plugin as well. It’s rather new, so I’m not sure if you guys want to move to it, but I thought you might want to know.

  • Joe Taber
    Participant
    Post count: 25
    in reply to: Execution #651

    Since shuffle is op phase and the pick phase is after the op phase (iirc), could shuffle+pick on a vector happen in a single instruction and on literal 0 or None? (That is, no explicit 0 or None vector needs to take up a belt position.).

    I guess that would get what I was asking for, except you’d need a way to generate and park the vector to pick on, unless you can pick on a vector literal, if such a thing exists.

  • Joe Taber
    Participant
    Post count: 25
    in reply to: Execution #647

    So shuffle can duplicate elements, can it replace elements with 0?

  • Joe Taber
    Participant
    Post count: 25
    in reply to: Metadata #512

    Well then that idea is out.

    How big are lisp objects, anyways? If they’re too big you might have to deal with memory anyways. E.g. ruby is moving from 5 word objects to 6 or 8 word objects (because they fit more evenly into cache lines). Yes, you read that correctly, that’s 8 words of 8 bytes each or 64 bytes per object, which pushes the total belt size on small mills already, let alone returning half a dozen of them.

    • This reply was modified 10 years, 7 months ago by  Joe Taber.
  • Joe Taber
    Participant
    Post count: 25
    in reply to: Metadata #506

    If the return types are all homogeneous, could the caller always return a vector of maximum size with None filled in for omitted values?

  • Joe Taber
    Participant
    Post count: 25
    in reply to: Instruction Encoding #390

    So the transfer op executes in the same cycle as other ops to do something in a loop…

    How do you break out of the loop?

    What happens when two transfer ops are executed in the same cycle?

Viewing 15 posts - 1 through 15 (of 19 total)