Forum Replies Created

Viewing 15 posts - 1 through 15 (of 32 total)
  • Author
    Posts
  • David
    Participant
    Post count: 32
    in reply to: fork() #1737

    “In addition, each turf (and in process-oriented systems each process corresponds to a turf) has a local address space that is located non-contiguously in the global shared address space.”

    By non-contiguously, do you mean individual local address spaces are not a single, contiguous regions in the global space?

  • David
    Participant
    Post count: 32

    My misunderstanding. For some reason I confused the “always 64 bit” with the integer size, instead of just the addressing space. The test “x < [>64-bit constant]” would be constantly true if x was max 64 bits in length. But with 128-bit integers, my question is moot.

  • David
    Participant
    Post count: 32

    “a < 0x12345678901234567”, that’s a 68-bit literal. Is that supported?

  • David
    Participant
    Post count: 32
    in reply to: Mill Logo #1629

    I had this logo idea at the time that this thread was made, but didn’t have time to execute it.

    The strokes of the letters evoke a transition between two states of the belt.

    It could naturally use some stylizing, but I like the concept.

  • David
    Participant
    Post count: 32
    in reply to: Simulation #1621

    Will: “What kind of projects has everyone been involved with previously?”

    I’m still involved in 8-bit home/retro computing, where Forth is still a pretty well regarded language, and all the fiddly little implementation decisions have a large impact on performance.

    Well before hearing about the Mill, I did create a VM interpreter on the 6502 which has a machine-generated instruction set, AcheronVM[1]. (The fact that anything outside of hand-allocated opcode values is astonishing to people is astonishing to me.) I was trying to come up with some balance between stack-based, accumulator-based (both of which can have great code density), and frame-based (which can bring a lot of expressive power to more complex data activity, and less shuffling than the other two). I settled on a fine-grained, non-hiding sliding register window with both direct register access and a variable accumulator (the “prior” used register can act like an implied accumulator). Within the bounds of 6502 code interpreting it, where no parallel effects can be had as in hardware, it’s a pretty good condensing of high level code representation. Of course, in hardware like the Mill, you can really go out of the box and redo everything. In software, the expense of dispatch is serial and significant.

    From my work in Forth and various experiments like that (which I am still using for Commodore 64 hobby game development, and have matured farther offline), I agree that a straightforward Forth-like ABI/interpreter planted on the Mill belt model might not be the best way to go. A Forth which uses traditional memory-based stacks would be simple to write, but wouldn’t have the performance gain of belt-based data. A Forth compiler which reworks the stack operations into dataflow expressions would be much more involved to write (especially if keeping all the ANS Forth standard operations), and there would likely be an impedance mismatch between that which is natural and efficient in traditional Forth, vs what comes out natural & efficient in the compiled Mill code.

    Beyond that, I’m not sure how much Forth source code reuse is practical. Older Forth code bases would have a strong representation of ANS standard Forth, but there are many other dialects. I’m not certain what more recent users of Forth go for. Beyond the most basic operations, it is a very plastic language. Even Chuck Moore laments the lack of new Forth language ideas post-standardization. The dialects he uses diverge from the ANS standard.

    So if something with a good matchup for the Mill can be designed, I think Forth is a great language for allowing easy, simple bootstrapping of a system, but not necessarily to pull in large existing Forth-language code bases. Given its age, Forth’s memory and filesystem abstractions are next to non-existent, and stuff like threading and networking are often outside its realm completely. Forth applications tend to be system-owning, hitting the hardware & memory space straight.

    I’m actually quite interested in being involved in building non-cycle-exact Mill emulators to help bootstrap the user software ecosystem.

    1 = https://acheronvm.github.io/acheronvm/

  • David
    Participant
    Post count: 32
    in reply to: Prediction #1490

    Something that was not addressed in this talk is virtual and indirect calls/jumps. Predicting these has significant effect not just with dynamic languages but as low-level as C++.

    One facet would be virtual dispatch where the type of object being dispatched is generally the same across multiple executions at the same call site. The existing Mill predictor would naturally be able to learn this, and Java and JavaScript tend to have function rewriting to perform this type of prediction in software.

    But often the type varies for a given call site. Unlike branch prediction, it’s not limited to a binary decision between two potential addresses. Opportunity to prefetch the next EBB can come from hoisting the load of the dispatch address, which can go pretty far in functions which perform processing themselves before the dispatch call. However, the software would need to be able to give this address to the exit chaining mechanism in order to gain value from this prefetch.

    There are also concepts of history tables linked to object identity instead of call site address. With these, multiple call sites could benefit from a single knowledge update, but I don’t think they’re as appropriate for general purpose CPUs as they’re usually specific to object system ABIs.

  • David
    Participant
    Post count: 32
    in reply to: Pipelining #1420

    Seeing how much work needs to go into the compiler to use the hardware well is relatively daunting for somebody looking at distinctly Not-C-family compilers for the Mill. However, I then remember LLVM. Since it has loop IR classes, I hope this means all the pipelining analysis & transformation you’re writing lives in the LLVM middle-end for any front-end to use?

    Maintaining a language myself, I completely understand and empathize with the years of battling unclean semantics and finally having that eureka moment that makes it all so simple. We’re very slow to make language changes even for obviously good ideas for precisely that reason.

    It was nice to see mentioned that my Lisp & JavaScript ABI questions are still points of consideration, even if they’re well outside the initial assumptions of the Mill. In the meantime, a lot of consideration on my end has taken a JVM style approach of tracking common usages and issuing runtime recompiles to optimize specific parameter layouts.

    It would be great to see some “ah hah!” allowing dynamic languages (specifically flexible calling conventions & multiple stacks) to slot into direct Mill support. But if not, it may not matter too much. These last few decades have witnessed compiler advances delivering effectively native speeds for dynamic languages on existing architectures so far.

    Again, a great talk, and it’s wonderful to see all these ideas be able to coalesce in our minds, filling in all the previously NYF gaps. :-)

  • David
    Participant
    Post count: 32
    in reply to: Simulation #1703

    I tried <b>normal HTML tags</b>, but they were automatically &escaped; out.

  • David
    Participant
    Post count: 32
    in reply to: Simulation #1700

    Ralph: Just looking at those constraints, I think that modifying Forth might be an easier language to bootstrap. Here are some of the basic modifications:

    *** Edit: How do you format text on this board?

    Operations would read existing belt data, and add new values
    Would operations be able to address the belt, or would there be a belt “PICK” operation to put previous items onto the front of the belt?
    Former would be more Mill-ish, latter would be more Forth-ish
    Even in the Mill-ish form, addressing the stack would be optional, defaulting to the front of the belt
    Addressing the stack should be able to be done via label, not just depth

        lit 1   \ push a literal number
        lit 3
     x: add     \ Declares 'x' to be the location on the belt holding the (first?) return value of this instruction
        ...
        lit 4
        add x   \ References wherever on the belt 'x' was.  The other parameter defaults to the front of the belt.
        sub y,0 \ Numeric references are a belt depth.  Does this look too much like subtracting the literal number zero?
    

    If ‘lit’ is properly hidden by the parser, then it could look more Forth-like instead of asm-like, but need to delineate the parameter references:

      1 3 x: add
      ...
      4 add (x) sub (y 0)  \ Parens are a little un-forthy
      4 add ^x sub ^y ^0   \ Decorate each reference instead of the beginning/end tokens?
    

    No sharing of stack data between functions, except parameters and return values
    Function declarations would have to specify the number of params & returns
    Can’t be variable, unless there’s vararg lists in memory, which I wouldn’t bother with
    This limits a number of clever operations and could bloat up user code

    Dynamic types
    The compiler would likely need simple type inference to figure out which typed instruction to generate for ‘add’ etc
    Could either foist that on the user, or does the specializer do some of this?
    Forth foists it on the user, so might as well do that here

    I think building on this basic thinking, of applying Mill-isms to Forth instead of vice-versa, would yield a simple usable bootstrapping language. I think it would end up looking a fair amount like GenAsm, though.

    • This reply was modified 9 years, 10 months ago by  David.
  • David
    Participant
    Post count: 32
    in reply to: Simulation #1646

    I shudder to think of what call/cc would require on the Mill, if the built-in stack frame architecture is used. Would it require some sort of privilege escalation to introspect and shuffle the stack around? The scratchpad and potentially the spiller would need to be involved, too.

  • David
    Participant
    Post count: 32
    in reply to: Mill Logo #1643

    I don’t have any layered file. I made it in JavaScript, so its semantics can be modified directly.

    http://www.white-flame.com/mill-logo.html

    Save the source and play with it, in particular boxSize and height.

  • David
    Participant
    Post count: 32
    in reply to: Simulation #1626

    Ok, since the EBNF wasn’t completely syntactically correct, I assumed it was just a straight hand-edited wiki page.

    To quote Ivan, “Have you considered direct interpretation of genAsm?” 🙂

    It sounds like the official system would specialize to conAsm, and the emulator would run a conAsm with useful cycle tracking and such for a particular Mill family member spec. I think there’d be some value in an externally sourced, straight genAsm interpreter (actually, given that I’m working in Lisp, I’d convert it to Lisp code to get native machine speed). Since it would ignore many CPU specifics, it might run faster and be able to iterate features quicker.

    Basically, it would be for testing your genAsm conceptually, while genAsm->conAsm->CPU-emulator would test how a Mill runs your code.

    Plus, wouldn’t a MillForth emulator be written in genAsm? Need to run that somewhere, and the official chain isn’t out yet. One without all the exacting details nailed down would be a good substrate for getting the basic design decisions implemented (assuming the genAsm VM is correct in what it runs).

  • David
    Participant
    Post count: 32
    in reply to: Simulation #1622

    “Have you considered direct interpretation of genAsm?”

    I’m considering it. I’ve updated the wiki page on genAsm to fix some EBNF syntactic issues, and have the spec itself parsing correctly now.

    Are there any public snippets of genAsm to test against? I know it’s in flux, but that’s why I’d execute from the specification directly, to absorb changes.

  • David
    Participant
    Post count: 32
    in reply to: Prediction #1492

    Beyond object method dispatch, lambda passing style would be another source of unstable transfer points. Custom lambdas passed into central utilities are often per-use closures. On the upside, the actual function that will be called is hoisted all the way up to an input parameter, which again would be an ideal for software injecting “upcoming|next EBB” hints into the prefetcher.

    For context, how many cycles ahead of an exit would the exit table need to be notified in order to prefetch code from cache to execute without any stalls? I suspect this would vary between family members. If it’s under 10, I would imagine software hints could eliminate stalls for many truly unstable dispatched exits.

    I agree that DRAM latency isn’t worth considering in these optimization scenarios. However, if the 5-cycle mispredict penalties are a concern, the fact remains that the absolute correct target for fully dynamic dispatch should be available in the software view far enough ahead of the call in enough situations to be beneficial to the hardware mechanism. The problem is communicating it from software into hardware.

    The Mill has software-controlled prefetching of data via speculation, but not software-controlled prefetching of code (that we’ve seen). If the hardware predictor consistently fails for a dispatched usage case, there’s no other way to update or augment its functionality.

    Having a compiler decide between whether to generate preemptive dispatch hints vs letting the predictor attempt to compensate would probably best be left to runtime log feedback, and might not be used by all compilers. But not having that option at all seems to me to be missing functionality that manifests in penalizing dynamic code.

    (Obviously, hopping through multiple layers of dynamic indirection quickly would likely cause stalls no matter the prefetch mechanism, but most dynamic designs boil down to just one indirection per function call.)

  • David
    Participant
    Post count: 32
    in reply to: Specification #1157

    We’re going to be rewriting the AOT and JIT compilers for the next version of our declarative programming server, so user code feeding the specializer is of particular interest to me, even if just for forward compatibility thinking.

    At the location of a compiler backend, yes it is relatively equivalent to convert between graphs and LLVM-style virtual instructions. For compiler writers, it sounds like graph generation will be the expectation on the Mill? That could likely be a bit easier than instructions, thinking about it…

Viewing 15 posts - 1 through 15 (of 32 total)