Forum Replies Created
- DavidParticipantApril 17, 2015 at 2:48 pmPost count: 32
“In addition, each turf (and in process-oriented systems each process corresponds to a turf) has a local address space that is located non-contiguously in the global shared address space.”
By non-contiguously, do you mean individual local address spaces are not a single, contiguous regions in the global space?
- DavidParticipantApril 14, 2015 at 1:57 pmPost count: 32
My misunderstanding. For some reason I confused the “always 64 bit” with the integer size, instead of just the addressing space. The test “x < [>64-bit constant]” would be constantly true if x was max 64 bits in length. But with 128-bit integers, my question is moot.
- DavidParticipantApril 10, 2015 at 5:21 pmPost count: 32
“a < 0x12345678901234567”, that’s a 68-bit literal. Is that supported?
- DavidParticipantJanuary 6, 2015 at 2:13 pmPost count: 32
- DavidParticipantJanuary 5, 2015 at 3:18 pmPost count: 32
Will: “What kind of projects has everyone been involved with previously?”
I’m still involved in 8-bit home/retro computing, where Forth is still a pretty well regarded language, and all the fiddly little implementation decisions have a large impact on performance.
Well before hearing about the Mill, I did create a VM interpreter on the 6502 which has a machine-generated instruction set, AcheronVM. (The fact that anything outside of hand-allocated opcode values is astonishing to people is astonishing to me.) I was trying to come up with some balance between stack-based, accumulator-based (both of which can have great code density), and frame-based (which can bring a lot of expressive power to more complex data activity, and less shuffling than the other two). I settled on a fine-grained, non-hiding sliding register window with both direct register access and a variable accumulator (the “prior” used register can act like an implied accumulator). Within the bounds of 6502 code interpreting it, where no parallel effects can be had as in hardware, it’s a pretty good condensing of high level code representation. Of course, in hardware like the Mill, you can really go out of the box and redo everything. In software, the expense of dispatch is serial and significant.
From my work in Forth and various experiments like that (which I am still using for Commodore 64 hobby game development, and have matured farther offline), I agree that a straightforward Forth-like ABI/interpreter planted on the Mill belt model might not be the best way to go. A Forth which uses traditional memory-based stacks would be simple to write, but wouldn’t have the performance gain of belt-based data. A Forth compiler which reworks the stack operations into dataflow expressions would be much more involved to write (especially if keeping all the ANS Forth standard operations), and there would likely be an impedance mismatch between that which is natural and efficient in traditional Forth, vs what comes out natural & efficient in the compiled Mill code.
Beyond that, I’m not sure how much Forth source code reuse is practical. Older Forth code bases would have a strong representation of ANS standard Forth, but there are many other dialects. I’m not certain what more recent users of Forth go for. Beyond the most basic operations, it is a very plastic language. Even Chuck Moore laments the lack of new Forth language ideas post-standardization. The dialects he uses diverge from the ANS standard.
So if something with a good matchup for the Mill can be designed, I think Forth is a great language for allowing easy, simple bootstrapping of a system, but not necessarily to pull in large existing Forth-language code bases. Given its age, Forth’s memory and filesystem abstractions are next to non-existent, and stuff like threading and networking are often outside its realm completely. Forth applications tend to be system-owning, hitting the hardware & memory space straight.
I’m actually quite interested in being involved in building non-cycle-exact Mill emulators to help bootstrap the user software ecosystem.
- DavidParticipantOctober 16, 2014 at 2:04 pmPost count: 32
Something that was not addressed in this talk is virtual and indirect calls/jumps. Predicting these has significant effect not just with dynamic languages but as low-level as C++.
But often the type varies for a given call site. Unlike branch prediction, it’s not limited to a binary decision between two potential addresses. Opportunity to prefetch the next EBB can come from hoisting the load of the dispatch address, which can go pretty far in functions which perform processing themselves before the dispatch call. However, the software would need to be able to give this address to the exit chaining mechanism in order to gain value from this prefetch.
There are also concepts of history tables linked to object identity instead of call site address. With these, multiple call sites could benefit from a single knowledge update, but I don’t think they’re as appropriate for general purpose CPUs as they’re usually specific to object system ABIs.
- DavidParticipantAugust 28, 2014 at 5:54 pmPost count: 32
Seeing how much work needs to go into the compiler to use the hardware well is relatively daunting for somebody looking at distinctly Not-C-family compilers for the Mill. However, I then remember LLVM. Since it has loop IR classes, I hope this means all the pipelining analysis & transformation you’re writing lives in the LLVM middle-end for any front-end to use?
Maintaining a language myself, I completely understand and empathize with the years of battling unclean semantics and finally having that eureka moment that makes it all so simple. We’re very slow to make language changes even for obviously good ideas for precisely that reason.
It would be great to see some “ah hah!” allowing dynamic languages (specifically flexible calling conventions & multiple stacks) to slot into direct Mill support. But if not, it may not matter too much. These last few decades have witnessed compiler advances delivering effectively native speeds for dynamic languages on existing architectures so far.
Again, a great talk, and it’s wonderful to see all these ideas be able to coalesce in our minds, filling in all the previously NYF gaps.
- DavidParticipantFebruary 5, 2015 at 6:05 pmPost count: 32
Ralph: Just looking at those constraints, I think that modifying Forth might be an easier language to bootstrap. Here are some of the basic modifications:
*** Edit: How do you format text on this board?
Operations would read existing belt data, and add new values
Would operations be able to address the belt, or would there be a belt “PICK” operation to put previous items onto the front of the belt?
Former would be more Mill-ish, latter would be more Forth-ish
Even in the Mill-ish form, addressing the stack would be optional, defaulting to the front of the belt
Addressing the stack should be able to be done via label, not just depth
lit 1 \ push a literal number lit 3 x: add \ Declares 'x' to be the location on the belt holding the (first?) return value of this instruction ... lit 4 add x \ References wherever on the belt 'x' was. The other parameter defaults to the front of the belt. sub y,0 \ Numeric references are a belt depth. Does this look too much like subtracting the literal number zero?
If ‘lit’ is properly hidden by the parser, then it could look more Forth-like instead of asm-like, but need to delineate the parameter references:
1 3 x: add ... 4 add (x) sub (y 0) \ Parens are a little un-forthy 4 add ^x sub ^y ^0 \ Decorate each reference instead of the beginning/end tokens?
No sharing of stack data between functions, except parameters and return values
Function declarations would have to specify the number of params & returns
Can’t be variable, unless there’s vararg lists in memory, which I wouldn’t bother with
This limits a number of clever operations and could bloat up user code
The compiler would likely need simple type inference to figure out which typed instruction to generate for ‘add’ etc
Could either foist that on the user, or does the specializer do some of this?
Forth foists it on the user, so might as well do that here
I think building on this basic thinking, of applying Mill-isms to Forth instead of vice-versa, would yield a simple usable bootstrapping language. I think it would end up looking a fair amount like GenAsm, though.
- This reply was modified 7 years, 6 months ago by David.
- DavidParticipantJanuary 6, 2015 at 11:39 pmPost count: 32
I shudder to think of what call/cc would require on the Mill, if the built-in stack frame architecture is used. Would it require some sort of privilege escalation to introspect and shuffle the stack around? The scratchpad and potentially the spiller would need to be involved, too.
- DavidParticipantJanuary 6, 2015 at 11:53 amPost count: 32
Ok, since the EBNF wasn’t completely syntactically correct, I assumed it was just a straight hand-edited wiki page.
To quote Ivan, “Have you considered direct interpretation of genAsm?” 🙂
It sounds like the official system would specialize to conAsm, and the emulator would run a conAsm with useful cycle tracking and such for a particular Mill family member spec. I think there’d be some value in an externally sourced, straight genAsm interpreter (actually, given that I’m working in Lisp, I’d convert it to Lisp code to get native machine speed). Since it would ignore many CPU specifics, it might run faster and be able to iterate features quicker.
Basically, it would be for testing your genAsm conceptually, while genAsm->conAsm->CPU-emulator would test how a Mill runs your code.
Plus, wouldn’t a MillForth emulator be written in genAsm? Need to run that somewhere, and the official chain isn’t out yet. One without all the exacting details nailed down would be a good substrate for getting the basic design decisions implemented (assuming the genAsm VM is correct in what it runs).
- DavidParticipantJanuary 6, 2015 at 11:03 amPost count: 32
“Have you considered direct interpretation of genAsm?”
I’m considering it. I’ve updated the wiki page on genAsm to fix some EBNF syntactic issues, and have the spec itself parsing correctly now.
Are there any public snippets of genAsm to test against? I know it’s in flux, but that’s why I’d execute from the specification directly, to absorb changes.
- DavidParticipantOctober 18, 2014 at 12:59 amPost count: 32
Beyond object method dispatch, lambda passing style would be another source of unstable transfer points. Custom lambdas passed into central utilities are often per-use closures. On the upside, the actual function that will be called is hoisted all the way up to an input parameter, which again would be an ideal for software injecting “upcoming|next EBB” hints into the prefetcher.
For context, how many cycles ahead of an exit would the exit table need to be notified in order to prefetch code from cache to execute without any stalls? I suspect this would vary between family members. If it’s under 10, I would imagine software hints could eliminate stalls for many truly unstable dispatched exits.
I agree that DRAM latency isn’t worth considering in these optimization scenarios. However, if the 5-cycle mispredict penalties are a concern, the fact remains that the absolute correct target for fully dynamic dispatch should be available in the software view far enough ahead of the call in enough situations to be beneficial to the hardware mechanism. The problem is communicating it from software into hardware.
The Mill has software-controlled prefetching of data via speculation, but not software-controlled prefetching of code (that we’ve seen). If the hardware predictor consistently fails for a dispatched usage case, there’s no other way to update or augment its functionality.
Having a compiler decide between whether to generate preemptive dispatch hints vs letting the predictor attempt to compensate would probably best be left to runtime log feedback, and might not be used by all compilers. But not having that option at all seems to me to be missing functionality that manifests in penalizing dynamic code.
(Obviously, hopping through multiple layers of dynamic indirection quickly would likely cause stalls no matter the prefetch mechanism, but most dynamic designs boil down to just one indirection per function call.)
- DavidParticipantJune 20, 2014 at 4:53 pmPost count: 32
We’re going to be rewriting the AOT and JIT compilers for the next version of our declarative programming server, so user code feeding the specializer is of particular interest to me, even if just for forward compatibility thinking.
At the location of a compiler backend, yes it is relatively equivalent to convert between graphs and LLVM-style virtual instructions. For compiler writers, it sounds like graph generation will be the expectation on the Mill? That could likely be a bit easier than instructions, thinking about it…