Forum Replies Created
- AuthorPosts
Changes in the divide instructions — now native on Silver?
Greetings all,
According to the wiki: http://millcomputing.com/wiki/Instruction_Set/divu
Divide operations are no longer all emulated via loops around (less expensive in hardware) helper operations. This is an interesting departure from the strategy of emulating divide by looping around a helper function (that’s less hardware costly.)
It seemed to me that this might happen at some point, for some Mill core(s), but I expected that it wasn’t on the critical path, so am interested in why change it now?
In some ways, this reminds me of what seemed to happen with the ARM7 series of chips. The base ARM7 lacked hardware divide, and although there were software emulations, I think ARM licensed far more ARM7’s with hardware divide than without. I think system designers were worried about worst-case soft-divide performance, should the software guys intend on writing divide-heavy code. (Shame on them!)
Thoughts?
- in reply to: Status of LLVM compiler toolchain #1716
Will,
Thanks for the info about a talk just a few weeks away. Such a talk, especially a keynote address at the LLVM conference, strongly implies some good news coming! š
I hope you folks can get materials from that talk online ASAP! Personally, I’d be happy to see the slides sooner by themselves, rather than having to wait for an edited video. Will Millcomputing be getting audio/video recording(s) of Ivan’s keynote? (Hope so!)
- in reply to: Status of LLVM compiler toolchain #1714
Will, Ivan, et al:
Could you (if the IP issues permit) give us any more detail on the LLVM-for-Mill port and its status? In at least one of Ivan’s presentations, Ivan indicated that “major surgery” was needed to make LLVM compile Mill code, because LLVM assumed a register-machine target, which the Mill is not. Will’s statement seems to say the port is well in hand, but says nothing about how or what you had to change to accomplish that. If that “major surgery” succeeded (or was handled in another way), telling the community of that significant success would IMHO help community confidence and probably ditto for prospective investors.
Of course, a talk on the toolchain — preferably including a demo of the compiler, specializer and simulator — would be fantastic! — though I know you guys are very busy. IMHO, the only thing better would be an alpha release of the toolchain and simulator! Please, tell us what you safely can, as soon as you can.
Thanks,
Larry
A subtle difference between hardware-native operations and emulation sub-graphs? (Re Mill Boolean predicate gangs)
According to the presentation on execution, exu-side/op-phase operations can pass condition codes to the next higher exu slot — which can then turn those codes into a Boolean result if so programmed.
If some Mill operation (let’s call it foo) is not native on a member, then foo will have to be emulated on that member by a sub-graph (which may be an emulating function call or an “inlined sequence of operations.) However, since this emulated foo will not have a slot of its own, I don’t see an emulated foo having an obvious way of mimicking a hardware-native foo’s ability to pass a CC set to another functional unit “horizontally” as a predicate gang, as would a native foo operation.
How does the Mill architecture and tool-chain handle this apparent difference between native and emulated operations?
Thanks in advance!
- in reply to: Simulation #1634
Ivan,
Are there any instructions that ask for/demand a pre-fetch of code from a *different* ebb?
Since the Forth data stack will almost certainly be in memory, the interpreter will likely often spend time loading operands from the data stack onto its own belt, before calling functions that implement Forth primitives. So if there’s a way to get code (e.g. for an upcoming primitive) fetched ahead of time, that would help speed up the interpreter loop considerably.
For a first-ever Mill interpreter, I’m more concerned with getting something to work correctly (with understandable code — a challenge in assembler!) than I am with optimal efficiency. That said, fast execution is certainly a plus.
Thanks,
Larry
- in reply to: Simulation #1623
David,
Good to hear of another person interested in Forth on the Mill.
I started three wiki pages about “Mill Forth.” See:
http://millcomputing.com/wiki/Preliminary_Design_for_Mill-Forth
http://millcomputing.com/wiki/How_best_to_map_the_core_components_of_Forth_onto_the_Mill%3F
Please take a look and feel free to add your ideas, perspectives and useful links. Same for anybody else reading. If you’re logged into the forums, you should also be logged into the wiki.
The idea of using the belt and scratchpad to serve as Forth’s data stack is attractive for speed reasons. However, I think that’s not the right approach for an initial version, since the belt doesn’t model a FIFO stack. While I’d like to let the Mill’s call/return machinery be Forth’s return stack, I don’t think that’ll work well, because the inner interpreter needs read/write access the stack for more than just call/return matters, including for the common DO … LOOP constructs. Breaking the interpreter’s ability to manipulate the return stack would IMHO make the result too far from Forth.
Overall, I agree that the only people likely to be porting much existing Forth code to the Mill are those of us interested in Mill Forth. Still, my instinct is to design something that’s recognizably Forth, for the first iteration. A belt-oriented interpreted language (or a type-aware Forth) are interesting notions, but I’d like to separate those from getting a Forth interpreter running on the Mill (initially under sim.)
So, my instinct is to go for a fairly vanilla port of Forth for the first iteration, with both stacks in memory. Doing so does not preclude using the Mill’s stack pointer, though the automatic stack cut-back upon true function return might make that problematic. Though if the inner interpreter never returns, we might be able to use it that way.
Happy New Year,
Larry
- This reply was modified 9 years, 10 months ago by LarryP. Reason: clarity
- in reply to: Simulation #1619
Thanks to Will, we now have a place on the Wiki for community ideas. If you’re logged into the forums, you can add your comments and ideas there. Feel free to chime in!
Under this category on the top-level wiki page, I’ve started the http://millcomputing.com/wiki/Software page. (Puzzlingly, the “Software” link under the category “Community” is still appearing in red, despite the fact that it’s not empty. I don’t know why.)
Under that, I’ve started a page about Forth or similar interpreters for the Mill; see
http://millcomputing.com/wiki/Forth_for_the_Mill_and/or_Forth-like_variants,_better_suited_to_the_MillFrom what I can tell, if you’re logged into the forums, you are also logged into the wiki, and can edit. I’m confining my edits to the discussion/talk tab for anything except those pages under the Community column on the TL wiki page.
- This reply was modified 9 years, 10 months ago by LarryP. Reason: clarity
- in reply to: Simulation #1618
Will,
I don’t know when that Ertl paper was/wasn’t paywalled. It turned up in a Google search I did today; I recognized the author and took a look. I hope it stays available.
Gcc computed gotos? Not the infamous “computed come from” statement? š
- in reply to: Simulation #1615
Greetings all,
Those of you interested in an interpreter/interpreted language for the Mill may enjoy reading the following paper, available (as of today) free online:
“The Structure and Performance of Efficient Interpreters” by M. Anton Ertl and David Gregg. Here’s the link: http://www.jilp.org/vol5/v5paper12.pdfOne thing this paper focuses on is that on many CPUs, branch prediction for indirect branches (e.g. via function pointers) is often much poorer than branch prediction for direct branches (like simple if/then/else constructs.) I suppose I should re-watch the presentation on the Mill’s prediction mechanism, to see what’s revealed about how the Mill will handle indirect branches. This issue is much less important for compiled code than for interpreted code, but apparently still gets involved in run-time method dispatch,e.g. in polymorphic objects.
I note that Yale Patt’s work is cited several times in this paper and I recall reading other interesting papers by Ertl.
Enjoy!
- in reply to: Introduction to the Mill CPU Programming Model #1612
David,
Endianness: Internally little-endian, your choice for memory.
From one of Ivan’s posts http://millcomputing.com/topic/metadata/#post-318
- in reply to: Simulation #1706
Ivan wrote:
We did decide that we would release, and support, alpha-level tooling on a āuse at your own riskā basis. A minority felt that we should wait for commercial-grade for fear that alpha-grade would forever paint us with ābroken and unusableā, but the majority felt that we should not try to hide our rough-shod nature.
I’m glad that some Mill software will be released in some fashion before reaching commercial grade. I’m happy to support any NDA or other terms/guidelines to make such an alpha-grade release help — rather than hinder — bringing the Mill architecture to market.
It seems that the genAsm assembler, specializer and a corresponding Mill simulator would be the first pieces of the toolchain available. So my focus is more focused on what we can do with those tools. Forth (or an interpreted language in that vein) seems doable with that partial Mill toolchain. I don’t know enough about functional languages, though this discussion has made me learn a little about them.
Renesac’s questions lead me to ask two more:
1. Of the seven phases identified in the Wiki http://millcomputing.com/wiki/Phasing , which ones *require* a clock cycle vs. which ones do not? Based on Ivan’s most recent comment and the Wiki’s entry, my best guess is that pick, conform and transfer do not require, of themselves, another clock cycle. Ivan &Co. please correct me if I’ve mis-read things.
2. In the execution talk and in the Wiki under phasing, we have diagrams of the reader/op/writer phases happening in steady state — but only for that case. What I’d like is some explanation of what happens when adjacent instructions do not all use the same phases. In such cases, are there essentially bubbles in the pipeline?
IMHO, a diagram would be very good for illustrating this case and hopefully explaining how the Mill executes when adjacent instructions use different phases from one another.
—-Side note:
The presentation on instruction decoding presents details most relevant to the exu-side decoding — but the flow-side encoding is substantially different in terms of how the header info and blocks are used! So the Wiki entry http://millcomputing.com/wiki/Decode (specifically the section titled “Flow Stream”) is good reading to understand what Ivan wrote about how and why the con operation is encoded as it is.- This reply was modified 9 years, 9 months ago by LarryP.
The linked article mentions the Mill CPU architecture just once, in a sentence so vague that I cannot honestly tell what the author believes to be true about the Mill:
BTW, this is a major reason Iām skeptical of the Mill architecture. Putting aside arguments about whether or not theyāll live up to their performance claims and that every chip startup I can think of failed to hit their power/performance targets, being technically excellent isnāt, in and of itself, a business model.
The first word of the above quote is such a vague reference (the previous paragraphs were about memory barriers and what is/isn’t guaranteed on multiprocessor systems), that I cannot tell what the author is trying to say about the Mill. Has anyone else read the linked article, and better understood what the author was trying to convey?
- in reply to: Simulation #1630
Ivan,
Can genAsm’s go construct do backwards branches? (that is to a label earlier in the same function??
If so, then that’s sufficient for writing the endless loop of an interpreter (though I’ll welcome the syntactic sugar of better looping constructs.)
From what I can tell, memory operations and statically allocated memory are the key other pieces needed to write a (very) basic Forth interpreter, e.g. one that starts out handling only scalar integers and pointers.
- in reply to: Simulation #1625
David,
You wrote:
I’ve updated the wiki page on genAsm to fix some EBNF syntactic issues, and have the spec itself parsing correctly now.
Please note that most of the wiki pages under architecture or infrastructure (except for the talk/discussion tab) are periodically overwritten from non-wiki source via the “generator.” Therefore your changes are likely to get overwritten. I strongly suggest you advise Ivan of your suggested changes and let him update the source upstream of the wiki.
Offhand, I don’t see much benefit of interpreting genAsm, since to get anywhere, we’ll already have access to compiled genAsm. (Though feel free to convince me otherwise.) I’d rather write a more nearly user-friendly interpreter, like Forth.
- AuthorPosts