Forum Replies Created
- AuthorPosts
- in reply to: Tooling update #1857
Nothing available yet to the public. Internally the specification, conAsm, and sim have been stable for a while, while the rest of the tool chain is struggling as we convert all our development environment to LLVM-related tools: CVS to git etc.
The gating event for public release will actually be the patents; we can’t put out a sim that shows how things work until the filings are in for those workings. And the tool chain without the sim is pretty useless. We had hoped to have it all done and ready for the public already – but you know how development goes, and it’s doubly true for a bootstrap.
Please be patient. We are impatient ourselves enough for all 🙂
- in reply to: Co-Exist and Merge: Not Supplant. #1850
Co-exist? It’s not clear one would want to 🙂 Binary translation is pretty good these days, and the Mill makes a pretty good ISA interpreter. We expect to ship an x86 translator/interpreter with the Mill. We need it because some I/O devices we will want to support come with on-board x86 code.
Code aside, there’s a lot of levels of “co-existence” between any pair of architectures. For example, what do you do if the cores have different endianness and share address space?
Specific to the Mill, endianness aside the Mill is reasonably data-compatible with any modern architecture, although the reverse is not necessarily true. Pain points might include the Mill support for IEEE decimal floating point (unless the other core was an IBM z-series), and the Mill’s support for quad (16 byte) scalar in both integer and floating point. However, if the app doesn’t use those types then alien cores should work OK when exchanging data with a Mill. Going the other way, the Mill won’t support 36-, 48-, and 80-bit data types as seen in various antiquated architectures, at least without software conversion.
If there is any problem, it will be in the memory hierarchy and the protection model when a Mill and another core are on the same chip and share address space. The Mill separates protection from paging and protects at byte granularity. Consequently it would not be possible for a Mill and an ARM (say) to safely share a 17-byte buffer, while two Mills can do so. The Mill also supports backless memory that has no allocated pages, which is impossible on a core where paging is in front of the caches. Consequently any memory shared between Mill and non-Mill would have to have real DRAM allocated for it. All these issues can be handled in software, but would have to be addressed.
Cache coherency issues cannot be handled in software for performance reasons, and the Mill coherency protocol is vastly simpler than and incompatible with the usual MOESI. Mill coherence is NYF, but I think hardware could force the Mill to use MOESI, although the performance hit would be painful; we’d be as slow as an x86.
A similarly painful hardware problem might occur with concurrency control. The Mill uses optimistic concurrency control, and should have no problem working with cores that also do so: PowerPC, M68k, z-series. However, cores that use bus locking for concurrency might have trouble. At a guess, you’d probably have assertion of a bus lock cause a bust of any in-flight Mill transaction. Going the other way, I suppose you could have the bus locked for the duration of an active transaction. Both of these would have a lot of spurious interference; it that was enough of a problem then you’d need more hardware smarts to do the integration.
Now your question addressed several chips on a board whereas my reply mostly addresses several cores on a chip. At the board level the only problem is data compatibility, which endianness aside should not be a problem. The Mill doesn’t share address space off chip, so all the hierarchy and protection issues are obviated.
Reading between the lines of your posting, it seems as if you expect that only solution to co-existence is to put an actual ARM or x86 core on a Mill chip. That may have once been true, but is no longer. Software translation is within a factor of two of native these days, and with a factor of ten to play with I doubt that we would ever put an alien native core on a Mill chip. Of course, a motivated customer could change my mind 🙂
- in reply to: Granting, Revoking, and Collusion #1847
It is not possible to unilaterally grant a global right; that would permit runaway grants as a DOS attack. Instead the grant is made locally (as a transient grant as part of a portal call), with an attached right that lets the grantee persist the grant for itself.
This (and much more about granting) will be in one of our next two talks; not sure which one at this point.
- in reply to: Virtual Mills and cloud servers #1844
We haven’t put much thought into virtual machine on the Mill, in part because it would appear relatively easy. The Mill has no reserved ops and no supervisor state; all protection is via memory accessibility. Consequently it would seem that VMs arise naturally by replicating the All address space, which is easily done by tacking an ASID on the caches and some specRegs.
Of course, it’s the things that seem simple that are often the most trouble, and I would not be surprised to find that Mill VM is not that easy. But right now we are focused on getting plain vanilla cores to work; VM is on the roadmap, but not an immediate concern.
- in reply to: C semantics and the Mill #1755
We have thought a bit about how some Mill facilities could be reached using language extensions. We will of course supply an intrinsics library for all the Mill opset, but intrinsics are a poor medium for coding.
The Mill ability to return multiple results can be expressed in C++ using tuples, which may in time be adopted by C in some form. However, I personally find the C++ tuple syntax singularly unsatisfying. An alternative is to admit pass-by-copy for arguments, as exists in Ada and other languages in which function arguments may be marked as in, out, or inout. It would not be difficult for a compiler to implement out and inout arguments using multi-return. The notation should be much more convenient to use than picking tuples apart, while the execution should be more performant (and much safer) than using pointer or reference arguments.
Will mentions the Mill overflow behaviors. These are most naturally brought into C as type-qualifiers: “
__saturating short a
” and the like. However, there would need to be promotion rules and extensions to the type lattice to deal with mixed arithmetic.There’s one thing you suggest that we cannot do, though. While the Mill operations are polymorphic at issue, they are not so at retire: the latency of (most) operations varies with the actual widths of the operands. Consequently it is not in general possible to write a function that is polymorphic over different types. Well, it can be done, sometimes, if you know exactly what you are doing – but the code will break on the next Mill member with a different timing matrix.
We do intend to extend our own C compiler (and others) to better support Millisms, and will put in proposals to the language committees for the extensions. Thereafter it’s up to the language mavens 🙂
- in reply to: Awesome IT recordings? #1749
Not recorded, but a repeat in the US will be.
- in reply to: Prediction #1855
It can’t – but then it doesn’t have to. If the code is RIM then the app must be expected to run forever, so there is no reason to retain state for the next time it runs.
- in reply to: Co-Exist and Merge: Not Supplant. #1852
Hardware only gets involved in physical memory allocation when realizing backless memory into one-line pages. No other system has one-line pages, nor backless memory, so any memory to be used by the alien core (or chip) must have been already allocated and of full alien size. Such pages are backed, not backless, and so the Mill backless support hardware would not be invoked.
Consequently, if an x86 core tried to access a Mill backless page the x86 would take a page trap, and the software would define a backing page and back the Mill caches with it rather than use the backless mechanism. When the page fault returned the two cores could share cache using the coherence mechanism and share DRAM using normal TLB entries in both. If a Mill core tried to access an x86 page then coherence would permit cache sharing up until a Mill load missed in the LLC or a store was evicted from the LLC. At that point the Mill would take a TLB miss, so the TLB entry would have to be marked as backed, by the right physical page. But that entry is either in DRAM or in coherent cache, so the only requirement is for an x86 allocation to clear an existing backless TLB entry from the Mill TLB.
Consequently I think the only problem in the two-cores-on-a-chip case is coherency. The two-chips-on-a-board problem is easy because the Mill does not extend address space off chip. DRAM allocation thus becomes yet another transaction between independent agents, and the wire protocol has to support that anyway.
I think 🙂
It was given yesterday, and post-production usually is a couple of weeks.
Essentially all of the LLVM talk, except very explicit LLVM issues, will be covered in our upcoming June 10 talk (see the “Events” list, which will be videoed and posted here as usual. Please be patient 🙂
It was recorded ny the LLVM folks, although I don’t know if they have made it available. It was not recorded by us.
Probably not.
Directly simulating a general-register machine requires a way to preserve updateable state that in the target would be in registers. The only updateable state on a Mill is memory, so performance would be abysmal. Then there would be problems providing the x86 memory semantics, which are weaker than a Mill.
But more to the point: binary translation has gotten pretty good these days, so there seems little reason to directly interpret any other chip’s native instruction set. We expect to include a (verrry slow) interpreter for use with device ROMs that contain x86 code when the device is needed by the BIOS. Or maybe we can avoid the problem some other way; hard to tell until we get further along.
Hardware div units are expensive and don’t pipeline well. On a microcoded OOO box that doesn’t matter much, but the Mill doesn’t microcode and is in-order. Other machines also use macrocode after a seed value, so we knew it was possible and would fit with the rest of the machine regardless of the algorithm, so we left the details to arm-waving until later. The emulation codes are like that in general: obviously software emulation of e.g. floating-point is possible, but it’s a rat’s nest that needs a specialist, so the details were deferred until we had a specialist.
We now have those specialists (Terje and Allen, and now Larry), so the arm-waving is being turned into code. Right now the bottleneck is genAsm. Speaking of which, I better get back to work 🙂
- in reply to: C semantics and the Mill #1757
NYF; sorry 🙁
You have permission, and can conduct your business here or on the Wiki as seems best to you.
In addition, you may propose novel helper ops beyond rdevu if you find the need, but please send the proposal for such an op to me directly first for vetting before putting it in the public space.
- AuthorPosts