Co-Exist and Merge: Not Supplant.

Author
Posts
Krinkleneck
Participant
June 21, 2015 at 12:58 am
Post count: 1
#1849 |
Maybe it’s because I haven’t lurked enough on these forums, but I can’t find anywhere any suggestions that this architecture could co-exist with x86, ARM, or any other platform. From what I have seen and read on this architecture there seems to be an all or none scenario.
My question is are there any plans to use this as a co-processor on a board that is already using a more mainstream processor such as x86/64, ARM, or, less likely but still farting around, IA64?
We already do this with graphics architectures, and up until recently we’ve put up with extraneous calculations being pushed off onto its own chip for engineering and calculation intense programs.
My main area of study is in mobile architecture and maintainability with older architectures. I would love to see a standalone version of this architecture. But, outside of programs that maintain their code with standards that a compiler can be written around, it’s going to be difficult to get any traction without allowing continued usage of older architecture. Even if it fits within standard programming for the programs set language… there will be bugs that will propagate through.
x86 alone is about… 30+ years of usable code. Since we’ve moved to x86_64 we’ve lost countless troves of usable programs because of compatibility issues, hardware bugs, and programmer irresponsibility. Are we going to repeat this problem with the Mill Architecture? Intel, while weakly, has already begun to move into ARM Space more effectively than the reversal. I feel this is something that should be thought of from the get go. Is the plan to coexist and absorb, or are we just going to try and supplant?
Can someone give me a bit more insight into what is going on here?
Ivan Godard
Keymaster
June 21, 2015 at 2:07 am
Post count: 689
#1850
Co-exist? It’s not clear one would want to 🙂 Binary translation is pretty good these days, and the Mill makes a pretty good ISA interpreter. We expect to ship an x86 translator/interpreter with the Mill. We need it because some I/O devices we will want to support come with on-board x86 code.
Code aside, there’s a lot of levels of “co-existence” between any pair of architectures. For example, what do you do if the cores have different endianness and share address space?
Specific to the Mill, endianness aside the Mill is reasonably data-compatible with any modern architecture, although the reverse is not necessarily true. Pain points might include the Mill support for IEEE decimal floating point (unless the other core was an IBM z-series), and the Mill’s support for quad (16 byte) scalar in both integer and floating point. However, if the app doesn’t use those types then alien cores should work OK when exchanging data with a Mill. Going the other way, the Mill won’t support 36-, 48-, and 80-bit data types as seen in various antiquated architectures, at least without software conversion.
If there is any problem, it will be in the memory hierarchy and the protection model when a Mill and another core are on the same chip and share address space. The Mill separates protection from paging and protects at byte granularity. Consequently it would not be possible for a Mill and an ARM (say) to safely share a 17-byte buffer, while two Mills can do so. The Mill also supports backless memory that has no allocated pages, which is impossible on a core where paging is in front of the caches. Consequently any memory shared between Mill and non-Mill would have to have real DRAM allocated for it. All these issues can be handled in software, but would have to be addressed.
Cache coherency issues cannot be handled in software for performance reasons, and the Mill coherency protocol is vastly simpler than and incompatible with the usual MOESI. Mill coherence is NYF, but I think hardware could force the Mill to use MOESI, although the performance hit would be painful; we’d be as slow as an x86.
A similarly painful hardware problem might occur with concurrency control. The Mill uses optimistic concurrency control, and should have no problem working with cores that also do so: PowerPC, M68k, z-series. However, cores that use bus locking for concurrency might have trouble. At a guess, you’d probably have assertion of a bus lock cause a bust of any in-flight Mill transaction. Going the other way, I suppose you could have the bus locked for the duration of an active transaction. Both of these would have a lot of spurious interference; it that was enough of a problem then you’d need more hardware smarts to do the integration.
Now your question addressed several chips on a board whereas my reply mostly addresses several cores on a chip. At the board level the only problem is data compatibility, which endianness aside should not be a problem. The Mill doesn’t share address space off chip, so all the hierarchy and protection issues are obviated.
Reading between the lines of your posting, it seems as if you expect that only solution to co-existence is to put an actual ARM or x86 core on a Mill chip. That may have once been true, but is no longer. Software translation is within a factor of two of native these days, and with a factor of ten to play with I doubt that we would ever put an alien native core on a Mill chip. Of course, a motivated customer could change my mind 🙂
- Paul A. Clayton
  Participant
  February 17, 2016 at 10:50 am
  Post count: 2
  #2109
  If there is any problem, it will be in the memory hierarchy and the protection model when a Mill and another core are on the same chip and share address space. The Mill separates protection from paging and protects at byte granularity. Consequently it would not be possible for a Mill and an ARM (say) to safely share a 17-byte buffer, while two Mills can do so. The Mill also supports backless memory that has no allocated pages, which is impossible on a core where paging is in front of the caches. Consequently any memory shared between Mill and non-Mill would have to have real DRAM allocated for it. All these issues can be handled in software, but would have to be addressed.
  While not an ideal solution, an approximation of fine-grained protection could be implemented on the Mill side by only communicating data that is readable by the alien architecture and ignoring/filtering out writes. Given that the alien architecture would not communicate what protection domain (“turf”) is accessing the memory, such protection would have to be system-wide.
  Likewise backless memory is not a problem because the alien architecture’s memory system could be given shadow page addresses which it would treat as physical addresses while the Mill component (and memory controllers) would treat as virtual addresses. (“All problems in computer science can be solved by another level of indirection.”)
  A similarly painful hardware problem might occur with concurrency control. The Mill uses optimistic concurrency control, and should have no problem working with cores that also do so: PowerPC, M68k, z-series. However, cores that use bus locking for concurrency might have trouble. At a guess, you’d probably have assertion of a bus lock cause a bust of any in-flight Mill transaction. Going the other way, I suppose you could have the bus locked for the duration of an active transaction. Both of these would have a lot of spurious interference; it that was enough of a problem then you’d need more hardware smarts to do the integration.
  Implementing a conceptual bus lock as an actual bus lock seems quite suboptimal (substantially limiting parallelism). (“As if” is an important concept.)
  For the Mill, I suspect that alien ISAs (other than different Mill specializations) on the same chip would be limited more specialized processing or extremely low power, performance, and area. In neither case is an SMP communication model likely to be used.
  - Ivan Godard
    Keymaster
    February 17, 2016 at 12:37 pm
    Post count: 689
    #2110
    There are a great many degrees of design freedom in configuring inter-core and inter-CPU relations; each has advantages, and drawbacks, which may be more or less important depending on the purpose. We already support one you mention, where the alien gets a system-wide window (we follow legacy practice and call it an “aperture”). Each component, active or passive, that has a notion of address space gets an aperture which is a hardware remapping window into the global shared address space. These apertures are mostly used for passive components such as RAMs and ROMs, all of which believe that they address from zero and must be placed in the global space by the aperture. However apertures are also meaningful for active components, possibly including alien CPUs.
    The drawback to apertures is that, being hardware, each component interface only gets one of them, and while they have byte granularity, the space they describe must be contiguous. In contrast the turf system, backed in hardware by the PLB, permits arbitrary numbers of possibly overlapping regions. Of course, the alien device could be given its own turf and could attach in front of the PLB. Then the device would appear to be just another thread to the rest of the system, and cache sharing would fall out naturally. Lots of possibilities 🙂
    I agree that putting a full-blown alien CPU on a Mill chip is unlikely to make much sense; a reasonable Mill should be able to emulate the alien as fast as the alien can run natively. Still, if a paying customer wanted it …
PeterH
Participant
June 21, 2015 at 5:35 pm
Post count: 41
#1851
Putting a Mill and an x86 compatible core on the same chip doesn’t make much sense to me. Putting the processors in separate chips together on a motherboard might make sense, though code translation removes much of the need for that.
One issue putting 2 CPU chips on a board, even of the same kind, would have to deal with is coordination of physical memory allocation. My impression is that the mill hardware automates a great deal of physical memory allocation, though provisions would have to be made for telling the hardware what address space is available for the purpose, and addressing specific hardware addresses.
- Ivan Godard
  Keymaster
  June 21, 2015 at 6:31 pm
  Post count: 689
  #1852
  Hardware only gets involved in physical memory allocation when realizing backless memory into one-line pages. No other system has one-line pages, nor backless memory, so any memory to be used by the alien core (or chip) must have been already allocated and of full alien size. Such pages are backed, not backless, and so the Mill backless support hardware would not be invoked.
  Consequently, if an x86 core tried to access a Mill backless page the x86 would take a page trap, and the software would define a backing page and back the Mill caches with it rather than use the backless mechanism. When the page fault returned the two cores could share cache using the coherence mechanism and share DRAM using normal TLB entries in both. If a Mill core tried to access an x86 page then coherence would permit cache sharing up until a Mill load missed in the LLC or a store was evicted from the LLC. At that point the Mill would take a TLB miss, so the TLB entry would have to be marked as backed, by the right physical page. But that entry is either in DRAM or in coherent cache, so the only requirement is for an x86 allocation to clear an existing backless TLB entry from the Mill TLB.
  Consequently I think the only problem in the two-cores-on-a-chip case is coherency. The two-chips-on-a-board problem is easy because the Mill does not extend address space off chip. DRAM allocation thus becomes yet another transaction between independent agents, and the wire protocol has to support that anyway.
  I think 🙂
Author
Posts

You must be logged in to reply to this topic.