Forum Replies Created
- AuthorPosts
- in reply to: Specification and Floating Point Numbers #1293
I am not able to talk about the merits of UNUM, although I did post it to comp.arch to see what they made of it.
Mill specification is the other way around; all members support the same complete specification, and some Mills emulate some ops, widths and types.
So a new type would be added to the generic specification and on those Mills that didn’t support it in hardware, it would require emulation.
- in reply to: Software Pipelining (Talk: July 14 2014) #1194
It was filmed.
Post production is quite an involved process, but is nearly finished.
We’ll post the video soon on the Architecture forum with the other talks, and announce it on the mailing list.
Subscribe to the mailing list or keep checking back manually.
- in reply to: ensuring bounds of arrays #1164
We plan to have hardware-checked “bounded” pointers, yes 🙂
Bounded pointers are pointers with special flags in them so the hardware knows to extract bounds as well as offset from the pointer and enforce them.
However, as these bound pointers are still 64-bit, and as the address space is 64-bit, its not possible to encode every combination of bounds and offset into so few bits. Bounded pointers compromise precision for truly humongous arrays.
(Here is a link to an old post on the Mill bounded pointers in comp.arch. The Mill mechanism has been refined somewhat since that post, but it gives the gist.)
On the Mill, which is wider-issue than other architectures, there are likely enough free slots so a compiler can also pipeline exact bounds checks that will likely have no performance penalty in the common case.
- in reply to: Can we implement it first in FPGA? #1130
Yes we plan to implement the Mill first on an FPGA.
If the compiler generates a member-independent intermediate form, is it left to the specializer to identify and make use of optimizations that depend on functional unit population and ordering?
Yes, this is how responsibilities are split between compiler and specializer.
The compiler targets an abstract Mill – infinite belt, etc – and serializes its AST to file for distribution. Its not actually a single AST, its a forest of candidate control-flow graphs so the compiler can provide alternatives for the specializers to choose between.
The scheduling of operations and scratch and such is performed by the specializer, which knows the parameters of the target Mill.
Whilst non-trivial to think about, the specializer’s scheduling isn’t really comparable to the heavyweight optimisations that a compiler makes. The specializer is very very fast, as it can be used in JIT as well as AOT generation.
I dug up this excellent post by Ivan that covers this in more detail: http://millcomputing.com/topic/introduction-to-the-mill-cpu-programming-model-2/#post-889
- in reply to: Security of Self-Services & Inheritance #1105
This is a central use-case of how we envisage Mill services being used.
There is a subtlety that a (software) private key as a service illustrates:
The private key had to be loaded from somewhere, and so its important that the turf the attacker exploits cannot itself access the key directly. So how can you create a child service to protect some data if the way it gets its data is by you giving it that data? You have access to the data too?
So the kernel may need to be aware of ‘user-space’ services within a ‘process’ (I use quotes as these are OS concepts and an OS on the Mill could do things differently) in order that a turf can rescind access rights to things.
This is rather like the POSIX (and Symbian) ‘capability’ model (quotes because a lot of people think them not capability models at all 😉 ), where you can rescind rights e.g. as a TCP server rescinds rights to bind on low ports after its started its listener.
So rights become something tracked at a turf level rather than at a ‘process’ level, with a process consisting of a collection of turfs and threads flying in close formation? This leads to thoughts about how operating systems conventionally track processes (or tasks, in L4 terms). A process is a collection of threads that share a user space. However, when you perform a Mill portal call, you are moving between protection domains (Mill turfs) without changing threads…
If the OS only allows you to create private services within a process, a hierarchical structure is maintained and its an evolutionary step for the kernel – which is still the intermediary for IPC – to track turfs as well as threads for a process.
However, the exciting ideas that the Mill also allows for is IPC to be between peers without kernel intermediary. However, this challenges how an OS conventionally tracks processes and determines the current running process and does things like preemption accounting and even so far as what
top
andkill
show and do. So one thing the Mill may do for the state of the art is start a whole new chapter in OS research 😉And yes a turf can revoke your own protection regions. You can also imagine the OS being involved in turf creation so if you want to create a new child service, the OS can hand it unshared memory.
Sorry for the long rambling post. We’re happy to have this picked apart and discussed 🙂
If possible after considering NYF, I’d like to see a bit more detail about the FU and slot population on at least one member, probably Gold
I am careful not to announce things; I am not up to speed on the intricate details of filings (fortunately for me). But in the bigger scheme of things, given this next bit, perhaps tracking what Gold looks like today will only hint at tomorrow’s mix.
My hope is that info on this evolution, and motivation for the changes will IMHO give readers some insight about Mill performance and the sorts of trade-off that have been made — and can continue to be made — using MillComputing’s specification and simulation tools.
Yes, although what I really look forward to is you being able to run programs on our simulators for real. I hope we can get that going, hand in hand with public disclosure of the ISA details etc. It obviously must not be allowed to take time from the critical path such as producing hardware, but if we can somehow gain benefit e.g. tool chain testing and defect finding from it, there may be a good cost benefit balance to the exercise.
One thing I’d like to see in the Wiki is a description of the instructions and any constraints they impose
We are working on exactly this. This can be auto-generated from our specification system, and we have someone working towards documenting this. When it can be published on our new public wiki may be held back by NYF considerations, but we are always very keen to get information out sooner rather than later now we’ve started being able to talk about things at all.
Theoretically, the Mill could have a very few operations as its core ISA, with a larger subset emulated with series of the core operation.
Yes! Absolutely.
And the mix of FUs, latencies and dimensions on any particular core can be determined in simulation benchmarking for representative workloads long before HW gets built. We have to be data-driven, not hunch-driven.
(for the interested, Art described an earlier Gold’s pipes: http://millcomputing.com/topic/introduction-to-the-mill-cpu-programming-model-2/#post-610 )
- in reply to: Pipelining #1261
Is the latency order guaranteed to be the same across different mills, or does it not matter because this scheduling is done during the “specialization” phase anyway?
The latency of ops varies by member, and can vary by operand width too. Some ops may even be emulated on some members.
It doesn’t matter to the compiler because its the specializer which schedules ops.
- in reply to: Pipelining #1246
In practice, the
retire
op has variants with arguments to cope with this and other implementation details. - in reply to: Inlining functions vs. call/return #1205
Yes, we fully expect well weighed inlining. The specializer has all the necessary data and can inline aggressively. The Mill’s metadata allows speculation so we can even inline many conditional calls.
In many of the talks Ivan mentions in passing that we plan to do these kinds of optimization, but the references may be quite obscure.
- in reply to: Loop pipelining and aliasing #1177
The thing is, the Mill is immune to aliasing because all loads are deferred as you describe it.
Loads specify their retire time, and the stations are snooping on stores so they know when their fetched value is stale.
So the compiler never ever has to even consider aliasing!
- in reply to: Security of Self-Services & Inheritance #1108
This is a very interesting thing to think through 🙂
When a process thread makes a portal call into a library/service, the thread remains associated with the process for general accounting purposes.
What if a service maintains a work queue internally so it can do someone else’s work when you give it your high-priority slice? Perhaps it can be reasoned that this is a non-issue?
- in reply to: Instruction Encoding #1090
Yes, we can put immediate values with the ops that want them, on both sides.
- in reply to: Microkernels vs Monolithic #1089
We plan to run Linux initially as an L4 guest.
The L4 port is really to provide a base for testing; we anticipate Linux to be one of the very first other OS to also be ported and optimized too, but there is no reason to not port all other mainstream and customers desired OSes.
- AuthorPosts