Page 15 – Mill Computing, Inc

Forum Replies Created

Viewing 15 posts - 211 through 225 (of 674 total)

Previous 1 … 14 15 16 … 45 Next

Author
Posts
Ivan Godard
Keymaster
January 22, 2018 at 2:08 pm
Post count: 689
in reply to: A random smattering of questions. #3187
It will be one of the inputs; which one is implementation dependent.
OS (and RTS) policy. For example, an exception handler within the thread being unwound might dispatch out. There are some bottom turtle issues in thread death, just as there are for permission revoke, and we are not sure we have everything covered; there may be additional helper operations added to the ISA in the future as we (and the OS implementors) gain more experience.
Of course, FSVO poor.
As compared to a 5 cycle call operation say, very little. As compared to a call without without the spiller, quite a bit, but hard to measure in those units. The spiller has internal SRAM for skid buffering, probably about as much as a top level cache – how many ALUs is a cache? The rest of the spiller and the belt are roughly the same as the bypasses on a conventional. Essentially all the call cost is the spiller.
Hardware latency is specified individually on a per-FU basis in our configuration tools. Not only can latency vary across family members, they can also vary per slot within a single member, so a chip could have a fast (expensive) multiplier and also a slow (cheap) one.
There are no sensitive operations. All control is via the address space and the protection mechanism.
Not yet announced because NYF.
Ivan Godard
Keymaster
January 2, 2018 at 11:39 am
Post count: 689
in reply to: 2017/2018 #3133
Pretty good progress across the board, given the resource constraints. Things crash all the time of course, but that’s what happens at our stage of development. Some details
* post-linking implemented and is now a routine part of tool flow
* specifications made constExpr, roughly doubling the speed of the tools that dynamically bind a spec.
* hunks of the central parts of the C runtime ported
* some work on the OS port but not much working yet (major 2018 target)
* test suite significantly expanded. Some of the tests even work!
* three public talks
* a handful of new patents
* compiled code quality improved after work throughout the tool chain
* progress on hardware generators; starting to get feedback from hardware implementation into architecture/tools
* some new people joined and ramping
Ivan Godard
Keymaster
December 22, 2017 at 5:22 pm
Post count: 689
in reply to: Scratchpad design decision #3131
The spiller holds ongoing program state as of a stall, mispreduct, call or return event. That state includes much more than just the then-current belt. Belt values, in their latches, are moved lazily into the spiller and its internal SRAM and eventually migrate to the spillets in DRAM. However, these are relatively simple to handle. The most difficult part of the spiller deals with in-flights, which do not yet exist at event time, but will be produced eventually and must then be captured for subsequent replay. That requires temporal ordering information that is not an address, but may be thought of as a stream or pipe.
So there is a part of the spiller that does indeed hold full operands (possibly compressed at hardware option), but this is not addressable in the sense that DRAM or scratchpad is. Instead the operands (not necessarily contiguous) are organized for ordered replay. As the “address” changes continuously during replay and the operands will have random and varying other state intermixed, it does not seem practical to try to use spiller hardware for the functionality that is the present scratchpad.
Ivan Godard
Keymaster
December 22, 2017 at 1:57 am
Post count: 689
in reply to: Scratchpad design decision #3120
The number of entries is a member config decision; ten would be small.
Storing by entry number would require a mapping from entry number to position, or (trivial mapping) with all entries being maximal size. We pack them (saving space/increasing capacity) and reference by start byte number. The byte number needs more bits to encode than an entry number would, but the scratch ops are otherwise small and currently we just burn the entropy. The belt uses full sized entries and doesn’t try to pack because the actual belt is a set of latches/regs that must be full width anyway.
The choice for scratch implementation is left to the hardware guys, and might be different in different members due to hardware/cost/power considerations.
Ivan Godard
Keymaster
January 22, 2018 at 2:21 pm
Post count: 689
in reply to: Meltdown and Spectre #3189
Our impression is that the whole thing makes our heads hurt.
Ivan Godard
Keymaster
January 22, 2018 at 2:16 pm
Post count: 689
in reply to: Meltdown and Spectre #3188
Actually there are problems with the predicated forms of load: with them you can hoist over a branch, as shown in the paper, but you can’t hoist over the computation of the predicate. Spectre has prompted much internal discussion about an alternative that doesn’t require the predicate until retire (easy) and is still safe from Spectre-like attacks (real hard). NYF for now.
Ivan Godard
Keymaster
January 15, 2018 at 3:18 am
Post count: 689
in reply to: Meltdown and Spectre #3172
You are right: we expect distros to be in genAsm and to be specialized to the actual target at install time. The chip will ship with a specializer in ROM with the BIOS. Nothing in the hardware stops the user from writing his own specializer, for whatever benefit or bugs that brings. For that matter, nothing in the hardware stops the user from writing his own genAsm. Subjects such as safe languages are matters of policy, above the architecture, and must be enforced by software. The Mill with its clean protection model offers an excellent platform for such things, but we as a hardware company do not expect to provide them ourselves.
Ivan Godard
Keymaster
January 12, 2018 at 8:07 am
Post count: 689
in reply to: Meltdown and Spectre #3168
We will publish a white paper with details on Monday 1/15/2018.
Ivan Godard
Keymaster
January 12, 2018 at 8:04 am
Post count: 689
in reply to: Meltdown and Spectre #3167
The Burroughs main frames (the A series – the first compiler I ever write was for the B6500) used this approach – the OS would only load code produced and signed by a known compiler. In the controlled environment typical of mainframes this worked, but in the free-for-all of PCs it would be too restrictive.
Ivan Godard
Keymaster
January 3, 2018 at 2:33 pm
Post count: 689
in reply to: Inter-process Communication #3141
A portal causes turf switch to a turf id contained in the portal structure. There are barriers to the vulnerability you suggest.
If the attacker gave the victim a code pointer that falsely purports to be a portal and the victim called it then the victim would still be in his original turf, executing the code referenced by the passed pointer. However, the victim must have execute rights for any code, so the substitute code must be executable by the victim’s turf; it can’t be attacker code because the victim does not have execute rights to attacker code. And the attacker cannot blindly give such rights to the victim; there is a check so that a suspicious victim must accept a proposed grant before it takes effect.
Thus the target address must thus be a valid entry point in the victims own code. Of course, getting the victim to call one of his own functions when he didn’t intend to is problematic too. There is a check, a bit more general than you suggest, that an untrusting program can use for this. It returns, for a given address, what permissions the caller has at that address. That check is necessary in a number of ways, but seems inelegant and we have been exploring alternatives, but with nothing entirely satisfactory yet.
Second, the portal structure itself is set up by trusted code, which always sets the associated turf to that of the thread creating the portal. That is, you can create portals into yourself, but not into anyone else.
Ivan Godard
Keymaster
January 3, 2018 at 6:05 am
Post count: 689
in reply to: Security on SSOOO processors is HARD #3138
Any core including Mill is susceptible to Rowhammer because RH hits a fault in the DRAM, not in the core. The question is what exploits if any can be achieved by RH style memory mangling. A rights escalation would require causing a user to load a different (more encompassing) turf while staying in the same code and data. The only point at which the turf changes is at entry to or exit from a portal call. On entry the new turf is in the portal structure, which is not writeable by either the creating or entering turf. On exit the resuming turf is in the spiller stack, which is also not writeable by either the exiting or exited-to turf. So to escalate the turf you have to have write access to where the turf is kept, and to get that access you have to have write access to where the keeper is kept, and so on. It’s not clear that there is a bottom turtle to that regression.
There is some code in the micro-kernel that has the rights to modify these saved turf ids and does so: the code that initializes a portal, and the code that does exception handling and thread teardown. So there may be an attack vector there if that code can be given a bogus value from DRAM that it then uses to overwrite a turf. But I don’t think I could do it even if I had the source code and the ability to arbitrarily change a value fetched from DRAM.
An alternative approach would be to try to mess with the PLB tables without changing the turf. Changing the entry as it is created would face the same bottom turtle problem as trying to change the running turf id. However, changing the address range when an entry is loaded to the PLB might be more possible. It is not clear how an attacker would learn where the entry is located; the table is dynamically allocated, he doesn’t have access to the register that holds the base address, and getting access to that register is the bottom turtle again. Still, if the attacker can flush the caches and the PLB, then probe to a valid location not in a WKR, then the line address containing the table entry will be among the next few addresses presented to the pins. But if you have pin-level access to the system you don’t need Rowhammer to change the DRAM values read.
So I won’t say that a Rowhammer crack of the Mill is impossible, but it does seem that it will be as hard as a pin crack, and those are blockable by encrypting pin traffic to DRAM. Fair warning though: this stuff is hard, and I may well be overlooking something.
Ivan Godard
Keymaster
December 22, 2017 at 4:19 pm
Post count: 689
in reply to: Scratchpad design decision #3129
I’ll try again. The belt uses temporal addressing; the scratchpad uses spatial addressing. There are two addresses involved in a spill/fill: the belt address of the value to be spilled somewhere, and the “somewhere” address needed to choose what value to fill. The present Mill uses temporal for the first, and like any reference the spill must execute before its target drops off the belt. If scratch were part of the spiller then fill would need a (arbitrarily large) address to look into the spiller history to find the value.
You can’t use temporal addressing for long- or indefinite-lived values because the temporal address range is unbounded. Hardware doesn’t do unbounded. With spatial addressing the address range is bounded by the configured size of the scratchpad. Hardware does that, although the tool chain must deal with running out of the bounds.
Perhaps you are thinking of a scheme whereby the spill op would push the value into a side stack and the fill would use a stack offset rather than a temporal reference to address it. That’s possible, but the stack management hardware is more than is needed for a simple regfile-like array of values. And, returning to the first question, one would need either maximal sized entries, or a map from entry number to packed byte offset, or make the stack byte addressable.
I’m not saying that one couldn’t put the scratchpad in the belt space so that scratch entries could sit in the same latches as belt operands. But the addressing logic to get such a scratch entry back into the space where adds and such could address it is too expensive because it would push up the size of the crossbar. So we keep the addresses space separate.
Ivan Godard
Keymaster
December 22, 2017 at 1:42 pm
Post count: 689
in reply to: Scratchpad design decision #3126
It’s a lifetime issue. The scratchpad is not a simple extension of the belt, it’s a repository for values with long or indeterminate lifetimes. The spill op copies a value from the belt to scratch, and that same value may move into the spiller if there were a call while it’s still live. But a value computed before a loop and used after the loop (and maybe in the loop) has an unknown lifetime, so we need to save it for the duration of the loop. Mill execution makes values with great abandon, and we can’t save them all as if there were an infinite belt. So we need a way for the compiler to tell the hardware that a particular value is of continuing interest, and be able the request it again later. That’s the spill and fill ops.
In contrast the spiller saves everything in-flight and on the belt, but that’s a tiny population compared to everything that has ever been on the belt, which it the potential population for the scratchpad. Different expected lifetimes, different reference patterns, different latency, complexity, and power constraints -> different mechanisms.
Ivan Godard
Keymaster
December 22, 2017 at 10:42 am
Post count: 689
in reply to: Scratchpad design decision #3124
We can’t leave scratchpad-usage data in the spiller because the data is both spatially and temporally random access, while the spiller is at heart just a glorified stack. Items can be left in the scratchpad for arbitrarily long times without increasing the latency of later access, whereas items in the spiller eventually migrate to memory and get memory latency.
Instead we want the scratchpad to have uniform latency and simple random access, without the expensive mux crossbar needed for spiller access even to limited depth. So really scratch acts, and is mostly implemented like, register files in conventional memory. The differences include the inclusion of metadata, the self-defining data widths, and the packing at byte granularity.
Ivan Godard
Keymaster
December 12, 2017 at 11:39 am
Post count: 689
in reply to: Support for multidimensional arrays #3117
Indeed you can overlap them, but there’s no need for any special hardware; normal instruction scheduling in the specializer will minimize the overall latency.
Author
Posts

Viewing 15 posts - 211 through 225 (of 674 total)

Previous 1 … 14 15 16 … 45 Next

Ivan Godard

Forum Replies Created