Forum Replies Created
- AuthorPosts
- in reply to: Specializer bootstrap #1721
Vendors and OEMs will no doubt use all the strategies you suggest, and others no one has though of. One we have considered for our own products is to use Multi-chip Module technology to put a ROM in the same package with the CPU, containing a pre-specialized in-memory specializer for all the (possibly different) cores in the package. The ROM would also contain the emulation code for operations not present in the hardware, and minimal initial code for things like trap and fault handlers and the power-up sequences, to give the BIOS a canonical platform to work on regardless of which family members are present.
The Mill approach to binary compatibility is not a panacea; we are still sensitive to the genForm distribution format, which must be stable and forward-compatible across the family, just a binary format must be stable and forward-compatible across other families such as the IBM mainframes. GenForm is not yet stable even for us; we are actively at work on it, but it will be done long before first product ships.
- in reply to: Individual patent licensing? #1687
We’d certainly be interested, but it seems unlikely. No one would take an encoding license unless they were doing a new binary-incompatible ISA anyway, and those don’t happen that often. Most companies risk-adverse too, so they would wait and see if it actually works in silicon. And unfortunately some of those who might want to use our stuff don’t bother to get a license.
Like I said, we haven’t done anything about it yet. As for the correct values, use the C/C++ standard for everything. For divide by zero, return NaR(undefinedValueNaR).
Things are tricky when the domain is signed and either or both arguments are negative; see https://en.wikipedia.org/wiki/Modulo_operation. Currently we do not have a mode argument in the spec for these ops, but we probably should; the natural candidate is the roundingCode attribute that FP ops use. The integral result would then be the value produced by an infinitely precise FP divide operation followed by the FP integerize operation, plus the Euclidean rule for the remainder. As far as I can see the available rounding codes provide all the sign rules demanded by the various languages listed in the Wikipedia article, plus some (nearest-ties-to-even for example) that don’t seem to be used.
I suggest that you see how far you get in unsigned, and when that gets through the specializer (you’ll write in genAsm) then we can revisit signed and decide whether to use roundingCode, a new attribute, or roundingCode with some enumerates faulting as undefined.
It’s OK to post your code here, although I’d suggest starting anew topic. Other readers/posters may chime in on your efforts; please be polite. When you have something that you believe works then either we will have published access to the tool chain for all or (if that’s still not done) we can NDA and let you use our servers.
The local space of a forked child is topologically similar to the local space of the parent, just at different addresses. The fork() function must allocate the new child local space such that none of that topology collides with existing global allocation, including those arising from the global equivalent of local allocations. It is not necessary that the new child have a topology-sized space all to itself, but its local must fit in the unallocated holes in global space.
Frankly, nobody has written the divide emulation code yet, so we have no numbers; it’s in the pipeline, but the emulation team has mostly been worrying about FP. If you’d like to help then feel free to post candidate code for div, rem, and divRem for the various widths. Remember that either or both inputs can be NaR/None. I do not expect there to be a loop; simply unroll the whole thing.
That said, the emulation substitution occurs at the start of the specializer where we are still working with dataflows, so if there is any code that is part of a different flow than the divide then the code will overlap.
32+(3*5) != 57 🙂
The con() op uses the same flow-side bit-bunching that is used by the polyadic ops like call() and conform(). The extra slots encode the flowArg() pseudo-op, so you can consider it a flow-side gang.
The bit-bunching happens for all the flow side ops, starting in D0; all the extension and manifest bits are moved to a long array with 47 (Gold) bits per slot. The move is controlled by two two-bit fields at fixed position in every op in the slot, so we can do the move without parsing the op or even knowing how many real ops there are. In D1 we know how many 47-bit elements are associated with each op, and for con we chop those out and ship them to the belt crossbar in time for the con to drop at the end of D2. The “<" is in X0 a.k.a D3, so it can use the literal. Ain't phasing lovely!
Quad constants are supported on a quad member, but the example wasn’t intended to imply any particular length.
You happened to catch work in progress. The Wiki shows a periodically-updated snapshot of the actual specs for the members, and so may be internally inconsistent with reality at times as various things are being worked on. We still have no plans to have hardware div, rem, or divRem operations, but they are part of the abstract Mill and so the tool chain and sim must support them for eventual family members. Of course, to test the software we need a member spec that does define these ops, and Silver seemed to have gotten elected. I expect it will go away again.
- in reply to: Simulation #1702
In theory normal HTML tags should work for editing.
Any stack-model language would not have much semantic kinship with genAsm, which is dataflow.
- in reply to: Simulation #1698
We talked a bit at our Tuesday meeting about how best to support such independent efforts and projects. We concluded that the Forum here was the right venue for discussions; the Wiki is more for canon, official documentation of the Mill and our products, but would be appropriate for such an effort when it moved from being a discussion subject to an actual facility with users of its own. For the source code, when an effort got that far, github seemed right; putting it on our SCS ran risks of mutual IP pollution.
As for tooling and its ETA – I wish I knew, but our resource base is still so small that real scheduling is impossible. We did decide that we would release, and support, alpha-level tooling on a “use at your own risk” basis. A minority felt that we should wait for commercial-grade for fear that alpha-grade would forever paint us with “broken and unusable”, but the majority felt that we should not try to hide our rough-shod nature.
Please be patient.
- in reply to: Simulation #1696
Here’s the agenda item for tonight; I’ll post a summary of discussion:
++++++++++++++++++++++++++++++++++++
There has been discussion on the forum about how to implement functional languages on a Mill. Recently a poster put in a long writeup for a new language with features especially tailored for a Mill (http://millcomputing.com/topic/simulation/#post-1688). This is an example of a general category: independent non-product volunteer efforts that are Mill-related in some way. There will with time be a lot of these as an ecosystem develops around us.The question: how should we support such efforts?
+++++++++++++++++++++++++++++++++++++ - in reply to: Simulation #1695
Sorry I wasn’t clear. I was not complaining at all, just looking ahead, assuming wild success of your project, and wondering how best we could support such Mill-related but non-product projects. I agree with your choice that here in the forum seems to be a good starting place, but the sequential-post style of the forum doesn’t seem best for a shared development effort for such projects if they take off. And we wouldn’t move it into our company product development environment, because we won’t be inventing new languages as products. So, assuming success, where should those working on a volunteer and unofficial but Mill related project do their work and communicate?
Now that I’ve thought about it for a while, the Wiki doesn’t seem quite right either – suitable for canon documentation perhaps, if the project reaches public release, but not really a good interchange medium for those working on such projects.
The source code (assuming it got that far) would be a natural fit for github. I’ll bring the question of the rest up at tonight’s general meeting.
Of course, all this is moot until such projects actually happen. But from your posting here, and various questions and comments from academia I’ve heard, it looks like they will happen sooner or later. Better to have some support plans at least thought about in advance.
- in reply to: Simulation #1693
Should this be continued here, or on the Wiki?
At the end of decode0 (D0) all the exu-side reader block has been decoded. This is important because we can start the scratchpad fill operations in D1, with the data available at the end of D2 (I0, X-1) for use by the adds in X0. Two cycles to read the scratchpad SRAM is plenty.
We also have the flow-side flow block decoded at the end of D0, but that doesn’t help because we need the flow-side extension and manifest blocks (which are really just data arrays) to make sense out of those decodes. The extension and manifest blocks are interleaved into the long (very long) constant in D1 and the selectors set up to extract the bit fields belonging to each op (based on the extCount/conCount fields in the already-decoded flow block) at the end of D1. The selectors extract the constants of any con ops and have them available at the end of D2, ready for the adds in X0 again.
The clock-critical part of all this is the priority-encodes of the extcount/conCount fields that set up the selectors. Priority encode is a linear parse, and with as many as eight flow slots that’s a lot to figure out. The priority encode and the actual selection can blur over the two cycles D1/D2, but this may constrain clock rate at high clock on very wide Mills, forcing us to add another decode cycle. A Gold at 4GHz is not for the faint of heart! It’s all heavily process dependent of course. Best guess is that the width vs clock constraints will pinch in the fastpath belt crossbar before they pinch in the con operation decode, but we don’t know.
- AuthorPosts