Mill Computing, Inc

Forum Replies Created

Viewing 7 posts - 1 through 7 (of 7 total)

Author
Posts
igodard
Moderator
January 2, 2014 at 6:43 pm
Post count: 9
in reply to: Specialized Address Operations #314
The three event bits in a pointer could be used as tags if the GC isn’t using them for garbage collection support. Mill pointers are always 64 bits. If is made enough of a difference I suppose we could be talked into one more bit
It would be possible to add a few metadata bits, or steal the floating-point metadata bits, for use as tags. However, the tags would be lost when the pointers are stored to memory, because memory has no metadata. Any ideas on what might be done about that?
igodard
Moderator
January 1, 2014 at 8:05 pm
Post count: 9
in reply to: How can I get involved? #309
The short answer: yes, you can get involved, although perhaps not as quickly as you (or OOTBC) would like.
There are three models that we have considered and believe could work.
In the first, we open-source the Mill software, and stand back.
In the second, we publish a tool set comprising asm, sim, config, and a tool chain (when one becomes usable), and let people configure and write code for Mill targets but not change the tools themselves.
In the third, we invite specific people to join the team, under NDA and with an explicit sweat-equity contract, to work on the the Mill software or hardware directly.
Of these three, the first and second require that we finish the patent filings first; not so many filings for the second approach as for the first though.
The third approach requires much more of our very scarce resources to ramp contributors; not only is the software large and non-trivial and the documentation mostly absent, but the machine itself is still a morphing moving target. Realistically we would need to get at least half time from a contributor to be worth the overhead.
The first and third approach would require serious C++-fu (not C run through a C++ compiler).
There are also a ton of smaller, or at least more easily isolated, things that need help. For example, the videos are full of topics that need expansion into white papers. If you have skill in explaining tech clearly then we’d really like to talk with you; sweat equity, on accepted delivery, all the help you want
We also need help outside the tech arena. For example, several people have suggested a kickstarter campaign, but it’s hard to figure out a rewards ladder that works for something as long-term and capital-intensive as semi. Start a topic, and see if you and others can come up with what you consider to be a viable ladder. For extra credit, write a script for the Kickstarter video.
Ivan
igodard
Moderator
January 1, 2014 at 7:30 pm
Post count: 9
in reply to: Program Load vs. Program Install #308
You guys got it right. The tool that produces binary is called the Specializer. It takes in serialized compiler internal representation and produces load modules or in-memory function bodies. The basic tool is an API, plus some wrappers to apply in different circumstances.
We expect that the normal case will Specialize at install time. However, the load module can cache several different specializations. If you (for example) upgrade your CPU chip to a different Mill family member then when you first run the program the loader will discover that there’s no specialization for the current host in the load module, and will do a load-time specialization on the fly; it’s much like load-time dynamic linking. The new specialization will then (assuming suitable permissions) will be cached back into the load module, so the next time the program is run the loader finds the desired specialization.
It is also possible to Specialize for an explicit target rather than for the current host. This is used to e.g. create a ROM for a different machine.
In general we do not expect to re-Specialize automatically based on the accumulated branch prediction information or other profile info; that would be an explicit manual step, or be under control of a higher-level framework such as an IDE. It’s not clear that respecializing would buy that much; code selection has already been done by the time that the Specializer gets at it, and operation scheduling (with a few exceptions, such as latency distribution in cascaded loads) does not appear to benefit much from profiles. However, the Specializer is also responsible for the layout of code in memory, so a profile could lead to improved cache behavior.
igodard
Moderator
January 1, 2014 at 7:08 pm
Post count: 9
in reply to: Metadata #307
Note: Easier to reply if you put the questons in separate postings
Metadata bit count: implementation dependent. While stealing NaR bits (one per byte) for other use in wider data (such as the FP flags) would save bits, it would complicate the hardware logic; generally bits are cheap.
Four-case pick: None or NaR selectors simply pass through, using the data width rather than the selector width. Implementation defined whether the None/NaR payload passes through or a new payload is created – the Belt crossbar is clock-critical, and pick must be very fast.
Vector-wise pick: passing the data through directly to the consumer is straightforward; just muxing. Creating the additional operand on the belt is more complicated, but Not Filed Yet.
Mixed-width arguments: each operation has a set of width signatures specified for what it will accept (see the next talk, Specification). If it gets something it doesn’t want then it used to produce a NaR, but we changed it a year ago so it faults immediately; wrong width indicates a compiler bug, not a data problem like overflow.
Excess widen/narrow: same as width-error above.
“only one opcode”: “only one opcode per signature” is more correct, but that might take us too far afield in a talk. For widening there are two ops, one for all scalar widths and one for all vector widths. For narrowing there is only one op but with two signatures, one with one argument (scalar) and one with two (vector). The same is true for other widening operations, such as add with the “widen” overflow attribute.
“broadcast” operation: there isn’t one. There is a “splat” operation that replicates a scalar into a vector, but that’s not the same. Just do “add” and it’s happy with either vector or scalar data, whichever it gets. There’s only one adder; the width just sets breaks in the carry tree.
“smearx”: this does have two results, which impacts the operation latency. The second port exists anyway; the Mill has a lot of two-result operations (such as vector widen). The talk doesn’t address latency because it couldn’t cover pipelining (May? maybe), but a loop written as shown, without pipelining and phasing, would need a no-op after the smearx. With phasing and pipelining the whole loop is only one cycle anyway, but that’s two more talks worth of explanation
“pickx”: not advantageous. It would add another mux to the belt crossbar, slowing the clock rate. In addition, while not used in the example, the bool vector is useful in other loops for more than pick.
rotating smear: Both the smear and the pick0 should be easy, and I don’t think that the pick0 would have the clock cost that pickx does. However, the loop control bool is at the wrong end of the bool vector from where smeari puts it, so there would need to be an extract operation to get it out to a scalar before it can be branched on, so that’s the same second cycle and belt position that the present definition of smearx uses, so no gain.
aligned loads: we have looked at quite a few possibilities in this area, with a goal of getting good performance on funnel-shifting a data stream. It’s quite hard to do that without branches, and we are not entirely happy with the present approach (Not Filed Yet)
clock rate: While for business and risk-reduction reasons the initial Mills will have a low clock, there is nothing in the architecture that precludes getting the same clock rate as any other chip.
mispredict penalty: While you are right that a high-end Mill mispredict stall can lead to loss of as many operation issues as a superscalar with a longer recovery would have, on the lower-end Mill family members (with peak issue widths no greater than conventional machines) the Mill advantage is real. We don’t claim that the Mill is always better at everything; we claim that it is always no worse and very frequently much better
An aside: I’m impressed with how far you have taken the overview given in the talk. The Mill boggles quite a few people I hope you will continue to hang around the forum here.
igodard
Moderator
December 30, 2013 at 9:31 pm
Post count: 9
in reply to: Site-related issues (problems, suggestions) #298
Seems fixed now.
igodard
Moderator
December 30, 2013 at 12:20 am
Post count: 9
in reply to: Forum RSS? #273
Having both the “RSS site feed” and the forum RSS button is confusing to me. Also, the RSS site feed doesn’t seem to be showing other site changes, like the new vidio in docs/metadata.
Ivan
igodard
Moderator
December 29, 2013 at 9:24 pm
Post count: 9
in reply to: Fork #260
The usual, frustrating for all concerned, answer: not filed yet
I can safely say that spawn() is much simpler than fork(), so the Mill actually makes a better Windows machine than for Unix
Author
Posts

Viewing 7 posts - 1 through 7 (of 7 total)

igodard

Forum Replies Created