Forum Replies Created

Viewing 15 posts - 496 through 510 (of 674 total)
  • Author
    Posts
  • Ivan Godard
    Keymaster
    Post count: 689
    in reply to: Execution #1139

    Excess arguments or returns are passed in memory in essentially the same way that a conventional ABI does. Ditto VARARGS and over-large args/results.

    The memory-pass ABI on a Mill has to cope with the fact that a Mill call can cross protection boundaries. Consequently it is important that the argument pass happens without the caller being able to see the callee’s frame, nor the callee see the caller’s frame. Yet still get the data across. Yes, yet more magic 🙂

    The full explanation requires pictures, but the short answer is that the caller piles the arguments contiguously up in his own space, and then the call automatically grants access to the pile to the callee, and the access is automatically revoked at the return. Addressing is via a pair of specRegs (unimaginatively called INP and OUTP) that can be used as base registers in load/store/LEA.

    Pictures in one of the future talks at some point, don’t hold your breath 🙁

  • Ivan Godard
    Keymaster
    Post count: 689

    In response to the above, Larry Pfeffer sent me a personal email, relevant parts reproduce below. I will respond in a reply:
    ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
    Ivan,

    In forum post #1128 you wrote:

    In this and all other topics, please do not speculate or suggest ideas in the public forum. … We may already have worked out something in great detail and are just about to file the patents on it, only to have an innocent well-intentioned suggestion give enough of the idea that a patent is no longer available, or could be attacked even if granted.

    I suggest you re-post your message #1128 in such a way that all forum members see it. On the off chance you haven’t thought of these options to better protect your IP, you might want to:

    1. Explicitly change the forums’ terms of use, so that all ideas posted on millcomputing’s forums becomes millcomputing’s property. If so, I suggest sending a message to everyone already signed up and change the forum signup to agree to the same message as a condition for forum participation. Doing so seals one type of IP-loss hazard, but even under that forum change, posts could still establish new deadlines for patent filings.

    2. Change the forums to be moderated. That’s clearly more work, and it requires moderators to know Millcomputing’s current and foreseen IP. (Admittedly, an ouch.)

    I hope you won’t close the forums, but I’d rather see you keep ownership of your IP and bring it to market ASAP. I’ve done my best to write my forum posts such that they don’t poison any of your wells. I’m first inventor on a recent patent, through which I learned about some of the many things that can screw a patent application’s chances of being granted. And drafting a patent application to get through the USPTO is considerably easier than drafting a patent application and claims that can also stand up to concerted challenges by deep-pocketed opponents, e.g. Intel, IBM, Google….

    • Ivan Godard
      Keymaster
      Post count: 689

      Larry’s remarks are well taken in the present IP climate.

      Unfortunately.

      There are actually two different IP-related questions here. In the first, we risk having the patentability of our work being unintentionally poisoned by a poster here. We already know the idea involved, may have worked it out in detail and put a lot of effort into it, but an un-fleshed-out public sketch of the idea could prevent us getting a patent, or invalidate one later even if did get one. We lose, the poster gains nothing, and only our competitors benefit. IANAL, but I don’t think that Terms of Service protect us from this problem; publication is publication, and once published the idea is gone for international patents, although US patents give us a year’s grace.

      Larry brings up the second question: what if the poster’s idea is new to us, and has enough merit that we would like to add it? We could put in Terms of Service that claim ownership of all posted ideas, and I suspect that most forum participants would consider them no more than the usual legal boilerplate cruft that grows kudzu all over everything these days. However, there’s a moral question here: if the idea is genuinely new to us, should the poster get some compensation for it? Even if he has waived ownership by virtue of signing in to the site? I have no problem with requiring IP assignment as a condition of employment; after all you are getting paid. But I’m not altogether comfortable with IP assignment as a condition of reading a site.

      On the other hand, there is the big grey area between an idea that we already have in detail, and an idea that we have never seen before. Every company has horror stories about unreasonable, sometimes even psychotic, people who allege that the company has stolen their idea – and sue. In defense, many companies simply refuse to even look at ideas from outside the company. That too is not the kind of company outlook I am comfortable with – the world is too full of the fortresses of fear already.

      It is difficult to find a line between the warm-fuzzy of an open source project (which would never get funded into reality) and the take-no-prisoners behavior of widespread legal practice and company policies. I think many company founders are worried about such issues – remember “Don’t be evil”? But Google is itself an example of how hard it is to retain moral focus through time and massive growth.

      I invite discussion of this topic, and the wider topic of company moral behavior.

      Ivan

  • Ivan Godard
    Keymaster
    Post count: 689

    Specific to the question of ganging:

    All ganged operations are bound; the two (or more) parts must be in adjacent slots, in order. The hardware slot population is defined to ensure this. Thus for example, in the FMA op (which is a gang because it needs three inputs) may be specified that the gang[0] is in slot 1 and gang[1] is in slot 2, or any other adjacent pair, or even all adjacent pairs if the specification is spendthrift of hardware, but cannot have gang[0] in slot 1 and gang[1] in slot 5. This makes scheduling gangs no more difficult than scheduling non-gang ops.

    As Will explains, the compiler does not know the FU and slot layout of the target member, and the same compiled code may be targeted at quite different members. Only the specializer knows the specific target, and it knows everything about that target, including such things as latency (you are right that that varies too). All that is dealt with during scheduling, which is done entirely in the specializer. The compiler output is a dataflow dependency forest graph, structured for easy and fast scheduling, but it is not scheduled until the specializer.

    The specializer does three main tasks: substitute emulation graphs for ops that are not native on the target; schedule; and emit binary.

  • Ivan Godard
    Keymaster
    Post count: 689
    in reply to: Specification #1103

    The specializer format is similar to a serialized IR, but unextended LLVM IR is not able to carry all the Mill semantics, and unfortunately has also discarded some information that the LLVM compiler had and the specializer needs. We are in the process of porting LLVM, but there’s a lot to do and limited resources.

  • Ivan Godard
    Keymaster
    Post count: 689
    in reply to: Specification #1092

    Complex answer here, please bear with me.

    Of the four reserved bits, one is not needed if the hardware is never going to support the Unix fork() function; any other thread- or process-spawning function will still work, but specifically not fork(). There will be a talk on how we do fork() sometime in the future; it’s non-trivial on a single-address-space machine.

    The remaining three reserved bits are not needed if the hardware is never going to support garbage collection, or only support garbage collectors that cannot (or do not) use the hardware event-trap mechanism that uses those three bits.

    Because few if any embedded systems need either fork() or garbage collection, the four reserved bits could be used for addresses on such a system.

    For the remaining address space bits, the Mill is inherently a single-address-space machine, and while you could use the Belt and other Mill notions on a conventional multi-space architecture, the result would not be a Mill. Consequently, for a Mill the size of pointers is dictated by the size of the total shared address space that is required by the application. If all the memory used by all threads of all processes is less than 256 bytes you could in principle have a Mill with one-byte pointers, although other aspects of the architecture are way too heavy-duty to compete with the likes of Atmel. A 16-bit pointer is only somewhat more plausible, and I think that the Z80 market is safe from us.

    However, embedded systems that currently run in 32-bit chips with no MMU should also be able to run in a Mill with 32-bit pointers and no native 8-byte arithmetic. However, the designers of such a chip would have to think hard about what else of the basic Mill to leave out: the PLB (losing wild-address protection and portal calls)? the TLB (losing backless memory, and paging of course)? Self-tuning transfer prediction, or prediction at all for that matter? All unclear, and heavily customer-driven.

    So I guess the answer to your final question is: eventually possible and very low priority.

  • Ivan Godard
    Keymaster
    Post count: 689

    Wow, the Nova. I did a compiler for that once, Nova 1200. Language roughly as complex as early C++ in the C-front era, the compiler could compile itself in 64k of (core) memory that you shared with the OS. 🙂

    Diagnostics for the Mill need to be written in C++ because they have to integrate with the specification system which is also C++, and the specification data structures, which are C++ data structures. Yes, other languages could be used, but we don’t want the maintenance headaches that language fan-out causes.

    But Mill diagnostics are not actually written, they are specified, and from the specifications comes the asm code for the actual target test code. We don’t have the resources, human, money or time, to write diagnostics for a dozen family members and manually track the changes that they incur as they evolve. The same is true for hardware diagnostics, except the output is probe signals or whatever the hardware sims (or the actual hardware in time) talk.

    The C++ problem aside, we welcome volunteers for the team. We don’t accept work-for-free volunteers, but we do accept sweat-equity vp;unteers where the team member is compensated by an interest in the company. It is a major cost to us ramping a new person up on the Mill, our code base, and our work, and the ramping must be intensive and productive or the new person never gets out of ramp mode. In practical terms, this means that any new person must realistically be able to commit 20 hours per week on an on-going basis.

    For you or any other reader here, if the above sounds like you then contact me directly at ivan@millcomputing.com.

  • Ivan Godard
    Keymaster
    Post count: 689
    in reply to: Execution #1138

    No, it occupies one three-input slot in the same encoding block with writers (writerBlock). It is able to do so because pick is not actually moving any data, the way that copying bits to the input register of an adder does. Pick only controls the routing of the data going someplace else. There’s no real pick FU, in the sense of an adder FU.

    The two-input block (exuBlock, adds and muls and such) does have real data movement and the wires to carry it. Real writers (not the pick that is encoded in the same block, put the pure sink operations that move data to the scratchpad or specRegs) have only one input. ReaderBloc operations (popular constants, specReg and scratchpad reads) have no inputs. And the whole flow side (control flow, memory) has an extremely ad hoc data movement pattern in which different slots can have different operand counts depending on what FUs are connected where.

    It’s as regular as we can make it, but no more than that 🙂

  • Ivan Godard
    Keymaster
    Post count: 689
    in reply to: Specification #1128

    In this and all other topics, please do not speculate or suggest ideas in the public forum. We have put and are putting a great deal of work into the Mill, and would like to see some tangible reward for our work. We may already have worked out something in great detail and are just about to file the patents on it, only to have an innocent well-intentioned suggestion give enough of the idea that a patent is no longer available, or could be attacked even if granted.

    We do ask for help, but that doesn’t mean that we are looking for postings of ideas, it means that we are looking for real human beings, with expertise – or just thought-out ideas – in the subject, who will join our team, sign the necessary NDA and IP agreements to protect our and their work, and put significant time into the effort on a sweat-equity basis. Contact me directly (ivan@millcomputing.com) if that’s you.

    So please, keep the discussion here only on material that has been part of our public disclosures, or material that has already been published by third parties. The latter is especially useful – a cite to an obscure decades-old paper may save us the cost of a pointless patent filing. Or ask questions about how XYZ from a talk works (but don’t suggest alternatives). Or talk about marketing or business matters – Mill Computing has a real technology, not just a business plan and an MBA as so many startups do, so we have nothing we need to protect about the business.

    But please, nothing about anything that might be a patent someday; speculations and ideas can hurt us and don’t help you.

    Ivan

  • Ivan Godard
    Keymaster
    Post count: 689

    The reason you want to kill it may be that it is hung inside the portal-called function.

    As Will said – the facility opens up all sorts of interesting possibilities for OS models. We designed it to ensure that existing OS models could be supported, with better performance. But that doesn’t rule out other models 🙂

  • Ivan Godard
    Keymaster
    Post count: 689

    Some capability systems, Keykos in particular, had capabilities for accounting entities. With these, you can give your credit card to another process.

    A similar mechanism could be created by grants to portals to functions that do the accounting. However, all that kind of machinery is way above the hardware. OS research guys will have fun with it – but then they will find a lot of fun on a Mill anyway 🙂

  • Ivan Godard
    Keymaster
    Post count: 689
    in reply to: Specification #1101

    You are describing what the sim does 🙂

    Of course, the sim is dynamic, while an asm has to be static. And unlike the specializer and sim, asm doesn’t have width information, and so it also does not have latency info, which is needed to track the belt. And then there’s the question of what to do at control-flow joins, especially joins that include the back edges of loops. The problem is roughly equivalent to type inference, i.e. mostly solvable in most code but not in general, and far more work than you’d want to do in an assembler. Good for a PhD if you wanted one though 🙂

  • Ivan Godard
    Keymaster
    Post count: 689
    in reply to: Specification #1099

    Our assembler is primarily intended for hardware verification; it is not intended for application use. Manual asm on a Mill is extremely inhumane due to the difficulty of keeping track of belt contents manually; it’s not too hard to read but impractical to write.

    Compiler output, which is also specializer input, is a non-textual graph structure. The specializer turns that into target binary, and you can inspect it or get listings in asm text, and even assemble it if you want, but the normal build cycle does not use an asm stage.

  • Ivan Godard
    Keymaster
    Post count: 689
    in reply to: Specification #1098

    Mixed-core-architecture chips are common in embedded and other special applications – think the Sony Cell. Usually the cores have dedicated functions – for example, one drives your friendly local nuke plant, and one drive the UI, so process migration is not an issue. The cores need not even be from the same vendor nor be of related architectures. Crimson is a testbed for such use in our specification system.

    Process migration across dissimilar cores has been the subject of some academic work. It appears to be practical in a JIT environment, although utility, except for reliability reasons, is questionable.

  • Ivan Godard
    Keymaster
    Post count: 689

    Expanding a bit on Will’s answer:

    There are several mechanisms to get an immediate into an operation or onto the belt. The reader-phase flow-side con operation can supply any non-NaR scalar value of any width (including 16-byte on Mill family members supporting native quad), but not vector values. The reader-phase exu-side operation rd can supply any of a hardware-defined preselected set of values known as popCons (popular constants), both scalar and vector. Both con and rd were used in the little demo program. Lastly, many op-phase exu-side operations such as add, sub, and the relational ops, both scalar and vector, have immediate forms in which the second argument is a small literal encoded in the operation.

    Of these, the exu-side ops, both popCons and immediates, are used as entropy-soaks; popCons for entropy in the readerBlock decode block and immediates for the exuBlock decode block. There is also an entropy-soaker for the exu-side writerBlock decode block, but it is NYF. There is no flow-side entropy-soaker, although one can think of the conBlock block and extBlock block as being entropy-soakers for the ops in the flowBlock block. There is nothing to prevent an entropy-soaker operation in flowBlock itself, but we’ve never found an operation that is naturally flow-side that can use a (varying) few bits of immediate.

    The take-away on this is that Mill encoding is not perfect maximal entropy, but it is pretty close and much denser than any other CPU encoding.

Viewing 15 posts - 496 through 510 (of 674 total)