Forum Replies Created

Viewing 15 posts - 16 through 30 (of 94 total)
  • Author
    Posts
  • Will_Edwards
    Moderator
    Post count: 98

    You’re both quite right!

    When there’s an if(/else) the compiler/specializer can encode it as branches or inline it using if-conversion, and can evaluate which is best (which may differ on different Mill members).

    The if(/else) can be solved by the if and the else blocks being their own EBBs with a conditional branch, or by computing both the if and the else blocks in parallel.

    As the Mill is such a wide target (especially on the middle and high end Mills) with so many pipelines and because loads can be hoisted up to their address being computable without fear of aliasing its often so that many if(/else) can be done in parallel speculatively and this is a huge win for performance. Usually you only have to introduce some strategic Nones or 0s to nullify some untaken paths you are executing speculatively rather than needing lots of conditional ops and picks.

    Its hard to get a feel for this without concrete code and we will be making a real simulator available in the new year.

    One note, its important that the belt be consistent at all times but its irrelevant what the belt contents are if they are unused. If all instructions after a certain point are consuming slots 0 and 3 then it doesn’t matter the values of any other belt slots. There are cheap fast opcodes for moving items to the front of the belt like rescue and conform to do this rearrangement.

  • Will_Edwards
    Moderator
    Post count: 98
    in reply to: What's New? #1488

    Why yes, its been excitingly busy at the mill recently and we will get to share that and be excitingly busy in public too soon!

    We are very very very close to publishing ISA specs and there are more talks scheduled. As usual, subscribe to the mailing list or visit the forums regularly to get details.

    Ivan will give a keynote at this year’s Open Source Days in Copenhagen: http://www.opensourcedays.org/2014/ – finally a chance for us Europeans to hear him live.

  • Will_Edwards
    Moderator
    Post count: 98

    Hi Laurent,

    Thanks for the encouragement!

    On the first point, if the OoO can do all 10 loads in parallel then all 10 must be independent. If all 10 are independent then the compiler can statically schedule them in parallel too.

    Things actually get interesting if the loads alias stores, as on the Mill the loads snoop/snarf on the stores.

    Re patents: it’s best to to hint nothing at all! Please don’t go this route of disowning ideas publically.

    The metadata aspect is interesting. Of course Mill SIMD is very familiar to people used to thinking about SIMD on other platforms, but its broader too.

    The widening and narrowing ops are very fast and the range of ops that work on vectors is so much wider than other ISAs.

    The tracking of error and None NaRs allows vectorization of so much more (see the auto vectorization in the Metadata talk and the Pipelining talk examples, and think how you can use masking in general open code too) so its swings and roundabouts but generally a really big win.

  • Will_Edwards
    Moderator
    Post count: 98

    I wasn’t at Hot Chips, although Ivan was, but like many I read about the Nvidia transcoding.

    It reminded me strongly of Transmeta.

    Because the chip must still run untranslated ARM code at a reasonable speed, it must basically be an OoO superscalar chip, and all the inefficiencies that implies. It must still have the full decode logic etc.

    And therefore the microops they cache must be very close to the ARM ops they represent.

    This aside, I expect they execute very well. Its a good halfway house and underlines how expensive CISC and even RISC to uop decode is; one imagines x86 chips getting much the same advantage if they store their uop decode caches too.

  • Will_Edwards
    Moderator
    Post count: 98
    in reply to: Simulation #1582

    This recent talk by Rob Pike describes a simple APL variant that would be fun to think about on the Mill; an interpreted language that can use the Mill’s vectorization nicely.

  • Will_Edwards
    Moderator
    Post count: 98
    in reply to: Memory #1564

    Yes, it means exactly that. This part of the address space is still protected by the normal Mill memory protection mechanism so an operating system can ensure that nobody unwittingly has an actual alias though.

  • Will_Edwards
    Moderator
    Post count: 98

    We’re both right ๐Ÿ™‚

    The hardware gang are hopeful that we can have NaR low-power paths for things like multiply.

  • Will_Edwards
    Moderator
    Post count: 98

    There are many types of “efficiency” and you got me thinking about power economy, which is a very important aspect to all this:

    There is the opportunity for multi-cycle instructions to fast-path NaRs, and I’m particularly thinking about monsters like multiplyโ€ฆ whereas a real multiply takes a lot of power especially for large operand widths and vectors, a NaR can cause a cheap short cicuit so you can make substantial gains in power efficiency with speculation verses the hardware needed to do prediction.

  • Will_Edwards
    Moderator
    Post count: 98

    From looking at raw assembly code for a conventional processor you get the impression that things happen linearly and, as you say, “a register system would just branch and execute the code or not”. But looking at the code is thinking about the ‘programming model’, and is not actually how things execute under the hood so to speak.

    And unfortunately there are catches; things don’t happen linearly (modern fast CPUs are “Out-of-Order”) and branches aren’t just to be taken cheaply. Branching – or rather, mispredicting a branch – is shockingly expensive, and OoO compounds that.

    Modern CPUs have all introduced new op-codes to do “if-conversion”, although with mixed success. cmov and kin are the new darling opcodes.

    The Metadata talk describes speculation and is warmly recommended.

  • Will_Edwards
    Moderator
    Post count: 98
    in reply to: What's New? #1536

    Thank you kindly for finding and sharing with us, Larry ๐Ÿ™‚

    I’ve sent out polite enquiries to the organisers of the more recent talks and hopefully we’ll track down the other videos soon too.

    If anyone spots them before us please do post here!

  • Will_Edwards
    Moderator
    Post count: 98
    in reply to: What's New? #1533

    an upcoming presentation on the abstract Mill, viewed as a virtual machine. If I recall, this was by somebody other than Ivan, possibly not an official Millcomputing person

    I think you might mean Terje Mathisen’s talk “Are virtual machines the future?” at NNUG, Oslo, Norway on the 22nd September? Terje is absolutely an official Millcomputing person ๐Ÿ™‚ Fortunately the talk was videoed.

    Terje did another talk at Trondheim on the 21st October. This one was allegedly videoed too, but we haven’t seen the video yet.

    Just last week Ivan gave a talk in Copenhagen, and this too was videoed – I was there and saw the cameras – but again we haven’t yet got our hands on the tapes yet.

    Of course everyone is subscribed to the mailing list already right?

    On the 21st November 2014 Terje Mathisen will be giving a Mill CPU talk at http://buildstuff.lt/ Vilnius, Lithuania. And on the 10th December I will be giving at talk in Tallinn, Estonia. There’s also likely a talk in December at Microsoft, Seattle provisionally compiler-themed.

    So the year isn’t over yet! ๐Ÿ˜€

    Re wiki, glad its being discovered and explored ๐Ÿ™‚ Its very much an ongoing project and will grow.

    PS Millcomputing folks. Please feel free to delete this post if Iโ€™ve mentioned things youโ€™d rather not yet have on the forums. If so, my apologies.

    LOL no fear ๐Ÿ™‚

    • This reply was modified 9 years, 5 months ago by  Will_Edwards. Reason: Updated with link to video
  • Will_Edwards
    Moderator
    Post count: 98
    in reply to: Prediction #1493

    Perceptive. Some concrete Mills may have deferred control flow (e.g. the delay on the br op) and this may need careful simulator profiling for the case of call too.

    the actual function that will be called is hoisted all the way up to an input parameter, which again would be an ideal for software injecting โ€œupcoming|next EBBโ€ hints into the prefetcher.

    Exactly.

    Luckily this hoisting would happen in the specializer (or jit), which can use deferred calls or other hints if the concrete Mill supports it. It is nothing a conventional compiler generating the flow graph need concern itself with.

  • Will_Edwards
    Moderator
    Post count: 98
    in reply to: Security #1486

    The IDs are unforgeable not because they are unguessable but because they are managed by hardware.

  • Will_Edwards
    Moderator
    Post count: 98
    in reply to: Pipelining #1395

    That’s a perfectly reasonable implementation path for C++.

    The compiler will still have to recognise it for what it is in order to know to use the Mill’s saturating opcodes in the emitted binary.

    More generally, many uses of saturating arithmetic is in DSP and graphics programs which are often written in C.

  • Will_Edwards
    Moderator
    Post count: 98

    Thx for the link. I haven’t read it yet, but do they talk about auto-vectorizing non-vector code? Can they take code that doesn’t have neon or whatever and turn it into vectors and such?

Viewing 15 posts - 16 through 30 (of 94 total)