Forum Replies Created

Viewing 15 posts - 76 through 90 (of 94 total)
  • Author
    Posts
  • Will_Edwards
    Moderator
    Post count: 98
  • Will_Edwards
    Moderator
    Post count: 98

    One interesting micro-problem that would help me understand the Mill is computing the Euclidean distance between two 3D points.

    The source code might look like this:

    d = sqrt(sqr(a.x - b.x) + sqr(a.y - b.y) + sqr(a.z - b.z))

    Or it might be:

    d = sqrt(sqr(a[0] - b[0]) + sqr(a[1] - b[1]) + sqr(a[2] - b[2]);

    Which are naturally equivalent if the fields in the point struct used in the first form are adjacent.

    The parallelism of the subtractions and squaring is obvious, and easy as vector or as separate parallel operations.

    • If vectorised, can you load a non-power-of-two length vector (perhaps it puts a power of two length vector on the belt, with None in the last slot?)
    • If vectorised, how do you then sum the values in the vector together?
    • And if done as separate operations, do you need two sequential add operations to add them together?
    • This reply was modified 10 years, 11 months ago by  Will_Edwards. Reason: clarifications
  • Will_Edwards
    Moderator
    Post count: 98
    in reply to: Memory #484

    Can an EBB see the stack of its caller?

    Can you explain more how inpReg is used?

    And, finally, cheap thread-local-storage? Excellent news! Although I imagine many languages may repurpose it, if you let them use it as just another data segment.

  • Will_Edwards
    Moderator
    Post count: 98
    in reply to: The Belt #463

    Is it possible to issue an op even if an instance of that same op is already in flight in that same pipeline?

    (This is hinted at in this talk but I want to confirm my understanding – or misunderstanding – of this)

  • Will_Edwards
    Moderator
    Post count: 98
    in reply to: The Belt #395

    A call pushes parameters onto the callee’s belt and creates a new frame id etc, which seems straightforward; but what happens to the belt when you branch rather than call?

    The loop execution has advanced the belt some number of positions, so the passed-in parameters are no longer at the top of the belt. There must be some kind of rewind mechanism? Which then makes me wonder what happens to internal values you don’t want to lose?

  • Will_Edwards
    Moderator
    Post count: 98
    in reply to: Instruction Encoding #388

    Can you describe how jumps within a frame – e.g. a loop – work, and what kind of latency they have?

    • This reply was modified 10 years, 11 months ago by  Will_Edwards. Reason: clarify
  • Will_Edwards
    Moderator
    Post count: 98

    When I look at the freshness column, it doesn’t seem to show the most recent poster nor time! For example, I posted to the Markets thread but it doesn’t show. And the names of who posted and the times don’t seem to match either.

  • Will_Edwards
    Moderator
    Post count: 98
    in reply to: Many core mill (GPU) #363

    It will be interesting to see how Intel’s new Knight’s Landing (72 in-order x86 cores giving 3 TFlops double-precision(!)) is received. I’ve chatted to someone who played with Knights Corner but as I recall they struggled to apply it to their problems. Sadly I’ve forgotten any deep insights they may have mentioned.

    I guess the big challenge when you have a lot of independent cores flying in close formation is meshing them together? And the granularity of the tasks has to be really quite large I imagine; if you play with, say, Intel’s Thread Building Blocks or openMQ (where parallelism is in-lined, rather than explicitly crafting a large number of tasks), you’ll be staggered at how many iterations of a loop you need to propose to do before its worth spreading them across multiple cores.

    Of course the Go goroutines and Erlang lightweight processes for CSP can perhaps use some more cores in a mainstream way, for server workloads.

    The other approach to massively parallel on-chip is GPGPU, which is notoriously non-GP-friendly and hard to apply to many otherwise-parallel problems. I persevered with hybrid CPU (4 core i7) and CUDA (meaty card, fermi IIRC, I was borrowing it on a remote machine, forget spec) when I was doing recmath contest entries, and typically the CUDA would give me nearly 2x the total performance of the 4xi7, which is not to be sneezed at but hardly unleashing all those flops! And conditions really killed it.

    AMD is pushing hard towards the APU and Intel also unified the address space for their integrated GPUs IIRC, so things do come to pass pretty much as John Carmack predicts each QuakeCon. His views on raytracing triangles for games are terribly exciting, and suggest to me a move towards more GP and MIMD GPUs in future too.

    So it’ll be exciting to see how people innovate with the Mill.

  • Will_Edwards
    Moderator
    Post count: 98
    in reply to: Metadata #552

    Yes, I can see how None is a kind of not a result, but what I think we meant by None taking precedence over NaR was None taking precedence over other kinds of NaR.

    If two different NaR kinds are operands to an arithmetic operation, what is the output NaR type?

    For example if you multiply the two vectors:

    2NaRNaRNone and
    NoneNone2NaR, do you get:
    NoneNoneNaRNone ?

  • Will_Edwards
    Moderator
    Post count: 98
    in reply to: Metadata #550

    Maybe a = Just a | Nothing instead it’s Belt a = Just a | Nothing | Error and binding gives Error precedence over Nothing

    Small clarification, Ivan to correct me if I’m wrong, but my understanding is that None has precedence over NaR; a NaR and None is None.

  • Will_Edwards
    Moderator
    Post count: 98
    in reply to: Metadata #545

    The scratch and spill preserves metadata.

    They are dealing with belt items, and not naked bytes, so just take the extra bits needed to maintain all this item state.

    The belt width is model specific, but completely known to the hardware and any software that interacts with it, obviously, so its easy to take care of.

    And yes, IMO these parallels with SSA and monads are appropriate πŸ™‚

  • Will_Edwards
    Moderator
    Post count: 98
    in reply to: Prediction #547

    The Mill has some mitigations, perhaps? It has an extremely small mispredict penalty (5 cycles) if the taken path is in the instruction cache. It can execute up to 6 dependent instructions in a single cycle. It also makes classic VLIW definitions of Very seem exaggerated πŸ˜‰

    What is predicted is very novel to the Mill, but the how is normal. There are predictors that try to predict the number of iterations and so on; these are implementation choices and models may differ.

  • Will_Edwards
    Moderator
    Post count: 98
    in reply to: Metadata #503

    If the caller knows whether it is interested in more than the first result, Lisps can have a calling convention where they tell the callee when they call, perhaps?

    So single interesting return values, presumably the common case, are fast pathed.

    And debuggers can patch this as they step through code, or they can just go the route of native debuggers and see what’s happening live even if there is only ever one result.

    • This reply was modified 10 years, 11 months ago by  Will_Edwards.
  • Will_Edwards
    Moderator
    Post count: 98
    in reply to: The Belt #398

    Hi Jonathan,

    I’m beginning to form my own (uninformed) mental model of the belt in loops.

    The loop body must be in an EBB, and the parameters to an EBB are passed at the front of the belt. The call instruction has a list of belt positions and order to make the new belt for the callee.

    If you jump back to the start of the EBB, you have to put the parameters for the next iteration back at the front of the belt.

    My intuition would be that the branch ops can all specify a list of belt positions and order to put these back at the front of the belt for the next iteration, just like a call can.

    I might have missed if this has been explained in any of the talks so far.

  • Will_Edwards
    Moderator
    Post count: 98

    The freshness column just doesn’t seem to be right for me πŸ™

    This is what I see on the Mill forum right now:

    I don’t think erikvv has commented in the Tools discussion 12 hours ago; that seems to have been mermerico in the Applications sub-forum instead.

    And so on. They all seem wrong. I commented in the Markets sub-forum 9 hours ago, for example.

    Its not a big deal, I’m installing an RSS feed reader on my phone instead of browsing the forums. What I’d really like is the ‘recent unread topics’ that most forums offer. I don’t think bb does, unfortunately πŸ™

Viewing 15 posts - 76 through 90 (of 94 total)