Forum Replies Created
- AuthorPosts
Pacific Standard Time, PST π
- in reply to: Application Walkthrough #530
One interesting micro-problem that would help me understand the Mill is computing the Euclidean distance between two 3D points.
The source code might look like this:
d = sqrt(sqr(a.x - b.x) + sqr(a.y - b.y) + sqr(a.z - b.z))
Or it might be:
d = sqrt(sqr(a[0] - b[0]) + sqr(a[1] - b[1]) + sqr(a[2] - b[2]);
Which are naturally equivalent if the fields in the point struct used in the first form are adjacent.
The parallelism of the subtractions and squaring is obvious, and easy as vector or as separate parallel operations.
- If vectorised, can you load a non-power-of-two length vector (perhaps it puts a power of two length vector on the belt, with None in the last slot?)
- If vectorised, how do you then sum the values in the vector together?
- And if done as separate operations, do you need two sequential add operations to add them together?
- This reply was modified 10 years, 10 months ago by Will_Edwards. Reason: clarifications
A call pushes parameters onto the callee’s belt and creates a new frame id etc, which seems straightforward; but what happens to the belt when you branch rather than call?
The loop execution has advanced the belt some number of positions, so the passed-in parameters are no longer at the top of the belt. There must be some kind of rewind mechanism? Which then makes me wonder what happens to internal values you don’t want to lose?
- in reply to: Instruction Encoding #388
Can you describe how jumps within a frame – e.g. a loop – work, and what kind of latency they have?
- This reply was modified 10 years, 10 months ago by Will_Edwards. Reason: clarify
- in reply to: Site-related issues (problems, suggestions) #366
When I look at the freshness column, it doesn’t seem to show the most recent poster nor time! For example, I posted to the Markets thread but it doesn’t show. And the names of who posted and the times don’t seem to match either.
- in reply to: Many core mill (GPU) #363
It will be interesting to see how Intel’s new Knight’s Landing (72 in-order x86 cores giving 3 TFlops double-precision(!)) is received. I’ve chatted to someone who played with Knights Corner but as I recall they struggled to apply it to their problems. Sadly I’ve forgotten any deep insights they may have mentioned.
I guess the big challenge when you have a lot of independent cores flying in close formation is meshing them together? And the granularity of the tasks has to be really quite large I imagine; if you play with, say, Intel’s Thread Building Blocks or openMQ (where parallelism is in-lined, rather than explicitly crafting a large number of tasks), you’ll be staggered at how many iterations of a loop you need to propose to do before its worth spreading them across multiple cores.
Of course the Go goroutines and Erlang lightweight processes for CSP can perhaps use some more cores in a mainstream way, for server workloads.
The other approach to massively parallel on-chip is GPGPU, which is notoriously non-GP-friendly and hard to apply to many otherwise-parallel problems. I persevered with hybrid CPU (4 core i7) and CUDA (meaty card, fermi IIRC, I was borrowing it on a remote machine, forget spec) when I was doing recmath contest entries, and typically the CUDA would give me nearly 2x the total performance of the 4xi7, which is not to be sneezed at but hardly unleashing all those flops! And conditions really killed it.
AMD is pushing hard towards the APU and Intel also unified the address space for their integrated GPUs IIRC, so things do come to pass pretty much as John Carmack predicts each QuakeCon. His views on raytracing triangles for games are terribly exciting, and suggest to me a move towards more GP and MIMD GPUs in future too.
So it’ll be exciting to see how people innovate with the Mill.
Yes, I can see how
None
is a kind of not a result, but what I think we meant byNone
taking precedence overNaR
wasNone
taking precedence over other kinds ofNaR
.If two different
NaR
kinds are operands to an arithmetic operation, what is the outputNaR
type?For example if you multiply the two vectors:
2
NaR
NaR
None
and
None
None
2
NaR
, do you get:
None
None
NaR
None
?The scratch and spill preserves metadata.
They are dealing with belt items, and not naked bytes, so just take the extra bits needed to maintain all this item state.
The belt width is model specific, but completely known to the hardware and any software that interacts with it, obviously, so its easy to take care of.
And yes, IMO these parallels with SSA and monads are appropriate π
- in reply to: Prediction #547
The Mill has some mitigations, perhaps? It has an extremely small mispredict penalty (5 cycles) if the taken path is in the instruction cache. It can execute up to 6 dependent instructions in a single cycle. It also makes classic VLIW definitions of Very seem exaggerated π
What is predicted is very novel to the Mill, but the how is normal. There are predictors that try to predict the number of iterations and so on; these are implementation choices and models may differ.
If the caller knows whether it is interested in more than the first result, Lisps can have a calling convention where they tell the callee when they call, perhaps?
So single interesting return values, presumably the common case, are fast pathed.
And debuggers can patch this as they step through code, or they can just go the route of native debuggers and see what’s happening live even if there is only ever one result.
- This reply was modified 10 years, 10 months ago by Will_Edwards.
Hi Jonathan,
I’m beginning to form my own (uninformed) mental model of the belt in loops.
The loop body must be in an EBB, and the parameters to an EBB are passed at the front of the belt. The
call
instruction has a list of belt positions and order to make the new belt for the callee.If you jump back to the start of the EBB, you have to put the parameters for the next iteration back at the front of the belt.
My intuition would be that the
branch
ops can all specify a list of belt positions and order to put these back at the front of the belt for the next iteration, just like acall
can.I might have missed if this has been explained in any of the talks so far.
- in reply to: Site-related issues (problems, suggestions) #370
The freshness column just doesn’t seem to be right for me π
This is what I see on the Mill forum right now:
I don’t think erikvv has commented in the Tools discussion 12 hours ago; that seems to have been mermerico in the Applications sub-forum instead.
And so on. They all seem wrong. I commented in the Markets sub-forum 9 hours ago, for example.
Its not a big deal, I’m installing an RSS feed reader on my phone instead of browsing the forums. What I’d really like is the ‘recent unread topics’ that most forums offer. I don’t think bb does, unfortunately π
- AuthorPosts