Forum Replies Created
- AuthorPosts
- in reply to: Pipelining on a Gold member of the Mill family #3346
Phasing is a separate issue, having nothing to do with piping. It makes a difference only in open code, not loops, where it is redundant to piping. We would still pipeline to the same degree if there were only one phase.
Oh dear, I didn’t realize that at all. In the decode talk you actually briefly mention that there are multiple execute phases (X0-X4+) per instruction/bundle. That would make dependent operations possible in a single instruction because they can execute in different cycles. I guess at this point I’ll have to carefully watch the talks again.
Thanks again. Your help is much appreciated!
- in reply to: Pipelining on a Gold member of the Mill family #3344
Many thanks!
Your explanations give a more realistic feel of what the Mill is capable of and what the relevant factors are. Assuming vector registers one might even do a 4×4-Matrix-Vector multiply in a couple of cycles. That is impressive imho.
About loop carried variables: I do see how conceptually the carried variables can just stay on the belt and therefore the maximal loop distance is proportional to the length of the belt (ignoring spill/fill ops). What I was trying to get at is that for the “i-1” version of the loop you can’t do two iterations in one instruction because of the data-dependency. Whereas the “i+1” version would allow more iterations per instruction due to no dependencies (ignoring other factors/limits).
So what I really wanted to ask: How can a single instruction contain/execute data-dependent integer or FPU ops (say a+b+c)?
According to the Execution talk (#6), an instruction is phased/pipelined internally. Now an instruction or bundle is decoded during 3 cycles and executed during 3 cycles but only one of the execution cycles is an ops phase (in which execution units can be used (true?)). I assume a+b+c can’t be computed during this one ops phase/cycle.
- AuthorPosts