Forum Replies Created
- AuthorPosts
Can someone please do a diagram showing cycle by cycle the status of mill pipeline when a typical call happens and returns? That isn’t very clear to me. When the belt renaming happens, and do different phases see different belt numbers?
From my understanding, the existence of the reading phase is by itself bad for performance, unlike the subsequent ones. It sure is a ingenuous way to enable denser instruction encoding, by allowing more complex instruction encoding and also by reducing the number of instructions encoded, avoiding the fixed overhead of an extra instruction. But save maybe by the interaction with a function call, I don’t see where else there can be a gain.
The abstract Mill instruction encoding format allows a lot of flexibility for the hardware and physical instruction set beneath to change and still be compatible with the abstract Mill. If for a certain Mill member the implementation trade-offs change, and it makes sense to encode both reader and computing phase operations together to be decoded and executed in the same cycle, would this still be a Mill? Or the reading phase executing a cycle earlier is a hardwired assumption that would break compatibility if changed?
Neither decode0 nor decode 1 “produce” anything except whole or party decoded operations.
I was referring exactly to those decoded or partially decoded instructions, of course. So here the break down of what happens with one instruction by cycle after it was fetched, as I understand:
1 – In decode 0 stage readerPhase operations are at least partially decoded, right?
2 – Then on decode 1 the rest of the instruction is at least partially decoded, and the readerPhase operations must have ended decoding. Are some long latency registers already probed at this stage as an advanced readerPhase? Or all readPhase operations patiently waits the next cycle because the opPhase needs a stage to issue and/or the flow side const encoding only finishes decoding this cycle?
3 – Reader phase operations are executed. It is also the issue stage for opPhase (also called decode 2). By now every operation is probably fully decoded already, only waiting for their phase, right?
- This reply was modified 9 years, 10 months ago by Renesac.
Right, I was confounded by your slide 28. It shows the con being moved to the brtr cycle. Apparently this is wrong. The add that should be moved up to the brtr cycle, and the con should be moved even higher to the cycle before, and both will not appear to have executed if the branch wasn’t taken. And the store is in the same cycle as the following branch (writterPhase) not moved lower.
Now I see that the reader phase is really advantageous in all situations (except some times for exit mispredictions, as any other pipeline lengthening, but that is ok).
decode 0
decode 1
readerPhase execute/opPhase issue
opPhase execute
callPhase/first called instruction
… repeated
writerPhase (which is also opPhase of the next instruction and readerPhase of the one after that)Let’s see if I understand it right. The first part could well be in the encoder talk topic:
Decode 0 don’t produces anything? Are the readerPhase ops already ready, but waiting for the opPhase ones to be decoded so they can issue while reader phase executes? Or do the reader side ops on the flow side take one extra cycle to decode compared to the simple encoding in the exu side, and that is why there is this delay?
The decode 1 decodes the opPhase operations (and maybe the pick and writer phase too, finishing the effective decode?) so that the next cycle can be a issue phase (Mill does had an issue stage after all, only masked by phasing) while the reader operations that were already ready execute.
Is there a decode 2 stage?
Parallel to the opPhase execute, is the readerPhase of the called function/loop executing, assuming that the branch/exit predictor did it’s job? Or there is a bubble there? That is the only place where I see the readerPhase increasing the work done in a cycle when assuming a infinitely powerful decoder.
So, is the “callPhase/first called instruction” already at the opPhase of the called instruction?
And in the cycle the function returns (already at transfer phase, that occurs at the same time as the writer phase?), is the calling site already at opPhase? Or there is a bubble? Again, assuming perfect exit prediction.
- This reply was modified 9 years, 10 months ago by Renesac.
The phase of an operation (or the phase of a sub-action within an operation that is multi-phase, like store) is determined by explicit specification.
Ok, so the abstract Mill accepts a processor that has no separate reader phase (or other phase), if such a thing is advantageous in some future implementation? For example, if you run a n-way interleaved multithreading (barrel processor) you have plenty of time between your cycles. Or maybe a Tin, where instructions are already small and simple, likewise the crossbar is tiny and clocking targets may be low. Or something weirder.
- AuthorPosts