Thanks for your detailed explanations about how the compiler and scheduler work together to implement pipelined loops! I found your detailed explanation very helpful in understanding more about how the specializer schedules code and handles resource allocation/tracking as part of its scheduling. The explanation about blocking ranges (and the impact of non-pipelined functional units) on the number of instructions needed to implement a loop body makes clear the downside of having a dedicated, long-latency functional unit (like some implementations of monolythic divide) to a Mill machine’s overall performance.
From what you wrote above, the answer to my previous questions appear to be:
How will the compiler and specializer jointly handle this case?
A. Quite well indeed.
(To this reader, it sounds like there’s already been lots of sim results and tests that have exercised the existing compiler, specializer and simulator on a whole bunch of loop-heavy code. So there’s a body of NYF/NY-disclosable evidence that the tools and software pipelining of looks works well and will continue to do so. No hand waving involved.)
Is this one of the cases where the compiler has to emit a series of alternative sequences for the specializer to choose from? Or can the specializer always implement any software-pipelined loop sequence, though it may need more instructions to do the pipeline-required loads using its limited load hardware?
A. The compiler could issue alternative sequences, but at present there appears to be no need to do so for loops.
A. The specializer can schedule any intermediate-level “code” the compiler gives it — for any Mill target — including for all known and foreseen pipelined loops. On lower-end Mills, with fewer slots/functional units (including fewer load units and retire stations), the pipelined loop body may, and probably will, take more instructions to do the loop body, as should be expected.
Incidentally, loads are uncommon in Mill software-pipelines, for reasons that are NYF
Especially given the example, that iterates through an array, I’ll be very interested in hearing about how/why loads are uncommon in Mill software pipelines! Of course, I’m already eagerly awaiting the availability of the pipelining talk, and I’ll hope that the post-production work goes smoothly and you don’t have the hassle of re-shooting any video.
Thanks again for your cogent and detailed responses,