Without history your example will miss the paint call every time on any ISA. History will avoid misses so long as it doesn’t saturate. But you don’t need history to do the data prediction and prefetch.
The naive code will first do the load of the indexed widget, then the method select, then the call, and lastly the control variable update and back branch. Without piping that will cause a data delay in the method select. With piping the load can be hoisted to an earlier iteration, extending the deferral. Depending on the width there may be more than one iteration in flight, but even on a narrow member the load can be hoisted over the paint call of the prior iteration. The load then has the full duration of the call to deal with data cache misses.
The present specializer hoists loads to the furthest dominator. In a loop the furthest dominator may be after the load in textual order but ahead of the load in iteration order. This is safe, even in the presence of mutation of the widgets array and even without “restrict”, because the load is defined to return the value as of retire, not as of issue, and retire is after the paint call.
That takes care of data misses. It doesn’t prefetch the code of the callee. To do that the compiler can insert a prefetch instruction for the address used by the call, but ahead of the call. That prefetch instruction too can be hoisted, so the entire load/select/prefetch of iteration N is executed before the paint call of iteration N-1. Of course, that squeezes out the data hoisting we just did, so that needs to be hoisted again. The whole sequence is:
LOAD (N) CALL (N-2) SELECT (N) PREFETCH (N) CALL (N-1) CALL (N)
and there are three iterations in flight in the pipe. Disclosure: we don’t do this yet, but the ISA supports it already.