An idea for debunking – this seems as good a place as any.
A fundamental issue that branch prediction attempts to solve is that there is no explicit clue as to a branch (or computed jump or call or return) target, until the actual PC-mutating instruction itself (is decoded).
This is directly analogous to the memory load problem – you don’t see the memory address until (the same instruction as when) the load result is supposed to be available.
So… why not solve it with explicit support for preload, rather than prediction.
In more (deliberately vague) detail:
Add a second register/buffer in the decode stage. Include an explicit ISA instruction to fetch the alternative instruction stream. As with ‘load from memory’, you can do this as soon as you know what the next (possible) PC/IP target is, which for most conditional and absolute branches/jumps/calls is known at compile-time. C++/JVM etc. ‘virtual’ call destinations, while dynamic, are also typically available at least several instructions before the actual PC/IP switch point (from linear execution).
You want, of course, to implement this in hardware at the level of i-cache lines, probably 32/64-byte units. At all times, your primary source of instructions is the linear execution path, but you have a prepared line of ‘alternative’ (possibly conditional or virtual or return) address instruction buffer.
The actual branch/jump/call/return instruction then reduces to a ‘switch-instruction-buffer’.
Even better, call/return for leaf call-points is a natural consequence – the return point is immediately available as the alternative instruction buffer after the ‘call’.
I seem to recall some history in this approach, but I don’t have any references at hand.
At a high level, it is entirely analogous to ‘prefetch… load’, which the Mill does explicitly as I recall.
So, is branch prediction actually necessary, or is it a reaction to pipelining on ISA’s which didn’t originally consider that they might be pipelined?