Mill Computing, Inc. Forums The Mill Architecture Can branch predictors perfectly predict counting loops? Reply To: Can branch predictors perfectly predict counting loops?

peceed
Participant
Post count: 3

nihil novi
I had exactly the same idea, my obvious case was quicksort that switches to more regular quadratic algorithms for short arrays.
This way I have rediscovered Loop Termination Prediction, Count Register, Counted Loop Branch, etc.

Prediction offers little value in lengthy loops

Yes, but there is surprisingly high amount of relatively short loops.
And don’t forget, that vectorization makes loops shorter up to 32 times!

When we execute the branch and discover that the prediction was wrong, or if we take a exit that wasn’t predicted, the hardware has already issued two cycles down the wrong path. Miss recovery has to unwind that, fetch the right target, and decode it, which takes 5 cycles if the target is in cache

5 cycles on Mill is like 15 cycles on conventional processor – quite a big loss!

In essence you propose a deferred branch whose transfer is detached from the evaluation of the predicate, similar to the way Mill separates load issue from load retire.

IIRC Elbrus E2k uses this technique, it has Prepare Branch and Execute Branch instructions.

One is the time required to reset the fetch. If the target is not in the I0 microcache then reset would take roughly as long as mispredict recovery, i.e. five cycles in our test configs.

It looks like you should split speculative fetch and speculative code prefetch.

How often can we eval a predicate five cycles before the transfer? Not often I’d guess, but I have no numbers.

But you can start resetting the fetch earlier! Prepare Branch and the following instructions before Branch Execute are valid…

A semantic issue is how the DB interacts with other branches. Suppose a regular branch is executed between DB issue/eval and transfer? Who wins?

It doesn’t matter as long as it is consistent! We could treat it in the same way as delay slots, so the most “powerful” answer is:both of them, unfortunately it won’t work with Belt I suppose.

  • This reply was modified 9 months, 2 weeks ago by  peceed.