Mill Computing, Inc

Keymaster

June 14, 2023 at 5:03 am

Post count: 689

Prediction offers little value in lengthy loops. Even in architectures with no form of prediction, there may be a miss on enter and another on exit, but for all the other iterations the loop body is already in the top level cache. At most a predictor can let fetch and decode proceed through the back branch without waiting for that branch to execute and the target resolve. That wait can be significant in very tight loops, but predictor hardware isn’t free and in cost conscious embedded usage predictors are an engineering choice.

This also applies to the Mill exit predictor. You are right that ours permits fetch-ahead without decode of the branch because the next ebb’s address is in the prediction, similarly to a BTB entry. The entry also contains a duration, in cycles, which tells the decode when to stop the current ebb and start on the new one. If the predicator was correct, the target ebb (and likely many more) is already in the TLC and the correct sequence of bundles passes through decode to issue with no bubbles.

That’s if no miss. When we execute the branch and discover that the prediction was wrong, or if we take a exit that wasn’t predicted, the hardware has already issued two cycles down the wrong path. Miss recovery has to unwind that, fetch the right target, and decode it, which takes 5 cycles if the target is in cache (legacy miss costs are usually around 20 cycles). A naive predictor will always miss at the bottom of a loop at the last iteration. If the loop has a low trip count the miss can be significant. To avoid it, a more sophisticated predictor uses prior branch history or iteration counts to switch the prediction on the last time through. We’ve tried both approaches, and found that which works best depends on the kind of apps being run, so that’s another configuration choice.

Reply To: Can branch predictors perfectly predict counting loops?