Forum Replies Created

Viewing 3 posts - 1 through 3 (of 3 total)
  • Author
    Posts
  • jcdutton
    Participant
    Post count: 3
    in reply to: Prediction #3543

    Regarding indirect jumps. I don’t understand how the predictor might predicts those. Particularly jump tables, where you start with an index variable, and do the equivalent of:
    goto jump_table[index];
    Do you know how the predictor predicts indirect jumps?

  • jcdutton
    Participant
    Post count: 3
    in reply to: Prediction #3540

    Turning the question of branch prediction on its head. Would it be possible to have a CPU that does not need any branch prediction?
    For example, the EBB has a single entry with multiple exit branches.
    What about if one included in the EBB the first few instructions after each branch.
    So, if a branch was taken, the CPU would already have the post branch instructions to execute before it needed the next EBB.
    So, say we added 5 instructions after each branch to the EBB. While executing those 5 instructions, one could retrieve the next EBB. Once the next is retrieved, we could start offset by 5 instructions into the new EBB.
    Ok, so there is a little duplication for those 5 instructions, but could one drop the whole need for prediction as a result?
    Note “5 instructions” is arbitrary. Probably best to match the pipeline size or something like that.
    This sort of approach is really only possible for a CPU that knows about the concept of a EBB, instead of individual instructions.

  • jcdutton
    Participant
    Post count: 3
    in reply to: The Belt #3535

    Hi.

    Some questions about the belt.
    When doing branches, loops etc. The belt needs to use the “conform” instruction to rearrange the belt.
    How efficient is the “conform” instruction?
    A CPU with normal registers would never need a “conform” instruction, so isn’t the belt causing a processing overhead here?
    I guess the “conform” could be a hardware shuffle of the belt providing very little overhead.
    Another problem aspect of the belt might be micro parallelism.
    From a single group of instructions to be executed, one might be able to execute some in parallel if they do not conflict. But the belt adds a conflict/race when the result is saved on the belt.
    A possible solution to this might be: Multiple belts. The instructions that work in parallel could be saving to different belts, thus removing the conflict.
    One would then have to implement some stall/syncronisation mechanism whereby the next instruction stalls, until the previous parallel instruction finishes that the stalled instruction depends on.
    How many bytes is the belt? How much needs to be copied on a context switch?

Viewing 3 posts - 1 through 3 (of 3 total)