Mill Computing, Inc. Forums The Mill Architecture less work for the spiller

Tagged: ,

  • Author
  • mhkool
    Post count: 7
    #3296 |

    The spiller has a decent amount of work to do when the belt is saved on function calls and executing the inner operation. The whole belt must be saved and recovered on return.
    Operations like conform and rescue reorganize the belt and may invalidate part of the belt. The invalidated belt slots need not be saved and restored which helps to reduce the work of the spiller.

    What are your thoughts on adding a single-bit property to all operations that produce a result on the belt to indicate if the result is used once or multiple times. Alternatively, instead of adding this property to many operations, maybe a special operation can have a bitmask saying which belt positions will be used only once.
    Since 80% of the belt values are used only once and can be invalidated automagically (when the value is used) with this new property, there is a potential gain where the spiller has even less work to do.

    • This topic was modified 5 years, 11 months ago by  mhkool. Reason: minor clarification
  • Ivan Godard
    Post count: 689

    A “one-shot” drop is an interesting idea we have looked at. It would not actually simplify the spiller, which would still need to be able to handle cases in which all drops were multi-shot. The savings would come in power, and to a lesser extent in spiller bandwidth in its paths to memory. The cost is entropy in the encoding and complexity of the implementation.

    It is clear that bit per drop encoding has the same functionality as an op with a bitmask. Roughly 65% of ops (dynamic) have drops, while multidrop ops are rare enough to ignore. Consequently bit-per-drop costs ~0.65 bit per op. If you figure IPC of 6 for a mid-range Mill, the overall entropy would be ~4 bits per instruction. In comparison, an op and a bit mask in complete generality (like rescue) would need a belt’s worth of bits (16 for midrange), plus the entropy of the op itself, say 9 to 15 bits depending on slot load. It’s clear that bit-per-drop has the better entropy, even before considerations of slot pressure.

    The separate op approach is also somewhat redundant with the rescue op. An unrescued belt entry (i.e. one that occupied a position when the rescue executed but wasn’t mentioned by the rescue) doesn’t get spilled; it’s just marked as invalid and a reference to it will fault. The same applies to branches that reconfigure the belt. A oneShot op says which operands will be used once before they are used, while the rescue op says which drops were one-shots after they were used. We don’t currently use rescue just to winnow dead values from the belt, but we could if power measurements showed it would be worth while.

You must be logged in to reply to this topic.