Compiling predicate gangs

Author
Posts
Ivan Godard
Keymaster
September 2, 2015 at 5:49 pm
Post count: 689
#1966 |
Predicate gangs are a Mill feature. A predicate gang pred bound to a normal op norm has the effect of:
b = pred(x = norm(...))
where norm is some arithmetic operation like addu and pred is something like comparison with zero for all the relationals, plus a few special flags like carry. norm and pred must be adjacent in the encoding of a single instruction, with norm in the lower slot. That is, the op and its predicate must be ganged. The advantage of a predicate gang is that both norm and pred execute in parallel and produce both results together at norm‘s latency, rather than in sequence at greater latency. The problem at hand is how to obtain the benefit of predicate gangs from C/C++ source, without excessive work in the tool chain, especially when norm must be expressed as an intrinsic.
Predicate gangs (aka condition codes for the Mill) are hard to express as single functions. The cross-product of all the ops that supply predicate codes against all the predicates is huge, unnatural to write, and would inject the code into the multiple-result mess. Unfortunately, there seems to be no way to compose arbitrary operations at the intrinsic level and preserve the binding through the compiler. All the llvm/clang code assumes that an intrinsic has function-call-like semantics, a single point operation with arguments and results; there are no two-function intrinsics, nor any way to introduce them. Still, predicate gangs are a significant performance gain so we must make them available from source, somehow.
The specializer handles predicate gangs internally by treating a bound pair as a single op with two results, placing the pair together in the code tableau. This is simple and works well, but requires that the specializer be informed of the binding. That is, someone must figure out that
bool b = pred(x = norm(...))
is really
{bool, <type>} norm_pred(...)
so it can be correctly represented in the specializer’s data structure. There seem to be two places where that recognition could be done: in an optimization pass in the compiler, or in the specializer. Recognition requires noticing:
```
    \     /  
      norm       0
      \.. \     /
            pred
            \.. \
```
This can be recognized in either place, at some cost. If it is done in the compiler then there is less work for the specializer (good), but we need a notation to express the predicate in genAsm and it would be hard for link-time optimization to tease the gang apart if (for example) it is found that the predicate result is unneeded (bad). If it is done in the specializer than we have a slippery slope of peephole optimizations in the specializer (bad) but might get better code after emulation substitution or prelink (good).
Tentatively it will be done in the specializer.
- This topic was modified 8 years, 10 months ago by Ivan Godard.
- This topic was modified 8 years, 10 months ago by Ivan Godard. Reason: formatting
Author
Posts

You must be logged in to reply to this topic.