Difference between revisions of "Division: Algorithms and user-submitted genAsm code"
Line 40: | Line 40: | ||
How do we get the !@#$% Wiki to do sane CODE formatting?? | How do we get the !@#$% Wiki to do sane CODE formatting?? | ||
+ | |||
+ | ---- | ||
+ | |||
+ | Timing and scheduling implications: | ||
+ | |||
+ | Mill operations not implemented as a subroutine retire results after a time known at specializer time, independent of data width(?). Division algorithms depend on a number of passes that varies with data width, presenting a complication for unrolled loops to be interspaced with other operations. Data size is often not knowable at specializer time. |
Revision as of 00:05, 22 April 2015
Integer Division on the Mill: User-contributions.
This page is for collecting ideas, pseudo-code and hopefully genAsm emulation sequences.
As in the parent page, please do NOT speculate on the Mill, per se. But alternative algorithms are most welcome. So, please feel free to add links to different ways to implement division.
Assumptions:
The rdiv* operator family (e.g. rdivu [[1]] and rdivs) provides an approximation to the reciprocal of its input argument. Scaling and precision are still TBD.
We can use integer operations (other than division), such as add, subtract, multiply, shift and count leading (and/or trailing) zeros.
??What can we assume about the speeds of integer operations??
If short (e.g. shorter than 32-bit) integer math is no faster than 32 bit, then it will probably make sense to promote shorter divisions to 32 bit. However, writing width-aware genAsm code is not something we have examples of, that I'm aware of to date. (Hint, hint.)
I (LarryP) suspect that 128-bit division will be tricky, since we have no support for 256-bit intermediates.
Links: Wikipedia article on division algorithms: [2]
Wikipedia subsection on fast division algorithms: [3]
Note that the above link seems to make an implicit assumtion that floating-point math will be used. That's almost certainly not the case on the Mill, since we need to emulate division on family members that lack native floating point. Similarly, the pre-scaling outlined is optimized for (normalized) floating point values. So -- among other things -- the suggested constants may well not be correct (let alone optimal) for integer math to emulate division.
LarryP, 21April2015: I originally thought that doing Newton-Raphson root finding would run fastest if we had both a reciprocal approximation as well as a slope of the reciprocal function. However, due to properties of the "reciprofying" hyperbola, it appears (see the Wikipedia subsection linked above on fast division) that all we need is a "good," suitably-scaled reciprocal approximaton, namely rdiv*. How good an approximation to 1/x we need (vs. the speed of the emulated division is an open question, right now, so far as I know.
On the interval x memberof( [1 -- 2^n-1]), the reciprocal itself, 1/x, is always positive, and the slope is always negative.
Some rough pseudocode is now in:
Larry's PseudoCode for Emulating Division
How do we get the !@#$% Wiki to do sane CODE formatting??
Timing and scheduling implications:
Mill operations not implemented as a subroutine retire results after a time known at specializer time, independent of data width(?). Division algorithms depend on a number of passes that varies with data width, presenting a complication for unrolled loops to be interspaced with other operations. Data size is often not knowable at specializer time.