Mill Computing, Inc. › Forums › The Mill › Architecture › Floating Point Rounding

- AuthorPosts
- #3206 |
Dithered rounding seems somewhat uncommon. Ever thought about doing it for the Mill? Ideal dithered rounding would take into account more than just an extra bit of mantissa, unlike the other rounding modes, and would make repeated addition of small floats to a large one give a far more accurate result (though one that’s less repeatable and predictable).

Actually, very very early the Mill had a stochastic (dithered) rounding mode. Then I became a member of the IEEE-754 (FP standard) committee and the others convinced me that it was a bad idea. I’m not enough of a numerics guy to explain why to someone else, but we accepted the opinion of the FP mavens on the committee and dropped it.

Sometimes a repeatable result is more important than a statistical small improvement in precision. In one example case, 3D modeling, a random round-off may require a slightly larger “close” check to tell if 2 vertices on polygons should be considered the same point.

It

**may**make sense at**very**low precision (way below IEEE). See https://arxiv.org/pdf/1502.02551v1.pdfI think it is really important to a) have control over rounding modes, b) not make them a global state (this makes performance slower, and makes calling functions tricky, and context switch need to save and restore the state).

I favor a explicit rounding modes in the instructions. I.e. default is round to nearest (as dictated by specifics of IEEE 754), but each individual instruction can specific a bit to force rounding toward infinity, toward minus infinity, or towards zero. random rounding support definitively optional. It can be implemented using fixed 2 bits if standard modes are to be supported, or using 1+2 bits, with 1 bit being zero, to indicated round to nearest, and 1 indicating non-standard rounding, with two remaining bits indicating which other mode to use – this would of course make the instruction decoding a bit more hard and make instructions variable.

x87, SSE, SSE2, AVX all use global flags to control rounding modes. AVX-512 uses explicit rounding control in the instruction itself. This actually make code much faster, because there is no global state to care about, and if you are frequently switching rounding mode (i.e. for interval arithmetic), there is no cost involved with switching rounding modes at all.

Adding to my previous reply that explicit per-instruction rounding modes are important, and good. I was wondering if it would be possible to extend it even further for vector operations. Even in AVX-512, the rounding mode specified in instruction applies the same way to all operands (vector elements). However, ability to do 3 different roundings in one vector op (towards minus infinity, towards plus infinity, and towards nearest), could make the interval arithmetic implementations even nicer and so much faster. 2 additional bits per operand might be a bit too much to fit into op, but a special “ALL” mode, where for two-element vectors is interpreted as (-INF, +INF), and for 4 element vectors as (-INF, NEAR, +INF, ZERO) for respective operands would be awesome. This combined with vector shuffling / permutations and expands, would make implementations of interval arithmetic rather straightforward, and really fast.

Classic case of slowdown due to global rounding mode is

int=float;

the rules say you chop, and on the 80×87 this forces rounding mode to change**twice**, which costs a few

**dozen**cycles; see http://stereopsis.com/FPU.html.There is a special optimization https://docs.microsoft.com/en-us/cpp/build/reference/qifist-suppress-ftol?view=vs-2017 to avoid this double change and allow rounding.

- This reply was modified 5 years, 2 months ago by gideony2.

gideony2, the /QIfist is not optimization, it completely breaks semantic of C. There is a reason why it is deprecated.

However, the example is definitively a good. x87 FIST/FISTP instruction when used in C to convert float to integer, can’t be used on their own, and rounding modes must be changed to achieve correct semantic. And as indicated it is SLOW.

The SSE3 still has global rounding modes, but new instruction FISTTP ignores the rounding modes and it always chops (from x87 80-bit floating point stack into integer). Yes, the FISTTP is x87 instruction, yet defined in modern extension.

This is different than the FIST and FISTP instructions, which do use rounding modes set in x87 control register, and were defined in 8087.

SSE2 using cvtsd (double precission float to 64-bit signed integer) do similar and ignore rounding modes.

And it is worth knowing that all these extensions SSE2, SSE3, etc, they all use control flags (that are separate from x87 control flags) to control rounding modes globally.

So, in general on modern x86 CPU, you have:

x87 control registers (2 bits) that control rounding modes of all x87 FPU ops, including FIST/FISTP

SSE2 control registers (2 bits) that control rounding modes of most of SSE2+ (no idea about MMX and SSE) instructions.

CVTTSS2SI (SSE2) that ignores rounding modes of SSE2 FPU.

FISTTP (SSE3) that ignores rounding modes of x87 control register.As of the latency, yes, the changing of rounding modes is slow, but the main design problem was that the float->int conversion was using rounding modes in the first place, and C uses a different default for this (round to zero) than default of floating point arithmetic (which is round to nearest). How Intel done a so big mistake (in 1980) is beyond me, as the FORTRAN and C was already well established languages (and they use rounding toward zero for float to integer conversions; Fortran 77 has also NINT, that rounded to nearest, but it wasn’t used that often probably), and it was obvious that doing float->int and normal float op float, stuff will require different rounding modes by design. Maybe penalty for changing rounding modes was smaller in the days of shallower pipelines? Or the fact that the float->int conversion was expected to happen infrequently (which is the case often, in scientific computations, you often do just crunch numbers all the time, without converting anything into integers back). Or the fact that Algol only real way to convert from real to int (long) was using ’round’ function, that was doing round to nearest. Maybe Ivan or his IEEE 754 friends (William Kahan maybe?) now better about history behind this. ðŸ™‚

One more thing, IEEE 754-2008, actually defines 5 rounding modes for single and doubel precission floating point number to integer conversions: tieEven (nearest), towardsZero (chop), Positive (to +infinity), Negative (to -infinity), Away (away from zero). So, that would require 3 bits. And if same 3 bits would be used in other ops (like adds and multiplications), maybe the remaining 11 combinations can be used for something constructive, like some combinations meaning random/stochastic rounding, or up-and-down-and-near for vector operations (which would be useful for interval arithmetic A LOT).

Reading even more into IEEE 754-2008, some operations can also be marked as having 6th rounding mode, called Exact. They will produce results normally as any other rounding mode (i.e. nearest), but if the result is not exact (i.e. sqrt(2) will produce result that is not exact, it will be rounded to something that is different that real result, but 2*3 or 1/2 will produce exact results), it will throw FP exception. Really cool feature, because it allows to write a fast path without carrying too much about handling all rounding, and handle the inexact results in some slow handler. It would be most useful in integer to float conversions, and float to int conversions. I.e. conversion of 2.5 to int will produce inexact value, and throw exception, but 2.0 to int will produce exact result, and continue execution.

Yes. That is the purpose. It will except on most divisions, but not all. As of additions and multiplications, hard to say. I would say it depends on application. I.e. 2*3, or 0.5+1 will not except, but when adding vastly different values in magnitude, or ones that have a lot of nonzero in significant digits, it will except.

It is a useful tool in some applications. I never used it personally tho, even in implementation of interval arithmetic.

Microsoft calls /QIfist an optimization, so I call it one.

Kahan: “Optimization is replacing something that works by something that almost works &is cheaper”

There is enough horsepower in /QIfist that MS implements &publishes this “optimization”.

Kahan is wrong ðŸ™‚ I hope it was just a joke.

I think the intention is that, if you use /Qlfist, then you make sure to manually set rounding modes in relevant code (to chop, so the FIST comforms to C semantic of float to int conversion), or in main, and you are aware that this also will change normal floating point operations rounding (but that is of much smaller importance in many cases). This way the code emitted by compiler doesn’t change rounding modes all the time, and you are supposed to make sure that the modes are correct instead.

Anyway, fortunately modern machines, have better facilities for dealing with the problem (FISTTP and CVTTSS2SI).

- AuthorPosts

You must be logged in to reply to this topic.