gideony2, the /QIfist is not optimization, it completely breaks semantic of C. There is a reason why it is deprecated.
However, the example is definitively a good. x87 FIST/FISTP instruction when used in C to convert float to integer, can’t be used on their own, and rounding modes must be changed to achieve correct semantic. And as indicated it is SLOW.
The SSE3 still has global rounding modes, but new instruction FISTTP ignores the rounding modes and it always chops (from x87 80-bit floating point stack into integer). Yes, the FISTTP is x87 instruction, yet defined in modern extension.
This is different than the FIST and FISTP instructions, which do use rounding modes set in x87 control register, and were defined in 8087.
SSE2 using cvtsd (double precission float to 64-bit signed integer) do similar and ignore rounding modes.
And it is worth knowing that all these extensions SSE2, SSE3, etc, they all use control flags (that are separate from x87 control flags) to control rounding modes globally.
So, in general on modern x86 CPU, you have:
x87 control registers (2 bits) that control rounding modes of all x87 FPU ops, including FIST/FISTP
SSE2 control registers (2 bits) that control rounding modes of most of SSE2+ (no idea about MMX and SSE) instructions.
CVTTSS2SI (SSE2) that ignores rounding modes of SSE2 FPU.
FISTTP (SSE3) that ignores rounding modes of x87 control register.
As of the latency, yes, the changing of rounding modes is slow, but the main design problem was that the float->int conversion was using rounding modes in the first place, and C uses a different default for this (round to zero) than default of floating point arithmetic (which is round to nearest). How Intel done a so big mistake (in 1980) is beyond me, as the FORTRAN and C was already well established languages (and they use rounding toward zero for float to integer conversions; Fortran 77 has also NINT, that rounded to nearest, but it wasn’t used that often probably), and it was obvious that doing float->int and normal float op float, stuff will require different rounding modes by design. Maybe penalty for changing rounding modes was smaller in the days of shallower pipelines? Or the fact that the float->int conversion was expected to happen infrequently (which is the case often, in scientific computations, you often do just crunch numbers all the time, without converting anything into integers back). Or the fact that Algol only real way to convert from real to int (long) was using ’round’ function, that was doing round to nearest. Maybe Ivan or his IEEE 754 friends (William Kahan maybe?) now better about history behind this. 🙂