Thebellster: The Mill really requires fixed latency. The load technique could be used, but would be a huge PITA.
Although many FPUs have slow special-case handling of denormals, it is possible to handle them at full speed. The Am29027 FPU did it, as do some recent IBM POWER processors. And the nVidia Fermi GPU, for reasons very similar to the Mill: variable latency would add more complexity than it saves. It just takes a bit more implementation effort.