I wasn’t at Hot Chips, although Ivan was, but like many I read about the Nvidia transcoding.
It reminded me strongly of Transmeta.
Because the chip must still run untranslated ARM code at a reasonable speed, it must basically be an OoO superscalar chip, and all the inefficiencies that implies. It must still have the full decode logic etc.
And therefore the microops they cache must be very close to the ARM ops they represent.
This aside, I expect they execute very well. Its a good halfway house and underlines how expensive CISC and even RISC to uop decode is; one imagines x86 chips getting much the same advantage if they store their uop decode caches too.