Thank you 🙂
About UTF-8: in general we do hardware, and hardware doesn’t deal with character sets. The hardware deals with bytes, and has no knowledge of what those bytes hold. We deal with, or don’t deal with, character sets only in our software. There will not be any “native” character type in the architecture itself, UTF-8 or otherwise.
For llvm you will get whatever llvm gives us; it’s the same with other possible third-party software for the Mill, such as gcc or Linux itself. In the diagnostics and listing of our own house-developed software I’m afraid that we have been very lax in worrying about localization; we use the standard ASCII that C++ gives us. That will have to change, we know, but we have more pressing matters right now.
As for reading UTF8, what you need in the architecture is a funnel shifter attached to a streamer rather than a specialized load. We have some ideas in that direction, but all NYF.