Yes, function-scale builtins could be emitted by the compiler and then get member-specific substitution in the specializer. The difficulty is in the compiler and host language: how would a program ask for one of these builtins? In all compilers (including those from third parties) and all languages?
Generally we consider the host language to be off limits. If something is needed by a language’s user community then let the standards process add it to the language and we will support it. Some such additions already exist: std::memcopy and others, and so for those we can do what you suggest. There are other possible candidates in libc and POSIX. However, it’s not clear that there would be much gain over simply having the compiler emit a function call and letting the specializer inline that.
Mind, it might be a good idea; the problem is that we can’t know until measurement. There is a lot of measurement and tuning in each member. It won’t get integer factors the way the basic architecture does, but a few percent here and there and it adds up too.