Well, Intel tried Larrabee. I would expect the Mill architecture to be much more suited for something like that than x86.
AMD tries hUMA too. In my ignorant lay person opinion, once the memory loads and access patterns can be served by one shared memory it shouldn’t be too much harder to plug two different sets of Mill cores into it. One set for application code, one set for float and graphics code.