The Microsoft press release is pretty content free, but it appears to be a direct descendant of the Texas TRIPS system, see https://en.wikipedia.org/wiki/TRIPS_architecture. TRIPS was in response to the barrier to parallelism represented by the quadratic complexity of reservation and issue stations in out-of-order CPUs. The basic idea was to statically isolate closely-connected (by dataflow) sets of instructions in which intra-set flows were numerous and extra-set flows were few. These sets could then be scheduled as a unit, sharply reducing the amount of hardware dependency analysis required because only the extra-set flows need to be reserved, whereas a conventional OOO must reserve for every flow of every instruction.
The reservation station barrier was widely recognized at the time, and both TRIPS and EPIC (Itanium) architectures were a response, as in a way is the Mill phasing. There have also been a lot of paper/sim designs that try to recognize micro-threads that can be optimally scheduled statically within the thread and dynamically scheduled outside the thread. The architectural problem is two-fold: how to statically recognize dependencies in large enough code granules to be useful, and how to package and issue the recognized granules in a low-overhead way. The recognition problem immediately bumps into C pointer semantics; despite a ton of academic work, compiler micro-task recognition doesn’t work except for BLAS-like loops with Fortran semantics.
Whereas recognition is a compiler problem, package-and-issue is a hardware problem, known as the “dispatch box problem” in dataflow architectures (https://en.wikipedia.org/wiki/Dataflow_architecture). Academic work has shown that large real programs have ILPs in the hundreds, but only at very long range; ILP at short range is widely believed to be around two, except for “embarrassingly parallel” applications. To the best of my knowledge TRIPS never cracked the recognition problem, but their dispatch hardware was one of the more interesting approaches that did not bound the granule size. EPIC (IA64) abandoned trying to bundle dataflows in favor of temporally horizontal execution (VLIW), though I think Bob Rau would have returned to dataflow had he lived long enough. Mill has both horizontal (wide issue/VLIW) and vertical (phasing/dataflow) scheduling, and might be seen as a conceptual descendant of a marriage TRIPS and EPIC. However, we made the dispatch problem tractable by statically limiting the dataflow granule, to three cycles in the current Mills, avoiding the variable-sized approach of TRIPS.
The comments above are mostly about TRIPS and not about Microsoft. TRIPS I rate as a fascinating but ultimately unsuccessful approach. What Microsoft has done with it I don’t know.