You’re human (darn it, right?).
I’m wondering what the practical differences would be with the Platinum Mill: the one with 64 belt positions.
However, I realize that the cycle limit time for the double crossbar for the belt would definitely lengthen the instruction cycle, OR would require more clock cycles to account for that difference, which would likely make the circuitry not merely twice as much (it wouldn’t be anyway due to the double crossbar) but scale a little closer to squared. Meanwhile, in most cases, I’m not sure how much more parallelism you could extract, and what the cost would be in power for typical workloads.
That’s one major thing I see with Intel with the x86 (32 and 64 bit) architectures: while the process size goes down, it’s easy for them to add more cores on the same area, but most practical workloads (non-server) typically rarely use more than 2 threads: I believe this is a huge reason Apple has gone the direction they have, with 2 higher performance cores in their ARM devices, with one core more likely used for all GUI stuff, and the other core for the background I/O and computations. Besides the parallelism of 2 threads being an easy reach for typical applications (general purpose GUI apps with I/O and background computations) there’s also the memory bandwidth to consider: if a modern superscalar OOO core is waiting 1/3 of its instruction cycles for main memory, adding more cores just means that more cores are idling, waiting for memory to have anything to compute.
Unless I’m mistaken about the nature of the beast, memory I/O also has additional power requirements for storing/retrieving data above regular refresh power, and actively used, RAM and the associated circuitry is one of the more major power consumers: thus, reducing memory I/O for a given amount of useful computation is also a power usage reduction.
- This reply was modified 8 years, 11 months ago by JonathanThompson.