- SymmetryParticipantMarch 8, 2014 at 9:07 amPost count: 28
Vectors on the mill are very useful things, and easy to apply to regular code compared to the vectors in on of the larger x86s. Intel is talking about having 512-bit wide vector units on future processors, and I’m worried that the Mill won’t be able to meet that in practice for architectural reasons. As explained in the Metadata talk the Mill allows any input to it’s functional units to be a vector, and if you want to have a wide Mill that is also tall (has wide vectors) the amount of silicon devoted to the belt would have to grow as the product of those numbers. By contrast an x86 can get away with only having the small fraction of it’s bypass network that is devoted to vector operations match the width of it’s vectors, and so not suffer from the problem.
This would be a huge power problem, but you can always shut down the portions of your functional units and bypass network that aren’t needed for an operation. Some people I was colaborating with during my thesis were looking at doing just that for conventional machines, and the metadata would make it much easier. But you would still have to pay for the extra line length the area costs would bring.
And then again, maybe a Mill doesn’t need larger vectors and would be better off just keeping a depth of 128 bits and increasing the width. You could look at 512 bit vectors as Intel trying to get around their limited ability to execute multiple operations at once, limitations the Mill doesn’t suffer from as much. You could say, “I don’t care if your vectors are 4 times wider, we’ll just use four times as many and we’ll both be limited by memory bandwidth.”
Or maybe you all have thought of some clever means of partitioning your belt network by depth that isn’t obvious to me (and so is probably NYF), or I’m wrong to thinkg that all inputs and outputs have to have the same size?
- This topic was modified 8 years, 2 months ago by Symmetry.
- Will_EdwardsModeratorMarch 8, 2014 at 10:57 amPost count: 98
You answer your own question very well 🙂 I think your reasoning closely approximates the ootb team.
> Or maybe you all have thought of some clever means of partitioning your belt network by depth that isn’t obvious to me (and so is probably NYF), or I’m wrong to thinkg that all inputs and outputs have to have the same size?
Well holistically Mill is an ABI at the load module level. But its very much grounded in a hardware architecture, of course. Yet how the belt is implemented is always described in the talks as an implementation detail…
You must be logged in to reply to this topic.