I had to ask the hardware guys about this (IANAHG). They say:
For add/subtract, the power increase with element size N bits is roughly proportional to N*log3(N).
For shift/rotate, N*log2(N).
For multiply, N*N (or N*N/4 with a Booth first stage).
Power increases linearly with number of elements.
These are only very rough, order of magnitude accuracy power terms, meant to give an intuitive feel.
AVX-512 contains not only ALUs, but shifters and multipliers as well. When the multipliers kick in is when the lights dim.
The big power win on the Mill is in getting rid of all the OOO machinery and the huge number of registers.
The configuration machinery separately specifies the number of ALUs, multipliers, and divide/sqrt units so configurations can be tuned to workload and the power/performance trade-off.