Q0 yes and Q1 yes, the spiller takes care of this. The spiller is clever enough to not spill empty belt positions, not the empty space in slots that are only scalar etc, so it makes the best possible job of it all.
Q2 is really no, the spill is automatic and lazy. The compiler writes to an intermediate IR, and does not know the dimensions of the various targets as Mill models differ in e.g. vector height and belt length.
But the good news is that the Mill models are very very good on tight timed loops! DSPs eat software pipelined loops, and the Mill is very much a DSP when you need that ummph.
There is a talk explaining pipelining on the Mill planned; watch the forum or subscribe to the mailing list for further info when the schedule is set!