The spiller has unlimited size, because under the buffering is all of memory. The constraining factor the spiller presents is bandwidth – high end Mills with big belt have a lot of state to save and restore. The top of the spiller, the part that connects directly to the belt and the rest of the core, has enough bandwidth to eat all the core can throw at it into buffer registers. The next level down is an SRAM, and there’s not so much bandwidth between buffers and the SRAM. The final part of the spiller, between the SRAM and the bottom level cache and thence to DRAM, has the lowest.
Any one of these steps can be overwhelmed if the core really works at it, just as the regular load-store bandwidth can be overwhelmed, on a Mill or any machine. Both spiller and the memory hierarchy are able to issue-stall the core if they need to catch up. The individual Mill members have the bandwidths sized so that stall is rare, part of the resource balancing that goes into the detail work on each family member.