We can’t leave scratchpad-usage data in the spiller because the data is both spatially and temporally random access, while the spiller is at heart just a glorified stack. Items can be left in the scratchpad for arbitrarily long times without increasing the latency of later access, whereas items in the spiller eventually migrate to memory and get memory latency.
Instead we want the scratchpad to have uniform latency and simple random access, without the expensive mux crossbar needed for spiller access even to limited depth. So really scratch acts, and is mostly implemented like, register files in conventional memory. The differences include the inclusion of metadata, the self-defining data widths, and the packing at byte granularity.