Other constraints limit the practical length of the belt to avoid impact on clock or power. We know 32-long works, and we think 64-long may be practical for high-end throughput-oriented members with slightly lowered clock rates, but 128 seems unlikely.
For your blur, if those 60 are really 20 RGB vectors (as PeterH suggests) then they will fit in the 32-belt of the mid- and upper-end members. I can imagine a member with say 16 FMA units and a 32-long belt, although we are starting to get into GPU-land with that. If there really is a distance of 60 then code even on the high general-purpose 64-members are going to use the scratchpad and its rotators as a belt extension, and the spills and fills of the belt overflow part that doesn’t fit will wind up needing to be pipelined with the rest of the code. The good news is that the scheduler in the specializer takes care of getting the spill-fill right, you don’t have to do it.