As the specializer is now implemented, the prioritization and the scheduling are not split into two passes. Instead the prioritization is done on the fly during scheduling.
Scheduling itself is fairly straightforward; dependencies and latency determines a target cycle, and the op is placed therein if a supporting slot is free. If not then we search for one, up or down the tableau depending on the op. Prioritization is a more complicated set of heuristics, that use both a priority among different categories of ops without consideration of dataflow, and among ops in the same category based on dataflow and belt size considerations. For more detail you’d need to work through the code, and the heuristics are likely to change as we gain experience.
As with other aspects of family member configuration, the number of stations will be set based of extensive tuning and measurement for workloads characteristic of the intended market. As a rough starting point, a station count equal to the belt length seems to work well.
That’s an excellent description of scratchpad behavior – would you be interested in writing a white paper? The only change needed: the OS doesn’t spill the scratchpad on task switch. We just note where in the circular buffer the task switch occurred, so the buffer can be being unloaded into one task’s save space while it is simultaneously being loaded or used by a different task from its own save space. The only thing that gets changed at task switch is the address that the hardware uses for load/unload.
Questions and responses:
Yes, you are right: it is very hard to set aside one’s own familiarity and answer (or prepare a talk) that addresses the level of knowledge and experience of the audience. Especially so when the audience is quite varied and it’s necessary to neither bore those familiar with the subject nor lose those not so familiar. We do our best.