Mill Computing, Inc. › Forums › The Mill › Architecture › Glossary
- AuthorPosts
- #908 |
I will extend this list over time, but please add more posts of terms that need to be explained if you feel like it. I will update this alphabetical list to include them with links, or if any moderator wants to do that as well, I won’t complain. If I’m being wrong somewhere, please yell at me too.
belt – provides the functionality of general purpose registers
belt slot – the read only data source for machine operations
bundle – a collection of instructions that get fetched from memory together
EBB – Extended Basic Block
exit – a point where the instruction stream can leave the EBB
instruction – a collection of operations that get executed together
instruction stream – a sequence of instructions, the Mill has 2 working in parallel
metadata – tags attached to belt slots that describe the data in it
None – undefined data in a slot that is silently is ignored by operations
NaR – Not a Result, undefined data that traps when used in certain operations
operation – the most basic semantically defined hardware unit of execution
PLB – Protection Lookaside Buffer
portal – a cross turf call destination
protection region – specified continuuous memory region with attached permissions
SAS – Single Address Space
service – a stateful call interface that can cross protection barriers
spiller – securely manages temporary memory used by certain operations in hardware
stacklet – hardware managed memory line used in fragmented stacks
TLB – Translation Lookaside Buffer
turf – memory protection domain on the Mill, a collection of regions belt
The belt is the Mill operand holding/interchange device. It is a fifo of a fixed length of 8, 16 or 32 depending on the specific chip. Operations pick any arguments from the belt and drop the results at the front, removing equally many values from the back in the process. One operatiuon can drop multiple results. While on the belt, values never change, removing any write/read hardware hazards.
Minor quibble:
belt
The belt is the Mill operand holding/interchange device. It is a fifo of a fixed length of 8, 16 or 32 depending on the specific core family member.
Chips can have several cores, and the cores can be different members.
I think a glossary is a great idea, and I’ve started the steps to create a Wiki on the site for the glossary and other community product.
This is really appreciated!
Its early days and this thread is a good place to prepare this.
Eventually, when things settle, we can incorporate this into the FAQ.
belt slot
Serves as a temprary memory to hold operation results and privide operation arguments. Each belt slot has at least 64bit but usually 128bit and additionally a few metadata bits. A belt slot can hold any width data, even vector data, which is indicated by the metadata flags and initialized on load. Belt slots are accesses via the index of their position on the belt. Whenever new values get dropped on the belt that index gets incremented.
bundle
Bundles are a collection of instructions that are fetched from memory together. On the Mill there is not a single bundle pulled at a time but two half-bundles are pulled at a time which together comprise a single very long instruction and can contain over 30 distinct operations. Those two half-bundles are also decoded and issued in sync and together as a very long instruction word would. Often a whole EBB consists of a single such instruction bundle, containing all necessary operations.
Extended Basic Block
A sequence or batch of instructions that logically belong together. They have one entry point, and can only be entered at that entry point, and one or more exit points. On the Mill there is no implicit exiting an EBB, so every ends with an explicit control flow operation. The EBB plays a central role for organizing code on the Mill. For one, jump addresses are encoded relative to the start of the current EBB. Branch prediction works on EBB exits instead of branches.
exit
An operation that transfers control to a new EBB. This can be a branch or a jump or a call, with immediate addresses or from a belt slot. Calls don’t neccessarily need to be exists though when taken.
Exits are also what is kept in the prediction tables, one for each EBB, instead of each branch.instruction
A collection of operations that get issued together. On the Mill an instruction is divided into two half-bundles, one for each instruction stream, and each half-bundle into a header and 3 blocks of operations, which correspond to the execution phases. One block contains one or more fixed length operations.
instruction stream
The Mill has 2 separate instuction streams that operate in sync. They are divided by functionality, one being the Exu-stream for computation and one being the Flow stream for control flow and memory access and address logic. Consequently there are two program counters, XPC and FPC. On control transfers both go to the same address and then diverge, XPC going down the address space, FPC going upward.
The main reason for this arrangement is simpler, cheaper and faster decoding of instructions and more efficient use of caches.metadata
All data kept in belt slots and the scratchpad is annotated with metadata bits. This metadata is initialized on value creation and also doesn’t change over its lifetime. The most important information kept in there is the scalar data width, whether it is a SIMD vector and whether it is a valid value or not. It also carries the floating point state bits of floating point operations and possibly more. This information is used throughout the machine to augment the operations performed on the data, like inferring operand widths, how to handle overflows, how to propagate values resulting from speculative computations etc.
Not a Result
Another metadata value indicating a real fault in a previous operation that should be raised whenever this data is actually realized to memory. Values with NaR metadata hold information about the nature and location of the fault to aid debugging.
- This reply was modified 10 years, 7 months ago by imbecile.
operation
Since the Mill is a wide issue architecture it is very necesary to distinguish between instructions and operations. An operation is a semantically distinct piece of computation. An instruction is a collection of one ore more operations that get issued together. This tends to be the same on mainstream general purpose architectures.
Protection Lookaside Buffer
On the Mill memory protection and virtual memory translation are separate systems. This allows, among other things, to make both protection and translation cheaper and faster by putting the protection structures on top of the caches and the memory translation below the caches. Actually there are two Protection Lookaside Buffers, iPLB and dPLB covering code and data caches respectively. The iPLB holds execute and portal call permission, the dPLB read and write access.
portal
A portal is a special data structure of cache line size that holds all neccessary information to call into service code across protection barriers. This happens without context/thread switches and for that reason is fast. There are a few operations to manage access to portals themselves and to memory used if necessary to pass parameters both permanently and temporarily for one call.
Single Address Space
All processes and threads on the Mill share the same mappings of virtual addresses to physical addresses. This is made possible by using 64bit addresses which have an address space large enough for the forseeable future. Different programs are protected/isolated from each other with permissions in different turfs, not memory mappings. No memory remaps need to be done on task switches, and often task switches are entirely unneccessary due to this.
service
Services are a kind of library, only that the calls happen across protection boundaries through portals. They can be used from applications or other services, and provide protection for both callers and callees from each other. They are the canonical way to provide “privileged” functionality on the Mill. It is not really privileged though. Services only reside in different turfs with different permissions than the code calling them. There is nothing fundamentally different between different turfs, only the set of permissions to different memory regions.
spiller
A part of the Mill hardware that is largely invisible to the programmer and can’t be directly accessed. What it does is manage temporary memory used by certain operations. It has it’s own separate caches and is ultimately backed by DRAM. Among other things it takes care of the scratchpad, of the call stacks, of the belts of frames down the call hierarchy, of contexts in task switches etc.
stacklet
A hardware allocated segment of stack residing on the top of the address space, which is used for services. They are identified by the turf of the service and the thread the service is executed in. This prevents fragmentation of the turfs of applications and services.
Translation Lookaside Buffer
Maps virtual memory addresses to physical addresses. Resides below the caches, i.e. in the caches everything is virtual addresses. Virtual addresses are unique and refer to the same physical address in every context. They only need to to be referenced when there is a cache miss and a DRAM access becomes neccessary.
turf
Is the collection of protection regions that share the same turf ID. This turf ID is held in a special register and provides the security context of the current thread. It can be changed for the current thread with portal calls. Memory access is granted as soon as the first region with the current turf ID (or thread ID, if the turf ID is wildcarded) and the required permission is found.
You could add implicit zero. I’d give the entry but I can’t remember all of the details ATM.
phase
One instruction on the Mill can contain many operatkions that are issued together. Those operations can have data dependencies among each other. For that reason operations were divided into distinct catgories called phases that permit an ordering in consecutive cycles for those operations to be executed in to account for those data dependencies.
The Phases are:1 (Reader) operations that load or create values into the belt with hardcoded arguments
2 (Operation) operations that take belt slots as arguments and produce results
3 (Call) function calls
4 (Pick) pick operation
5 (Writer) stores and branchesimplicit zero
All new memory allocations happen in the cache first and are flagged as new for each byte. A load from such a location produces a zero. Only once something is stored in newly allocated byte the new flag is removes. As a result you get zero initialization for free, and often temporary buffers or stacks don’t even need to go into DRAM and exist all in the cache, only as virtual addresses.
replay
On all architectures unexpected things can happen which throw the pipeline and funcitonal units and all internal state into a pickle. Somehow the normal flow is suspended and must be continued later. This can happen through interrupts, branch predictions or normal reorders on out of order architectures. The Mill is not out of order, but normal calls there can present small temporary context changes just as interrupts (and interrupts are just unscheduled calls on the Mill). There are several strategies to return to the state before the interruption, all usually called replay.
Result Replay is used on the Mill. Since all interruptions introduce a new belt context in the form of a new frame, all new operations drop their results in their new context (i.e. belt slots), while the already issued operations all finish and drop the results into their old contexts. This happens all via belt slot tagging/renaming with frame ids. On return to the previous flow, all results are presented as if there never was an interruption. It may have been necessary for the spiller to intervene and temprarily save and restore some results for that to happen.
Execution Replay is used on pretty much all major hardware. When an interruption occurs all results and temporary transient state is thrown away, the issued instructions and arguments are remembered though, and on return they are reissued. This can be quite expensive on longer pipelines and with lots of complicated instructions.
specializer
The specializer is a dedicated Mill specific library. It can be considered both the last step of compilation or the first step of execution. Since the Mill is a processor family with lots of members for different purposes and thus different parameters like belt slot count, number and kind of functional units, cache sizes etc. and the Mill heavily depends on exposing those details of the chip to the compiler for optimal static scheduling, the specializer was introduced. There is a universal, member independent kind of internal byte code software is distributed and deployed with and the specializer translates it into the actually executable binary code for the required chip. This could happen at install time or load time and also does caching and symbol resulution and similar tasks done by traditional dynamic linkers.
- AuthorPosts
You must be logged in to reply to this topic.