• Author
    Posts
  • imbecile
    Participant
    Post count: 48
    #908 |

    I will extend this list over time, but please add more posts of terms that need to be explained if you feel like it. I will update this alphabetical list to include them with links, or if any moderator wants to do that as well, I won’t complain. If I’m being wrong somewhere, please yell at me too.

    belt – provides the functionality of general purpose registers
    belt slot – the read only data source for machine operations
    bundle – a collection of instructions that get fetched from memory together
    EBB – Extended Basic Block
    exit – a point where the instruction stream can leave the EBB
    instruction – a collection of operations that get executed together
    instruction stream – a sequence of instructions, the Mill has 2 working in parallel
    metadata – tags attached to belt slots that describe the data in it
    None – undefined data in a slot that is silently is ignored by operations
    NaR – Not a Result, undefined data that traps when used in certain operations
    operation – the most basic semantically defined hardware unit of execution
    PLB – Protection Lookaside Buffer
    portal – a cross turf call destination
    protection region – specified continuuous memory region with attached permissions
    SAS – Single Address Space
    service – a stateful call interface that can cross protection barriers
    spiller – securely manages temporary memory used by certain operations in hardware
    stacklet – hardware managed memory line used in fragmented stacks
    TLB – Translation Lookaside Buffer
    turf – memory protection domain on the Mill, a collection of regions

  • imbecile
    Participant
    Post count: 48

    belt

    The belt is the Mill operand holding/interchange device. It is a fifo of a fixed length of 8, 16 or 32 depending on the specific chip. Operations pick any arguments from the belt and drop the results at the front, removing equally many values from the back in the process. One operatiuon can drop multiple results. While on the belt, values never change, removing any write/read hardware hazards.

    • Ivan Godard
      Keymaster
      Post count: 689

      Minor quibble:

      belt

      The belt is the Mill operand holding/interchange device. It is a fifo of a fixed length of 8, 16 or 32 depending on the specific core family member.

      Chips can have several cores, and the cores can be different members.

      I think a glossary is a great idea, and I’ve started the steps to create a Wiki on the site for the glossary and other community product.

  • Will_Edwards
    Moderator
    Post count: 98

    This is really appreciated!

    Its early days and this thread is a good place to prepare this.

    Eventually, when things settle, we can incorporate this into the FAQ.

    • imbecile
      Participant
      Post count: 48

      Well, looks like editing isn’t possible anymore after a while, which means extending and updating the alphabetical list isn’t really possible. Anything that could be done about that?

  • imbecile
    Participant
    Post count: 48

    belt slot

    Serves as a temprary memory to hold operation results and privide operation arguments. Each belt slot has at least 64bit but usually 128bit and additionally a few metadata bits. A belt slot can hold any width data, even vector data, which is indicated by the metadata flags and initialized on load. Belt slots are accesses via the index of their position on the belt. Whenever new values get dropped on the belt that index gets incremented.

  • imbecile
    Participant
    Post count: 48

    bundle

    Bundles are a collection of instructions that are fetched from memory together. On the Mill there is not a single bundle pulled at a time but two half-bundles are pulled at a time which together comprise a single very long instruction and can contain over 30 distinct operations. Those two half-bundles are also decoded and issued in sync and together as a very long instruction word would. Often a whole EBB consists of a single such instruction bundle, containing all necessary operations.

  • imbecile
    Participant
    Post count: 48

    Extended Basic Block

    A sequence or batch of instructions that logically belong together. They have one entry point, and can only be entered at that entry point, and one or more exit points. On the Mill there is no implicit exiting an EBB, so every ends with an explicit control flow operation. The EBB plays a central role for organizing code on the Mill. For one, jump addresses are encoded relative to the start of the current EBB. Branch prediction works on EBB exits instead of branches.

  • imbecile
    Participant
    Post count: 48

    exit
    An operation that transfers control to a new EBB. This can be a branch or a jump or a call, with immediate addresses or from a belt slot. Calls don’t neccessarily need to be exists though when taken.
    Exits are also what is kept in the prediction tables, one for each EBB, instead of each branch.

  • imbecile
    Participant
    Post count: 48

    instruction

    A collection of operations that get issued together. On the Mill an instruction is divided into two half-bundles, one for each instruction stream, and each half-bundle into a header and 3 blocks of operations, which correspond to the execution phases. One block contains one or more fixed length operations.

  • imbecile
    Participant
    Post count: 48

    instruction stream

    The Mill has 2 separate instuction streams that operate in sync. They are divided by functionality, one being the Exu-stream for computation and one being the Flow stream for control flow and memory access and address logic. Consequently there are two program counters, XPC and FPC. On control transfers both go to the same address and then diverge, XPC going down the address space, FPC going upward.
    The main reason for this arrangement is simpler, cheaper and faster decoding of instructions and more efficient use of caches.

  • imbecile
    Participant
    Post count: 48

    metadata

    All data kept in belt slots and the scratchpad is annotated with metadata bits. This metadata is initialized on value creation and also doesn’t change over its lifetime. The most important information kept in there is the scalar data width, whether it is a SIMD vector and whether it is a valid value or not. It also carries the floating point state bits of floating point operations and possibly more. This information is used throughout the machine to augment the operations performed on the data, like inferring operand widths, how to handle overflows, how to propagate values resulting from speculative computations etc.

  • imbecile
    Participant
    Post count: 48

    None

    Is one of the possible metadata values indicating missing data. Used primarily for speculative execution.

  • imbecile
    Participant
    Post count: 48

    Not a Result

    Another metadata value indicating a real fault in a previous operation that should be raised whenever this data is actually realized to memory. Values with NaR metadata hold information about the nature and location of the fault to aid debugging.

    • This reply was modified 9 years, 12 months ago by  imbecile.
  • imbecile
    Participant
    Post count: 48

    operation

    Since the Mill is a wide issue architecture it is very necesary to distinguish between instructions and operations. An operation is a semantically distinct piece of computation. An instruction is a collection of one ore more operations that get issued together. This tends to be the same on mainstream general purpose architectures.

  • imbecile
    Participant
    Post count: 48

    Protection Lookaside Buffer

    On the Mill memory protection and virtual memory translation are separate systems. This allows, among other things, to make both protection and translation cheaper and faster by putting the protection structures on top of the caches and the memory translation below the caches. Actually there are two Protection Lookaside Buffers, iPLB and dPLB covering code and data caches respectively. The iPLB holds execute and portal call permission, the dPLB read and write access.

  • imbecile
    Participant
    Post count: 48

    portal

    A portal is a special data structure of cache line size that holds all neccessary information to call into service code across protection barriers. This happens without context/thread switches and for that reason is fast. There are a few operations to manage access to portals themselves and to memory used if necessary to pass parameters both permanently and temporarily for one call.

  • imbecile
    Participant
    Post count: 48

    protection region

    A continuous region in the address space with a set of access permissions. It can be attributed to a turf, a thread or both with a turf and/or thread ID.

  • imbecile
    Participant
    Post count: 48

    Single Address Space

    All processes and threads on the Mill share the same mappings of virtual addresses to physical addresses. This is made possible by using 64bit addresses which have an address space large enough for the forseeable future. Different programs are protected/isolated from each other with permissions in different turfs, not memory mappings. No memory remaps need to be done on task switches, and often task switches are entirely unneccessary due to this.

  • imbecile
    Participant
    Post count: 48

    service

    Services are a kind of library, only that the calls happen across protection boundaries through portals. They can be used from applications or other services, and provide protection for both callers and callees from each other. They are the canonical way to provide “privileged” functionality on the Mill. It is not really privileged though. Services only reside in different turfs with different permissions than the code calling them. There is nothing fundamentally different between different turfs, only the set of permissions to different memory regions.

  • imbecile
    Participant
    Post count: 48

    spiller

    A part of the Mill hardware that is largely invisible to the programmer and can’t be directly accessed. What it does is manage temporary memory used by certain operations. It has it’s own separate caches and is ultimately backed by DRAM. Among other things it takes care of the scratchpad, of the call stacks, of the belts of frames down the call hierarchy, of contexts in task switches etc.

  • imbecile
    Participant
    Post count: 48

    stacklet

    A hardware allocated segment of stack residing on the top of the address space, which is used for services. They are identified by the turf of the service and the thread the service is executed in. This prevents fragmentation of the turfs of applications and services.

  • imbecile
    Participant
    Post count: 48

    Translation Lookaside Buffer

    Maps virtual memory addresses to physical addresses. Resides below the caches, i.e. in the caches everything is virtual addresses. Virtual addresses are unique and refer to the same physical address in every context. They only need to to be referenced when there is a cache miss and a DRAM access becomes neccessary.

  • imbecile
    Participant
    Post count: 48

    turf

    Is the collection of protection regions that share the same turf ID. This turf ID is held in a special register and provides the security context of the current thread. It can be changed for the current thread with portal calls. Memory access is granted as soon as the first region with the current turf ID (or thread ID, if the turf ID is wildcarded) and the required permission is found.

  • joseph.h.garvin
    Participant
    Post count: 21

    You could add implicit zero. I’d give the entry but I can’t remember all of the details ATM.

  • imbecile
    Participant
    Post count: 48

    phase

    One instruction on the Mill can contain many operatkions that are issued together. Those operations can have data dependencies among each other. For that reason operations were divided into distinct catgories called phases that permit an ordering in consecutive cycles for those operations to be executed in to account for those data dependencies.
    The Phases are:

    1 (Reader) operations that load or create values into the belt with hardcoded arguments
    2 (Operation) operations that take belt slots as arguments and produce results
    3 (Call) function calls
    4 (Pick) pick operation
    5 (Writer) stores and branches

  • imbecile
    Participant
    Post count: 48

    implicit zero

    All new memory allocations happen in the cache first and are flagged as new for each byte. A load from such a location produces a zero. Only once something is stored in newly allocated byte the new flag is removes. As a result you get zero initialization for free, and often temporary buffers or stacks don’t even need to go into DRAM and exist all in the cache, only as virtual addresses.

  • imbecile
    Participant
    Post count: 48

    replay

    On all architectures unexpected things can happen which throw the pipeline and funcitonal units and all internal state into a pickle. Somehow the normal flow is suspended and must be continued later. This can happen through interrupts, branch predictions or normal reorders on out of order architectures. The Mill is not out of order, but normal calls there can present small temporary context changes just as interrupts (and interrupts are just unscheduled calls on the Mill). There are several strategies to return to the state before the interruption, all usually called replay.

    Result Replay is used on the Mill. Since all interruptions introduce a new belt context in the form of a new frame, all new operations drop their results in their new context (i.e. belt slots), while the already issued operations all finish and drop the results into their old contexts. This happens all via belt slot tagging/renaming with frame ids. On return to the previous flow, all results are presented as if there never was an interruption. It may have been necessary for the spiller to intervene and temprarily save and restore some results for that to happen.

    Execution Replay is used on pretty much all major hardware. When an interruption occurs all results and temporary transient state is thrown away, the issued instructions and arguments are remembered though, and on return they are reissued. This can be quite expensive on longer pipelines and with lots of complicated instructions.

  • imbecile
    Participant
    Post count: 48

    specializer

    The specializer is a dedicated Mill specific library. It can be considered both the last step of compilation or the first step of execution. Since the Mill is a processor family with lots of members for different purposes and thus different parameters like belt slot count, number and kind of functional units, cache sizes etc. and the Mill heavily depends on exposing those details of the chip to the compiler for optimal static scheduling, the specializer was introduced. There is a universal, member independent kind of internal byte code software is distributed and deployed with and the specializer translates it into the actually executable binary code for the required chip. This could happen at install time or load time and also does caching and symbol resulution and similar tasks done by traditional dynamic linkers.

You must be logged in to reply to this topic.