Preliminary Design for Mill-Forth

From Mill Computing Wiki
Revision as of 15:42, 5 January 2015 by LarryP (Talk | contribs)

Jump to: navigation, search

{Started 04Jan2014 by LarryP} Feel free to add and/or comment, but please don't wholesale remove content unilaterally.

Mill-Forth (MF) Design Context:

  • The partial Mill tool-chain (specifically the genAsm assembler, the specializer and the simulator) are the tools I expect will be used for the initial effort. So the core of MF must initially be written in Mill genAsm.
  • A simple design is more likely to work -- even if it doesn't do everything everybody wants. And working code is vastly more valuable than fragments of a more capable design that never quite runs.
  • We can -- and probably should -- revisit MF's initial design after we have something working, but IMHO that should be a new phase of this project, assuming we get there.

Mill-Forth Design Assumptions:

  • We must assume that our code can and will start running with permissions sufficient to read and write at least one specified range of memory. Otherwise, I can't see how MF can do anything useful. I assume we'll know the location and size of that region, that it'll be at least 64*1024 bytes.
  • Initially, MF will run on a simulated Mill CPU.
  • Since we likely won't have an O/S or even a BIOS, we will have to emulate I/O via either simulator outcalls or simple (ring?) buffers in memory. However, I think abstracting MF's input and output behavior will make MF more portable and useful. Note that Forth has been used in a number of non-X86 system designs (IBM Power servers, SUN workstations, PowerPC Macs) as part of their boot firmware. Google "Open Firmware" or OpenBoot for details. Wikipedia entry for "Open Firmware:" [1]
  • We will use ASCII-encoded bytes for input and output in the initial version. Other character sets (e.g. UTF-8, etc.) will have to wait or not happen. Note that many Forths treat input in a case-blind fashion, usually forcing input to upper case before processing it. I'd prefer not to do this, since reading all UC hurts my eyes, but I'm fine with using all UC for standard Forth words.
  • If all items on the data stack must be of uniform size (typical of Forths I've seen), I think using 8-byte data items (e.g. "cells" in Forth parlance) is the right way to go. Those can hold arbitrary Mill pointers, which is essential, if I recall correctly.



Nice but not necessary (probably defer until phase 2+):

  • Having a contiguous chunk of persistent memory would be nice, e.g. to save some state between MF invocations, but adding this will depend on the simulator's capabilities.
  • A design that permits multiple MF interpreters to co-exist (e.g. with separate I/O streams, memory and permissions) and optionally with a means for them to "play nicely with one another" seems desirable.

Key parts of MF:

  • A readable byte stream for input.
  • A write-able byte stream for output.
  • A stack to hold data.

In classic Forth, ALL data is passed between Forth words (essentially functions) via a FIFO stack that is separate from the return stack.

  • A return stack (or functional equivalent.)

In classic Forth, a FIFO return stack -- separate from the data stack -- is used to record where to return to at the end of each Forth word. In normal usage, the only thing that goes onto this stack are addresses of Forth words (or something trivially convertable into such addresses.)

  • Working Memory

Forth's memory management is extremely simple; the "dictionary" is a contiguous chunk of memory that Forth uses for both words and "global" variables.

  • The Forth interpreter(s)
  • The outer interpreter handles interactive text editing and makes entire lines of text available to the inner interpreter. {LarryP opinion} I think this function should be moved outside our MF design and housed within whatever we use to implement the input byte stream. /{LarryP opinion}
  • The inner interpreter parses the input into white-space delimited tokens, processes them each in order and sends any resultant output bytes to the output stream. All input is handled as:
    • Whitespace (ignored on input except to delimit tokens)
    • Numbers (usually just pushed onto the data stack)
    • Already-defined Forth words (which are normally executed immediately)
    • Unrecognized text (echoed as part of an error message, unless in the process of defining a new word.)

How best to map the core components of Forth onto the Mill?