Preliminary Design for Mill-Forth

{Started 04Jan2014 by LarryP} Feel free to add and/or comment, but please don't wholesale remove content unilaterally.

Mill-Forth (MF) Design Context:

The partial Mill tool-chain (specifically the genAsm assembler, the specializer and the simulator) are the tools I expect will be used for the initial effort. So the core of MF must initially be written in Mill genAsm.

A simple design is more likely to work -- even if it doesn't do everything everybody wants. And working code is vastly more valuable than fragments of a more capable design that never quite runs.

We can -- and probably should -- revisit MF's initial design after we have something working, but IMHO that should be a new phase of this project, assuming we get there.

Mill-Forth Design Assumptions:

We must assume that our code can and will start running with permissions sufficient to read and write at least one specified range of memory. Otherwise, I can't see how MF can do anything useful. I assume we'll know the location and size of that region, that it'll be at least 64*1024 bytes. Note that the spiller's separate memory map is not part of this initial size estimate, though the spiller will need memory of it's own.

Note that while extremely small memory footprint has historically been a design goal for Forths, memory is vastly cheaper now, and I think we want clarity, capability and speed more than we do a tiny memory footprint. Based on previous experience with Forths, I think 64K should be enough to contain the stacks (even if all of them except the spiller's are in this memory), the interpreter(s), the dictionary and supporting functions (e.g. faked I/O) -- even using big (read 8-byte) cells on the data stack.

Initially, MF will run on a simulated Mill CPU.

Since we likely won't have an O/S or even a BIOS, we will have to emulate I/O via either simulator outcalls or simple (ring?) buffers in memory. However, I think abstracting MF's input and output behavior will make MF more portable and useful. Note that Forth has been used in a number of non-X86 system designs (IBM Power servers, SUN workstations, PowerPC Macs) as part of their boot firmware. Google "Open Firmware" or OpenBoot for details. Wikipedia entry for "Open Firmware:" [1]

We will use ASCII-encoded bytes for input and output in the initial version. Other character sets (e.g. UTF-8, etc.) will have to wait or not happen. Note that many Forths treat input in a case-blind fashion, usually forcing input to upper case before processing it. I'd prefer not to do this, since reading all UC hurts my eyes, but I'm fine with using all UC for standard Forth words.

If all items on the data stack must be of uniform size (typical of Forths I've seen), I think using 8-byte data items (e.g. "cells" in Forth parlance) is the right way to go. Those can hold arbitrary Mill pointers, which is essential, if I recall correctly.

Nice but not necessary (probably defer until phase 2+):

Having a contiguous chunk of persistent memory would be nice, e.g. to save some state between MF invocations, but adding this will depend on the simulator's capabilities.

A design that permits multiple MF interpreters to co-exist (e.g. with separate I/O streams, memory and permissions) and optionally with a means for them to "play nicely with one another" seems desirable.

How should we handle NaRs and Nones in Mill-Forth?

The only place we can do calculations is onto the belt. Belt results can be NaRs or Nones, either by accident or deliberately. We cannot keep the entire Forth dictionary on the belt/scratchpad, so the dictionary must be in memory. Sometimes, we need to be able to store the results of calculations into the dictionary and bring them back -- preferably unchanged. But the Mill special handles loads and stores of NaRs and Nones, so that they normally cannot be written to memory. So a round-trip of an operand on the belt to the dictionary (in memory) and back will generally fail the "round-trip fidelity test!" (Or at least will fail it, without the interpreter doing some extra gymnastics and reproducing the equivalent of some metadata out in memory.)

What are the feasible options for handling NaRs and Nones in Mill-Forth?

Can we ignore them as an edge case, in the initial version?

I'd like to be able to play with NaRs and Nones, since they have interesting properties and are novel to the Mill. So if we can figure out a clean way of representing Nones and NaRs in a Forth-ish manner, that would keep our options open. However, I doubt we can handle NaRs and Nones cleanly/consistently, without adding our own kind of memory-resident metadata. Adding metadata to Forth would be a considerable change; probably something to think about for phase 2 or later.

Key parts of MF:

A readable byte stream for input.

A write-able byte stream for output.

A stack to hold data.

In classic Forth, ALL data is passed between Forth words (essentially functions) via a FIFO stack that is separate from the return stack.

A return stack (or functional equivalent.)

In classic Forth, a FIFO return stack -- separate from the data stack -- is used to record where to return to at the end of each Forth word. The most common thing that goes onto this stack are addresses of Forth words, but the return stack is also used to hold loop info (e.g. for the DO and LOOP words) and is read and write accessible for other uses, so long as they don't interfere with the interpreter's use of the return stack for maintaining interpreter state.

Working Memory

Forth's memory management is extremely simple; the "dictionary" is a contiguous chunk of memory that Forth uses for both words and "global" variables.

The Forth interpreter(s)

The outer interpreter handles interactive text editing and makes entire lines of text available to the inner interpreter. {LarryP opinion} I think this function should be moved outside our MF design and housed within whatever we use to implement the input byte stream. /{LarryP opinion}

The inner interpreter parses the input into white-space delimited tokens, processes them each in order and sends any resultant output bytes to the output stream. All input is handled as:

- Whitespace (ignored on input except to delimit tokens)

- Numbers (usually just pushed onto the data stack)

- Already-defined Forth words (which are normally executed immediately)

- Unrecognized text (echoed as part of an error message, unless in the process of defining a new word.)

How best to map the core components of Forth onto the Mill?

Preliminary Design for Mill-Forth

Navigation menu

Personal tools

Namespaces

Variants

Views

Actions

Search

Navigation

Tools