How best to map the core components of Forth onto the Mill?

From Mill Computing Wiki
Revision as of 20:01, 5 January 2015 by LarryP (Talk | contribs)

Jump to: navigation, search

In his post [[1]] forums, Ivan wrote:

The interesting part of doing Forth on a Mill is not the ops, which are straightforward, but in figuring out how to map the Forth dual-stack program model onto the belt

{LarryP opinion, currently rough}: I'd generalize Ivan's statement to:
How best to map the Forth's key components (including the two stacks) onto the Mill architecture?

For the moment, let's put input/output functions and structures aside (including the outer interpreter, which is basically an input line editing/buffering device.) What's left that Forth needs:

  • A data stack
  • A return stack (or functional equivalent)
  • The inner interpreter
  • The dictionary, a contiguous chunk of memory used for both compiled words and static variables.

Let's dispose of the easy ones first:

The inner interpreter and any helper functions it needs must be in memory that has execute permission. This probably means that the interpreter and other directly executable code needs to be in a separate chunk of memory from the dictionary and stacks (though they may be adjacent.)

The dictionary is big (way too big to fit on the belt and/or the spiller) and needs both read and write permissions. Although words in the dictionary will be "executed," that execution is really just interpretation by the inner interpreter and any helper functions it calls.

The data stack is also very likely too big to fit on the belt, and will also need to be in read/write memory. Note that this could conceivably be on the Mill's frame-oriented stack of the interpreter function itself, although we'd have to be careful that the Mill stack's virtual zero and automatic stack cut-back work very differently than stacks on conventional CPUs.


Now the less obvious Forth-to-Mill mappings

Unless we want to do painful, low-level management of call and return addresses in the interpreter itself (and use branches instead of calls... yuck!), I think the best option for the return stack (for called PRIMITIVE functions) is to use the Mill's native call and return operations and the associated spiller stack. Note that using Mill call/return ops will hide the actual CPU/spiller state from us. However, so long as call and return behave; I think that's probably OK. Note that the interpreter will also need somewhere else to keep its own state (where it is in a deep nest of words that invoke words that invoke....) I think we'll need something else for where we keep the interpreter's state, since it needs to be accessible -- and thus cannot go onto the call/return stack maintained by the CPU/spiller.


Note that the Mill's call/return ops impose something -- an a priori known number of results, that Forth doesn't itself require. Relatively few Forth words produce variable numbers of results, because those make data-stack management even trickier than otherwise, but Forth permits one to do tricky/risky things. I think we can use either Nones (maybe) or argc/argv (more likely) for variable numbers of results. Or we forbid variable number of formal return values, at least in the first version. (Translation: if a word wants to do varargs, it has to pass argc and argv as explicit arguments that are included in the formal arg or return counts denoted in the dictionary.)

Side question: does the genAsm language itself generate argc/argv work-arounds for variable number of return values? It seems to me that genAsm must do something like argc/Argv for variable numbers of input arguments -- or for functions that have more input args than will fit on the target member's belt.


Using the Mill's call/return machinery means that to execute a PRIMITIVE word (one not defined in terms of existing words, but as a true Mill function call), the interpreter will have to:

  • record the interpreter state (somewhere, TBD.)
  • Find the dictionary entry for the function that implements the primitive.
  • Extract from that dictionary entry the word's execution address, number of input arguments and number of results.
  • Get the input args onto its own belt
  • Call the function (probably using an indirect call)
  • Once the function returns, copy the results from the interpreter's belt onto the data stack, including handling possible variable number of results.

If we require all words to have a fixed number of input and output parameters (e.g. by requiring the few words that need to use varargs to manage those themselves, and include argc and argv among their formal parameter counts), then the invocation of primitives will be fast and straightforward.

Before calling a primitive word, the interpreter will have to record all those aspects of the interpreter's state somewhere other than the inaccessible Mill call/return state, but I don't see much help for that. I'd like to call this the interpreter-state stack, since it's keeping track of things other than Mill function addresses. Probably yet another stack-ordered chunk of memory, sigh....

Yes, this approach means that the interpreter will spend a good fraction of its time moving data between the data stack and its own belt, but it also means that the interpreter won't have to handle the return stack for true Mill function calls; the Mill hardware will do it for us -- quickly and reliably! I think this is a better initial design than trying to use conform or rescue ops to force FIFO behavior onto the LIFO-natured belt. Since Forth programs are typically deeply factored, the lower-level words get called more -- and the lowest-level ones, namely the primitives (directly coded as helper functions) should run nice and fast.

Side note: There is -- or can be -- yet a third stack, if we have use for it. Although the Mill keeps return addresses out of its program-accessible stack, it does have a frame-oriented stack that we are free to use for data. Since each called function gets its own stack frame and allocation (if it uses the stack at all), functions (including any that directly implement Forth words) could use such stack frames for local (automatic) storage -- even if the inner interpreter is using its stack frame (elsewhere on the same stack in memory) to hold the entire dictionary. This would be a departure from the classic (and very stripped-down) Forth virtual machine, which has no real concept of automatic variables other than the data stack itself. However, the concept of word-local variables exists in some Forth versions, often using the LOCALS| word to introduce them. (Note that making this work transparently in both primitive and non-primitive words is something I haven't looked at yet. So TBD.)

Since the Mill is such a big departure from conventional architectures, departing from the classic Forth virtual machine model may be a fruitful area to consider for "Fifth" -- a notional follow-on interpreted language, designed specifically for the Mill.

/{LarryP opinion}