8.9 KiB
Forth-on-SX: stack language on the VM
The smallest serious second language — Forth's stack-based semantics map directly onto the SX bytecode VM (OP_DUP, OP_SWAP, OP_DROP already exist as arithmetic primitives or can be added trivially). Compile-mode / interpret-mode is the one genuinely novel piece, but it's a classic technique and small.
End-state goal: passes John Hayes' ANS-Forth test suite (the canonical Forth conformance harness — small, well-documented, targets the Core word set).
Scope decisions (defaults — override)
- Standard: ANS-Forth 1994 Core word set + Core Extension. No ANS-Forth Optional word sets (File Access, Floating Point, Search Order, etc.) in the first run.
- Test suite: John Hayes' "Test Suite for ANS Forth" (~250 tests, public domain, widely used).
- Case-sensitivity: case-insensitive (ANS default).
- Number base: support
BASEvariable, defaults to 10. Hex and binary literals ($FF,%1010) per standard.
Ground rules
- Scope: only touch
lib/forth/**andplans/forth-on-sx.md. No edits tospec/,hosts/,shared/, or other language dirs. - SX files: use
sx-treeMCP tools only. - Architecture: reader (not tokenizer — Forth is whitespace-delimited) → interpreter → dictionary-backed compiler. The compiler emits SX AST (not bytecode directly) so we inherit the VM.
- Commits: one feature per commit. Keep
## Progress logupdated.
Architecture sketch
Forth source text
│
▼
lib/forth/reader.sx — whitespace-split words (that's it — no real tokenizer)
│
▼
lib/forth/interpreter.sx — interpret mode: look up word in dict, execute
│
▼
lib/forth/compiler.sx — compile mode (`:` opens, `;` closes): emit SX AST
│
▼
lib/forth/runtime.sx — stack ops, dictionary, BASE, I/O
│
▼
existing CEK / VM — runs compiled definitions natively
Representation:
- Stack = SX list, push = cons, pop = uncons
- Dictionary = dict
word-name → {:kind :immediate? :body}where kind is:primitiveor:colon-def - A colon definition compiles to a thunk
(lambda () <body-as-sx-sequence>) - Compile-mode is a flag on the interpreter state;
:sets it,;clears and installs the new word - IMMEDIATE words run at compile time
Roadmap
Phase 1 — reader + interpret mode
lib/forth/reader.sx: whitespace-split, number parsing (base-aware)lib/forth/runtime.sx: stack as SX list, push/pop/peek helpers- Core stack words:
DUP,DROP,SWAP,OVER,ROT,-ROT,NIP,TUCK,PICK,ROLL,?DUP,DEPTH,2DUP,2DROP,2SWAP,2OVER - Arithmetic:
+,-,*,/,MOD,/MOD,NEGATE,ABS,MIN,MAX,1+,1-,2+,2-,2*,2/ - Comparison:
=,<>,<,>,<=,>=,0=,0<>,0<,0> - Logical:
AND,OR,XOR,INVERT(32-bit two's-complement sim) - I/O:
.(print),.S(show stack),EMIT,CR,SPACE,SPACES,BL - Interpreter loop: read word, look up, execute, repeat
- Unit tests in
lib/forth/tests/test-phase1.sx— 108/108 pass
Phase 2 — colon definitions + compile mode
:opens compile mode and starts a definition;closes it and installs into the dictionary- Compile mode: non-IMMEDIATE words are compiled as late-binding call thunks; numbers are compiled as pushers; IMMEDIATE words run immediately
VARIABLE,CONSTANT,VALUE,TO,RECURSE,IMMEDIATE@(fetch),!(store),+!- Colon-def body is
(fn (s) (for-each op body))— runs on CEK, inherits TCO - Tests in
lib/forth/tests/test-phase2.sx— 26/26 pass
Phase 3 — control flow + first Hayes tests green
IF,ELSE,THEN— compile to SXifBEGIN,UNTIL,WHILE,REPEAT,AGAIN— compile to loopsDO,LOOP,+LOOP,I,J,LEAVE— counted loops (needs a return stack)- Return stack:
>R,R>,R@,2>R,2R>,2R@ - Vendor John Hayes' test suite to
lib/forth/ans-tests/ lib/forth/conformance.sh+ runner;scoreboard.json+scoreboard.md- Baseline: probably 30-50% Core passing after phase 3
Phase 4 — strings + more Core
S",C",.",TYPE,COUNT,CMOVE,FILL,BLANKCHAR,[CHAR],KEY,ACCEPTBASEmanipulation:DECIMAL,HEXDEPTH,SP@,SP!- Drive Hayes Core pass-rate up
Phase 5 — Core Extension + optional word sets
- Full Core + Core Extension
- File Access word set (via SX IO)
- String word set (
SLITERAL,COMPARE,SEARCH) - Target: 100% Hayes Core
Phase 6 — speed
- Inline primitive calls during compile (skip dict lookup)
- Tail-call optimise colon-def endings
- JIT cooperation: mark compiled colon-defs as VM-eligible
Progress log
Newest first.
-
Phase 3 —
DO/LOOP/+LOOP/I/J/LEAVE+ return stack words (+16). Counted loops compile onto the same PC-driven body runner. DO emits an enter-op (pops limit+start from data stack, pushes them to rstack) and pushes a{:kind "do" :back PC :leaves ()}marker onto cstack. LOOP/+LOOP emit a dict op (:kind "loop"/"+loop"with target=back-cell). The step handler pops index & reads limit, increments, and either restores the updated index + jumps back, or drops the frame and advances. LEAVE walks cstack for the innermost DO marker, emits a:kind "leave"dict op with a fresh target cell, and registers it on the marker's leaves list. LOOP patches all registered leave-targets to the exit PC and drops the marker. The leave op pops two from rstack (unloop) and branches.Ipeeks rtop;Jreads rstack index 2 (below inner frame). Added non-immediate return-stack words>R,R>,R@,2>R,2R>,2R@. Nested DO/LOOP with J tested; LEAVE in nested loops exits only the inner. 177/177 green. -
Phase 3 —
BEGIN/UNTIL/WHILE/REPEAT/AGAIN(+9). Indefinite-loop constructs built on the same PC-driven body runner introduced forIF. BEGIN records the current body length onstate.cstack(a plain numeric back-target). UNTIL/AGAIN pop that back-target and emit abif/branchop whose target cell is set to the recorded PC. WHILE emits a forwardbifwith a fresh target cell and pushes it on the cstack above the BEGIN marker; REPEAT pops both (while-target first, then back-pc), emits an unconditional branch back to BEGIN, then patches the while-target to the current body length — so WHILE's false flag jumps past the REPEAT. Mixed compile-time layout (numeric back-targets + dict forward targets on the same cstack) is OK because the immediate words pop them in the order they expect. AGAIN works structurally but lacks a test without a usable mid-loop exit; revisit onceEXITlands. 161/161 green. -
Phase 3 start —
IF/ELSE/THEN(+18).lib/forth/compiler.sxtests/test-phase3.sx. Colon-def body switched fromfor-eachto a PC-driven runner so branch ops can jump: ops now include dict tags{"kind" "bif"|"branch" "target" cell}alongside the existing(fn (s) ...)shape. IF compiles abifwith a fresh target cell pushed tostate.cstack; ELSE emits an unconditionalbranch, patches the IF's target to the instruction after this branch, and pushes the new target; THEN patches the most recent target to the current body length. Nested IF/ELSE/THEN works via the cstack. Also fixedEMIT:code-char→char-from-code(spec-correct primitive name) so Phase 1/2 tests run green on sx_server. 152/152 (Phase 1 + 2 + 3) green.
-
Phase 2 complete — colon defs, compile mode, VARIABLE/CONSTANT/VALUE/TO, @/!/+! (+26).
lib/forth/compiler.sxplustests/test-phase2.sx. Colon-def body is a list of ops (one per source token) wrapped in a single lambda. References are late-binding thunks so forward/recursive references work viaRECURSE. Redefinitions take effect immediately. VARIABLE creates a pusher for a symbolic address stored instate.vars. CONSTANT compiles to(fn (s) (forth-push s v)). VALUE/TO share the vars dict. Compiler rewritesforth-interpretto drive from a token list stored on state so parsing words (:,VARIABLE,TOetc.) can consume the next token withforth-next-token!. 134/134 (Phase 1 + 2) green. -
Phase 1 complete — reader + interpret mode + core words (+108).
lib/forth/{runtime,reader,interpreter}.sxplustests/test-phase1.sx. Stack as SX list (TOS = first). Dict is{lowercased-name -> {:kind :body :immediate?}}. Data + return stacks both mutable. Output buffered in state (no host IO yet). BASE-aware number parsing with$,%,#prefixes and'c'char literals. Bitwise AND/OR/XOR/INVERT simulated over 32-bit two's-complement. Integer/is truncated-toward-zero (ANS symmetric), MOD matches. Case-insensitive lookup. 108/108 tests green.
Blockers
- (none yet)