Salvaged from worktree-agent-* branches killed during sx-tree MCP outage: - lua: tokenizer + parser + phase-2 transpile (~157 tests) - prolog: tokenizer + parser + unification (72 tests, plan update lost to WIP) - forth: phase-1 reader/interpreter + phase-2 colon/VARIABLE (134 tests) - erlang: tokenizer + parser (114 tests) - haskell: tokenizer + parse tests (43 tests) Cherry-picked file contents only, not branch history, to avoid pulling in unrelated ocaml-vm merge commits that were in those branches' bases.
6.1 KiB
Forth-on-SX: stack language on the VM
The smallest serious second language — Forth's stack-based semantics map directly onto the SX bytecode VM (OP_DUP, OP_SWAP, OP_DROP already exist as arithmetic primitives or can be added trivially). Compile-mode / interpret-mode is the one genuinely novel piece, but it's a classic technique and small.
End-state goal: passes John Hayes' ANS-Forth test suite (the canonical Forth conformance harness — small, well-documented, targets the Core word set).
Scope decisions (defaults — override)
- Standard: ANS-Forth 1994 Core word set + Core Extension. No ANS-Forth Optional word sets (File Access, Floating Point, Search Order, etc.) in the first run.
- Test suite: John Hayes' "Test Suite for ANS Forth" (~250 tests, public domain, widely used).
- Case-sensitivity: case-insensitive (ANS default).
- Number base: support
BASEvariable, defaults to 10. Hex and binary literals ($FF,%1010) per standard.
Ground rules
- Scope: only touch
lib/forth/**andplans/forth-on-sx.md. No edits tospec/,hosts/,shared/, or other language dirs. - SX files: use
sx-treeMCP tools only. - Architecture: reader (not tokenizer — Forth is whitespace-delimited) → interpreter → dictionary-backed compiler. The compiler emits SX AST (not bytecode directly) so we inherit the VM.
- Commits: one feature per commit. Keep
## Progress logupdated.
Architecture sketch
Forth source text
│
▼
lib/forth/reader.sx — whitespace-split words (that's it — no real tokenizer)
│
▼
lib/forth/interpreter.sx — interpret mode: look up word in dict, execute
│
▼
lib/forth/compiler.sx — compile mode (`:` opens, `;` closes): emit SX AST
│
▼
lib/forth/runtime.sx — stack ops, dictionary, BASE, I/O
│
▼
existing CEK / VM — runs compiled definitions natively
Representation:
- Stack = SX list, push = cons, pop = uncons
- Dictionary = dict
word-name → {:kind :immediate? :body}where kind is:primitiveor:colon-def - A colon definition compiles to a thunk
(lambda () <body-as-sx-sequence>) - Compile-mode is a flag on the interpreter state;
:sets it,;clears and installs the new word - IMMEDIATE words run at compile time
Roadmap
Phase 1 — reader + interpret mode
lib/forth/reader.sx: whitespace-split, number parsing (base-aware)lib/forth/runtime.sx: stack as SX list, push/pop/peek helpers- Core stack words:
DUP,DROP,SWAP,OVER,ROT,-ROT,NIP,TUCK,PICK,ROLL,?DUP,DEPTH,2DUP,2DROP,2SWAP,2OVER - Arithmetic:
+,-,*,/,MOD,/MOD,NEGATE,ABS,MIN,MAX,1+,1-,2+,2-,2*,2/ - Comparison:
=,<>,<,>,<=,>=,0=,0<>,0<,0> - Logical:
AND,OR,XOR,INVERT(32-bit two's-complement sim) - I/O:
.(print),.S(show stack),EMIT,CR,SPACE,SPACES,BL - Interpreter loop: read word, look up, execute, repeat
- Unit tests in
lib/forth/tests/test-phase1.sx— 108/108 pass
Phase 2 — colon definitions + compile mode
:opens compile mode and starts a definition;closes it and installs into the dictionary- Compile mode: non-IMMEDIATE words are compiled as late-binding call thunks; numbers are compiled as pushers; IMMEDIATE words run immediately
VARIABLE,CONSTANT,VALUE,TO,RECURSE,IMMEDIATE@(fetch),!(store),+!- Colon-def body is
(fn (s) (for-each op body))— runs on CEK, inherits TCO - Tests in
lib/forth/tests/test-phase2.sx— 26/26 pass
Phase 3 — control flow + first Hayes tests green
IF,ELSE,THEN— compile to SXifBEGIN,UNTIL,WHILE,REPEAT,AGAIN— compile to loopsDO,LOOP,+LOOP,I,J,LEAVE— counted loops (needs a return stack)- Return stack:
>R,R>,R@,2>R,2R>,2R@ - Vendor John Hayes' test suite to
lib/forth/ans-tests/ lib/forth/conformance.sh+ runner;scoreboard.json+scoreboard.md- Baseline: probably 30-50% Core passing after phase 3
Phase 4 — strings + more Core
S",C",.",TYPE,COUNT,CMOVE,FILL,BLANKCHAR,[CHAR],KEY,ACCEPTBASEmanipulation:DECIMAL,HEXDEPTH,SP@,SP!- Drive Hayes Core pass-rate up
Phase 5 — Core Extension + optional word sets
- Full Core + Core Extension
- File Access word set (via SX IO)
- String word set (
SLITERAL,COMPARE,SEARCH) - Target: 100% Hayes Core
Phase 6 — speed
- Inline primitive calls during compile (skip dict lookup)
- Tail-call optimise colon-def endings
- JIT cooperation: mark compiled colon-defs as VM-eligible
Progress log
Newest first.
-
Phase 2 complete — colon defs, compile mode, VARIABLE/CONSTANT/VALUE/TO, @/!/+! (+26).
lib/forth/compiler.sxplustests/test-phase2.sx. Colon-def body is a list of ops (one per source token) wrapped in a single lambda. References are late-binding thunks so forward/recursive references work viaRECURSE. Redefinitions take effect immediately. VARIABLE creates a pusher for a symbolic address stored instate.vars. CONSTANT compiles to(fn (s) (forth-push s v)). VALUE/TO share the vars dict. Compiler rewritesforth-interpretto drive from a token list stored on state so parsing words (:,VARIABLE,TOetc.) can consume the next token withforth-next-token!. 134/134 (Phase 1 + 2) green. -
Phase 1 complete — reader + interpret mode + core words (+108).
lib/forth/{runtime,reader,interpreter}.sxplustests/test-phase1.sx. Stack as SX list (TOS = first). Dict is{lowercased-name -> {:kind :body :immediate?}}. Data + return stacks both mutable. Output buffered in state (no host IO yet). BASE-aware number parsing with$,%,#prefixes and'c'char literals. Bitwise AND/OR/XOR/INVERT simulated over 32-bit two's-complement. Integer/is truncated-toward-zero (ANS symmetric), MOD matches. Case-insensitive lookup. 108/108 tests green.
Blockers
- (none yet)