Files
rose-ash/plans/forth-on-sx.md
giles 99753580b4 Recover agent-loop progress: lua/prolog/forth/erlang/haskell phases 1-2
Salvaged from worktree-agent-* branches killed during sx-tree MCP outage:
- lua: tokenizer + parser + phase-2 transpile (~157 tests)
- prolog: tokenizer + parser + unification (72 tests, plan update lost to WIP)
- forth: phase-1 reader/interpreter + phase-2 colon/VARIABLE (134 tests)
- erlang: tokenizer + parser (114 tests)
- haskell: tokenizer + parse tests (43 tests)

Cherry-picked file contents only, not branch history, to avoid pulling in
unrelated ocaml-vm merge commits that were in those branches' bases.
2026-04-24 16:03:00 +00:00

6.1 KiB

Forth-on-SX: stack language on the VM

The smallest serious second language — Forth's stack-based semantics map directly onto the SX bytecode VM (OP_DUP, OP_SWAP, OP_DROP already exist as arithmetic primitives or can be added trivially). Compile-mode / interpret-mode is the one genuinely novel piece, but it's a classic technique and small.

End-state goal: passes John Hayes' ANS-Forth test suite (the canonical Forth conformance harness — small, well-documented, targets the Core word set).

Scope decisions (defaults — override)

  • Standard: ANS-Forth 1994 Core word set + Core Extension. No ANS-Forth Optional word sets (File Access, Floating Point, Search Order, etc.) in the first run.
  • Test suite: John Hayes' "Test Suite for ANS Forth" (~250 tests, public domain, widely used).
  • Case-sensitivity: case-insensitive (ANS default).
  • Number base: support BASE variable, defaults to 10. Hex and binary literals ($FF, %1010) per standard.

Ground rules

  • Scope: only touch lib/forth/** and plans/forth-on-sx.md. No edits to spec/, hosts/, shared/, or other language dirs.
  • SX files: use sx-tree MCP tools only.
  • Architecture: reader (not tokenizer — Forth is whitespace-delimited) → interpreter → dictionary-backed compiler. The compiler emits SX AST (not bytecode directly) so we inherit the VM.
  • Commits: one feature per commit. Keep ## Progress log updated.

Architecture sketch

Forth source text
    │
    ▼
lib/forth/reader.sx      — whitespace-split words (that's it — no real tokenizer)
    │
    ▼
lib/forth/interpreter.sx — interpret mode: look up word in dict, execute
    │
    ▼
lib/forth/compiler.sx    — compile mode (`:` opens, `;` closes): emit SX AST
    │
    ▼
lib/forth/runtime.sx     — stack ops, dictionary, BASE, I/O
    │
    ▼
existing CEK / VM        — runs compiled definitions natively

Representation:

  • Stack = SX list, push = cons, pop = uncons
  • Dictionary = dict word-name → {:kind :immediate? :body} where kind is :primitive or :colon-def
  • A colon definition compiles to a thunk (lambda () <body-as-sx-sequence>)
  • Compile-mode is a flag on the interpreter state; : sets it, ; clears and installs the new word
  • IMMEDIATE words run at compile time

Roadmap

Phase 1 — reader + interpret mode

  • lib/forth/reader.sx: whitespace-split, number parsing (base-aware)
  • lib/forth/runtime.sx: stack as SX list, push/pop/peek helpers
  • Core stack words: DUP, DROP, SWAP, OVER, ROT, -ROT, NIP, TUCK, PICK, ROLL, ?DUP, DEPTH, 2DUP, 2DROP, 2SWAP, 2OVER
  • Arithmetic: +, -, *, /, MOD, /MOD, NEGATE, ABS, MIN, MAX, 1+, 1-, 2+, 2-, 2*, 2/
  • Comparison: =, <>, <, >, <=, >=, 0=, 0<>, 0<, 0>
  • Logical: AND, OR, XOR, INVERT (32-bit two's-complement sim)
  • I/O: . (print), .S (show stack), EMIT, CR, SPACE, SPACES, BL
  • Interpreter loop: read word, look up, execute, repeat
  • Unit tests in lib/forth/tests/test-phase1.sx — 108/108 pass

Phase 2 — colon definitions + compile mode

  • : opens compile mode and starts a definition
  • ; closes it and installs into the dictionary
  • Compile mode: non-IMMEDIATE words are compiled as late-binding call thunks; numbers are compiled as pushers; IMMEDIATE words run immediately
  • VARIABLE, CONSTANT, VALUE, TO, RECURSE, IMMEDIATE
  • @ (fetch), ! (store), +!
  • Colon-def body is (fn (s) (for-each op body)) — runs on CEK, inherits TCO
  • Tests in lib/forth/tests/test-phase2.sx — 26/26 pass

Phase 3 — control flow + first Hayes tests green

  • IF, ELSE, THEN — compile to SX if
  • BEGIN, UNTIL, WHILE, REPEAT, AGAIN — compile to loops
  • DO, LOOP, +LOOP, I, J, LEAVE — counted loops (needs a return stack)
  • Return stack: >R, R>, R@, 2>R, 2R>, 2R@
  • Vendor John Hayes' test suite to lib/forth/ans-tests/
  • lib/forth/conformance.sh + runner; scoreboard.json + scoreboard.md
  • Baseline: probably 30-50% Core passing after phase 3

Phase 4 — strings + more Core

  • S", C", .", TYPE, COUNT, CMOVE, FILL, BLANK
  • CHAR, [CHAR], KEY, ACCEPT
  • BASE manipulation: DECIMAL, HEX
  • DEPTH, SP@, SP!
  • Drive Hayes Core pass-rate up

Phase 5 — Core Extension + optional word sets

  • Full Core + Core Extension
  • File Access word set (via SX IO)
  • String word set (SLITERAL, COMPARE, SEARCH)
  • Target: 100% Hayes Core

Phase 6 — speed

  • Inline primitive calls during compile (skip dict lookup)
  • Tail-call optimise colon-def endings
  • JIT cooperation: mark compiled colon-defs as VM-eligible

Progress log

Newest first.

  • Phase 2 complete — colon defs, compile mode, VARIABLE/CONSTANT/VALUE/TO, @/!/+! (+26). lib/forth/compiler.sx plus tests/test-phase2.sx. Colon-def body is a list of ops (one per source token) wrapped in a single lambda. References are late-binding thunks so forward/recursive references work via RECURSE. Redefinitions take effect immediately. VARIABLE creates a pusher for a symbolic address stored in state.vars. CONSTANT compiles to (fn (s) (forth-push s v)). VALUE/TO share the vars dict. Compiler rewrites forth-interpret to drive from a token list stored on state so parsing words (:, VARIABLE, TO etc.) can consume the next token with forth-next-token!. 134/134 (Phase 1 + 2) green.

  • Phase 1 complete — reader + interpret mode + core words (+108). lib/forth/{runtime,reader,interpreter}.sx plus tests/test-phase1.sx. Stack as SX list (TOS = first). Dict is {lowercased-name -> {:kind :body :immediate?}}. Data + return stacks both mutable. Output buffered in state (no host IO yet). BASE-aware number parsing with $, %, # prefixes and 'c' char literals. Bitwise AND/OR/XOR/INVERT simulated over 32-bit two's-complement. Integer / is truncated-toward-zero (ANS symmetric), MOD matches. Case-insensitive lookup. 108/108 tests green.

Blockers

  • (none yet)