Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Has been cancelled
168 lines
8.9 KiB
Markdown
168 lines
8.9 KiB
Markdown
# Forth-on-SX: stack language on the VM
|
|
|
|
The smallest serious second language — Forth's stack-based semantics map directly onto the SX bytecode VM (OP_DUP, OP_SWAP, OP_DROP already exist as arithmetic primitives or can be added trivially). Compile-mode / interpret-mode is the one genuinely novel piece, but it's a classic technique and small.
|
|
|
|
End-state goal: **passes John Hayes' ANS-Forth test suite** (the canonical Forth conformance harness — small, well-documented, targets the Core word set).
|
|
|
|
## Scope decisions (defaults — override)
|
|
|
|
- **Standard:** ANS-Forth 1994 Core word set + Core Extension. No ANS-Forth Optional word sets (File Access, Floating Point, Search Order, etc.) in the first run.
|
|
- **Test suite:** John Hayes' "Test Suite for ANS Forth" (~250 tests, public domain, widely used).
|
|
- **Case-sensitivity:** case-insensitive (ANS default).
|
|
- **Number base:** support `BASE` variable, defaults to 10. Hex and binary literals (`$FF`, `%1010`) per standard.
|
|
|
|
## Ground rules
|
|
|
|
- **Scope:** only touch `lib/forth/**` and `plans/forth-on-sx.md`. No edits to `spec/`, `hosts/`, `shared/`, or other language dirs.
|
|
- **SX files:** use `sx-tree` MCP tools only.
|
|
- **Architecture:** reader (not tokenizer — Forth is whitespace-delimited) → interpreter → dictionary-backed compiler. The compiler emits SX AST (not bytecode directly) so we inherit the VM.
|
|
- **Commits:** one feature per commit. Keep `## Progress log` updated.
|
|
|
|
## Architecture sketch
|
|
|
|
```
|
|
Forth source text
|
|
│
|
|
▼
|
|
lib/forth/reader.sx — whitespace-split words (that's it — no real tokenizer)
|
|
│
|
|
▼
|
|
lib/forth/interpreter.sx — interpret mode: look up word in dict, execute
|
|
│
|
|
▼
|
|
lib/forth/compiler.sx — compile mode (`:` opens, `;` closes): emit SX AST
|
|
│
|
|
▼
|
|
lib/forth/runtime.sx — stack ops, dictionary, BASE, I/O
|
|
│
|
|
▼
|
|
existing CEK / VM — runs compiled definitions natively
|
|
```
|
|
|
|
Representation:
|
|
- **Stack** = SX list, push = cons, pop = uncons
|
|
- **Dictionary** = dict `word-name → {:kind :immediate? :body}` where kind is `:primitive` or `:colon-def`
|
|
- **A colon definition** compiles to a thunk `(lambda () <body-as-sx-sequence>)`
|
|
- **Compile-mode** is a flag on the interpreter state; `:` sets it, `;` clears and installs the new word
|
|
- **IMMEDIATE** words run at compile time
|
|
|
|
## Roadmap
|
|
|
|
### Phase 1 — reader + interpret mode
|
|
- [x] `lib/forth/reader.sx`: whitespace-split, number parsing (base-aware)
|
|
- [x] `lib/forth/runtime.sx`: stack as SX list, push/pop/peek helpers
|
|
- [x] Core stack words: `DUP`, `DROP`, `SWAP`, `OVER`, `ROT`, `-ROT`, `NIP`, `TUCK`, `PICK`, `ROLL`, `?DUP`, `DEPTH`, `2DUP`, `2DROP`, `2SWAP`, `2OVER`
|
|
- [x] Arithmetic: `+`, `-`, `*`, `/`, `MOD`, `/MOD`, `NEGATE`, `ABS`, `MIN`, `MAX`, `1+`, `1-`, `2+`, `2-`, `2*`, `2/`
|
|
- [x] Comparison: `=`, `<>`, `<`, `>`, `<=`, `>=`, `0=`, `0<>`, `0<`, `0>`
|
|
- [x] Logical: `AND`, `OR`, `XOR`, `INVERT` (32-bit two's-complement sim)
|
|
- [x] I/O: `.` (print), `.S` (show stack), `EMIT`, `CR`, `SPACE`, `SPACES`, `BL`
|
|
- [x] Interpreter loop: read word, look up, execute, repeat
|
|
- [x] Unit tests in `lib/forth/tests/test-phase1.sx` — 108/108 pass
|
|
|
|
### Phase 2 — colon definitions + compile mode
|
|
- [x] `:` opens compile mode and starts a definition
|
|
- [x] `;` closes it and installs into the dictionary
|
|
- [x] Compile mode: non-IMMEDIATE words are compiled as late-binding call thunks; numbers are compiled as pushers; IMMEDIATE words run immediately
|
|
- [x] `VARIABLE`, `CONSTANT`, `VALUE`, `TO`, `RECURSE`, `IMMEDIATE`
|
|
- [x] `@` (fetch), `!` (store), `+!`
|
|
- [x] Colon-def body is `(fn (s) (for-each op body))` — runs on CEK, inherits TCO
|
|
- [x] Tests in `lib/forth/tests/test-phase2.sx` — 26/26 pass
|
|
|
|
### Phase 3 — control flow + first Hayes tests green
|
|
- [x] `IF`, `ELSE`, `THEN` — compile to SX `if`
|
|
- [x] `BEGIN`, `UNTIL`, `WHILE`, `REPEAT`, `AGAIN` — compile to loops
|
|
- [x] `DO`, `LOOP`, `+LOOP`, `I`, `J`, `LEAVE` — counted loops (needs a return stack)
|
|
- [x] Return stack: `>R`, `R>`, `R@`, `2>R`, `2R>`, `2R@`
|
|
- [ ] Vendor John Hayes' test suite to `lib/forth/ans-tests/`
|
|
- [ ] `lib/forth/conformance.sh` + runner; `scoreboard.json` + `scoreboard.md`
|
|
- [ ] Baseline: probably 30-50% Core passing after phase 3
|
|
|
|
### Phase 4 — strings + more Core
|
|
- [ ] `S"`, `C"`, `."`, `TYPE`, `COUNT`, `CMOVE`, `FILL`, `BLANK`
|
|
- [ ] `CHAR`, `[CHAR]`, `KEY`, `ACCEPT`
|
|
- [ ] `BASE` manipulation: `DECIMAL`, `HEX`
|
|
- [ ] `DEPTH`, `SP@`, `SP!`
|
|
- [ ] Drive Hayes Core pass-rate up
|
|
|
|
### Phase 5 — Core Extension + optional word sets
|
|
- [ ] Full Core + Core Extension
|
|
- [ ] File Access word set (via SX IO)
|
|
- [ ] String word set (`SLITERAL`, `COMPARE`, `SEARCH`)
|
|
- [ ] Target: 100% Hayes Core
|
|
|
|
### Phase 6 — speed
|
|
- [ ] Inline primitive calls during compile (skip dict lookup)
|
|
- [ ] Tail-call optimise colon-def endings
|
|
- [ ] JIT cooperation: mark compiled colon-defs as VM-eligible
|
|
|
|
## Progress log
|
|
|
|
_Newest first._
|
|
|
|
- **Phase 3 — `DO`/`LOOP`/`+LOOP`/`I`/`J`/`LEAVE` + return stack words (+16).**
|
|
Counted loops compile onto the same PC-driven body runner. DO emits an
|
|
enter-op (pops limit+start from data stack, pushes them to rstack) and
|
|
pushes a `{:kind "do" :back PC :leaves ()}` marker onto cstack. LOOP/+LOOP
|
|
emit a dict op (`:kind "loop"`/`"+loop"` with target=back-cell). The step
|
|
handler pops index & reads limit, increments, and either restores the
|
|
updated index + jumps back, or drops the frame and advances. LEAVE walks
|
|
cstack for the innermost DO marker, emits a `:kind "leave"` dict op with
|
|
a fresh target cell, and registers it on the marker's leaves list. LOOP
|
|
patches all registered leave-targets to the exit PC and drops the marker.
|
|
The leave op pops two from rstack (unloop) and branches. `I` peeks rtop;
|
|
`J` reads rstack index 2 (below inner frame). Added non-immediate
|
|
return-stack words `>R`, `R>`, `R@`, `2>R`, `2R>`, `2R@`. Nested
|
|
DO/LOOP with J tested; LEAVE in nested loops exits only the inner.
|
|
177/177 green.
|
|
|
|
- **Phase 3 — `BEGIN`/`UNTIL`/`WHILE`/`REPEAT`/`AGAIN` (+9).** Indefinite-loop
|
|
constructs built on the same PC-driven body runner introduced for `IF`.
|
|
BEGIN records the current body length on `state.cstack` (a plain numeric
|
|
back-target). UNTIL/AGAIN pop that back-target and emit a `bif`/`branch`
|
|
op whose target cell is set to the recorded PC. WHILE emits a forward
|
|
`bif` with a fresh target cell and pushes it on the cstack *above* the
|
|
BEGIN marker; REPEAT pops both (while-target first, then back-pc), emits
|
|
an unconditional branch back to BEGIN, then patches the while-target to
|
|
the current body length — so WHILE's false flag jumps past the REPEAT.
|
|
Mixed compile-time layout (numeric back-targets + dict forward targets
|
|
on the same cstack) is OK because the immediate words pop them in the
|
|
order they expect. AGAIN works structurally but lacks a test without a
|
|
usable mid-loop exit; revisit once `EXIT` lands. 161/161 green.
|
|
|
|
- **Phase 3 start — `IF`/`ELSE`/`THEN` (+18).** `lib/forth/compiler.sx`
|
|
+ `tests/test-phase3.sx`. Colon-def body switched from `for-each` to
|
|
a PC-driven runner so branch ops can jump: ops now include dict tags
|
|
`{"kind" "bif"|"branch" "target" cell}` alongside the existing
|
|
`(fn (s) ...)` shape. IF compiles a `bif` with a fresh target cell
|
|
pushed to `state.cstack`; ELSE emits an unconditional `branch`,
|
|
patches the IF's target to the instruction after this branch, and
|
|
pushes the new target; THEN patches the most recent target to the
|
|
current body length. Nested IF/ELSE/THEN works via the cstack.
|
|
Also fixed `EMIT`: `code-char` → `char-from-code` (spec-correct
|
|
primitive name) so Phase 1/2 tests run green on sx_server.
|
|
152/152 (Phase 1 + 2 + 3) green.
|
|
|
|
- **Phase 2 complete — colon defs, compile mode, VARIABLE/CONSTANT/VALUE/TO, @/!/+! (+26).**
|
|
`lib/forth/compiler.sx` plus `tests/test-phase2.sx`.
|
|
Colon-def body is a list of ops (one per source token) wrapped in a single
|
|
lambda. References are late-binding thunks so forward/recursive references
|
|
work via `RECURSE`. Redefinitions take effect immediately.
|
|
VARIABLE creates a pusher for a symbolic address stored in `state.vars`.
|
|
CONSTANT compiles to `(fn (s) (forth-push s v))`. VALUE/TO share the vars dict.
|
|
Compiler rewrites `forth-interpret` to drive from a token list stored on
|
|
state so parsing words (`:`, `VARIABLE`, `TO` etc.) can consume the next
|
|
token with `forth-next-token!`. 134/134 (Phase 1 + 2) green.
|
|
|
|
- **Phase 1 complete — reader + interpret mode + core words (+108).**
|
|
`lib/forth/{runtime,reader,interpreter}.sx` plus `tests/test-phase1.sx`.
|
|
Stack as SX list (TOS = first). Dict is `{lowercased-name -> {:kind :body :immediate?}}`.
|
|
Data + return stacks both mutable. Output buffered in state (no host IO yet).
|
|
BASE-aware number parsing with `$`, `%`, `#` prefixes and `'c'` char literals.
|
|
Bitwise AND/OR/XOR/INVERT simulated over 32-bit two's-complement.
|
|
Integer `/` is truncated-toward-zero (ANS symmetric), MOD matches.
|
|
Case-insensitive lookup. 108/108 tests green.
|
|
|
|
## Blockers
|
|
|
|
- _(none yet)_
|