9.6 KiB
lib/guest — shared toolkit for SX-hosted languages
Extract the duplicated plumbing across lib/{haskell,common-lisp,erlang,prolog,js,lua,smalltalk,tcl,forth,ruby,apl,hyperscript} into a small, composable kit so language N+1 costs ~200 lines instead of ~2000, without regressing any existing conformance scoreboard.
Branch: architecture. SX files via sx-tree MCP only. Never edit generated files.
Thesis
The substrate (CEK, hygienic macros, records, delimited continuations, IO suspension, reactivity) was chosen with multi-paradigm hosting in mind, but each guest currently re-rolls its own tokeniser, recursive-descent loop, conformance harness, and primitive-rename layer. Extracting these shared layers does not reduce conformance bug-finding pressure — it only removes plumbing — so it is pure win.
Canaries: Lua (small, conventional expression-grammar — exercises lex/Pratt/AST) and Prolog (paradigm-different — exercises pattern-match/unification). The two-canary rule prevents Lua-shaped abstractions.
Two-language rule: no extraction is merged until two guests consume it.
Current baseline
The loop fills these in on its first iteration by running every */conformance.sh and */test.sh and copying each scoreboard.json to lib/guest/baseline/<lang>.json. Until then:
| Guest | Suite | Baseline |
|---|---|---|
| lua | bash lib/lua/test.sh |
185 / 185 |
| prolog | bash lib/prolog/conformance.sh |
590 / 590 |
| haskell | bash lib/haskell/conformance.sh |
156 / 156 (was reported 0/18 by the buggy old script) |
| common-lisp | bash lib/common-lisp/conformance.sh |
309 / 309 |
| erlang | bash lib/erlang/conformance.sh |
0 / 0 (suite all-zero) |
| js | bash lib/js/conformance.sh |
94 / 148 (test262-slice) |
| smalltalk | bash lib/smalltalk/conformance.sh |
625 / 629 |
| tcl | bash lib/tcl/conformance.sh |
3 / 4 (programs) |
| forth | bash lib/forth/test.sh |
64 / 64 |
| ruby | bash lib/ruby/test.sh |
76 / 76 |
| apl | bash lib/apl/test.sh |
73 / 73 |
The baseline only needs to be re-snapshotted when the substrate (spec/**, hosts/**) changes underneath this loop.
Phase 0 — Baseline snapshot (one-shot)
Step 0: Snapshot every guest's scoreboard
Create lib/guest/baseline/. Run every guest's conformance/test runner. Copy each scoreboard.json (or extract pass/fail counts from test.sh output for guests without a scoreboard) into lib/guest/baseline/<lang>.json. Fill in the table above.
Verify: ls lib/guest/baseline/*.json shows one per guest. Plan table populated.
Phase 1 — Cheap, zero-semantic-risk extractions
Step 1: lib/guest/conformance.sx — config-driven test runner
Replace the 6+ near-identical */conformance.sh scripts with one driver that takes a config dict:
{:lang "prolog"
:loads ("lib/prolog/tokenizer.sx" "lib/prolog/parser.sx" ...)
:suites (("parse" "lib/prolog/tests/parse.sx" "pl-parse-tests-run!") ...)}
The driver locates sx_server.exe, runs the epoch protocol, collects pass/fail per suite, and writes scoreboard.{json,md}. The per-language conformance.sh becomes a 3-line stub that points at its config.
Port to: lib/prolog/conformance.sh and lib/haskell/conformance.sh. Two consumers required for merge.
Verify: both bash lib/prolog/conformance.sh and bash lib/haskell/conformance.sh produce scoreboard JSONs equal to baseline.
Step 2: lib/guest/prefix.sx — prefix-rename macro
One macro that takes a prefix and a list of SX symbols and binds prefixed aliases:
(prefix-rename "cl-" '(null? pair? even? odd? zero? ...))
Replaces hundreds of hand-written (define (cl-null? x) (= x nil))-style wrappers in common-lisp/runtime.sx, lua/runtime.sx, erlang/runtime.sx.
Port to: common-lisp/runtime.sx (largest user) and lua/runtime.sx. Two consumers.
Verify: common-lisp + lua scoreboards equal baseline.
Phase 2 — Lex / parse kit
Step 3: lib/guest/lex.sx — character-class + tokeniser primitives
- Source-position tracking (line/col/offset).
- Character-class predicates (
whitespace?,digit?,alpha?,ident-start?,ident-rest?). - Number recognisers (decimal, hex, float, scientific).
- String recognisers (quoted, escapes, raw).
- Comment recognisers (line, block, nestable).
- Token record
{:type :value :pos :end :line}.
Port to: lua/tokenizer.sx and tcl/tokenizer.sx. Two consumers.
Verify: lua + tcl scoreboards equal baseline.
Step 4: lib/guest/pratt.sx — Pratt / operator-precedence parser
Prefix / infix / postfix tables, left/right associativity, precedence climbing. Grammar is a dict, not hardcoded cond.
Port to: Lua expression parser (lua/parser.sx) and Prolog operator table (prolog/parser.sx — Prolog ops are the stress test). Two consumers.
Verify: lua + prolog scoreboards equal baseline.
Step 5: lib/guest/ast.sx — canonical AST node shapes
Standard constructors and predicates for: literal, var, app, lambda, let, letrec, if, match-clause, module, import. Optional — guests may keep their own AST — but using the canonical shape lets cross-language tooling (formatters, highlighters, debuggers) work without per-language adapters.
Port to: lua + prolog AST emitters. Two consumers.
Verify: lua + prolog scoreboards equal baseline.
Phase 3 — Semantic extractions (highest leverage, highest risk)
Step 6: lib/guest/match.sx — pattern-match + unification engine
Single engine for:
- Literal patterns (numbers, strings, symbols, nil, booleans).
- Wildcard
_. - Constructor patterns (ADT-shaped — depends on Phase 3 of
sx-improvements.mdif available, otherwise dict-tagged). - Variable binding.
- Unification (Prolog flavour): symmetric, occurs-check toggle, substitution returned.
- Match (Haskell flavour): asymmetric pattern→value, bindings returned.
Port to: haskell/match.sx and prolog/query.sx unification core. Two consumers.
Verify: haskell + prolog scoreboards equal baseline. Highest-risk extraction — if either regresses by 1 test, revert and redesign.
Step 7: lib/guest/layout.sx — significant-whitespace / off-side rule
Generalised layout-sensitive lexer. Configurable: which keywords open layout blocks, whether semicolons are inserted, brace insertion rules.
Port to: haskell/layout.sx (existing). Second consumer: write a synthetic test fixture that exercises a Python-ish layout to prove the kit is not Haskell-shaped. Two consumers.
Verify: haskell scoreboard equal baseline; synthetic layout fixture passes.
Step 8: lib/guest/hm.sx — Hindley-Milner type inference
Extract from haskell/infer.sx. Algorithm W or J, generalisation, instantiation, occurs-check, principal types.
Sequencing: this step is paired with plans/ocaml-on-sx.md Phase 5. The natural order is lib-guest Steps 0–7 → OCaml-on-SX Phases 1–5 → lib-guest Step 8. With OCaml-on-SX Phase 5 done, the two-language rule is satisfied for real (Haskell + OCaml). Without it, accept "second user TBD" — the alternative is letting the inference stay locked inside Haskell forever.
Port to: haskell/infer.sx and (preferred) lib/ocaml/types.sx.
Verify: haskell scoreboard equal baseline; if OCaml-on-SX Phase 5 has shipped, OCaml type-inference tests equal baseline too.
Progress log
| Step | Status | Commit | Delta |
|---|---|---|---|
| 0 — baseline snapshot | [done] | 2f7f8189 |
11 guests captured: lua 185/185, forth 64/64, ruby 76/76, apl 73/73, prolog 590/590, common-lisp 309/309, smalltalk 625/629, tcl 3/4, haskell 0/18 programs, js 94/148 (slice), erlang 0/0 |
| 1 — conformance.sx (prolog + haskell) | [done] | 58dcff26 |
Prolog 590/590 (matches baseline). Haskell 156/156 — old script was broken (0/18 was an artefact of a never-matching grep), driver reveals true counts; baseline updated. |
| 2 — prefix.sx (common-lisp + lua) | [in-progress] | — | — |
| 3 — lex.sx (lua + tcl) | [ ] | — | — |
| 4 — pratt.sx (lua + prolog) | [ ] | — | — |
| 5 — ast.sx (lua + prolog) | [ ] | — | — |
| 6 — match.sx (haskell + prolog) | [ ] | — | — |
| 7 — layout.sx (haskell + synthetic) | [ ] | — | — |
| 8 — hm.sx (haskell + TBD) | [ ] | — | — |
Rules
- Branch:
architecture. Commit locally. Never push. Never touchmain. - Scope: ONLY
lib/guest/**,lib/{lua,prolog,haskell,common-lisp,tcl}/**(canaries + extraction targets),plans/lib-guest.md,plans/agent-briefings/lib-guest-loop.md. Nospec/,hosts/,web/,shared/. - SX files:
sx-treeMCP tools only.sx_validateafter every edit. - No raw dune. Use
sx_build target="ocaml"MCP tool. - Two-language rule: never merge an extraction until two guests consume it (Step 8 excepted with explicit note).
- Conformance baseline is the bar. Any port whose scoreboard regresses by ≥1 test → revert, mark blocked, move on.
- Substrate change → re-snapshot. If
spec/orhosts/changes underneath this loop, re-run Step 0 before continuing. - One step per code commit. Plan updates as a separate commit. Short message with delta.
- No alias chains to paper over drift between extraction and consumer (
feedback_no_alias_bloat). - Partial extraction is OK if the canary works and a pending consumer is identified — mark
[partial — pending <consumer>]. - Hard timeout: if stuck >45 min on a step, mark
blocked (<reason>)and move on.