Files
rose-ash/plans/lib-guest.md
giles d441807c8e GUEST-plan: claim step 3 — lex.sx
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 23:00:44 +00:00

9.9 KiB
Raw Blame History

lib/guest — shared toolkit for SX-hosted languages

Extract the duplicated plumbing across lib/{haskell,common-lisp,erlang,prolog,js,lua,smalltalk,tcl,forth,ruby,apl,hyperscript} into a small, composable kit so language N+1 costs ~200 lines instead of ~2000, without regressing any existing conformance scoreboard.

Branch: architecture. SX files via sx-tree MCP only. Never edit generated files.

Thesis

The substrate (CEK, hygienic macros, records, delimited continuations, IO suspension, reactivity) was chosen with multi-paradigm hosting in mind, but each guest currently re-rolls its own tokeniser, recursive-descent loop, conformance harness, and primitive-rename layer. Extracting these shared layers does not reduce conformance bug-finding pressure — it only removes plumbing — so it is pure win.

Canaries: Lua (small, conventional expression-grammar — exercises lex/Pratt/AST) and Prolog (paradigm-different — exercises pattern-match/unification). The two-canary rule prevents Lua-shaped abstractions.

Two-language rule: no extraction is merged until two guests consume it.

Current baseline

The loop fills these in on its first iteration by running every */conformance.sh and */test.sh and copying each scoreboard.json to lib/guest/baseline/<lang>.json. Until then:

Guest Suite Baseline
lua bash lib/lua/test.sh 185 / 185
prolog bash lib/prolog/conformance.sh 590 / 590
haskell bash lib/haskell/conformance.sh 156 / 156 (was reported 0/18 by the buggy old script)
common-lisp bash lib/common-lisp/conformance.sh 518 / 518 (Phase 2 +182 and Phase 6 +27 were previously under-counted)
erlang bash lib/erlang/conformance.sh 0 / 0 (suite all-zero)
js bash lib/js/conformance.sh 94 / 148 (test262-slice)
smalltalk bash lib/smalltalk/conformance.sh 625 / 629
tcl bash lib/tcl/conformance.sh 3 / 4 (programs)
forth bash lib/forth/test.sh 64 / 64
ruby bash lib/ruby/test.sh 76 / 76
apl bash lib/apl/test.sh 73 / 73

The baseline only needs to be re-snapshotted when the substrate (spec/**, hosts/**) changes underneath this loop.


Phase 0 — Baseline snapshot (one-shot)

Step 0: Snapshot every guest's scoreboard

Create lib/guest/baseline/. Run every guest's conformance/test runner. Copy each scoreboard.json (or extract pass/fail counts from test.sh output for guests without a scoreboard) into lib/guest/baseline/<lang>.json. Fill in the table above.

Verify: ls lib/guest/baseline/*.json shows one per guest. Plan table populated.


Phase 1 — Cheap, zero-semantic-risk extractions

Step 1: lib/guest/conformance.sx — config-driven test runner

Replace the 6+ near-identical */conformance.sh scripts with one driver that takes a config dict:

{:lang "prolog"
 :loads ("lib/prolog/tokenizer.sx" "lib/prolog/parser.sx" ...)
 :suites (("parse" "lib/prolog/tests/parse.sx" "pl-parse-tests-run!") ...)}

The driver locates sx_server.exe, runs the epoch protocol, collects pass/fail per suite, and writes scoreboard.{json,md}. The per-language conformance.sh becomes a 3-line stub that points at its config.

Port to: lib/prolog/conformance.sh and lib/haskell/conformance.sh. Two consumers required for merge.

Verify: both bash lib/prolog/conformance.sh and bash lib/haskell/conformance.sh produce scoreboard JSONs equal to baseline.

Step 2: lib/guest/prefix.sx — prefix-rename macro

One macro that takes a prefix and a list of SX symbols and binds prefixed aliases:

(prefix-rename "cl-" '(null? pair? even? odd? zero? ...))

Replaces hundreds of hand-written (define (cl-null? x) (= x nil))-style wrappers in common-lisp/runtime.sx, lua/runtime.sx, erlang/runtime.sx.

Port to: common-lisp/runtime.sx (largest user) and lua/runtime.sx. Two consumers.

Verify: common-lisp + lua scoreboards equal baseline.


Phase 2 — Lex / parse kit

Step 3: lib/guest/lex.sx — character-class + tokeniser primitives

  • Source-position tracking (line/col/offset).
  • Character-class predicates (whitespace?, digit?, alpha?, ident-start?, ident-rest?).
  • Number recognisers (decimal, hex, float, scientific).
  • String recognisers (quoted, escapes, raw).
  • Comment recognisers (line, block, nestable).
  • Token record {:type :value :pos :end :line}.

Port to: lua/tokenizer.sx and tcl/tokenizer.sx. Two consumers.

Verify: lua + tcl scoreboards equal baseline.

Step 4: lib/guest/pratt.sx — Pratt / operator-precedence parser

Prefix / infix / postfix tables, left/right associativity, precedence climbing. Grammar is a dict, not hardcoded cond.

Port to: Lua expression parser (lua/parser.sx) and Prolog operator table (prolog/parser.sx — Prolog ops are the stress test). Two consumers.

Verify: lua + prolog scoreboards equal baseline.

Step 5: lib/guest/ast.sx — canonical AST node shapes

Standard constructors and predicates for: literal, var, app, lambda, let, letrec, if, match-clause, module, import. Optional — guests may keep their own AST — but using the canonical shape lets cross-language tooling (formatters, highlighters, debuggers) work without per-language adapters.

Port to: lua + prolog AST emitters. Two consumers.

Verify: lua + prolog scoreboards equal baseline.


Phase 3 — Semantic extractions (highest leverage, highest risk)

Step 6: lib/guest/match.sx — pattern-match + unification engine

Single engine for:

  • Literal patterns (numbers, strings, symbols, nil, booleans).
  • Wildcard _.
  • Constructor patterns (ADT-shaped — depends on Phase 3 of sx-improvements.md if available, otherwise dict-tagged).
  • Variable binding.
  • Unification (Prolog flavour): symmetric, occurs-check toggle, substitution returned.
  • Match (Haskell flavour): asymmetric pattern→value, bindings returned.

Port to: haskell/match.sx and prolog/query.sx unification core. Two consumers.

Verify: haskell + prolog scoreboards equal baseline. Highest-risk extraction — if either regresses by 1 test, revert and redesign.

Step 7: lib/guest/layout.sx — significant-whitespace / off-side rule

Generalised layout-sensitive lexer. Configurable: which keywords open layout blocks, whether semicolons are inserted, brace insertion rules.

Port to: haskell/layout.sx (existing). Second consumer: write a synthetic test fixture that exercises a Python-ish layout to prove the kit is not Haskell-shaped. Two consumers.

Verify: haskell scoreboard equal baseline; synthetic layout fixture passes.

Step 8: lib/guest/hm.sx — Hindley-Milner type inference

Extract from haskell/infer.sx. Algorithm W or J, generalisation, instantiation, occurs-check, principal types.

Sequencing: this step is paired with plans/ocaml-on-sx.md Phase 5. The natural order is lib-guest Steps 07 → OCaml-on-SX Phases 15 → lib-guest Step 8. With OCaml-on-SX Phase 5 done, the two-language rule is satisfied for real (Haskell + OCaml). Without it, accept "second user TBD" — the alternative is letting the inference stay locked inside Haskell forever.

Port to: haskell/infer.sx and (preferred) lib/ocaml/types.sx.

Verify: haskell scoreboard equal baseline; if OCaml-on-SX Phase 5 has shipped, OCaml type-inference tests equal baseline too.


Progress log

Step Status Commit Delta
0 — baseline snapshot [done] 2f7f8189 11 guests captured: lua 185/185, forth 64/64, ruby 76/76, apl 73/73, prolog 590/590, common-lisp 309/309, smalltalk 625/629, tcl 3/4, haskell 0/18 programs, js 94/148 (slice), erlang 0/0
1 — conformance.sx (prolog + haskell) [done] 58dcff26 Prolog 590/590 (matches baseline). Haskell 156/156 — old script was broken (0/18 was an artefact of a never-matching grep), driver reveals true counts; baseline updated.
2 — prefix.sx (common-lisp + lua) [partial — pending lua] 2ef773a3 common-lisp/runtime.sx ported (47 aliases collapsed into 13 prefix-rename calls); 518/518 vs 309/309 baseline (improvement, no regression). lua/runtime.sx has no pure same-name aliases — every lua- definition wraps custom logic; second consumer pending.
3 — lex.sx (lua + tcl) [in-progress]
4 — pratt.sx (lua + prolog) [ ]
5 — ast.sx (lua + prolog) [ ]
6 — match.sx (haskell + prolog) [ ]
7 — layout.sx (haskell + synthetic) [ ]
8 — hm.sx (haskell + TBD) [ ]

Rules

  • Branch: architecture. Commit locally. Never push. Never touch main.
  • Scope: ONLY lib/guest/**, lib/{lua,prolog,haskell,common-lisp,tcl}/** (canaries + extraction targets), plans/lib-guest.md, plans/agent-briefings/lib-guest-loop.md. No spec/, hosts/, web/, shared/.
  • SX files: sx-tree MCP tools only. sx_validate after every edit.
  • No raw dune. Use sx_build target="ocaml" MCP tool.
  • Two-language rule: never merge an extraction until two guests consume it (Step 8 excepted with explicit note).
  • Conformance baseline is the bar. Any port whose scoreboard regresses by ≥1 test → revert, mark blocked, move on.
  • Substrate change → re-snapshot. If spec/ or hosts/ changes underneath this loop, re-run Step 0 before continuing.
  • One step per code commit. Plan updates as a separate commit. Short message with delta.
  • No alias chains to paper over drift between extraction and consumer (feedback_no_alias_bloat).
  • Partial extraction is OK if the canary works and a pending consumer is identified — mark [partial — pending <consumer>].
  • Hard timeout: if stuck >45 min on a step, mark blocked (<reason>) and move on.