13 KiB
SX Language Improvements — roadmap
Language-building improvements to the SX evaluator, compiler, and standard library. Ordered by impact and prerequisite chain. Each step is one loop commit.
Branch: architecture. SX files via sx-tree MCP only. Never edit generated files.
Current baseline (2026-05-06)
- SX core spec: 2571 passing (595 non-HS pre-existing failures — bytecode-serialize, defcomp-render, etc.)
- HyperScript behavioral: 1478/1496 (run via
node tests/hs-kernel-eval.js) - Active bugs: JIT combinator bug (11 HS failures), letrec+resume (browser-only)
- E38 sourceInfo: 2/4 tests passing (tokenizer missing
:end/:line, some spans incomplete)
Phase 1 — Bug fixes
Step 1: Fix JIT closures-returning-closures
What: parse-bind, many, seq, and other parser combinators that return closures
miscompile under JIT. The compiled closure drops intermediate stack values when the
callee itself returns a closure. 11 HyperScript tests fail under JIT, pass under CEK.
Root cause in hosts/ocaml/lib/sx_vm.ml: When a JIT-compiled closure returns
another closure (i.e. the callee is VmClosure), the frame restoration after the
call incorrectly reuses the parent frame's locals slot, overwriting saved intermediate
values. The call_closure_reuse path must snapshot sp before the inner call and
restore it after, or bail to the non-reuse path for closures-returning-closures.
Verify: node tests/hs-kernel-eval.js 2>&1 | tail -3 — should go from 3116/3127 to 3127/3127.
Step 2: Fix letrec + perform resume (browser)
What: In browser JIT mode, letrec sibling bindings are nil after a perform/resume
cycle. call_closure_reuse in sx_browser.ml intentionally ignores _saved_sp, which
strips the frame locals that sf_letrec was waiting on.
Fix: In sx_browser.ml, the VmSuspension resume path must restore frame locals
from the suspension snapshot before calling the continuation. Mirror what sx_vm.ml
does in the non-browser case.
Verify: Write a test in spec/tests/ that does (letrec ((f (fn () (perform :io nil)))) (f)) with a resume, check bindings survive. Runs under OCaml: dune exec -- bin/run_tests.exe.
Phase 2 — Source info (E38 completion)
Design: plans/designs/e38-sourceinfo.md. Target: 4/4 sourceInfo tests.
The API (hs-parse-ast, hs-source-for, hs-line-for, hs-node-get, hs-src,
hs-src-at, hs-line-at) and parser span wrapping (hs-ast-wrap, hs-span-mode)
are already in the codebase. Two tests are passing; two fail because:
- Tokenizer tokens lack
:endand:line(only:postoday). - Some statement-level spans and
:nextfield navigation are incomplete.
Step 3: Tokenizer — add :end and :line to tokens
lib/hyperscript/tokenizer.sx: extend hs-make-token to {:pos :end :value :type :line}.
Track a current-line counter (1-based, increments after \n). Update all ~20 emission
sites. Mirror to shared/static/wasm/sx/hs-tokenizer.sx after edits.
Verify: (hs-make-token "NUMBER" "1" 0) returns a dict with :end and :line keys.
Step 4: Complete parser spans + :next field
lib/hyperscript/parser.sx: ensure hs-ast-wrap populates :next on every command
in a CommandList (i.e. the following sibling command). Check that statement-level
productions (if, for) correctly populate :true-branch. Trace through the two failing
tests (get source works for expressions, get line works for statements) to find the
exact missing fields or off-by-one positions.
Mirror to shared/static/wasm/sx/hs-parser.sx.
Verify: All 4 hs-upstream-core/sourceInfo tests pass.
Outcome: Subsumed by Step 3. Once tokens carried :end and :line, the existing
parser plumbing (link-next-cmds for :next, :true-branch extraction in parse-cmd)
worked end-to-end. All 4 hs-upstream-core/sourceInfo tests pass with no parser changes.
Phase 3 — Native ADTs (define-type / match)
Design: plans/designs/sx-adt.md. No existing implementation.
Impact: every language implementation (Haskell, Prolog, Lua, Common Lisp, Erlang)
currently fakes sum types with {:tag "..." :field ...} dicts. Native ADTs remove
that everywhere.
Step 5: OCaml — AdtValue type + define-type + basic match
hosts/ocaml/lib/sx_types.ml:
type adt_value = { av_type: string; av_ctor: string; av_fields: value array }
| AdtValue of adt_value
hosts/ocaml/lib/sx_runtime.ml (or evaluator):
step-sf-define-type: parse(Name (Ctor1 f1 f2) (Ctor2) ...), register constructor NativeFns, predicates (Ctor1?,Name?), field accessors (Ctor1-f1) viaenv-bind!.step-sf-match+MatchFrame: linear scan of clauses; flat patterns only for 6a; bind pattern variables in child env;elseclause; raise on no match.type-ofreturns the type name (e.g."Maybe").
Write tests in spec/tests/test-adt.sx: basic constructor, predicate, accessor, match,
else, no-match raise.
Verify: dune exec -- bin/run_tests.exe — new test file all green.
Step 6: JS — AdtValue + define-type + match
hosts/javascript/platform.py: add AdtValue as { _adt: true, _type, _ctor, _fields }.
Mirror define-type and match special forms in the JS evaluator.
Retranspile: python3 hosts/javascript/cli.py --output shared/static/scripts/sx-browser.js
Verify: node hosts/javascript/run_tests.js — adt tests pass on JS too.
Step 7: Nested patterns (Phase 6b)
Both OCaml and JS MatchFrame: replace linear binding with recursive
matchPattern(pattern, value, env) that:
- Recurses into constructor sub-patterns.
- Returns
{matched: bool, bindings: map}. - Handles wildcard
_, literals (42,"str",true,nil).
Extend spec/tests/test-adt.sx with nested pattern tests.
Outcome: No host-side changes needed. The spec-level match-pattern function
in spec/evaluator.sx (≈line 2835) already recurses through constructor
sub-patterns via the dict-shape shim ((get value :_adt|:_ctor|:_fields)),
handles _ wildcards, literals, and variable bindings. Step 7 added 8 new
deftests to spec/tests/test-adt.sx covering: nested constructor sanity,
nested constructor with field binding, nested wildcard, nested literal
equality, nested literal-vs-var clause fall-through, deeply nested constructors,
mixed bind+wildcard, and nested ctor fail-through. Both hosts: +8 tests pass,
zero regressions (OCaml 4532→4540, JS 2578→2586).
Step 8: Exhaustiveness warnings (Phase 6c)
_adt_registry: type_name → [ctor_names] global populated by define-type.
On first non-exhaustive match evaluation: console.warn("[sx] match: non-exhaustive …").
No error — warning only.
Outcome: host-warn primitive added on both hosts (OCaml prerr_endline,
JS console.warn). Spec-level helpers match-clause-is-else?,
match-clause-ctor-name, match-warn-non-exhaustive,
match-check-exhaustiveness added in spec/evaluator.sx and
called from step-sf-match. *adt-warned* env-bound dict used to
dedupe warnings per (type, missing-set). The OCaml step_sf_match
in hosts/ocaml/lib/sx_ref.ml was hand-patched (not retranspiled)
because sx_ref.ml retranspilation drops several preamble fixes;
the spec changes still flow to JS via sx_build target="js". Both
hosts emit identical warnings (e.g. [sx] match: non-exhaustive — Maybe: missing Nothing).
5 new tests added. OCaml: 4540 → 4545. JS: 2586 → 2591. Zero regressions.
Phase 4 — Plugin / extension system
Design: plans/designs/hs-plugin-system.md.
Step 9: Parser feature registry
lib/hyperscript/parser.sx: replace parse-feat hardcoded cond with a dict lookup.
(hs-register-feature! name parse-fn) adds to the registry.
Step 10: Compiler command registry + as converter registry
lib/hyperscript/compiler.sx: replace hs-to-sx hardcoded dispatch with dict.
(hs-register-command! name compile-fn) and (hs-register-converter! name convert-fn).
Step 11: Migrate hs-prolog-hook + Worker plugin
lib/hyperscript/runtime.sx: remove hs-prolog-hook/hs-set-prolog-hook! ad-hoc
slots. Create lib/hyperscript/plugins/prolog.sx that calls hs-register-feature!
and hs-register-command!. Create lib/hyperscript/plugins/worker.sx replacing the
E39 stub.
Phase 5 — Performance
These are incremental and can interleave with other phases.
Step 12: Frame records (CEK)
hosts/ocaml/lib/sx_runtime.ml: represent CEK frames as OCaml records instead of
tagged variant lists. Eliminates allocation pressure from list construction per frame.
Profile before/after on a tight-loop benchmark.
Outcome: Frames were already records (cek_frame in sx_types.ml) — the actual
hot-path bottleneck was prim_call "=" [...] in step_continue/step_eval dispatch:
each step did a Hashtbl lookup + 2x list cons + pattern match per comparison. Added a
fast path in prim_call (sx_runtime.ml) for =, <, >, <=, >=, empty?,
first, rest, len that skips the table lookup entirely. Also inlined _fast_eq
for the common scalar-equality cases that dominate frame-type dispatch. Median
improvements (bench_cek.exe, 7 runs):
| Benchmark | Before | After | Change |
|---|---|---|---|
| fib(18) | 2789ms | 941ms | -66% |
| loop(5000) | 2018ms | 620ms | -69% |
| map sq(1000) | 108ms | 48ms | -56% |
| reduce + (2000) | 72ms | 10ms | -86% |
| let-heavy(2000) | 491ms | 271ms | -45% |
Tests: 4545 passing (unchanged baseline), 1339 failing (unchanged baseline).
Benchmark binary: bin/bench_cek.exe.
Step 13: Buffer primitive for string building
Add make-buffer, buffer-append!, buffer->string primitives. Eliminates the
(str a b c d ...) quadratic allocation pattern in serializers and renderers.
Wire into sx_primitives.ml and the JS platform.
Outcome: Short aliases make-buffer/buffer?/buffer-append!/buffer->string/
buffer-length added on both hosts, sharing the existing StringBuffer value type.
buffer-append! accepts any value (auto-coerces non-strings via inspect), unlike
string-buffer-append! which is strict. The hot path converted was the OCaml
host-internal inspect function in sx_types.ml: rewrote from (... ^ String.concat " " (List.map inspect items) ^ ...) (which allocates O(n) intermediate strings per
recursion level) to a single shared Buffer.t accumulator (inspect_into buf v
walks the value tree appending into one buffer). inspect is called by
sx-serialize on both spec and host paths, plus error-path formatting.
Median improvements (bin/bench_inspect.exe, best of 3 runs of 9-run min):
| Benchmark | Baseline (best min) | Buffer (best min) | Change |
|---|---|---|---|
| tree-d8 (75KB) | 5.31ms | 1.30ms | -76% |
| tree-d10 (679KB) | 81.89ms | 16.02ms | -80% |
| dict-1000 | 0.80ms | 0.31ms | -61% |
| list-2000 | 0.74ms | 0.33ms | -55% |
5 new tests in spec/tests/test-string-buffer.sx covering the new aliases (incl
non-string coercion and interop with the existing string-buffer-* API).
OCaml: 4545 → 4550. JS: 2591 → 2596. Zero regressions.
Step 14: Inline common primitives in JIT
hosts/ocaml/lib/sx_vm.ml: add OP_ADD, OP_SUB, OP_EQ, OP_APPEND specialised
opcodes that skip the primitive table lookup for the most common calls. Compiler emits
these when operands are known numbers/lists.
Progress log
| Step | Status | Commit |
|---|---|---|
| 1 — JIT combinator bug | [x] | 882a4b76 |
| 2 — letrec+resume | [x] | e80e655b |
| 3 — tokenizer :end/:line | [x] | 023bc2d8 |
| 4 — parser spans complete | [x] | b7ad5152 (subsumed by 023bc2d8) |
| 5 — OCaml AdtValue + define-type + match | [x] | 1f49242a |
| 6 — JS AdtValue + define-type + match | [x] | fc8a3916 |
| 7 — nested patterns | [x] | 0679edf5 |
| 8 — exhaustiveness warnings | [x] | 6d391119 |
| 9 — parser feature registry | [x] | 986d6411 |
| 10 — compiler + as converter registry | [x] | d22361e4 |
| 11 — plugin migration + worker | [x] | 6328b810 |
| 12 — frame records | [x] | a66c0f66 (fib -66%, loop -69%, reduce -86% via prim_call fast path) |
| 13 — buffer primitive | [x] | 0e022ab6 (inspect rewrite: tree-d10 -80%, tree-d8 -76%, dict-1000 -61%, list-2000 -55%) |
| 14 — inline primitives JIT | [ ] | — |
Rules
- Branch:
architecture. Never push tomain. - SX files:
sx-treeMCP tools only.sx_validateafter every edit. - After every
.sxedit tolib/hyperscript/, mirror toshared/static/wasm/sx/hs-<file>.sx. - OCaml build:
sx_build target="ocaml"MCP tool (never rawdune). - JS build:
sx_build target="js"MCP tool. - One step per commit. Update progress log in this file.
- No new planning docs. No comments in SX unless non-obvious.
- Unicode in SX: raw UTF-8 only, never
\uXXXX.