318 lines
15 KiB
Markdown
318 lines
15 KiB
Markdown
# SX Language Improvements — roadmap
|
|
|
|
Language-building improvements to the SX evaluator, compiler, and standard library.
|
|
Ordered by impact and prerequisite chain. Each step is one loop commit.
|
|
|
|
## Roadmap complete (2026-05-07)
|
|
|
|
All 14 steps shipped in 14 commits on the `architecture` branch. Phase 1 (bug fixes:
|
|
JIT closures, letrec+resume), Phase 2 (E38 source info — subsumed by tokenizer fix),
|
|
Phase 3 (native ADTs: AdtValue, define-type, match, exhaustiveness on both hosts),
|
|
Phase 4 (parser/compiler plugin registry + worker), Phase 5 (perf: frame-records via
|
|
prim_call fast path, buffer-based serializer, JIT inline opcodes). Cumulative
|
|
performance wins on hot benchmarks: CEK fib -66% / loop -69% / reduce -86% (Step 12);
|
|
inspect tree-d10 -80% / dict-1000 -61% (Step 13); VM JIT fib -69% / loop -62% / sum
|
|
-50% / count-lt -38% / count-eq -58% (Step 14). Test suite: 4550/4550 OCaml.
|
|
|
|
Branch: `architecture`. SX files via `sx-tree` MCP only. Never edit generated files.
|
|
|
|
## Current baseline (2026-05-06)
|
|
|
|
- SX core spec: 2571 passing (595 non-HS pre-existing failures — bytecode-serialize, defcomp-render, etc.)
|
|
- HyperScript behavioral: 1478/1496 (run via `node tests/hs-kernel-eval.js`)
|
|
- Active bugs: JIT combinator bug (11 HS failures), letrec+resume (browser-only)
|
|
- E38 sourceInfo: 2/4 tests passing (tokenizer missing `:end`/`:line`, some spans incomplete)
|
|
|
|
---
|
|
|
|
## Phase 1 — Bug fixes
|
|
|
|
### Step 1: Fix JIT closures-returning-closures
|
|
|
|
**What:** `parse-bind`, `many`, `seq`, and other parser combinators that return closures
|
|
miscompile under JIT. The compiled closure drops intermediate stack values when the
|
|
callee itself returns a closure. 11 HyperScript tests fail under JIT, pass under CEK.
|
|
|
|
**Root cause in `hosts/ocaml/lib/sx_vm.ml`:** When a JIT-compiled closure returns
|
|
another closure (i.e. the callee is `VmClosure`), the frame restoration after the
|
|
call incorrectly reuses the parent frame's locals slot, overwriting saved intermediate
|
|
values. The `call_closure_reuse` path must snapshot `sp` before the inner call and
|
|
restore it after, or bail to the non-reuse path for closures-returning-closures.
|
|
|
|
**Verify:** `node tests/hs-kernel-eval.js 2>&1 | tail -3` — should go from 3116/3127 to 3127/3127.
|
|
|
|
### Step 2: Fix letrec + perform resume (browser)
|
|
|
|
**What:** In browser JIT mode, `letrec` sibling bindings are nil after a `perform`/resume
|
|
cycle. `call_closure_reuse` in `sx_browser.ml` intentionally ignores `_saved_sp`, which
|
|
strips the frame locals that `sf_letrec` was waiting on.
|
|
|
|
**Fix:** In `sx_browser.ml`, the `VmSuspension` resume path must restore frame locals
|
|
from the suspension snapshot before calling the continuation. Mirror what `sx_vm.ml`
|
|
does in the non-browser case.
|
|
|
|
**Verify:** Write a test in `spec/tests/` that does `(letrec ((f (fn () (perform :io nil)))) (f))` with a resume, check bindings survive. Runs under OCaml: `dune exec -- bin/run_tests.exe`.
|
|
|
|
---
|
|
|
|
## Phase 2 — Source info (E38 completion)
|
|
|
|
Design: `plans/designs/e38-sourceinfo.md`. Target: 4/4 sourceInfo tests.
|
|
|
|
The API (`hs-parse-ast`, `hs-source-for`, `hs-line-for`, `hs-node-get`, `hs-src`,
|
|
`hs-src-at`, `hs-line-at`) and parser span wrapping (`hs-ast-wrap`, `hs-span-mode`)
|
|
are already in the codebase. Two tests are passing; two fail because:
|
|
- Tokenizer tokens lack `:end` and `:line` (only `:pos` today).
|
|
- Some statement-level spans and `:next` field navigation are incomplete.
|
|
|
|
### Step 3: Tokenizer — add `:end` and `:line` to tokens
|
|
|
|
`lib/hyperscript/tokenizer.sx`: extend `hs-make-token` to `{:pos :end :value :type :line}`.
|
|
Track a `current-line` counter (1-based, increments after `\n`). Update all ~20 emission
|
|
sites. Mirror to `shared/static/wasm/sx/hs-tokenizer.sx` after edits.
|
|
|
|
**Verify:** `(hs-make-token "NUMBER" "1" 0)` returns a dict with `:end` and `:line` keys.
|
|
|
|
### Step 4: Complete parser spans + :next field
|
|
|
|
`lib/hyperscript/parser.sx`: ensure `hs-ast-wrap` populates `:next` on every command
|
|
in a `CommandList` (i.e. the following sibling command). Check that statement-level
|
|
productions (if, for) correctly populate `:true-branch`. Trace through the two failing
|
|
tests (`get source works for expressions`, `get line works for statements`) to find the
|
|
exact missing fields or off-by-one positions.
|
|
|
|
Mirror to `shared/static/wasm/sx/hs-parser.sx`.
|
|
|
|
**Verify:** All 4 `hs-upstream-core/sourceInfo` tests pass.
|
|
|
|
**Outcome:** Subsumed by Step 3. Once tokens carried `:end` and `:line`, the existing
|
|
parser plumbing (`link-next-cmds` for `:next`, `:true-branch` extraction in `parse-cmd`)
|
|
worked end-to-end. All 4 `hs-upstream-core/sourceInfo` tests pass with no parser changes.
|
|
|
|
---
|
|
|
|
## Phase 3 — Native ADTs (`define-type` / `match`)
|
|
|
|
Design: `plans/designs/sx-adt.md`. No existing implementation.
|
|
|
|
Impact: every language implementation (Haskell, Prolog, Lua, Common Lisp, Erlang)
|
|
currently fakes sum types with `{:tag "..." :field ...}` dicts. Native ADTs remove
|
|
that everywhere.
|
|
|
|
### Step 5: OCaml — AdtValue type + `define-type` + basic `match`
|
|
|
|
`hosts/ocaml/lib/sx_types.ml`:
|
|
```ocaml
|
|
type adt_value = { av_type: string; av_ctor: string; av_fields: value array }
|
|
| AdtValue of adt_value
|
|
```
|
|
|
|
`hosts/ocaml/lib/sx_runtime.ml` (or evaluator):
|
|
- `step-sf-define-type`: parse `(Name (Ctor1 f1 f2) (Ctor2) ...)`, register constructor
|
|
NativeFns, predicates (`Ctor1?`, `Name?`), field accessors (`Ctor1-f1`) via `env-bind!`.
|
|
- `step-sf-match` + `MatchFrame`: linear scan of clauses; flat patterns only for 6a;
|
|
bind pattern variables in child env; `else` clause; raise on no match.
|
|
- `type-of` returns the type name (e.g. `"Maybe"`).
|
|
|
|
Write tests in `spec/tests/test-adt.sx`: basic constructor, predicate, accessor, match,
|
|
else, no-match raise.
|
|
|
|
**Verify:** `dune exec -- bin/run_tests.exe` — new test file all green.
|
|
|
|
### Step 6: JS — AdtValue + `define-type` + `match`
|
|
|
|
`hosts/javascript/platform.py`: add `AdtValue` as `{ _adt: true, _type, _ctor, _fields }`.
|
|
Mirror `define-type` and `match` special forms in the JS evaluator.
|
|
Retranspile: `python3 hosts/javascript/cli.py --output shared/static/scripts/sx-browser.js`
|
|
|
|
**Verify:** `node hosts/javascript/run_tests.js` — adt tests pass on JS too.
|
|
|
|
### Step 7: Nested patterns (Phase 6b)
|
|
|
|
Both OCaml and JS `MatchFrame`: replace linear binding with recursive
|
|
`matchPattern(pattern, value, env)` that:
|
|
- Recurses into constructor sub-patterns.
|
|
- Returns `{matched: bool, bindings: map}`.
|
|
- Handles wildcard `_`, literals (`42`, `"str"`, `true`, `nil`).
|
|
|
|
Extend `spec/tests/test-adt.sx` with nested pattern tests.
|
|
|
|
**Outcome:** No host-side changes needed. The spec-level `match-pattern` function
|
|
in `spec/evaluator.sx` (≈line 2835) already recurses through constructor
|
|
sub-patterns via the dict-shape shim (`(get value :_adt|:_ctor|:_fields)`),
|
|
handles `_` wildcards, literals, and variable bindings. Step 7 added 8 new
|
|
deftests to `spec/tests/test-adt.sx` covering: nested constructor sanity,
|
|
nested constructor with field binding, nested wildcard, nested literal
|
|
equality, nested literal-vs-var clause fall-through, deeply nested constructors,
|
|
mixed bind+wildcard, and nested ctor fail-through. Both hosts: +8 tests pass,
|
|
zero regressions (OCaml 4532→4540, JS 2578→2586).
|
|
|
|
### Step 8: Exhaustiveness warnings (Phase 6c)
|
|
|
|
`_adt_registry: type_name → [ctor_names]` global populated by `define-type`.
|
|
On first non-exhaustive `match` evaluation: `console.warn("[sx] match: non-exhaustive …")`.
|
|
No error — warning only.
|
|
|
|
**Outcome:** `host-warn` primitive added on both hosts (OCaml `prerr_endline`,
|
|
JS `console.warn`). Spec-level helpers `match-clause-is-else?`,
|
|
`match-clause-ctor-name`, `match-warn-non-exhaustive`,
|
|
`match-check-exhaustiveness` added in `spec/evaluator.sx` and
|
|
called from `step-sf-match`. `*adt-warned*` env-bound dict used to
|
|
dedupe warnings per (type, missing-set). The OCaml `step_sf_match`
|
|
in `hosts/ocaml/lib/sx_ref.ml` was hand-patched (not retranspiled)
|
|
because `sx_ref.ml` retranspilation drops several preamble fixes;
|
|
the spec changes still flow to JS via `sx_build target="js"`. Both
|
|
hosts emit identical warnings (e.g. `[sx] match: non-exhaustive — Maybe: missing Nothing`).
|
|
5 new tests added. OCaml: 4540 → 4545. JS: 2586 → 2591. Zero regressions.
|
|
|
|
---
|
|
|
|
## Phase 4 — Plugin / extension system
|
|
|
|
Design: `plans/designs/hs-plugin-system.md`.
|
|
|
|
### Step 9: Parser feature registry
|
|
|
|
`lib/hyperscript/parser.sx`: replace `parse-feat` hardcoded `cond` with a dict lookup.
|
|
`(hs-register-feature! name parse-fn)` adds to the registry.
|
|
|
|
### Step 10: Compiler command registry + `as` converter registry
|
|
|
|
`lib/hyperscript/compiler.sx`: replace `hs-to-sx` hardcoded dispatch with dict.
|
|
`(hs-register-command! name compile-fn)` and `(hs-register-converter! name convert-fn)`.
|
|
|
|
### Step 11: Migrate hs-prolog-hook + Worker plugin
|
|
|
|
`lib/hyperscript/runtime.sx`: remove `hs-prolog-hook`/`hs-set-prolog-hook!` ad-hoc
|
|
slots. Create `lib/hyperscript/plugins/prolog.sx` that calls `hs-register-feature!`
|
|
and `hs-register-command!`. Create `lib/hyperscript/plugins/worker.sx` replacing the
|
|
E39 stub.
|
|
|
|
---
|
|
|
|
## Phase 5 — Performance
|
|
|
|
These are incremental and can interleave with other phases.
|
|
|
|
### Step 12: Frame records (CEK)
|
|
|
|
`hosts/ocaml/lib/sx_runtime.ml`: represent CEK frames as OCaml records instead of
|
|
tagged variant lists. Eliminates allocation pressure from list construction per frame.
|
|
Profile before/after on a tight-loop benchmark.
|
|
|
|
**Outcome:** Frames were already records (`cek_frame` in `sx_types.ml`) — the actual
|
|
hot-path bottleneck was `prim_call "=" [...]` in `step_continue`/`step_eval` dispatch:
|
|
each step did a Hashtbl lookup + 2x list cons + pattern match per comparison. Added a
|
|
fast path in `prim_call` (sx_runtime.ml) for `=`, `<`, `>`, `<=`, `>=`, `empty?`,
|
|
`first`, `rest`, `len` that skips the table lookup entirely. Also inlined `_fast_eq`
|
|
for the common scalar-equality cases that dominate frame-type dispatch. Median
|
|
improvements (bench_cek.exe, 7 runs):
|
|
|
|
| Benchmark | Before | After | Change |
|
|
|-----------|--------|-------|--------|
|
|
| fib(18) | 2789ms | 941ms | -66% |
|
|
| loop(5000) | 2018ms | 620ms | -69% |
|
|
| map sq(1000) | 108ms | 48ms | -56% |
|
|
| reduce + (2000) | 72ms | 10ms | -86% |
|
|
| let-heavy(2000) | 491ms | 271ms | -45% |
|
|
|
|
Tests: 4545 passing (unchanged baseline), 1339 failing (unchanged baseline).
|
|
Benchmark binary: `bin/bench_cek.exe`.
|
|
|
|
### Step 13: Buffer primitive for string building
|
|
|
|
Add `make-buffer`, `buffer-append!`, `buffer->string` primitives. Eliminates the
|
|
`(str a b c d ...)` quadratic allocation pattern in serializers and renderers.
|
|
Wire into `sx_primitives.ml` and the JS platform.
|
|
|
|
**Outcome:** Short aliases `make-buffer`/`buffer?`/`buffer-append!`/`buffer->string`/
|
|
`buffer-length` added on both hosts, sharing the existing `StringBuffer` value type.
|
|
`buffer-append!` accepts any value (auto-coerces non-strings via inspect), unlike
|
|
`string-buffer-append!` which is strict. The hot path converted was the OCaml
|
|
host-internal `inspect` function in `sx_types.ml`: rewrote from `(... ^ String.concat
|
|
" " (List.map inspect items) ^ ...)` (which allocates O(n) intermediate strings per
|
|
recursion level) to a single shared `Buffer.t` accumulator (`inspect_into buf v`
|
|
walks the value tree appending into one buffer). `inspect` is called by
|
|
`sx-serialize` on both spec and host paths, plus error-path formatting.
|
|
|
|
Median improvements (`bin/bench_inspect.exe`, best of 3 runs of 9-run min):
|
|
|
|
| Benchmark | Baseline (best min) | Buffer (best min) | Change |
|
|
|-------------------|--------------------:|------------------:|-------:|
|
|
| tree-d8 (75KB) | 5.31ms | 1.30ms | -76% |
|
|
| tree-d10 (679KB) | 81.89ms | 16.02ms | -80% |
|
|
| dict-1000 | 0.80ms | 0.31ms | -61% |
|
|
| list-2000 | 0.74ms | 0.33ms | -55% |
|
|
|
|
5 new tests in `spec/tests/test-string-buffer.sx` covering the new aliases (incl
|
|
non-string coercion and interop with the existing `string-buffer-*` API).
|
|
OCaml: 4545 → 4550. JS: 2591 → 2596. Zero regressions.
|
|
|
|
### Step 14: Inline common primitives in JIT
|
|
|
|
`hosts/ocaml/lib/sx_vm.ml`: add `OP_ADD`, `OP_SUB`, `OP_EQ`, `OP_APPEND` specialised
|
|
opcodes that skip the primitive table lookup for the most common calls. Compiler emits
|
|
these when operands are known numbers/lists.
|
|
|
|
**Outcome:** The opcodes (`OP_ADD`=160, `OP_SUB`=161, `OP_MUL`=162, `OP_DIV`=163,
|
|
`OP_EQ`=164, `OP_LT`=165, `OP_GT`=166, `OP_NOT`=167, `OP_LEN`=168, `OP_FIRST`=169,
|
|
`OP_REST`=170, `OP_CONS`=172) already existed in `sx_vm.ml` but the compiler never
|
|
emitted them — every primitive call went through `OP_CALL_PRIM` (52) with a Hashtbl
|
|
lookup. Two changes:
|
|
|
|
1. **`lib/compiler.sx` `compile-call`**: when the primitive name + arity matches a
|
|
specialized opcode, emit the 1-byte opcode (no name index, no argc operand)
|
|
instead of the 4-byte CALL_PRIM. Bytecode for `fib` shrank from 50→38 bytes.
|
|
2. **`hosts/ocaml/lib/sx_vm.ml` opcode bodies**: extended `OP_ADD/SUB/MUL/DIV` to
|
|
handle `Integer + Integer` (was `Number + Number` only — defaulted to Hashtbl
|
|
for the common integer case). Inlined `OP_EQ` to call `Sx_runtime._fast_eq`
|
|
directly. Inlined `OP_LT/GT` integer + mixed-numeric comparisons.
|
|
|
|
Median improvements (`bin/bench_vm.exe`, best of 3 runs of 9-min):
|
|
|
|
| Benchmark | Baseline (best min) | After (best min) | Change |
|
|
|------------------|---------------------|------------------|-------:|
|
|
| fib(22) | 107.87ms | 33.13ms | -69% |
|
|
| loop(200000) | 429.64ms | 161.16ms | -62% |
|
|
| sum-to(50000) | 72.85ms | 36.74ms | -50% |
|
|
| count-lt(20000) | 28.44ms | 17.58ms | -38% |
|
|
| count-eq(20000) | 37.23ms | 15.46ms | -58% |
|
|
|
|
Tests: 4550/4550 passing (unchanged baseline). Zero regressions. Benchmark binary:
|
|
`bin/bench_vm.exe` (loads `lib/compiler.sx` via CEK, JIT-compiles each test fn,
|
|
measures `Sx_vm.call_closure` time on the compiled `vm_closure`).
|
|
|
|
---
|
|
|
|
## Progress log
|
|
|
|
| Step | Status | Commit |
|
|
|------|--------|--------|
|
|
| 1 — JIT combinator bug | [x] | 882a4b76 |
|
|
| 2 — letrec+resume | [x] | e80e655b |
|
|
| 3 — tokenizer :end/:line | [x] | 023bc2d8 |
|
|
| 4 — parser spans complete | [x] | b7ad5152 (subsumed by 023bc2d8) |
|
|
| 5 — OCaml AdtValue + define-type + match | [x] | 1f49242a |
|
|
| 6 — JS AdtValue + define-type + match | [x] | fc8a3916 |
|
|
| 7 — nested patterns | [x] | 0679edf5 |
|
|
| 8 — exhaustiveness warnings | [x] | 6d391119 |
|
|
| 9 — parser feature registry | [x] | 986d6411 |
|
|
| 10 — compiler + as converter registry | [x] | d22361e4 |
|
|
| 11 — plugin migration + worker | [x] | 6328b810 |
|
|
| 12 — frame records | [x] | a66c0f66 (fib -66%, loop -69%, reduce -86% via prim_call fast path) |
|
|
| 13 — buffer primitive | [x] | 0e022ab6 (inspect rewrite: tree-d10 -80%, tree-d8 -76%, dict-1000 -61%, list-2000 -55%) |
|
|
| 14 — inline primitives JIT | [x] | 6c171d49 (fib -69%, loop -62%, sum -50%, count-lt -38%, count-eq -58% via specialized opcode emission) |
|
|
|
|
---
|
|
|
|
## Rules
|
|
|
|
- Branch: `architecture`. Never push to `main`.
|
|
- SX files: `sx-tree` MCP tools only. `sx_validate` after every edit.
|
|
- After every `.sx` edit to `lib/hyperscript/`, mirror to `shared/static/wasm/sx/hs-<file>.sx`.
|
|
- OCaml build: `sx_build target="ocaml"` MCP tool (never raw `dune`).
|
|
- JS build: `sx_build target="js"` MCP tool.
|
|
- One step per commit. Update progress log in this file.
|
|
- No new planning docs. No comments in SX unless non-obvious.
|
|
- Unicode in SX: raw UTF-8 only, never `\uXXXX`.
|