plans: SX review master remediation plan + evidence

Consolidates the three-lane review (core K01-K110, hosts J*/C*/JS*/P*/S*,
conformance F1-F15) into plans/sx-review/:
- PLAN.md — 15 workstreams, phased execution, full per-finding coverage
  ledger (every ~213 finding-instances mapped to a workstream + status)
- RULINGS.md — 40 draft normative rulings (Phase-0 gate)
- core.md / hosts.md / conformance.md — the lane evidence files

dc7aa709 quick-wins batch marked DONE in the ledger; K01 (guard re-raise
hang), S1 (live HTTP crash), K03 (shift-k), and W14 (test gate) flagged as
the highest-value open work.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
2026-07-03 21:28:41 +00:00
parent 72a3989fed
commit 4f766ea4f1
6 changed files with 2768 additions and 0 deletions

343
plans/sx-review/PLAN.md Normal file
View File

@@ -0,0 +1,343 @@
# SX Review — Master Remediation Plan
Consolidates every finding from the three parallel review sessions (2026-07-03):
- `core.md` — language core / spec semantics (K01K110)
- `hosts.md` — per-host implementations + FFI (J*, C*, JS*, P*, S*, PY)
- `conformance.md` — cross-host agreement + test adequacy (F1F15, conf-S1S5)
- `RULINGS.md` — 40 draft normative rulings (R1R40) that gate the ambiguity fixes
**How to read this.** Findings are grouped into workstreams (W1W15). Each workstream lists the
finding IDs it resolves, the approach, what ratified ruling(s) it needs, and status. The full
per-ID coverage ledger is at the bottom — every finding maps to a workstream + status, so nothing
is silently dropped. `[DONE]` = landed in commit dc7aa709 (quick-wins batch). `[GATE]` = blocked on
a Phase-0 decision. `[dup→Kxx]` = same defect found by another lane, fixed once.
**Prime directive from the review:** the verification infrastructure currently cannot tell you
whether a fix works (runner envs diverge from production, the WASM kernel never runs the corpus,
the JS gate is structurally red, one test passed *because of* the bug it tested). So Phase 1 Track A
(gate repair) comes before the bulk of the semantic work — otherwise fixes land blind.
---
## Phase 0 — Decisions (BLOCKING; maintainer; no code)
Nothing in Phases 2+ that changes observable semantics should merge before the relevant ruling is
ratified. These three decisions unblock ~40 findings.
### D1. Host lineup
Evidence: the JS-transpiled bundle is hollow (C0a: define-library files → 0 bytes) and its gate is
red (C0b: 2490/5086 fail); nothing serves it. The standalone Python host cannot load (C30/PY).
Production = OCaml native + WASM kernel (one OCaml library) + the load-bearing Python parser/bridge
in `shared/sx/`.
**Recommendation:** declare the kernel family the only evaluator targets; retire `hosts/javascript`
+ `hosts/python` standalone; shrink `shared/sx/parser.py` to a wire-subset with a parity suite.
→ Ratifying this **closes W13 entirely** (C0a/C0b/JS1JS8 become "delete") and simplifies W6/W7.
### D2. Ratify RULINGS.md (R1R40)
Each ruling is one normative answer + one mechanical fix. Ratify in a pass; four need a
pre-ratification usage sweep because they're high-churn: **R17** (arity: kill nil-fill), **R9**
(cond flat-only), **R31** (append! errors on derived lists), **R15a** (HO swap only when
unambiguous). See RULINGS.md for the per-ruling recommendation.
### D3. Define the merge gate
Recommendation: (a) native `run_tests` green with hs-upstream skip-listed; (b) same corpus on the
WASM kernel; (c) cross-kernel differential battery output-identical; (d) CEK-vs-forced-JIT
differential when JIT is on; (e) `sx_ref.ml` regen + diff. This is W14's definition of done.
---
## Phase 1 — Trustworthy verification + stop the bleeding
### W14. Test gate & conformance infrastructure *(do FIRST — everything else verifies against it)*
Findings: C0b, C9, C21, C22, C23, C3, C4, C5, C6, C7, F2, F3, F4, F5, F6, F7, F8, F9, F10, F11,
F12, K19 (harness/runtime primitive drift, partial from batch), K104 (harness log-before-mock).
Approach:
1. **Unify runner env with production env** — delete or productionize every runner-only binding:
`values`/`call-with-values` (F7, K42), the JS runner's fake sha3/equal?/apply/env-set! shims
(JS5, F7). Rule: if the spec needs it, it's a kernel primitive; if not, the test can't have it.
2. **WASM corpus runner** in CI (F2) — promote conformance's `run_wasm.js` prototype.
3. **MCP harness honesty** (K19): `mcp_tree.ml` drops its parallel primitive table and links real
`sx_primitives` (batch aligned 8 entries as a stopgap); make `sx_harness_eval` fresh per call.
4. **Harness fixes**: log IO before invoking the mock (C22/K104); real perform/suspend mode (C21);
adapter-dom render-output tests (C23).
5. **Epoch-loop protocol fuzz suite** (C3/C4/C5/C6/C7) + skip-list hs-upstream (F10) + empty suite
label (C9).
6. **Test-debt ledger**: pin every confirmed finding with a failing test FIRST — the three lane
files are a ready-made corpus of minimal reprs. **Batch gap to close: dc7aa709's fixes have no
pinning tests** (except crit-2, now non-vacuous). Add tests for K09, K11, K18, K20, K39, K49,
C1/C1b, S4 before further evaluator work.
Gate: none (this IS the gate). Status: OPEN — highest priority.
### W1. Condition system & delimited continuations *(the kernel criticals)*
Findings: K01 (guard/handler re-raise hang — CRITICAL), K03 (shift-k nested cek-run double-exec —
CRITICAL), K10 (dynamic-wind re-entry + sibling winder corruption), K12 (`->` non-HO steps in
nested CEK), K36 (guard multi-expr clause body — inherits cond fix W5), K41 (host errors uncatchable
by guard), K57 (strict errors uncatchable), K106 (SUSPECTED: expand-macro/let-values/qq nested-eval
boundaries), S10 (VM inline-IO in HO callbacks can't suspend). K02 [DONE].
Root cause (shared by K03/K12/K106/S10): evaluation crosses a **nested `cek-run`/`trampoline
(eval-expr)` boundary** the outer continuation can't see. One architectural fix — invoke
continuations and evaluate these sub-expressions via CEK frames, not nested runs — resolves the
cluster. K01 is separate: run handlers with the OUTER handler set (unwound kont must EXCLUDE the
matched frame); make guard clause bodies evaluate after the escape (the no-match auto-reraise path
already does this — make it the only path). K10: common-ancestor before/after algorithm + winders
stored per-continuation, not one global length-keyed stack. K41/K57: raise host/primitive/strict
errors as structured catchable conditions (needs R7).
Gate: R6 (handler installation), R7 (what's catchable), R8 (raise-continuable). Status: OPEN —
**K01 is the single highest-value fix left** (DoS-able hang, server + browser).
### W3. HTTP-mode concurrency & serving safety *(production robustness — lib/host is LIVE)*
Findings: S1 (multi-Domain render race — LIVE CRASH), S2 (per-request globals read by queued
workers), S3 (`expand-components?` bind/remove on shared env), S5 (cache key ignores cookies/query),
S11 (URL evaluated as SX — any env binding invokable), S12 (island hydration reuses no SSR DOM),
S13 (SSR/client purity, no dev-mode check), K30 (emit!/emitted cross-request — shared with W2). S4
[DONE], J1/J2/J3 mitigated by the batch JIT gate.
Approach: serialize or isolate rendering (S1: lock `_stream_mutex` or per-Domain env/cache);
per-request state carried with the request not process-global (S2/S3/K30); include query in cache
key + cookie policy (S5); whitelist URL-routable bindings to a `page:` prefix (S11); hydration cursor
+ dev-mode purity check (S12/S13). Pairs with W2 (per-flow scope stacks).
Gate: none for S1/S4/S5 (safety). Status: OPEN — S1 is a live crash.
---
## Phase 2 — Correctness families (each = ruling + fixes + conformance rows)
### W2. Environment & scope integrity
Findings: K04 (caller frame leaks into interpreted lambda + JIT disagrees), K05 (letrec injects into
foreign closures — global contamination), K06 (named-let leaks loop name), K07 (~60 unshadowable
names; = J8 VM-honors-CEK-doesn't), K30 (emit! cross-request — shared W3), K31 (provide leak on
raise/shift), K32 (provide! ambient global), K33 (set! unbound creates + JIT/interp split brain),
K40 (scope :value dead + dead frame type), K107 (SUSPECTED env_merge depth-100 flip).
Approach: fresh frames for letrec/named-let (K05/K06); drop the top-frame copy in env_merge
(K04/K107); reserved-words error for dispatch names, aligning VM+CEK (K07/J8); unwind-safe +
invocation-scoped dynamic state — one mechanism for provide/emit!/batch (K30/K31/K32); set!-unbound
per R1 + kill the JIT/interp global split (K33); remove dead scope :value/frame (K40).
Gate: R1 (set!), R2 (reserved names). Status: OPEN.
### W4. Higher-order forms & threading
Findings: K13 (2-arg reduce returns coll), K14 (reduce init-swap), K15 (data-first drops extra
args), K43 (O(n²) map/filter), K44 (HO names not first-class), K45 (cryptic uncatchable HO errors),
K46 (multi-coll rejects strings/vectors), K47 (thread lambda literal), K78 (component in HO → zeros),
K79 (dead `|>`), K80 (keyword getters in HO/->), K81 (zero-arg HO silent ()), J7 (VM data-first
deopt — shared W11).
Approach: implement R15 sub-rulings (swap only when one arg callable + error otherwise; reduce
arities; drop-extra→error; multi-coll seq-to-list parity; HO first-class; zero-arg→error); fix O(n²)
via reversed-cons accumulation; delete dead `|>`.
Gate: R13 (threading), R15 (HO forms). Status: OPEN.
### W5. Special forms & macros
Findings: K08 (cond dual-grammar — silent side-effect drops), K34 (qq depth), K35 (qq dict
traversal), K37 (&key misbind on fn/defmacro), K38 (splice non-list/malformed), K70 (case else any
position), K71 (case dialect + punning), K72 (letrec parallel + ref-before-init), K76 (defmacro
unhygienic vs "hygiene" test name), K77 (match guard clauses silently structural), K42 (values —
special forms now registered [DONE-partial]; `values` primitive still runner-only). K09/K11/K39 [DONE].
Approach: cond flat-only + explicit begin (R9); qq depth tracking + dict traversal + splice arity
errors (R12); &key one binding path for fn/defmacro/component (R5/K37); case final-else + evaluated-
datum doc + clause-syntax error (R10); letrec* + ref-before-init error (R4); match guards implemented
or error (R14); make `values` a real kernel primitive (finish K42).
Gate: R4, R5, R9, R10, R12, R14. Status: OPEN (K09/K11/K39 done).
### W6. Parser, serializer, canonical form & CIDs
Findings: K21 (canonical.sx runner-only helpers), K22 (serializer dict-key escaping + CID fixed-
point), K23 (four divergent ident/number classifiers), K24 (`1e`→nil), K25 (guest rationals throw),
K63 (`#;` before `)`), K64 (`=` no Char arm — shared W7), K65 (`#\a` mcp crash), K66 (multibyte char
literals), K67 (`\uXXXX` validation), K68 (unknown-escape divergence), K69 (`#name` reader macro
unimpl on OCaml), K100 (parse error locations), K101 (dict literal edges), K102 (`#|` raw string),
K103 (`:`/`::` keyword edges), K108 (SUSPECTED cross-host CID nondeterminism), C25 (Py↔OCaml escape
corruption), C26 (Py unicode symbols), C27 (Py dict order — shared W7/P9).
Approach: ONE normative ident/number classifier bound by every surface (R32); \u validation +
unknown-escape error + datum-comment fix (R27/R33); native `#name` reader macro registry;
canonical path = native CBOR/CID normative, spec/canonical.sx tested mirror or deleted, property-
test `parse(serialize(x))=x` and canonical fixed-point cross-kernel (R34/R35). CID determinism
(K108/K35-in-canonical) is sx-pub-critical.
Gate: R27, R32, R33, R34, R35. Status: OPEN.
### W7. Numbers, equality, strings, collection primitives
Findings: K17 (append! silent no-op), K52 (byte-based strings), K53 (spec/runtime primitive drift),
K54 (div-by-zero inconsistency), K55 (`/` doc), K56 (sort no comparator), K64 (char equality), K85
(binary `=`, exactness conflation), K86 (rounding/inexact->exact/sqrt), K87 (float/nil rendering),
K88 (nil/empty tolerance), K89 (keys reverse order — GATED, see R29 note: breaks render tests), K90
(keyword-name on evaluated kw), K91 (string->number), K92 (apply doesn't spread), P1 (lossy float
wire), P2 (sort mixed int/float), P3 (into needs bridge), P4 (int63 vs float64), P5 (= not deep on
JS dicts, missing eq?/eqv?), P6 (string units), P7 (JS coercion cluster — GATE D1), P8 (nil/list
strictness), P10 (NaN/Inf wire tokens), P11 (upcase/round), P12 (zip-pairs). K18/K20 [DONE].
Approach: append! errors on non-mutable lists + deprecate (R31); codepoint string semantics (R25);
implement eq?/eqv?, add `=` Char arm, n-ary comparisons (R19); exact-±2^53 + overflow-promote (R21);
shortest-round-trip float printing + inf/nan wire tokens (R23); div-by-zero catchable (R22); apply
spreads (R16); sort comparator + numeric compare (R30/P2); into native, contains?-on-dict [done],
merge-skip-nil, zip-pairs sliding window (R30/P8/P12); reconcile spec/runtime primitive lists (K53).
Gate: R16, R19, R21, R22, R23, R25, R29, R30, R31. Status: OPEN (K18/K20 done).
### W8. Render pipeline
Findings: K16 (infinite recursion no depth guard), K48 (attr-name injection — XSS class), K50 (aser
list kwargs), K51 (dom/html attr parity = C19), K82 (bool-attr truthiness footgun), K83 (dead
is-render-expr? / html: tags), K84 (script/style escaping), K87 (float render — shared W7), C19 (=K51),
C20 (CSRF cross-origin), S14 (deep nested-list flatten html vs aser), S9 (SPA boosted-nav fragility).
K49 [DONE]. Approach: depth limit + cycle guard (K16); attr-name validation (R36/K48); quote aser
list kwargs (R38/K50); align 4 adapters on bool-attr contract (R36/C19/K51); script/style raw-text
error-on-breakout (R37/K84); wire or delete is-render-expr? (R37/K83); depth-2 aser/html parity test
(S14). CSRF cross-origin (C20) + SPA manifest staleness (S9, overlaps W11 stale-bundle).
Gate: R36, R37, R38. Status: OPEN (K49 done).
### W9. Strict typing
Findings: K26 (HO callbacks bypass), K27 (apply bypasses), K58 (unknown type names match all), K59
(keyword type dead / components untypeable), K60 (component &key misalign), K93 (name-keyed, evaded),
K94 (set-prim-param-types! no validation), K95 (too-few args skip checks), K96 (`(:as type)` unenforced),
K97 (paper cuts). Approach (R20): move checks to continue-with-call/vm_call chokepoints (covers HO,
apply, components, => receivers); validate type names at declaration; real "component" branch, remove
dead "keyword" (R18); `(:as type)` as the declaration channel; merge+validate set-prim-param-types!;
strict errors catchable (R7, shared W1). Return types explicitly out of scope.
Gate: R7, R18, R20. Status: OPEN.
### W10. Signals & coroutines
Findings: K28 (dispose-computed no-op), K29 (batch wedge on exception), K61 (identity-not-equality
change detection), K62 (diamond glitch), K98 (batch unusable on server / coroutines inert), K99
(effect cleanup double-invoke), K109 (SUSPECTED coroutine non-yield wedge), K110 (SUSPECTED VM no
strict — shared W9/W11). Approach (R39): `=`-based change detection (needs W7 R19); unwind-safe batch
(shared W2 mechanism); two-phase/topological notify for glitch-freedom; fix dispose-computed + effect
cleanup; make batch/coroutines work outside run_tests (bind batch-begin!/end! + cek hooks in real
envs, or fold into kernel). Zero test coverage today — add suites.
Gate: R39. Status: OPEN.
### W12. Python bridge & boundary *(load-bearing in production)*
Findings: C24 (boundary validation dead — [DONE-partial]: now warns; full revival needs tier-1
declarations recreated + zero-violation proof since SX_BOUNDARY_STRICT=1 is live), C28 (two SxExpr
classes double-quote), C29 (reader-macro auto-resolve broken), C30 (standalone Python host dead —
GATE D1: delete), C31 (14/33 test files broken + 5 live failures), S-bridge (coroutine-cancel
desync, no timeouts, dead _restart), S-bridge2 (numeric-result-as-epoch ambiguity), K42 (values —
shared W5). C25/C26/C27 live in W6 (parser). Approach: finish C24 (recreate declarations, prove
clean, re-enable); single SxExpr class (C28); fix OcamlSync.start→_ensure (C29); bridge timeouts +
working _restart (S-bridge); robust (ok N V) parse (S-bridge2); fix/retire broken tests (C31).
Gate: D1 (C30). Status: OPEN (C24 partial).
### W11. JIT correctness (serving-JIT re-enable preconditions)
Findings: J1 (`->` miscompile), J2 (fallback re-runs whole call — double side effects), J3 (macro
args eager), J4 (VM component kwargs misparse), J5 (specialized opcodes freeze redefs), J6 (compiler-
used prim redef poisons), J7 (data-first deopt — shared W4), J10 (stale Sx_compiler stub), J11 (JIT
debug paths diverge), K33 (set! split brain — shared W2), K19 (harness drift — shared W14), C10
(browser compiler one fix behind), C11 (stale module-manifest.sx), C12 (dead SOURCE_MAP paths), C14
(stale dist/ bundle). J12 = positive (perform/resume fixed). Currently MITIGATED (JIT gated OFF in
both epoch and — post-batch — HTTP mode). Approach: fix compile-thread-step (J1); fallback-before-
side-effects or compile-time reject of fallback-prone forms (J2); macro-aware compile (J3); keyword
tagging in constant pool (J4); redefinition invalidation (J5/J6); one browser-compiler sync pipeline
+ single bundle dir (C10/C11/C12/C14). Do NOT re-enable serving-JIT until the CEK-vs-JIT differential
(W14) is green.
Gate: W14 differential. Status: DEFERRED (mitigated; only unblock if serving-JIT is wanted).
### W13. JS host *(GATE D1 — likely "delete")*
Findings: C0a (hollow bundle), C0b (2490 fail gate), JS1 (define-record-type/makeRtd), JS2 (host-
callback type tag), JS3 (arithmetic drops args), JS4 (`.` symbol), JS5 (runner shims), JS6 (str nil),
JS7 (no qq emission), JS8 (stale metadata). If D1 retires the JS bundle: delete `hosts/javascript`,
remove from `sx-build-all.sh`/CI, keep only the WASM kernel path. If kept: this is a ~2500-test
revival project. Gate: D1. Status: BLOCKED on D1.
---
## Phase 3 — Hygiene & docs
### W15. Hygiene & documentation
Findings: C8 (triplicated hosts/ocaml/hosts/ tree), C13 (test_platform.js stale path), C15 (tracked
stale wasm blob), C16 (orphaned hosts/native), C17 (sx-platform-2.js + 23 dead .sxbc.json), C18
(spa-debug.js + root clutter), C2 (r7rs string->number radix shadow), F14 (doc drift — batch fixed
canonical-ref + island rules; suite counts + case-syntax + primitives-header still stale), F15
(sha3 stub / test.sx dead filename), F13 (regen reproducibility — [DONE] as batch side effect).
K105/K73 [DONE]. Approach: delete dead trees/blobs/files; fix r7rs shadow (C2); finish CLAUDE.md
(suite counts, case syntax); regen-diff CI check (F13 → make it a gate in W14).
Gate: D1 (some deletions). Status: OPEN (K105/K73/F13 done).
---
## Suggested execution shape (maps to the loop workflow)
Four loops, mostly independent after Phase 0:
1. **loops/sx-gate** (W14 + W15 hygiene) — the enabler. Start FIRST. Pins tests for the dc7aa709
batch, builds the WASM corpus runner + differential battery, unifies runner env, cleans dead code.
2. **loops/sx-kernel** (W1 + W2 + W5) — condition system, scope integrity, special forms. Single
owner (touches evaluator.sx + regen). TDD off W14's pinned tests. K01 first.
3. **loops/sx-runtime** (W3 HTTP safety + W12 Python bridge) — production robustness; can run
parallel to kernel since it's mostly host OCaml + Python, not spec.
4. **loops/sx-families** (W4, W6, W7, W8, W9, W10) — one family at a time, each gated by its rulings
+ the new batteries. W6/W7 pay the sx-pub CID debt.
W11 (JIT) and W13 (JS) are decision-gated and sit out until D1 + a green differential exist.
**Sequencing rule:** no semantic fix merges before (a) its pinning test exists, (b) the relevant
ruling is ratified, (c) native + WASM both run it. D1/D2/D3 are the only hard blockers.
---
## Coverage ledger — every finding accounted for
Status key: DONE (dc7aa709) · OPEN · PARTIAL · DEFERRED · GATE(Dn) · dup→(primary). Workstream in [].
### Core (K01K110)
- K01 [W1] OPEN — guard/handler re-raise hang (CRITICAL, highest value)
- K02 [W1] DONE — signal-return frame key
- K03 [W1] OPEN — shift-k nested cek-run (CRITICAL)
- K04 [W2] OPEN · K05 [W2] OPEN · K06 [W2] OPEN · K07 [W2] OPEN (=J8)
- K08 [W5] OPEN — cond dual grammar
- K09 [W5] DONE · K10 [W1] OPEN · K11 [W5] DONE
- K12 [W1] OPEN (=W4 threading) · K13 [W4] OPEN · K14 [W4] OPEN · K15 [W4] OPEN
- K16 [W8] OPEN · K17 [W7] OPEN — append! · K18 [W7] DONE · K19 [W14] PARTIAL · K20 [W7] DONE
- K21 [W6] OPEN · K22 [W6] OPEN · K23 [W6] OPEN · K24 [W6] OPEN · K25 [W6] OPEN
- K26 [W9] OPEN · K27 [W9] OPEN · K28 [W10] OPEN · K29 [W10] OPEN
- K30 [W2/W3] OPEN — emit! cross-request (=S2 dir)
- K31 [W2] OPEN · K32 [W2] OPEN · K33 [W2/W11] OPEN — set! split brain
- K34 [W5] OPEN · K35 [W5/W6] OPEN · K36 [W1/W5] OPEN · K37 [W5] OPEN · K38 [W5] OPEN
- K39 [W5] DONE · K40 [W2] OPEN · K41 [W1] OPEN · K42 [W5/W12] PARTIAL (forms registered; `values` prim runner-only)
- K43 [W4] OPEN · K44 [W4] OPEN · K45 [W4] OPEN · K46 [W4] OPEN · K47 [W4] OPEN
- K48 [W8] OPEN · K49 [W8] DONE · K50 [W8] OPEN · K51 [W8] OPEN (=C19)
- K52 [W7] OPEN · K53 [W7] OPEN · K54 [W7] OPEN · K55 [W7] OPEN · K56 [W7] OPEN
- K57 [W1/W9] OPEN · K58 [W9] OPEN · K59 [W9] OPEN · K60 [W9] OPEN
- K61 [W10] OPEN · K62 [W10] OPEN · K63 [W6] OPEN · K64 [W6/W7] OPEN — char `=`
- K65 [W6] OPEN · K66 [W6] OPEN · K67 [W6] OPEN · K68 [W6] OPEN · K69 [W6] OPEN
- K70 [W5] OPEN · K71 [W5] OPEN · K72 [W5] OPEN · K73 [W15] DONE
- K74 [W2] OPEN (component &key false→nil; R5) · K75 [W2] OPEN (trailing kw; R5)
- K76 [W5] OPEN · K77 [W5] OPEN · K78 [W4] OPEN · K79 [W4] OPEN · K80 [W4] OPEN · K81 [W4] OPEN
- K82 [W8] OPEN · K83 [W8] OPEN · K84 [W8] OPEN · K85 [W7] OPEN · K86 [W7] OPEN · K87 [W7/W8] OPEN
- K88 [W7] OPEN · K89 [W7] OPEN — keys order, GATED R29 (breaks render tests, see RULINGS note)
- K90 [W7] OPEN · K91 [W7] OPEN · K92 [W7] OPEN — apply spread
- K93 [W9] OPEN · K94 [W9] OPEN · K95 [W9] OPEN · K96 [W9] OPEN · K97 [W9] OPEN
- K98 [W10] OPEN · K99 [W10] OPEN · K100 [W6] OPEN · K101 [W6] OPEN · K102 [W6] OPEN · K103 [W6] OPEN
- K104 [W14] OPEN · K105 [W15] DONE
- K106 [W1] OPEN (SUSPECTED nested-eval boundaries) · K107 [W2] OPEN (SUSPECTED)
- K108 [W6] OPEN (SUSPECTED CID nondeterminism) · K109 [W10] OPEN (SUSPECTED) · K110 [W9/W11] OPEN (SUSPECTED)
### Hosts — JIT (J1J12)
- J1 [W11] DEFERRED (mitigated: JIT gated off) · J2 [W11] DEFERRED · J3 [W11] DEFERRED
- J4 [W11] DEFERRED · J5 [W11] DEFERRED · J6 [W11] DEFERRED · J7 [W11/W4] DEFERRED
- J8 [W2] OPEN dup→K07 · J9 [W11/W14] DEFERRED · J10 [W11] DEFERRED · J11 [W11] DEFERRED
- J12 POSITIVE (no action — perform/resume verified fixed)
### Hosts — kernel/protocol/build (C*)
- C0a [W13] GATE(D1) · C0b [W13/W14] GATE(D1) · C1 [W3] DONE · C1b [W3] DONE
- C2 [W15] OPEN · C3 [W14] OPEN · C4 [W14] OPEN · C5 [W14] OPEN · C6 [W14] OPEN · C7 [W14] OPEN
- C8 [W15] OPEN · C9 [W14] OPEN · C10 [W11] DEFERRED · C11 [W11] DEFERRED · C12 [W11/W15] OPEN
- C13 [W15] OPEN · C14 [W11/W15] OPEN · C15 [W15] OPEN · C16 [W15] OPEN · C17 [W15] OPEN · C18 [W15] OPEN
- C19 [W8] OPEN dup→K51 · C20 [W8] OPEN · C21 [W14] OPEN · C22 [W14] OPEN · C23 [W14] OPEN
- C24 [W12] PARTIAL · C25 [W6] OPEN · C26 [W6] OPEN · C27 [W6/W7] OPEN dup→P9
- C28 [W12] OPEN · C29 [W12] OPEN · C30 [W12] GATE(D1) · C31 [W12] OPEN
### Hosts — JS host (JS1JS8)
- JS1JS8 [W13] all GATE(D1) — delete if JS retired, else ~2500-test revival
### Hosts — cross-host parity (P1P12, PY)
- P1 [W7] OPEN · P2 [W7] OPEN · P3 [W7] OPEN · P4 [W7] OPEN · P5 [W7] OPEN · P6 [W7] OPEN
- P7 [W7] GATE(D1) · P8 [W7] OPEN · P9 [W6/W7] OPEN (=C27) · P10 [W7] OPEN · P11 [W7] OPEN · P12 [W7] OPEN
- PY [W13] GATE(D1) dup→C30
### Hosts — HTTP/suspected (S1S14, S-bridge*)
- S1 [W3] OPEN (LIVE CRASH) · S2 [W3/W2] OPEN · S3 [W3] OPEN · S4 [W3] DONE · S5 [W3] OPEN
- S6 [W14] OPEN · S7 [W14/W1] OPEN (unify eval/IO paths) · S8 [W13/W8] OPEN (browser env prims)
- S9 [W8/W11] OPEN · S10 [W1] OPEN · S11 [W3] OPEN · S12 [W3] OPEN · S13 [W3] OPEN · S14 [W8] OPEN
- S-bridge [W12] OPEN · S-bridge2 [W12] OPEN
### Conformance (F1F15, conf-S1S5)
- F1 [W7] OPEN dup→K18/P4 (WASM int wrap) · F2 [W14] OPEN · F3 [W7/W6] OPEN (apply + dict order) · F4 [W13/W14] GATE(D1)
- F5 [W14] OPEN (host-neutral corpus) · F6 [W14] OPEN (directories one-host-gated) · F7 [W14] OPEN dup→K42
- F8 [W14] OPEN (differential battery) · F9 [W7/W14] OPEN (primitive parity) dup→K53 · F10 [W14] OPEN (skip hs)
- F11 [W12] OPEN dup→C24 · F12 [W6] OPEN dup→C25/26/27 · F13 [W15] DONE · F14 [W15] PARTIAL · F15 [W15] OPEN
- conf-S1 [W14] OPEN (native-vs-WASM web-stack diff) · conf-S2 [W14] OPEN (hyperscript unverifiable)
- conf-S3 [W11] OPEN (import path browser vs test) · conf-S4 [W14] OPEN (float golden precision) · conf-S5 [W11] OPEN (JS build-flag ADT divergence)
### Tally
~213 finding-instances. DONE: 13 (dc7aa709). PARTIAL: 4 (K19, K42, C24, F14). DEFERRED: 12 (W11 JIT).
GATE(D1): ~16 (JS host + Python standalone). OPEN: the rest, distributed across W1W12/W14/W15.

21
plans/sx-review/README.md Normal file
View File

@@ -0,0 +1,21 @@
# SX Review — 2026-07-03
Findings from three parallel review sessions of the SX language/runtime, plus the master
remediation plan.
| File | What |
|------|------|
| **PLAN.md** | Master remediation plan: 15 workstreams (W1W15), execution order, and a full per-finding coverage ledger. Start here. |
| **RULINGS.md** | 40 draft normative rulings (R1R40). Phase-0 gate — ratify before the semantics fixes. |
| core.md | Language core / spec semantics lane (K01K110). |
| hosts.md | Per-host implementations + FFI lane (J*, C*, JS*, P*, S*, PY). |
| conformance.md | Cross-host agreement + test adequacy lane (F1F15, S1S5). |
**Status:** the quick-wins batch (commit dc7aa709) landed 13 fixes + 4 partials; suite at baseline
5762p/274f (fail set byte-identical). Everything else is OPEN/GATE/DEFERRED per PLAN.md's ledger.
**Highest-value open items:** K01 (guard/handler re-raise hang — DoS-able, server+browser),
S1 (live HTTP crash under load), K03 (shift-k double-execution), and W14 (test gate — the enabler
that makes all other fixes verifiable).
**Blocking decisions (maintainer):** D1 host lineup, D2 ratify rulings, D3 gate definition.

396
plans/sx-review/RULINGS.md Normal file
View File

@@ -0,0 +1,396 @@
# SX RULINGS — normative decisions on every ambiguity surfaced by the 2026-07-03 review
DRAFT for ratification. Each ruling: STATUS `PROPOSED` → flip to `RATIFIED` / `REJECTED` /
`AMENDED: <text>`. Once ratified, this file moves to `spec/RULINGS.md` and becomes the
authority the conformance batteries pin against. Evidence citations: core.md finding names,
hosts.md J/C/JS/P/S codes, conformance.md F codes.
**Default posture used for recommendations** (override per-ruling as you see fit):
1. Prefer an ERROR over any silent behavior (silent drop/no-op/misparse caused the worst findings).
2. Prefer R7RS/standard semantics where churn is low; prefer current-behavior-plus-documentation
where churn is high and behavior is defensible.
3. Every ruling lands with conformance rows that run on BOTH production kernels (native + WASM).
**Companion decisions (not language rulings, restated for context):**
- D1 host lineup — recommended: kernel family (native OCaml + WASM) are the only evaluator
targets; hosts/javascript and hosts/python standalone retired; shared/sx/parser.py shrunk to a
wire-subset with a parity suite. Rulings below marked [D1] simplify to kernel-only if ratified.
- D3 gate — recommended: native corpus green (hs-upstream skip-listed) + same corpus on WASM +
cross-kernel differential battery + CEK-vs-JIT differential (when JIT on) + sx_ref.ml regen diff.
---
## A. Bindings & scope
### R1. `set!` on an unbound name
- Current: silently creates a root binding (tested intent, test-scope.sx:196) — but BOTH spec docs
say error (eval-rules.sx:112, special-forms.sx:141), and under JIT it writes a different global
table than the interpreter (split brain).
- RECOMMENDATION: **ERROR** ("set!: <name> is not bound — use define"). Typo'd set! is a bug-hider;
the docs already promise this. Flip test-scope.sx:196; sweep the corpus for reliance (expected
small — the idiom is define-then-set!). Either way the JIT/interpreter split MUST die.
- Churn: low-medium. Findings: core set!-unbound; hosts J-globals split. STATUS: PROPOSED
### R2. The ~60 special-form/HO names (`map`, `filter`, `bind`, `match`, `do`, `case`, `->`, …)
- Current: `define`/`let`/`defmacro` of these names is silently accepted but ignored in call
position (CEK); the VM honors them (J8) — worst of both worlds.
- RECOMMENDATION: **reserved words**`define`/`let`/`set!`/`defmacro` of any dispatch-table name
is a load-time ERROR. Publish the list in spec. Align the VM. (Full lexical honoring is more
Schemely but taxes every list-head dispatch and rescues little real code.)
- Churn: low (error surfaces existing dead definitions). Findings: core unshadowable-names; J8. STATUS: PROPOSED
### R3. `let` semantics
- Current: sequential (`let*`), body = implicit begin, on BOTH engines (tested intent). CLAUDE.md
island rules claim the opposite (describes a dead evaluator).
- RECOMMENDATION: **ratify current behavior**: `let``let*`; body sequences. Fix CLAUDE.md.
Document (or forbid) the observed letrec-ish quirk that binding-init lambdas capture the shared
frame (`(let ((f (fn () a)) (a 5)) (f))` → 5).
- Churn: zero (docs only). Findings: core let-docs; hosts handoff let-sequential. STATUS: PROPOSED
### R4. `letrec`
- Current: parallel (all inits evaluated, then bound); read-before-init yields nil silently; PLUS
two outright bugs (names injected into foreign lambdas' closures = global contamination;
named-let loop name leaks into and clobbers the enclosing frame).
- RECOMMENDATION: **letrec\* semantics** (sequential init) with ERROR on read-before-init
(pre-bind to an "uninitialized" sentinel that faults on read). Named-let binds its loop name in
a fresh frame, invisible after the form. The closure-injection and frame-leak are bugs to fix
regardless of ruling.
- Churn: low. Findings: core letrec-parallel/-injection/named-let. STATUS: PROPOSED
### R5. Component `&key` conventions
- Current: `:flag false` is coerced to nil (indistinguishable from omitted); trailing keyword with
no value silently binds nil; `&key` on plain fn/defmacro silently misbinds.
- RECOMMENDATION: `false` is a legal &key value (bind via has-key, not `(or …)`); trailing keyword
without a value = ERROR; `&key` in fn/defmacro either implemented identically to components or
ERROR at definition (recommend: implement — one binding path for all three).
- Churn: low. Findings: core &key-false / trailing-kw / defmacro-&key. STATUS: PROPOSED
## B. Errors & conditions
### R6. Handler installation semantics
- Current: a handler runs with ITSELF still installed → any raise/error inside a guard clause or
handler-bind handler loops forever (crit 1; WASM-verified).
- RECOMMENDATION: **R7RS/CL semantics** — handlers run with the OUTER handler set; guard clause
bodies evaluate after the escape (the no-match auto-reraise path already does this correctly —
make it the only path).
- Churn: zero for correct code (only un-hangs broken cases). Findings: crit 1, guard family. STATUS: PROPOSED
### R7. What is catchable
- Current: only guest `(raise …)` reaches guard; host primitive errors, undefined-symbol, arity
errors, and strict type errors all blow through every handler.
- RECOMMENDATION: **everything is a condition.** Host/primitive/strict/undefined-symbol errors are
raised as structured condition dicts ({:type :message :op …}) through the same channel guard
sees. Reserve a non-catchable class only for kernel panics.
- Churn: low-medium (code that "relied" on uncatchability is unlikely). Findings: core
host-errors-uncatchable, strict-uncatchable; enables sane server error pages. STATUS: PROPOSED
### R8. `raise-continuable` / `signal-condition`
- RECOMMENDATION: ratify R7RS: handler's value returns to the signal site (the current
whole-program-result behavior is crit 2's frame-key bug, not a semantic choice). STATUS: PROPOSED
## C. Special forms
### R9. `cond` grammar — kill the dual-mode heuristic
- Current: flat pairs documented; undocumented Scheme clause mode auto-detected iff every arg is a
2-element list → silent side-effect drops, mode flips, wrong values (core cond-ambiguity).
- RECOMMENDATION: **flat pairs only**: `(cond t1 r1 t2 r2 … :else d)`. Multi-expression results
use explicit `(do …)`. Support arrow as a flat triple `t => receiver`. A clause-shaped arg list
as a test position is just evaluated — no mode detection ever. Migrate the cond-arrow suite
(test-r7rs.sx:135-145) and any clause-mode usage (sweep needed).
- Churn: medium (sweep + migrate clause-mode call sites). Findings: core cond-ambiguity,
guard-multi-expr (inherits). STATUS: PROPOSED
### R10. `case`
- RECOMMENDATION: ratify the flat evaluated-datums form and document it (vals ARE evaluated,
first-match, structural `=`); `:else`/`else` legal ONLY in final position (else ERROR); Scheme
datum-list clause syntax → clear parse-time ERROR ("use flat pairs"). Keyword/string punning
follows R21 and gets documented.
- Churn: low. Findings: core case-else-position / case-dialect. STATUS: PROPOSED
### R11. `do`
- Current: `do` is a begin-alias EXCEPT when its first form's head is a list — then it's a Scheme
do-loop → IIFE misparse.
- RECOMMENDATION: **`do` = begin alias, always.** Scheme do-loop moves to a distinct name
(`do-loop`) or is dropped (named let covers it). Kills the heuristic.
- Churn: low (sweep for real do-loop usage; expected rare). Findings: core do-IIFE. STATUS: PROPOSED
### R12. Quasiquote
- RECOMMENDATION, four sub-rulings:
a. `unquote-splicing` becomes an alias of `splice-unquote` (one-line; kills the silent
zero-splice trap; rename the misleadingly-named tests).
b. Implement standard **depth tracking** (nested quasiquote raises quote depth; `,,x` works).
Hosts agree current shallow behavior is consistent-but-nonstandard — fix at spec level.
c. Quasiquote **traverses dict literals** (`{:k ,v}` works).
d. Splicing a non-list and malformed splice arity → ERROR.
- Churn: low (b is the only subtle one). Findings: core qq-longhand/-depth/-dicts/-splice-nonlist. STATUS: PROPOSED
### R13. Threading `->` / `->>`
- RECOMMENDATION: (a) steps evaluate in CEK frames (bug: guard/IO broken through threading);
(b) a lambda literal as a step = expand-time ERROR; (c) keyword step sugar: `(-> x :k :j)`
`(-> x (get :k) (get :j))` — cheap, expected, kills the `Not callable: nil` trap; (d) remove the
dead `|>` dispatch branch (parser rejects `|` anyway); (e) fix reduce-seeding via R15.
- Churn: low. Findings: core threading-nested-CEK/-lambda-literal/|>-dead/keywords-as-getters. STATUS: PROPOSED
### R14. `match`
- RECOMMENDATION: `(pattern (when cond))` guard clauses either implemented or ERROR — never
silently read as a structural pattern (current). Recommend: implement (small, high value).
Document let-match as dict-destructuring-only with a clear error for list patterns.
- Churn: low. Findings: core match-guards. STATUS: PROPOSED
## D. Calling convention
### R15. Higher-order forms
- RECOMMENDATION, six sub-rulings:
a. Arg-order swap happens ONLY when exactly one argument is callable (components count as
callable); both-callable or neither → ERROR "map: cannot determine function/collection".
b. `(reduce f coll)` (2-arg) = Clojure-style fold (first element as init, empty coll → error
unless f has identity? keep simple: empty → ERROR); `(reduce init f coll)` and threaded
`(-> init (reduce f coll))` work via the one-callable rule in (a).
c. Data-first with extra args = ERROR (today silently dropped).
d. Multi-collection map coerces every collection with seq-to-list (strings/vectors), zips to
shortest (already); map over a dict iterates `(k v)` pairs.
e. HO names are first-class: `map` etc. in value position resolve to real closures so
`(define f map)` / `(apply map …)` work.
f. Zero/one-arg HO calls = arity ERROR (today silently `()`).
Also fix the O(n²) accumulation (implementation, not semantics).
- Churn: medium — (a) changes behavior for ambiguous calls, sweep needed. Findings: core reduce-2arg /
reduce-swap / swap-drops-args / HO-not-first-class / ho-cryptic-errors / multi-coll / zero-arg;
J7 (VM parity). STATUS: PROPOSED
### R16. `apply`
- Current: native never spreads; WASM spreads 2-arg; test runner has a third behavior (three-way
divergence, F-3 + core corrected finding).
- RECOMMENDATION: **R7RS**: `(apply f a b … rest-list)` spreads, leading args prepended. All
surfaces align; strict checks fire through apply (R25).
- Churn: low (today it mostly errors). STATUS: PROPOSED
### R17. Arity checking (too-few args)
- Current: missing params silently nil-fill (this is load-bearing: 1-arg `(assert x)` works only
via nil-fill); too-many errors.
- RECOMMENDATION: **ERROR on too-few** as well, with `&optional`/`&key`/`&rest` as the explicit
mechanisms. Sweep required (harness `assert`, any nil-fill reliance). If the sweep turns up
heavy reliance, fallback position: keep nil-fill but document it loudly and make strict mode
error. Primary recommendation stands: error.
- Churn: **high** — flagged as the riskiest ruling; do the sweep before ratifying. Findings: core
strict-too-few / harness-assert nit. STATUS: PROPOSED
## E. Keywords, equality, types
### R18 (=R21 referenced above). Keywords
- RECOMMENDATION: ratify current model — keywords self-evaluate to their string name; keyword-ness
exists only in unevaluated AST. Consequences made explicit: `(keyword-name :k)` needs a quote;
`"keyword"` is REMOVED from the strict type system; case/dict punning documented. NOT callable
(R13c covers the getter idiom).
- Churn: zero (docs + removing a dead type branch). STATUS: PROPOSED
### R19. Equality
- RECOMMENDATION (low-churn variant, chosen deliberately over full R7RS split):
a. `=` stays deep structural equality (alias equal?) — ubiquitous in the corpus; add the missing
**Char arm** (today `(= #\a #\a)` → false) and any other missing type arms; document that
`(= 1 1.0)` → true (numeric value equality inside =).
b. Add real `eqv?` (identity + exact numeric/char equality) and `eq?` (alias identical?) as
kernel primitives — they are spec-declared today but implemented NOWHERE.
c. Comparisons `< > <= >=` become n-ary chained (R7RS); `=` stays 2+-ary deep.
d. If content-addressing ever needs exactness-distinguishing equality, that's `eqv?`, not `=`.
- Churn: low. Findings: core eq?/eqv?-missing, =-binary, char-equality; P5. STATUS: PROPOSED
### R20. Strict typing
- RECOMMENDATION: (a) checks move to the continue-with-call/vm_call chokepoints → HO callbacks,
apply, components, => receivers all covered; (b) unknown type name at declaration = ERROR;
(c) `"component"` becomes a real type branch; `"keyword"` removed (R18); (d) `(:as type)` param
annotations become the declaration channel (deprecate the name-keyed global dict, which is
trivially evaded and inherited by shadowers); (e) strict errors are catchable conditions (R7);
(f) set-prim-param-types! merges and validates; (g) return types: explicitly out of scope now.
- Churn: low-medium. Findings: core strict-* family (8 findings). STATUS: PROPOSED
## F. Numbers
### R21. Integer model & overflow
- Current: native = int63 with overflow-promote-to-float on + and * but silent WRAP on expt;
WASM = 32-bit silent wrap (F-1 — production browsers!); JS bundle = float64.
- RECOMMENDATION: spec defines SX integers as **exact within ±2^53** (the portable range);
arithmetic that exceeds the host's exact range **promotes to float** (never wraps) — `expt`
included. WASM must be fixed to match (js_of_ocaml int64/boxed or explicit overflow checks) —
hosts lane feasibility-checks the mechanism; silent 32-bit wrap is a bug under any ruling.
Values beyond 2^53 must not be trusted exact across the wire.
- Churn: low at spec level; WASM fix is real hosts work. Findings: F-1, P4, core expt. STATUS: PROPOSED
### R22. Division & zero
- RECOMMENDATION: integer `/`, `mod`, `quotient`, `remainder` by zero = catchable SX condition
(today: raw OCaml Division_by_zero for mod/quotient, silent `inf` for /); float ops keep IEEE
(inf/nan). `/` doc fixed: returns int when exact, float otherwise (current behavior ratified).
- Churn: low. Findings: core div-by-zero, /-doc. STATUS: PROPOSED
### R23. Float text & wire
- RECOMMENDATION: **shortest-round-trip printing everywhere** (native `%g` 6-sig-digit printing is
a wire-corruption bug — P1); `inf`/`-inf`/`nan` are THE wire tokens on all hosts (P10); `round`
stays half-away-from-zero, documented (R7RS banker's rejected: churn without benefit);
`inexact->exact` rounding behavior kept + documented; `str 1.0` → keep `"1"` but canonical/wire
serializers must preserve the float/int distinction (`1.0` serializes as `1.0`).
- Churn: low. Findings: P1, P10, core round/float-rendering; canonical CID determinism. STATUS: PROPOSED
### R24. Rationals
- RECOMMENDATION: `string->number` parses `"1/2"`; `(/ 1 3)` stays float (rationals remain opt-in
via make-rational) — documented; radix arg restored by fixing the r7rs.sx shadow (C2).
- Churn: low. STATUS: PROPOSED
## G. Strings
### R25. Unit semantics
- Current: native counts UTF-8 bytes (substring can split codepoints → invalid UTF-8); JS counts
UTF-16 units; constructors are codepoint-aware. Project style mandates UTF-8 text everywhere.
- RECOMMENDATION: **codepoint semantics** for length/substring/index/ref at the spec level; kernel
implements UTF-8-aware ops. Accept the perf cost (or add byte-* variants for hot paths later).
- Churn: medium (kernel work + any code relying on byte counts). Findings: core UTF-8 family, P6. STATUS: PROPOSED
### R26. Case mapping
- RECOMMENDATION: kernel `upcase`/`downcase`/`upper`/`lower` are **ASCII-only, documented** (full
Unicode case tables deferred; JS's full-Unicode behavior dies with D1). Aliases exist on all
surfaces (P11).
- Churn: zero. STATUS: PROPOSED
### R27. `split` and escapes
- RECOMMENDATION: `split` = literal substring separator, keeps empties, empty separator → chars
(ratifies native; pin with the multi-char test that history shows is needed). String escape
table is normative: `\n \t \r \\ \" \uXXXX(validated: 4 hex digits, scalar value, else ERROR)`;
**unknown escape = parse ERROR** (kills the native-keeps-backslash vs guest-drops-it silent
divergence, C25 direction fight).
- Churn: low. Findings: core split note, \u family, unknown-escape divergence; C25. STATUS: PROPOSED
## H. Collections, nil, dicts
### R28. nil vs empty list
- Current: distinct values in the reader/serializer; `(cons 1 nil)``(1)` on native (nil-as-
empty in constructors); read ops inconsistent (`first nil` → nil but `reverse nil` → error).
- RECOMMENDATION: nil and `()` remain **distinct values**; collection READ ops uniformly
**nil-pun** (treat nil as empty: first/rest/nth/last/reverse/len/empty? all accept nil);
constructors keep nil-as-empty seeding (cons/append onto nil). `nil?``empty?` preserved.
- Churn: low (only un-errors cases). Findings: core nil-tolerance; P7/P8 arms. STATUS: PROPOSED
### R29. Dict ordering
- RECOMMENDATION: **insertion order preserved** — iteration, keys/vals, and serialization (OCaml
Hashtbl replaced with an insertion-indexed structure; keys-reversed bug dies). CANONICAL form
always sorts keys independently (already true in the CBOR/CID layer). Duplicate literal keys:
last-wins, documented.
- EMPIRICAL NOTE (quick-wins batch, 2026-07-03): an interim sorted-keys change broke 4 render
tests — attr emission order flows through dict_keys and the tests PIN source-order attributes
(`width` before `height` etc.). So the current reverse-ish order is load-bearing for render;
any change here must land together with the render-attr ordering contract. Reverted; do not
change keys order except via this ruling.
- Churn: medium (kernel dict rework) but pays across wire/golden/cache findings C27/P9/core-keys. STATUS: PROPOSED
### R30. Small-primitive contract fixes (spec already says; hosts violate)
- RECOMMENDATION: ratify the spec text and fix: `contains?` on dicts = key check; `merge` skips
nil; `into` native on the kernel; `sort` takes an optional comparator, compares int/float
numerically, stable; `get` returns a STORED nil (default only when key absent); `zip-pairs` =
sliding window per spec (kernel currently chunks); `(max)`/`(min)` zero-arg = ERROR.
- Churn: low each. Findings: core contains?/sort/keys; P2/P3/P8/P12; JS `get` arm. STATUS: PROPOSED
### R31. `append!` and mutation
- Current: silently no-ops on ANY derived list (map/filter/rest/reverse output) — worst silent-
data-loss finding in the primitives sweep.
- RECOMMENDATION: `append!` **ERRORS on non-mutable lists** immediately (honest), and is
deprecated in favor of persistent `append` + a real mutable vector/buffer for accumulator
idioms. Sweep the corpus (it's a known accumulator idiom in loops).
- Churn: medium (idiom sweep). Findings: core append!. STATUS: PROPOSED
## I. Parser & wire
### R32. One token grammar
- RECOMMENDATION: publish the normative ident/number classifier in spec/parser.sx and make every
surface bind THE SAME table (today: four divergent tables → same source, different ASTs).
Specific token rulings: maximal-munch then classify (`1+`, `a,b` are symbols — ratifies native);
hex/binary/octal `#x/#o/#b`-style and `0x10` accepted, documented; `inf`/`nan`/`-inf` are number
literals (reserved, not idents); `1e` and other malformed numbers = parse ERROR (never nil);
unicode identifiers **allowed** (UTF-8 letters — the docs mandate UTF-8 text; native reader
extends its charset); `$`/`|` NOT ident chars; `.` IS a valid symbol (ratifies native; JS4 dies
with D1); `#t`/`#f` = boolean literals on all surfaces.
- Churn: medium (native reader charset + guest table sync). Findings: core parser-divergence
family; C1b (unicode symbol kills server — fixed by charset + C1 try-wrap); JS4. STATUS: PROPOSED
### R33. Reader extensibility & comments
- RECOMMENDATION: implement the `#name` reader-macro registry on the kernel (spec documents it;
only JS has it today) — small, and sx-pub extensibility wants it. `#;` datum comment valid
before `)` and at EOF (standard). `#|…|` stays a RAW STRING (documented loudly as not-a-block-
comment); no block comments.
- Churn: low. Findings: core reader-macro/datum-comment/raw-string. STATUS: PROPOSED
### R34. Dict literals & serializer round-trip
- RECOMMENDATION: dict literal keys must be keyword/string/symbol — anything else is a parse ERROR
on every parser (guest currently stringifies `{1 2}` silently); odd form count gets a "dict
needs key-value pairs" error. Serializer: dict keys escaped/round-trippable (today unparseable
output for non-ident keys — also a CID hazard); chars serialize by codepoint (`#\é` readable
back once R25 lands); PROPERTY TEST: `parse(serialize(x)) = x` for the full value lattice, run
on both kernels.
- Churn: low. Findings: core serializer-dict-keys / multibyte-chars / dict-edges. STATUS: PROPOSED
### R35. Canonical form & CIDs (sx-pub-critical)
- RECOMMENDATION: the **native CBOR/CID path is normative** (key-sorted, verified native==WASM,
F-3). The canonical TEXT form is defined as: sorted keys, shortest-round-trip floats with
preserved int/float distinction, fully-escaped strings, and is a fixed point
(canonical(parse(canonical(x))) = canonical(x)) — property-tested cross-kernel. spec/canonical.sx
either becomes a tested mirror of the native path (fix its runner-only helpers) or is deleted;
two silently-diverging implementations is the one unacceptable state.
- Churn: low-medium. Findings: core canonical family; F-3; P9/C27 (via R29). STATUS: PROPOSED
### R40. Primitive naming & small-default unification (answers the hosts handoff list)
- RECOMMENDATION: one canonical name registry in spec/primitives.sx; per-host aliases die (with
D1 most of these resolve to "make it native on the kernel"): `json-encode`/`json-parse` are
KERNEL primitives (not IO-bridge helpers — today unavailable sandboxed); `regex-*` is the
canonical family name; `parse`/`sx-parse``sx-parse` canonical, `parse` alias documented;
1-arg `(range n)` = 0..n-1 (ratifies native); `parse-int`/`string->number` on failure → nil
(ratifies native, never 0); `format` and the stdlib move for real (the primitives.sx header
claims a stdlib migration that never happened — make the header true or revert it) and
spec/stdlib.sx loads in production (today `format` is unresolved on the server).
- Churn: low. Findings: F-9 naming splits, P7 arms, core spec-drift / stdlib-header. STATUS: PROPOSED
## J. Render contracts
### R36. Attribute contract (all four adapters)
- RECOMMENDATION: one contract, HTML-mode's as base: boolean-registry attrs — false/nil omit,
anything else emits bare name (SX truthiness, documented footgun stands); non-boolean attrs —
value stringified INCLUDING `"true"`/`"false"` (DOM adapter aligns — C19/core, found by both
lanes); attribute NAMES validated `[A-Za-z_:][A-Za-z0-9_:.-]*` else ERROR (kills spread-dict
injection); nil attr value omits the attribute.
- Churn: low. Findings: core attr-name-injection / bool-footguns / dom-html-parity; C19. STATUS: PROPOSED
### R37. Raw-text elements & voids
- RECOMMENDATION: `<script>`/`<style>` children are NOT entity-escaped; instead the renderer
ERRORS if content contains `</script`/`</style` (raw! unchanged as the explicit bypass) — HTML-
correct and injection-safe, and stops corrupting legitimate inline JS/CSS. Void-element registry
completed (area/base/embed/param/track added to HTML_TAGS); children passed to a void element =
ERROR (today silently dropped). Component render gets a depth limit (default 512) with a clear
error. `is-render-expr?` either wired in (html:/custom elements) or deleted.
- Churn: low. Findings: core script-style / void-elements / recursive-component / is-render-expr. STATUS: PROPOSED
### R38. aser wire format
- RECOMMENDATION: list-valued keyword args serialize quoted (`:items (quote (…))`) so the wire
form re-evaluates to the same value; contract documented: components unexpanded, control flow
evaluated, all VALUES must round-trip through parse+eval. Property-test aser output re-evaluation.
- Churn: low. Findings: core aser-list-kwargs; S14 (deep-nesting parity — add the depth-2 test). STATUS: PROPOSED
## K. Signals (spec-level semantics)
### R39. Reactive semantics
- RECOMMENDATION: (a) change detection by `=` (deep equality — needs R19's Char arm etc.), not
physical identity — kills the every-reset-notifies behavior; (b) `batch` is unwind-safe
(depth decremented on any exit — today one throw wedges all reactivity forever); (c) notify is
glitch-free: two-phase (mark dirty → recompute in topological order) so diamonds recompute once;
(d) dispose-computed actually unsubscribes (bug); (e) effect cleanup cleared after invocation
(bug). Ratify as the documented reactive contract with tests (today: zero coverage of these).
- Churn: low-medium (topological notify is the real work). Findings: core signals family (5). STATUS: PROPOSED
---
## Ratification checklist
1. Flip each STATUS; AMENDED rulings get their text edited in place.
2. High-churn rulings needing a pre-ratification sweep: **R17 (arity)**, R9 (cond clause-mode
usage), R31 (append! idiom), R15a (ambiguous HO calls).
3. On ratification: move to spec/RULINGS.md; every ruling becomes (a) a conformance battery row
(native + WASM), (b) a fix ticket if behavior changes, (c) a docs line in CLAUDE.md's rewrite.
4. Rulings deliberately NOT made here (need your call, no strong recommendation):
- Whether rationals should ever be the default result of exact `/` (R24 keeps float).
- Whether to pursue full-Unicode case mapping (R26 defers).
- Whether `do-loop` (R11) is worth keeping at all vs deleting Scheme do entirely.
- JIT re-enable timeline (J1J8 are preconditions, not rulings).

View File

@@ -0,0 +1,188 @@
# SX Conformance Review — cross-host agreement + test adequacy
Axis: CONFORMANCE (do the hosts agree, and would the suite catch it if not?). Sibling lanes: core semantics, host implementations.
Date: 2026-07-03. All test runs and probes executed this session (bounded with `timeout`; no shared sx_server touched — probes used freshly-spawned bounded `sx_server.exe` instances).
CONFIRMED = reproduced here; SUSPECTED = static reasoning, probe proposed. Severity S1 critical → S4 minor.
Probe artifacts: `/tmp/claude-0/-root-rose-ash/9a04ba52-7bf4-476d-99ea-04f84bff1359/scratchpad/{probes,prims}/`; raw suite logs `/tmp/sx-review/*.log`.
---
## 0. Ground truth: what the hosts actually are (the briefed picture is stale)
- **OCaml native kernel** (`hosts/ocaml`, sx_server.exe) — canonical evaluator, essentially green on the corpus.
- **OCaml WASM kernel** (`shared/static/wasm/sx_browser.bc.wasm.js`, js_of_ocaml build of the same kernel) — **what production browsers actually run** (served at lib/host/blog.sx:1910). Served artifact verified identical (md5) to freshly built one — no artifact staleness.
- **JS-transpiled bundle** (`hosts/javascript``shared/static/scripts/sx-browser.js`) — legacy bootstrapped evaluator; still built and gated by `scripts/sx-build-all.sh`; catastrophically red (below). Not referenced by any served page found.
- **Python host** (`hosts/python`) — vestigial: test runner deleted in d735e28b; `SX_USE_OCAML=1` in every compose file; bootstrap output consumed by nothing. NOT a live evaluator target. BUT `shared/sx/parser.py` (an independent Python SX reader/serializer) **is live in production plumbing** — see F-8.
- **hosts/native** — Cairo pixel-renderer (separate host concept), single smoke test, not a corpus runner.
## Suite results (this session)
| Runner | Briefed | Actual |
|---|---|---|
| JS standard (`node hosts/javascript/run_tests.js`) | ~747 green | **2596 pass / 2490 FAIL** (5086) |
| JS full (`--full`) | ~870 green | **2453 pass / 3203 FAIL** (5656) |
| Python | ~744 green | **runner deleted** (d735e28b) — zero tests |
| OCaml (`run_tests.exe`; no `--full` flag exists) | ~1080 green | **5762 pass / 274 FAIL** |
| OCaml WASM kernel | — | **corpus never runs on it** (F-2) |
Rebuilding the JS bundle fresh from today's spec reproduces the JS failures exactly (2596/2490) → red JS is real, not a stale artifact.
---
# CONFIRMED findings (most severe first)
## F-1 [S1, confidence high] Server and browser disagree on integer arithmetic — silently
The WASM kernel (the artifact every production browser loads) has 32-bit int semantics (js_of_ocaml); native is 63-bit. Same expression, same "canonical kernel", different answers, no error:
| Expr | native sx_server | WASM SxKernel (shipped) | legacy JS bundle |
|---|---|---|---|
| `(* 99999999 99999999)` | 9999999800000001 | **1674919425** (32-bit wrap) | 9999999800000000 (float64) |
| `(+ 9007199254740992 1)` | 9007199254740993 | **0** (literal truncated mod 2^32) | 9007199254740992 |
| `(expt 2 62)` | -4611686018427387904 (int63 wrap) | **0** | 4611686018427388000 (float) |
| `(* 999999999999 999999999999)` | 2003762205206896641 (silent int63 wrap) | 9.99999999998e+23 (goes float) | 9.99999999998e+23 |
Three hosts, three different answers on the same input. Repro: `scratchpad/probes/run_wasm.js probes.txt` vs `sx_server.exe (eval ...)`. Any SSR-computed value re-derived client-side (pagination totals, ids, hashes-by-arithmetic, seat counts) can differ. Note native has two hazards of its own: silent int63 wraparound and reduced float print precision (`(+ 0.1 0.2)` prints `0.3`).
Fix ownership: hosts lane (int64/boxed ints or explicit overflow policy in the WASM build); THIS lane's ask: a numeric-tower differential suite that runs on every shipped artifact.
## F-2 [S1, high] The shipped browser kernel never runs the corpus
No runner feeds spec/tests through `sx_browser.bc.js`/`.wasm.js`. Its entire test surface: `test_boot.sh` (require() smoke), `test_wasm_native.js` (test-framework.sx + web/tests/test-wasm-browser.sx only), `tests/node/run-sx-tests.js` (one deftest file). Both "conformance" runners test kernels that are NOT the shipped browser artifact. F-1 and F-3 existed undetected precisely because of this. The probe harness at `scratchpad/probes/run_wasm.js` shows a corpus runner on the WASM kernel is trivially buildable (SxKernel.eval works headless in node with the test_boot.sh stub block).
## F-3 [S1, high] Native and WASM disagree on `apply` and dict key order (same kernel family!)
- `(apply + (list 1 2 3))` → native: **error** "Expected number, got list" (apply does not spread); WASM: **6** (spreads, 2-arg form only; `(apply + 1 (list 2 3))` errors "apply: function and list"). Legacy JS bundle spreads fully (both forms → 6). Three different apply semantics.
- `(assoc {:a 1} :b 2)` → native `{:b 2 :a 1}` (new keys prepend); WASM `{:a 1 :b 2}` (insertion order); JS bundle insertion order. `merge` same. Dict iteration/serialization order differs server-vs-browser — anything ordering-dependent (rendered attr order, keys/vals iteration, golden-file comparisons) diverges.
- Defused non-finding: `cid-from-sx` canonicalizes — CIDs of `{:a 1 :b 2}` and `(assoc {:a 1} :b 2)` are identical on native AND WASM (probe verified). Content addressing is order-safe.
## F-4 [S1, high] JS host fails ~half the shared corpus; the documented build gate is red
spec/tests discovery is byte-identical in both runners (82 files both), so results are directly comparable: OCaml ≈ green, JS 2490 FAIL / 5086. `scripts/sx-build-all.sh` (`set -euo pipefail`) runs `node hosts/javascript/run_tests.js --full` as a gate → **the documented full pipeline currently exits FAIL** (run_tests.js exits 1). Either the JS host is dead (remove from gate/docs) or alive (≈2500 tests behind). Today no automated gate enforces cross-host agreement at all.
## F-5 [S1, high] The corpus is not host-neutral: OCaml runner preloads libraries + services `import`; JS runner does neither
- OCaml `make_test_env` (run_tests.ml:3700-3806) unconditionally preloads r7rs/render/canonical/adapters, forms/engine/router/orchestration, bytecode/compiler/vm, stdlib, signals, freeze, content, **parser-combinators, graphql, graphql-exec**, dom/browser, and the full **hyperscript stack**.
- OCaml services `(import ...)` IO-suspensions at runtime (run_tests.ml:2148-2169, 2178-2272). The JS runner calls `Sx.eval` directly — **no suspension/resume loop**, so `import` can never complete on JS.
- Result: hundreds of "Undefined symbol: gql-tokenize / pc-* / hs-*" JS failures (js-standard.log) that are runner-environment artifacts layered on top of real gaps. Tests exercising import/suspension (test-import-bind, test-io-suspension, test-coroutines, test-modules) do not test the same thing per host.
## F-6 [S1, high] Whole test directories are gated on only one host — or none
From run_tests.js:300-448 and run_tests.ml:3959-4001:
- **lib/tests (11 files)** — continuations, continuations-advanced, freeze, signals-advanced, stdlib, stepper, tree-tools, types, vm, vm-closures, vm-primitives — auto-run **only on JS `--full`** (OCaml runs them only if named explicitly). Their only automated gate is the runner that's red: e.g. test-continuations.sx on JS: 475 pass / 54 FAIL; algebraic-data-types: 53 FAIL. **Effectively ungated.**
- **web/tests (13 runnable)** — adapter-html, aser, deps, engine, examples, forms, orchestration, page-helpers, relate-picker, router, signals, swap-integration, tw-layout — auto-run **only on OCaml**.
- **6 web/tests run on NO standard runner**: test-adapter-dom, test-boot-helpers, test-cek-reactive, test-handlers (module-loaded, not run as suite), test-layout, test-wasm-browser (only via test_wasm_native.js).
- **OCaml foundation tests** (run_tests.ml:1178+) — native-only unit tests, no cross-host equivalent (fine, but they're invisible to other hosts by design).
- `spec/tests/test.sx` matches neither filter (`test-*.sx`) → **runs nowhere**.
- Guest-language suites (lib/scheme/tests/records.sx, lib/haskell/tests/records.sx, …) sit outside all discovery — each loop self-tests ad hoc, no aggregate gate.
## F-7 [S1, high] Features pass their tests only inside the test runner's private environment — both directions
- **OCaml side**: `values`, `promise?`, `make-promise`, `force` are bound **only in run_tests.ml** (lines 1131-1167). Probes: `(force (delay (+ 1 2)))`, `(values ...)`, `(let-values ...)` → "Undefined symbol" on BOTH production surfaces (native sx_server epoch env AND WASM SxKernel). ~271 promises/values assertions pass against an environment that exists nowhere in production.
- **JS side**: run_tests.js injects `equal?`, `apply`, `env-*`, `render-html`, `make-continuation`, upcase/downcase, and a **fake sha3-256 stub** (lines 80-160) into the test env. The real browser bundle lacks these (probe: `equal?` → Undefined symbol on the bundle). Coverage numbers overstate both shipped artifacts.
## F-8 [S1→S3 itemized, high] Differential probes, OCaml native vs JS bundle: 130 exprs, 98 identical, real divergences
(harness: `scratchpad/probes/`; numeric rows already in F-1)
Core data semantics:
| Expr | OCaml native | JS bundle |
|---|---|---|
| `(cons 1 nil)` | `(1)` (nil ≡ empty list) | `(1 nil)` (nil is an element) — **foundational** |
| `(range 3)` | `(0 1 2)` | `()` — single-arg range returns empty on JS |
| `(round -2.5)` | -3 (half away from zero) | -2 (JS half-up) |
| `(str (list 1 2 3))` | `"(1 2 3)"` | `"1,2,3"` (Array.toString leak) |
| `(str {:a 1})` | `"{:a 1}"` | `"[object Object]"` |
| `(len "héllo")` / `(len "👍")` | 6 / 4 (bytes) | 5 / 2 (UTF-16 units) — string indexing math differs on all non-ASCII; neither counts codepoints |
| `(upper "straße")` | "STRAßE" (ASCII-only) | "STRASSE" (Unicode) |
| `(reverse "abc")` | error (lists only) | `"cba"` |
| `(sort lst cmp-fn)` | error "sort: 1 list" | sorts with comparator |
| `(max)` | error | `-Infinity` |
| `(/ 1 0)` | `inf` | `Infinity` (print) |
| `(+ 0.1 0.2)` | prints `0.3` | prints `0.30000000000000004` |
Agreements worth recording (hosts agree, docs/spec don't): `case` uses flat pairs `(case x 1 "one" 2 "two" :else ...)` — the CLAUDE.md-documented clause-list syntax errors identically on all three kernels; `(join sep coll)` is sep-first on both; `int?` undefined on both (it's `integer?`); `string->number` radix arg unsupported on all hosts (and the corpus asserts it — a live OCaml FAIL, see F-10).
## F-9 [S2, high] Primitive-set parity is broken at scale (full lists in `scratchpad/prims/report.txt`)
- **194 OCaml-core names missing from the JS bundle**, incl. language-level: `take drop zip unique init char-at substr upcase downcase equal? identical? dict-get dict-has? dict-delete! parse parse-safe parse-float escape-string compile compile-module` + records constructors + math extras. Dynamically verified samples throw "Undefined symbol" on JS.
- **Naming splits where both have the capability**: OCaml `regex-*` vs JS `regexp-*`; OCaml `parse` vs JS `sx-parse`; OCaml `json-encode` vs JS `json-stringify`.
- **24 user-facing JS names missing on the OCaml kernel**: `format values force promise? make-promise call-with-values json-parse json-stringify format-date parse-datetime pluralize escape strip-tags error-message env-parent char-code-at now-ms satisfies? …` (deps-check verified). Note the epoch-mode server doesn't load `spec/stdlib.sx`, so even `format` is unresolved in production server env.
- **Declared in spec/primitives.sx but implemented NOWHERE: `eq?`, `eqv?`** (202 spec-declared total; 9 missing from OCaml, 6 from JS).
- JS bundle oddities: 6 statically-assigned PRIMITIVES keys absent at runtime (`loop raw-loop reactive-text read-char-name-loop read-map-loop scan`).
## F-10 [S2, high] The OCaml suite is not green, and a permanent red band is normalized
274 FAIL on the canonical host: 272 hs-upstream-* (fetch/socket/runtimeErrors/asExpression/...). If browser-only, they should be skip-listed like the 6 web tests — a permanently red FAIL column trains everyone to ignore failures. The 2 core failures are live: `can-map-an-array` ("map with block") and `string->number` radix (corpus asserts a feature no host implements). Also note hs-upstream pass/fail sets differ wildly between OCaml (272 fail) and JS (~1900 fail) — the suites' shared-corpus value is currently nil for hyperscript. **UPDATE (S-2 probe, F-18): ≥118 of the 272 red hs tests PASS on the shipped WASM kernel + happy-dom — the red band is mostly mock-DOM environment deficiency, not engine failure.**
## F-11 [S2, high] Boundary validation silently disabled in production; stale imports broke the Python-side test files
- `shared/sx/boundary.py:34` imports `.ref.boundary_parser` — moved to hosts/python in 7036621b → ImportError swallowed (boundary.py:44-49) → **boundary type validation is a silent no-op**.
- `shared/sx/tests/test_bootstrapper.py:149`, `test_parity.py:733` import deleted `shared.sx.ref.*` — those tests cannot even load.
- `shared/sx/async_eval.py` (fallback evaluator) is macro-crippled by design (raises "sx_ref.py has been removed"); `resolver.py` fully stubbed. Fallback path SX_USE_OCAML=0 is non-functional — fine if intentional, but it's still importable wiring.
(Hand-off: hosts lane for the fix; kept here because it *is* the boundary-conformance enforcement.)
## F-12 [S2, high] The live Python SX reader diverges from the OCaml reader
`shared/sx/parser.py` runs on every production request (serialize/parse plumbing under SX_USE_OCAML=1: handlers.py:191,211; helpers.py:449). Probes vs OCaml reader:
- Python CANNOT read: `#t`/`#f` (reader-macro error), `#\a` chars, dotted pairs `(a . b)`, `.5` — all valid on OCaml. If any OCaml-serialized value containing a char (`#\a`) crosses the Python plumbing, Python errors.
- Dict serialization order differs (Python insertion `{:a 1 :b "two"}` vs OCaml reversed `{:b "two" :a 1}`) — hazard for any wire-text comparison/caching (CIDs themselves are safe, F-3).
- Agreements: escapes (\n \t \" \\), floats/exponents, keywords, `[..]`-as-list, quote/quasiquote sugar.
## F-13 [S3, med] The checked-in generated kernel is not reproducible from today's spec+bootstrap
`python3 hosts/ocaml/bootstrap.py --output <scratch>` vs checked-in `hosts/ocaml/lib/sx_ref.ml`: 360 diff lines. Function inventory is identical (the delta is `let` vs `and` restructuring + block moves — an older generator produced the checked-in file); no semantic difference detected, and tree==\_build. But CI (.gitea/Dockerfile.test steps 3-4) re-bootstraps sx_ref.ml from spec and recompiles — so CI's binary is built from different generated source than the dev tree's. Reproducibility should be a checked invariant (regen + diff in CI).
## F-14 [S3, high] Doc drift — CLAUDE.md describes a system that no longer exists
- ~~"Canonical SX semantics in `shared/sx/ref/*.sx`"~~ — **FIXED 2026-07-03**: CLAUDE.md now points at `spec/` and documents the bootstrap chain + corrected island rules (sequential `let`, implicit begin). Remaining items below still stand.
- Documents a Python host + ~744 tests — deleted.
- Documents `case` clause-list syntax — all hosts reject it (flat-pair syntax is real, spec/tests/test-cek-advanced.sx:549).
- Briefed suite counts (747/870/744/1080) are months stale; corpus is 5-6k assertions.
- `spec/primitives.sx` header claims stdlib functions "moved to stdlib" but spec/stdlib.sx defines only `format` (and production doesn't load it).
## F-15 [S4, med] Housekeeping
- Stray recursive tree `hosts/ocaml/hosts/ocaml/hosts/ocaml/bin/sx_server.ml` (accidental copy) — confuses greps and could get compiled/edited by mistake.
- `spec/tests/test.sx` dead filename (see F-6).
- JS runner's fake `sha3-256` stub returns non-SHA3 values — any hash-shape-only assertion passes; any real-value assertion would mysteriously fail JS-only.
---
# Hyperscript on the shipped WASM kernel (S-2 probe, executed 2026-07-03)
Method: new probe harness `scratchpad/hsprobe/run_hs_wasm.js` — loads the SHIPPED kernel (`shared/static/wasm/sx_browser.bc.js`) + shipped platform in node/happy-dom via `tests/node/sx-harness.js`, preloads the same files as the OCaml runner (test-framework, spec/harness, web/lib/dom, the lib/hyperscript stack), and runs the hs spec corpus with streamed per-test output. Per-test diff vs the native run in `scratchpad/hsprobe/compare2.py`; raw logs `/tmp/sx-review/wasm-hs-*.log`, comparisons `/tmp/sx-review/wasm-hs-compare.txt`, `wasm-hs-pure2-compare.txt`. Coverage: 1,553 tests measured (tokenizer/parser/compiler/runtime 206; behavioral 1,250 of 1,514 — one shard timed out; conformance 97 of 222 — timed out; conformance-dev/sandbox/diag/integration/htmx unmeasured). Bottom line: **the kernel itself conforms; the packaging and the test environments are where the defects are.**
## F-16 [S1, high, CONFIRMED] Shipped browser hyperscript cannot call functions: `host-call-fn` is referenced but defined nowhere in the shipped stack
- The shipped stack ships a full hs engine (`shared/static/wasm/sx/hs-{tokenizer,parser,compiler,runtime,integration,htmx,worker,prolog}.sx(+.sxbc)`), but `hs-runtime.sx(bc)`/`hs-integration.sx(bc)` reference `host-call-fn`/`host-call-fn-raising`, which only `run_tests.ml:3564` defines (the test runner's mock host bridge). The shipped platform (`sx-platform.js`) registers `host-call` / `host-new` / `host-get` etc. — NOT `host-call-fn`.
- Reproduced: running the behavioral corpus on the shipped stack as-is → **536 pass / 978 fail, 906 of them "Undefined symbol: host-call-fn"** (wasm-hs-behavioral.log). Any hyperscript that invokes a function or method would fail the same way in a real browser.
- The OCaml runner's own comment calls this binding "the single biggest gap: ~900 behavioral tests failed" — the fix was made in the test runner only, never in the shipped platform. Textbook case of F-7 (feature exists only in the runner's private environment).
## F-17 [S1, high, CONFIRMED] Shipped `hs-runtime.sx` is missing the `(jit-exclude! "hs-*")` guard
- `lib/hyperscript/runtime.sx` ends with `(jit-exclude! "hs-*")` + a comment explaining the hs recursion web **miscompiles under the bytecode JIT** (the parser-combinator JIT bug) and must stay CEK-interpreted.
- `shared/static/wasm/sx/hs-runtime.sx` is byte-identical EXCEPT this guard is absent (10-line diff; the other 5 hs modules are identical copies). The browser is precisely the sxbc/JIT-heavy environment. So the tested configuration (JIT-excluded) is not the shipped configuration (JIT-eligible). The lib→wasm/sx sync that copied these files dropped exactly the safety-critical line.
## F-18 [S2, high, CONFIRMED] The kernel conforms; the native runner's mock DOM is the outlier — and the "272 red" band mostly indicts the test env, not the code
Per-test diff, shipped-kernel+happy-dom (with the host bridge mirrored) vs native mock-DOM run:
- Pure pipeline (tokenizer/parser/compiler/runtime, 206 tests): **0 WASM-only failures**; all 51 failures shared with native; 2 native-only.
- Behavioral (1,250 matched): **9 WASM-only failures** vs **118 native-only failures** (WASM+happy-dom passes where the mock DOM fails: fetch 22, toggle 10, on 10, make 8, repeat 6, append 6, ...). Conformance (97 matched): 0 WASM-only, 4 native-only.
- Consequence for F-10: the permanently-red 272 hs failures on the canonical host are largely **mock-DOM environment artifacts** — the engine passes those tests on the shipped kernel with a more realistic DOM. The red band hides real information in both directions.
- The 9 WASM-only failures (listed in wasm-hs-compare.txt): behavior scoping ×4 ("Expected 10/20, got <empty>"), as-fragment conversions ×3 (a `{:__host_handle N}` leaks where a list is expected — happy-dom NodeList boundary), init/where sequencing ×2. Candidates for bisection; none look like core-semantics bugs.
- Also measured: **the js_of_ocaml kernel is ~1-2 orders of magnitude slower** on this corpus than native (conformance ≈24s/test vs seconds for the whole file natively) — worth knowing before making WASM corpus runs a gate.
## F-19 [S3, med, CONFIRMED] hs corpus drift + inverted assert labels
- The shared-failure bucket (~50 behavioral + parser/tokenizer suites) is corpus drift: generated tests (generate-sx-tests.py, "DO NOT EDIT") still expect the old parser AST — `(me)` implicit target and `(. obj k)` — while the current parser emits `(beingTold)` / `poss` (probed identically on native sx_server AND WASM: `(hs-compile "add .foo")``(add-class "foo" (beingTold))` on both). Tests were not regenerated after the parser change.
- `assert=` (spec/harness.sx:31) is `(actual expected msg)` but the generated corpus calls it as `(assert= expected actual)` — every failure message prints Expected/got **swapped**, which materially misled diagnosis during this probe (it makes current-parser output look like the "expected" value).
- `hypersx.sx` in the shipped boot list is NOT the hyperscript engine (it's an sx→hypersx-notation pretty-printer); the actual hs engine modules are shipped but absent from the boot module list (`loadWebStackFallback`, sx-platform.js:670) — load path for hs in production is unclear/on-demand only.
---
# SUSPECTED findings / gaps (probe proposed, not yet reproduced)
## S-1 [S1, med] Everything F-8 lists between native and the JS bundle likely also splits native vs WASM in web-stack .sx code paths not covered by my 130 probes (strings near render, regex at the sxbc layer, signals timing). Probe: run the full probe corpus + web adapter smoke through `run_wasm.js` and diff (harness exists; only 130 exprs run so far).
## S-2 — RESOLVED, promoted to F-16/F-17/F-18 below (hs corpus WAS run against the shipped WASM kernel; see "Hyperscript on the shipped kernel" section).
## S-3 [S2, med] `import`-dependent components behave differently in production browser vs test: OCaml runner resolves imports synchronously from disk; browser resolves via fetch + sxbc bundles (compile-modules.js). No test asserts the browser-side import path resolves the same module set. Probe: compare `sx_build_manifest`/dist sxbc contents vs run_tests preload list.
## S-4 [S3, med] Float printing (`0.3` vs `0.30000000000000004`) means any golden-string test containing floats validates different precision per host — some current "passes" may be precision-coincidences. Probe: grep corpus asserts for float literals with >6 significant digits.
## S-5 [S3, med — partially confirmed] The two JS kernel BUILDS disagree with each other: `spec/tests/test-adt.sx` (algebraic-data-types) passes on the standard bundle but fails 53 assertions on the `--full` build (`--extensions continuations --spec-modules types`). Confirmed from js-standard.log (0 FAIL, PASSes present) vs js-full.log (53 FAIL). So build flags change language behavior within one host — the types/continuations extension modules interfere with ADT/match. Suspected mechanism unverified; probe: bisect by building with each flag alone and running test-adt.sx.
---
# Suite asymmetry — why the counts differ (summary answer to the brief)
- Corpus exploded (hyperscript upstream ports, GQL, parser combinators, regex, records, chars, bytevectors, numeric tower, values/promises...) to ~5-6k assertions; briefed counts are months stale.
- JS standard (5086) = spec/tests only; JS full (5656) = + lib/tests + bigger kernel build.
- OCaml (6036) = spec/tests + web/tests + native foundation tests, but NOT lib/tests.
- Python = 0 (deleted).
- WASM (shipped browser artifact) = ~0 (boot smoke + 2 files).
- **No two hosts run the same file set; only spec/tests overlaps fully — and even there the runner environments differ (F-5, F-7), so "same test passing on two hosts" does not mean "same behavior".**
# Recommended conformance gate (one paragraph for the maintainer)
Run the spec/tests corpus on: native run_tests.exe, the WASM kernel via node (harness pattern: `scratchpad/probes/run_wasm.js`), and (if kept) the JS bundle — from the SAME preload manifest, with runner shims deleted or mirrored into all three; add lib/tests to the OCaml runner; skip-list the browser-only hs suites instead of letting them fail; add a numeric/string/dict-order differential probe file (seed: `scratchpad/probes/probes.txt`, 130 exprs, 32 divergences today) that must be output-identical across kernels; regen sx_ref.ml in CI and diff against checked-in.

717
plans/sx-review/core.md Normal file
View File

@@ -0,0 +1,717 @@
# SX Language Core Review — spec/ semantics
Reviewer axis: LANGUAGE CORE (spec/evaluator.sx CEK machine, parser.sx, primitives.sx,
render.sx, special-forms.sx, eval-rules.sx, stdlib.sx, signals.sx, coroutines.sx, canonical.sx).
Note: brief mentions spec/types.sx — it no longer exists; strict-typing machinery lives in evaluator.sx.
Status: COMPLETE — all 8 dimension sweeps merged (CEK core, env/scope, HO forms,
special forms/macros, parser/serializer/canonical, primitives/stdlib, render modes,
strict typing + signals/coroutines/harness).
TOTALS: 104 CONFIRMED findings (3 critical, 26 high, 40 medium, 2 low-medium, 33 low)
+ 5 SUSPECTED. Every CONFIRMED item has a runtime repro (fresh sx_server.exe unless noted).
All 3 criticals additionally re-verified on the shipped WASM browser kernel (see CROSS-LANE CHECK).
THEMES a ranker should know:
1. **Nested cek-run instead of CEK frames** is one root cause behind ≥4 findings (shift-k
double-execution, threading guard/IO break, + suspected macro/let-values/qq boundaries).
2. **Handler still installed while handler runs** explains the guard/handler-bind hang family.
3. **Name-before-env dispatch** explains the ~60 unshadowable names + HO-not-first-class family.
4. **Global mutable stacks popped only on normal exit** explains provide/winder/batch leak family
(scope stack, *winders*, *batch-depth* — none unwind-safe).
5. **Test-runner-only bindings** make whole suites (values, canonical floats, batch, coroutines)
green for features the shipped runtime doesn't have; one test passes *vacuously because of the
very bug it tests* (signal-return).
6. **Per-host re-bound platform primitives** (parse-number, char-code, escape-string, split, get…)
are the drift engine behind parser AST divergence + harness/runtime divergence.
CROSS-LANE CHECK (vs /tmp/sx-review/hosts.md and conformance.md, done 2026-07-03):
- **All 3 criticals re-verified on the shipped WASM browser kernel** (js_of_ocaml build browsers
actually load; probe harness from the conformance lane): guard re-raise HANGS (node killed at
25s), signal-condition → `42` (same kont drop), shift repro → identical double-execution trace
`(99 ("r=escaped" "after-k" "r=99"))`. Kernel-family bugs: native server AND production browser.
Conformance F-2 (corpus never runs on WASM) explains why nothing caught them there.
- **Not masked by the JIT**: hosts J2/J9 confirm guard-installing lambdas are interpret-only and
any raise/call-cc in JIT'd code falls back to the CEK — my criticals are the live path.
- **Three independent double-side-effect mechanisms now on record**: my shift-k nested cek-run
(critical #3), hosts J1 (`->` miscompiled under serving-JIT, steps re-run), hosts J2
(JIT-fallback re-runs whole call). Same user-visible symptom, three distinct fixes.
- **One of my findings corrected** (apply — see the finding below); expt int63 wrap corroborated
by conformance F-1 (WASM is worse: `(expt 2 62)` → 0); unshadowable-HO finding extended by hosts
J8 (the VM DOES honor local bindings — CEK/VM divergence within one host); render dom/html attr
parity independently confirmed as hosts C19; values/eq?/eqv? runner-only bindings corroborated
by conformance F-7/F-9.
- **No contradiction on canonical/CIDs**: conformance's working `cid-from-sx` is a native kernel
primitive (verified: works with spec/canonical.sx not loaded). My canonical.sx finding concerns
the spec guest implementation — production CIDs bypass it. Two parallel CID implementations,
only the native one exercised; spec-vs-native canonical-form agreement is untested (conformance
F-3 checked native-vs-WASM only).
Verification recipe: `sx_harness_eval` (MCP) cross-checked against fresh real-runtime processes
`printf '(epoch 1)\n(eval "...")\n' | timeout 30 hosts/ocaml/_build/default/bin/sx_server.exe`.
TOOLING CAVEATS found during review (also listed as handoffs): (1) the MCP harness primitive
table diverges from the real runtime; (2) `sx_harness_eval` is NOT a fresh sandbox — state
persists and cross-contaminates calls; (3) sx_read_subtree ignores `path`, sx_read_tree ignores
`max_lines`. All critical/divergent probes were re-verified on fresh `sx_server.exe` processes.
---
## CONFIRMED findings (most severe first)
### [critical] [CONFIRMED×2] Any raise/error inside a guard clause body or handler-bind handler loops forever — handler runs with its own handler frame still installed
- Location: spec/evaluator.sx, `raise-eval` case of step-continue (~4547-4573) + `kont-unwind-to-handler` (236-259); inherited by `step-sf-guard` (~1693)
- What: `kont-unwind-to-handler` returns `{:handler match :kont kont}` where `kont` still contains the matched handler frame; the handler is invoked with that kont. A `raise` inside the handler re-matches the same handler → infinite loop. Not just explicit re-raise: ANY error while a handler/clause body runs (`(error ...)`, a raised different value) hangs instead of propagating. CL/R7RS: handlers run with the enclosing (outer) handler set. `guard` desugars clause bodies to run INSIDE the handler-bind extent (`(__guard-k (cond ...))` — clauses evaluate before the escape), so the memory'd gotcha "`(raise e)` in a guard clause hangs" is exactly this. Contrast: the no-matching-clause auto-reraise is R7RS-correct (`(guard (outer (true outer)) (guard (e ((= e 1) "one")) (raise 2)))` → outer catches 2) because the sentinel re-raise happens after the call/cc return, OUTSIDE handler-bind — which is exactly how clause bodies should also run.
- Repro (bounded CLI, all timeout exit 124): `(guard (e (true (raise e))) (raise 42))`; `(handler-bind (((fn (c) true) (fn (c) (raise c)))) (raise 1))`; same with `(raise "different")` and `(error "again")`.
- Cross-check: reproduced on the shipped WASM browser kernel (hangs, killed at 25s) — affects production browsers, not just the server. Hosts lane J2/J9: guard-installing lambdas are interpret-only, so the JIT never masks this.
- Coverage: test-r7rs.sx guard suite + test-conditions.sx cover happy paths only; no test raises from within a handler. test-cek-try-seq.sx "error in error handler propagates" passes because `cek-try` is a different mechanism.
### [critical] [CONFIRMED] signal-return frame key mismatch drops the caller's continuation — continuable signal/raise-continuable returns the handler value as the WHOLE program's result; the covering test passes vacuously
- Location: spec/evaluator.sx, `make-signal-return-frame` (line 182, stores saved kont under `:f`) vs `signal-return` case of step-continue (~4509-4512, reads `(get frame "saved-kont")`); mirrored in hosts/ocaml/lib/sx_runtime.ml:210-231 (CekFrame get has no `"saved-kont"` mapping → Nil)
- What: the resume kont is always nil, so after the handler returns, its value becomes the terminal value of the entire CEK run — every frame outside the signal site (arithmetic, enclosing lists, asserts) is silently discarded.
- Repro: `(list "outer" (handler-bind (((fn (c) true) (fn (c) 42))) (+ 1 (signal-condition 5))) "end")``42`; expected `("outer" 43 "end")`. Same with `raise-continuable``42`. The shipped test expr `(handler-bind (((fn (c) true) (fn (c) (* c 10)))) (+ 1 (signal-condition 5)))``50` on both CLI and harness, yet the test asserting `51` PASSES under run_tests.exe — the dropped continuation includes the `assert-equal` frame itself, so the assertion never executes (vacuous pass).
- Cross-check: reproduced byte-identically (`42`) on the shipped WASM browser kernel.
- Coverage: test-conditions.sx "signal returns handler value to call site" — passing vacuously; the bug defeats its own test.
### [critical] [CONFIRMED] Invoking a shift-captured continuation uses a nested cek-run — escaping across that boundary re-executes the outer program tail (double execution, duplicated side effects); raising inside a resumed k can't reach outer handlers
- Location: spec/evaluator.sx, `continue-with-call`, `continuation?` branch (~4708-4716): `(let ((result (cek-run (make-cek-value arg env captured)))) (make-cek-value result env kont))`
- What: the nested run's kont ends at the captured frames; (a) a call/cc escape invoked inside the resumed extent rewrites the kont *inside the nested run*, which then runs the rest of the program to completion, returns that as the value of `(k arg)`, and the outer run executes the program tail again; (b) handler frames in the outer kont are invisible to `kont-unwind-to-handler` inside the nested run.
- Repro (a): `(do (define log (list)) (define r (call/cc (fn (esc) (reset (do (shift k (do (k 1) (set! log (append log (list "after-k"))) 99)) (esc "escaped") "unreached"))))) (set! log (append log (list (str "r=" r)))) (list r log))` → actual `(99 ("r=escaped" "after-k" "r=99"))` (tail executed twice); expected `("escaped" ("r=escaped"))`.
- Repro (b): `(guard (e (true (list "caught" e))) (reset (do (shift k (k 1)) (raise "boom"))))``Unhandled exception: "boom"`; expected `("caught" "boom")`.
- Cross-check: repro (a) reproduced with the identical wrong trace `(99 ("r=escaped" "after-k" "r=99"))` on the shipped WASM browser kernel. Note: hosts J1/J2 are two FURTHER independent double-side-effect mechanisms (JIT `->` miscompile; JIT-fallback re-run) — three distinct fixes needed for "side effects ran twice" reports.
- Coverage: not covered (test-cek-advanced.sx shift/reset tests never cross the boundary with call/cc or raise).
### [high] [CONFIRMED] Caller's immediate frame leaks into interpreted lambda calls — partial dynamic scoping, and the JIT disagrees
- Location: spec/evaluator.sx, `continue-with-call` (`(local (env-merge (lambda-closure f) env))` ~4739; same in `call-lambda` ~896) + `env_merge` in hosts/ocaml/lib/sx_types.ml:390
- What: when the call-site env is NOT a descendant of the lambda's closure env, `env_merge` copies the caller's **top frame** bindings into the lambda's local env. Free variables in the body resolve to the caller's locals — a lexical-scoping violation. Depth-1 only (a binding one frame deeper does not leak). The JIT path disagrees: a VM-compiled body raises "VM undefined" for the same program — behavior flips depending on whether the body got JIT-compiled.
- Repro: `(do (define mg (fn () (fn () (guard (e (true e)) leakedz)))) (define gz (mg)) (let ((leakedz 66)) (gz)))`**66** (guard forces interpretation); expected undefined-symbol error. Without guard → `"VM undefined: leaked"` (JIT). Depth-2 variant → Undefined symbol.
- Coverage: not covered — test-scope.sx "environment-isolation" tests only the lambda→caller direction.
### [high] [CONFIRMED] letrec injects its bindings into foreign lambdas' closure envs — permanent global contamination
- Location: spec/evaluator.sx, `sf-letrec` (~1370: `(env-bind! (lambda-closure val) n (env-get local n))`)
- What: after evaluating inits, letrec binds ALL letrec names into the closure env of every lambda **value**. `make_lambda` stores the defining env directly, so a letrec whose value is a pre-existing (e.g. top-level) lambda writes the letrec names into that lambda's closure — the **global env** — permanently.
- Repro: `(do (define idf (fn (x) x)) (letrec ((zzq idf) (zzn 55)) nil) zzn)`**55**; expected "Undefined symbol: zzn". (This leak also polluted the shared MCP harness image across calls during verification.)
- Coverage: not covered — test-scope.sx "letrec-edge" only binds lambdas created inside the letrec (extra binds are no-ops there).
### [high] [CONFIRMED] Named let leaks its loop name into the enclosing env frame and clobbers same-name bindings
- Location: spec/evaluator.sx, `sf-named-let` (~1035: `(env-bind! (lambda-closure loop-fn) loop-name loop-fn)`)
- What: `lambda-closure loop-fn` IS the enclosing env (no fresh frame), so the loop name is bound into the surrounding scope: visible after the form, and it clobbers (not shadows) an existing binding of the same name.
- Repro: `(do (let lp ((i 0)) i) (lambda? lp))`**true** (expected unbound). `(let ((lp2 5)) (let lp2 ((i 0)) i) lp2)` → loop lambda in interpreter, **nil** under JIT — never the expected 5.
- Coverage: not covered — test-named-let-sx locks set!-accumulator patterns only.
### [high] [CONFIRMED×3] ~60 special-form/HO names are silently unshadowable — define/let/defmacro accepted, call-position dispatch ignores them
- Location: spec/evaluator.sx `step-eval-list` (~1801-1958) — head-name `match` runs before any env lookup; only the `_` fallthrough (custom special forms, ~1959) checks `(not (env-has? env name))`
- What: list-head dispatch checks built-in special/HO forms BEFORE env lookup. `(define bind (fn (a b) "mine"))` succeeds (`type-of` says lambda) but `(bind 1 2)``1` (special form runs). `(define map ...)`, `(let ((map ...)) ...)`, `(defmacro if ...)`, `(defmacro map ...)` — all silently ignored in call position while honored in value position. Regular primitives ARE properly shadowable (`(define get ...)`, `(define inc ...)` → user def wins) — only this name set is hijacked, and custom special forms DO respect user bindings, making built-ins doubly inconsistent.
- Unshadowable names (extracted from dispatch): `if when cond case and or let let* lambda fn define defcomp defisland defmacro defio define-foreign io begin do guard quote quasiquote -> ->> |> as-> set! letrec reset shift deref scope provide peek provide! context bind emit! emitted handler-bind restart-case signal-condition invoke-restart match let-match dynamic-wind map map-indexed filter reduce some every? for-each raise raise-continuable call/cc call-with-current-continuation perform define-library import define-record-type define-protocol implement parameterize syntax-rules define-syntax`. Collision-prone short ones: `map filter reduce some bind match peek context deref guard io do case`.
- Repro: `(do (define map (fn (f xs) "mine")) (map (fn (x) (* x 10)) (list 1 2)))``(10 20)`; `(let ((map (fn (a b) 42))) (map 1 2))``Error: rest: 1 list arg`; `(let ((-> (fn (a b) 99))) (-> 1 2))``Not callable: nil`; `(do (defmacro if (a b c) 99) (if true 1 2))` → 1.
- Coverage: not covered anywhere. Memory gotcha "bind/conj/disj shadowed" confirmed for `bind`; `conj`/`disj` aren't core primitives (guest-worktree lore).
### [high] [CONFIRMED] cond grammar is ambiguous — an all-clauses-len-2 heuristic silently switches modes; multi-expr clause bodies are dropped or crash; flat-intent code can silently return the wrong value
- Location: spec/evaluator.sx, `step-sf-cond` (a `scheme?` detection binding selects clause-mode vs flat-pair mode)
- What: `cond` supports flat pairs `(cond t1 r1 ... :else d)` (the only documented syntax — eval-rules.sx:64) plus an undocumented Scheme clause mode auto-detected iff **every** arg is a 2-element list (or `(test => proc)`). All verified consequences:
- Single clause with multi-expr body: `(cond ((= 1 1) (set! a 1) (set! b 2)))` → nil, **neither set! runs** — silent total drop of side effects.
- Multi-expr body + other clauses: `(cond ((= 1 1) "a" "b") (:else "no"))``Not callable: true` — one len≠2 clause anywhere flips the WHOLE cond to flat mode.
- Silent misinterpretation: `(do (define x false) (define y true) (cond (not x) (list 1) (not y) (list 2)))``false` (clause-mode reads `(not x)` as test=`not`, result=`x`); flat reading gives `(1)`. Wrong answer, no error.
- Test-only clause `(cond (5))` → nil (Scheme: 5); poisons detection: `(cond (true "t") (5))``Not callable: nil`.
- Trailing odd flat arg silently ignored, never evaluated: `(cond (set! a 99))` leaves a unchanged.
- Coverage: flat tested (test-eval.sx:306-312); clause mode only via cond-arrow suite (test-r7rs.sx:135-145). Ambiguity/multi-expr/test-only uncovered; clause mode entirely undocumented.
### [high] [CONFIRMED] `(unquote-splicing x)` longhand silently no-splices; only `splice-unquote` is recognized
- Location: spec/evaluator.sx, `qq-expand` (checks `(symbol-name (first item)) = "splice-unquote"` only)
- What: `,@` sugar parses to `splice-unquote` and works; the R7RS-standard longhand `unquote-splicing` fails dispatch, is recursed into as an ordinary list, and is emitted literally — silent zero-splice. `(unquote x)` longhand works.
- Repro: `(quasiquote (a (unquote-splicing xs)))``(a (unquote-splicing xs))`; `` `(a ,@xs) `` and `(splice-unquote xs)` → `(a 1 2)`.
- Coverage: not covered — worse, test-macros.sx tests are NAMED "unquote-splicing …" (lines 43-63) while all using `,@` sugar, actively reinforcing the trap. (Confirms memory gotcha; root cause now located.)
### [high] [CONFIRMED] dynamic-wind before-thunks never re-run on continuation re-entry; global length-based winder stack corrupts across sibling wind contexts (afters skipped/duplicated)
- Location: spec/evaluator.sx, `continue-with-call` callcc-continuation branch (~4702-4707: `(do (wind-escape-to w-len) ...)`), `wind-escape-to` (261-271)
- What: invoking a captured continuation only pops after-thunks down to the captured *length* of the global `*winders*` stack. No common-ancestor computation, no before-thunks on entry (R7RS requires before/after along the path between extents). Lengths from unrelated wind contexts collide: resuming a k captured inside wind A while inside wind B (equal depth) unwinds nothing, then A's `wind-after` frame pops B's winder.
- Repro 1 (re-entry): capture k inside wind, escape, re-invoke → `(2 ("b" "a" "a"))`; expected `(2 ("b" "a" "b" "a"))` (before not re-run; after ran twice).
- Repro 2 (sibling): capture in wind A, re-invoke from inside wind B → `(2 ("A-in" "A-out" "B-in" "A-out"))`; expected B-out + A-in before final A-out. B's after silently never runs (resource-leak class), A's runs twice.
- Coverage: test-dynamic-wind.sx (8 tests): normal return, raise, one-shot escape only.
### [high] [CONFIRMED×2] guard re-raise sentinel is forgeable — a body/clause legitimately returning `(list '__guard-reraise__ X)` is misinterpreted as a re-raise of X
- Location: spec/evaluator.sx, `step-sf-guard` (~1693-1767): sentinel `(make-symbol "__guard-reraise__")`, detected by structural `=` on any 2-element list escaping the guard
- Repro: `(guard (e (true (list (quote __guard-reraise__) 42))) (raise 1))``Unhandled exception: 42`; `(guard (e (true "handled")) (list (quote __guard-reraise__) 7))``Unhandled exception: 7` — the guard *body's* return value converted into a raise. Should be an unforgeable/gensym'd token. (Severity judged high by one reviewer, low by another — data-dependent conversion of values into raises; rank accordingly.)
- Coverage: not covered.
### [high] [CONFIRMED] `->`/`->>` non-HO steps run in a nested CEK with empty kont — guard and IO suspension broken through threading
- Location: spec/evaluator.sx, `thread-insert-arg`/`thread-insert-arg-last` (7289) call `eval-expr` (4828: `cek-run` with kont `(list)`); "thread" frame handler (~4074) stays CEK-native only for `ho-form-name?` heads
- What: a threaded non-HO step evaluates in a fresh machine that can't see outer guard frames and can't suspend. (a) `raise` inside a threaded call escapes an enclosing `guard`; (b) IO/effects inside a threaded step hard-crash instead of suspending. The HO path is CEK-native and correct — same expression works or fails depending on the step's head symbol. Same root pattern as the shift-k critical.
- Repro: `(define boom (fn (x) (raise "T"))) (guard (e (else "caught")) (-> 1 boom))``Unhandled exception: "T"` (map version → caught). `(-> {:op "noop"} (perform))``Error: Sx_vm.VmSuspended(_,_)` (map version suspends/resumes fine).
- Coverage: not covered
### [high] [CONFIRMED] 2-arg `(reduce f coll)` silently returns the collection unchanged
- Location: spec/evaluator.sx, `ho-setup-dispatch` "reduce" branch (~3671) + `ho-swap-args` (~3557)
- What: fn-first 2-arg reduce makes `init` the collection and `coll` nil → returns init. Expected: fold with first element as init (Scheme/Clojure) or arity error. Asymmetrically, data-first `(reduce coll f)` DOES fold — with nil init (works only via nil-coercion in `+`/`str`).
- Repro: `(reduce + (list 1 2 3))``(1 2 3)` (expected 6 or error); `(reduce (list 1 2 3) +)``6`.
- Coverage: not covered (tests only use 3-arg forms)
### [high] [CONFIRMED] ho-swap-args misreads `(reduce init f coll)` — breaks `(-> init (reduce f coll))`
- Location: spec/evaluator.sx, `ho-swap-args` reduce branch: `(list b (nth evaled 2) a)`
- What: with non-callable arg0, `(reduce init f coll)` treats arg0 as coll and arg2 as init — the threaded scalar seed becomes the "collection" → cryptic host error. The thread handler inserts the threaded value FIRST for HO forms, so any `->` reduce with a scalar seed hits this.
- Repro: `(-> 0 (reduce + (list 1 2 3)))``Error: rest: 1 list arg` (expected 6); same for `(reduce 0 + (list 1 2 3))`.
- Coverage: not covered — thread-ho suite only tests `(-> coll (reduce + 0))` (test-cek-advanced.sx:673)
### [high] [CONFIRMED] Data-first ho-swap-args silently drops all args beyond the second
- Location: spec/evaluator.sx, `ho-swap-args` non-reduce branch: `(list b a)`
- What: when arg0 is data and arg1 callable, everything after arg1 is discarded — a data-first multi-collection map silently maps over only the first collection; with no lambda-arity enforcement, garbage results, not errors.
- Repro: `(map (list 1 2) (fn (x) (* x 10)) (list 3 4))``(10 20)`; `(map (list 1 2) (fn (x y) (+ x y)) (list 30 40))``(1 2)` (y → nil, `(+ 1 nil)` = 1).
- Coverage: not covered
### [high] [CONFIRMED] Infinite recursive component hangs the renderer — no depth guard
- Location: web/adapter-html.sx `render-html-component`/`render-list-to-html`; spec/render.sx has no recursion bound
- What: a self-referencing component with no base case (or data-driven cycle) recurses forever — one render pins the server thread indefinitely. No depth limit or cycle detection.
- Repro: `(do (defcomp ~loop () (div (~loop))) (render-to-html '(~loop) (current-env)))` → never returns (killed at 20s). Bounded `(~nest :n 3)` renders fine.
- Coverage: not covered (needs a depth limit + test)
### [high] [CONFIRMED] append! silently no-ops on all derived lists
- Location: spec/primitives.sx `append!` (+ OCaml impl)
- What: `append!` mutates only literal `(list ...)` cells. Lists produced by `map`, `filter`, `rest`, `reverse`, `append` are silently unappendable — no error, mutation lost. `append!` returns the appended *value*, masking the failure.
- Repro: `(let ((xs (map (fn (x) x) (list 1 2)))) (append! xs 3) xs)``(1 2)`; literal list → `(1 2 3)`.
- Coverage: test-primitives.sx:339 uses `append!` only on a literal-list accumulator.
### [high] [CONFIRMED] expt silently wraps at 63-bit int; inconsistent with +/* which promote to float
- Location: spec/primitives.sx `expt`
- Repro: `(expt 2 62)``-4611686018427387904`; `(expt 2 100)``0`; but `(* 4611686018427387904 4)` → float and `(+ 9223372036854775807 1)` → float. `(expt 2.0 100)` correct.
- Coverage: test-math.sx:66-71 — overflow not covered.
### [high] [CONFIRMED] MCP harness primitive table diverges from real runtime — invalidates harness-based verification
- Location: hosts/ocaml/bin/mcp_tree.ml (own primitive table, e.g. `bind "contains?"` L484, `bind "split"` L563) vs hosts/ocaml/lib/sx_primitives.ml (sx_server)
- What: sx_harness_eval runs a *parallel implementation* of many primitives. Divergences (harness → runtime): `(empty? "")`/`(empty? {})` false → **true** (test-primitives.sx:89 asserts true — harness contradicts a passing test); `(get {:a 1} :a 99)` **nil even for present key** → 1; `(get {:a 1} :zz 99)` nil → 99; `(get (list 10 20) 1)` nil → 20; `(split "a--b" "--")` char-class → substring; `(split "abc" "")` crash → `("a" "b" "c")`; `equal?` undefined → defined; `(contains? {:a 1} :a)` true → **error**; `(keyword-name :kw)` `""` → error. CLAUDE.md mandates harness verification, so this drift silently produces false findings/passes.
- Coverage: nothing tests harness/runtime parity. (Cross-lane: host tooling — see handoffs — but it's the spec-mandated verification path.)
### [high] [CONFIRMED] contains? does not support dicts in the real runtime, contradicting its spec doc
- Location: spec/primitives.sx `contains?` (":doc … Dicts: key check"); sx_primitives.ml
- Repro: `(contains? {:a 1} :a)``Unhandled exception: "contains?: 2 args"` (misleading arity error); lists/strings work.
- Coverage: list membership only (run_tests.ml:1255); no dict case.
### [high] [CONFIRMED] canonical.sx depends on test-runner-only helpers — content addressing fails on ANY number outside run_tests
- Location: spec/canonical.sx, `canonical-number` (46-59) calls `contains-char?` (defined only in run_tests.ml:728 / run_tests.js:85) and `trim-right` (run_tests.js:87 only — not even OCaml run_tests). Neither exists in sx_primitives.ml, sx_server.ml, or mcp_tree.ml.
- What: `canonical-serialize`/`content-id` on the production server errors on any number. In the OCaml test runner the trim-right branch (floats with trailing zeros) is unreachable-but-passing because tests only canonicalise integers.
- Repro: fresh sx_server: `(load "spec/canonical.sx")` `(canonical-serialize 42)``Undefined symbol: contains-char?`; with a shim, `(canonical-serialize 0.1)``Undefined symbol: trim-right`.
- Coverage: test-canonical.sx covers ints/dict-sorting/CIDs — never a non-`.0` float; failure mode invisible to all suites.
### [high] [CONFIRMED] Serializer emits dict keys unescaped — non-identifier keys produce unparseable/wrong output; canonical form not a fixed point (CID hazard)
- Location: spec/parser.sx `sx-serialize-dict` (emits `(str ":" key)`); spec/canonical.sx `canonical-dict` (~79, same pattern)
- What: dict keys are strings; both serializers print `:` + raw key. Keys with spaces/parens/non-ident chars produce output that reparses differently or errors. Since `canonical-serialize` feeds sha3-256, CIDs exist for values whose canonical form violates `canonical(parse(canonical(x))) = canonical(x)`. The native reader accepts string keys `{"a b" 1}`, so such dicts are creatable from plain source.
- Repro: dict with key `"hello world"``"{:hello world 2 :k 1}"` → reparse errors; `{(+ 1 2) 5}` → key `"(+ 1 2)"` → serializes `{:(+ 1 2) 5}` → garbage.
- Coverage: "serialize dict round-trips" uses keyword-shaped keys only.
### [high] [CONFIRMED] Same source parses to different ASTs across the four ident/number classifier variants
- Location: hosts/ocaml/lib/sx_parser.ml:36-46 (native), hosts/ocaml/bin/sx_server.ml:1330-1348, hosts/ocaml/bin/mcp_tree.ml:391-410, hosts/javascript/platform.py:2622-2626 — four different ident-start/ident-char tables feeding the one spec grammar
- What (all verified live): `(a,b)` → single symbol on native/mcp/JS but `(a (unquote b))` on sx_server guest; unicode idents accepted by mcp guest only (forbidden by the production reader); `$x`/`|y|` symbols on sx_server guest only; `0x10`/`0b101`/`1_000` → numbers 16/5/1000 on native (undocumented C-style acceptance) vs number 0 + symbol `x10` on guest/JS (silent token split); `inf`/`nan`/`-inf` are float literals on native (can't be variable names!) vs symbols on guest/JS; `1+`/`1abc` single symbols native vs silent `1` + symbol split guest/JS (`(1+ 2)` → 3-element list); `#t`/`#f` booleans native vs `Undefined symbol: reader-macro-get` on OCaml guest vs "Unknown reader macro" on JS; `{1 2}` rejected native vs silently stringified key `"1"` guest/JS.
- Coverage: none of these tokens appear in any test file — suites exercise only the intersection.
### [high] [CONFIRMED] `1e` bare-exponent numbers silently parse to nil in the guest parser
- Location: spec/parser.sx `read-number``parse-number` fallthrough (nil emitted as a value, no error)
- Repro: `(sx-parse "1e")``(nil)`; JS `parseAll('1e')``[null]`; native reader yields `Symbol "1e"` — a third behavior. `(foo 1e)` becomes `(foo nil)` silently.
- Coverage: only valid exponent forms tested.
### [high] [CONFIRMED] Guest parser cannot produce rationals on server/tooling hosts — `1/2` throws
- Location: spec/parser.sx `read-number` (215-231); sx_server.ml:1325 and mcp_tree.ml:384 override `parse-number` to always return float, shadowing the Integer-aware sx_primitives version; `make-rational` rejects (Number,Number)
- Repro: fresh sx_server: `(load "spec/parser.sx")` `(sx-parse "1/2")``make-rational: expected 2 integers`. Works only in run_tests env. Native reader parses `1/2` fine.
- Coverage: test-rationals.sx (62 tests) never uses `sx-parse`; test-parser.sx has zero rational tests.
### [high] [CONFIRMED] Strict mode: HO-form callbacks bypass type checks entirely
- Location: spec/evaluator.sx `step-continue` — map/filter/reduce/for-each/some/every/multi-map frames call `continue-with-call` directly; only the "arg" frame runs `strict-check-args` (enforcement site 4152-4194). Same in sx_ref.ml:1009.
- What: with strict on and types declared for `f`, `(f "a")` errors but `(map f coll)`/`(filter f coll)`/`(reduce f init coll)`/`(for-each f coll)`/`(every? f coll)`/`(some f coll)` silently pass mistyped elements. Also unchecked: cond `=>` arrow calls, call/cc continuation invocation, exception-handler invocation, signal-subscriber cek-calls.
- Repro: `hh` typed `(x number)`: `(hh "abc")` → type error; `(map hh (list "a" "b"))``("a" "b")` silently.
- Coverage: test-strict.sx checks direct calls only.
### [high] [CONFIRMED] Strict mode: `apply` bypasses type checks on the target function
- Location: hosts/ocaml/lib/sx_primitives.ml:1534 / sx_server.ml:1240 — native prim spreads args and calls directly
- Repro: `(apply hh (list "a"))``"a"` (no error); direct `(hh "a")` errors.
- Coverage: not covered.
### [high] [CONFIRMED] `dispose-computed` is a no-op — computed signals leak subscriptions after disposal
- Location: spec/signals.sx, `dispose-computed``(signal-remove-sub! dep nil)` passes **nil** as the subscriber; the actual `recompute` closure is trapped in `computed`'s letrec and unreachable. The island-scope disposer registered by `computed` is therefore broken (contrast `effect`, whose dispose-fn works).
- Repro: computed on `a2` (1 run); `(dispose-computed c2)`; `(reset! a2 5)` → runs=2, value updated. Expected: runs=1, unchanged. Subscriber leak in island teardown.
- Coverage: no dispose-computed test exists.
### [high] [CONFIRMED] Exception inside `batch` permanently wedges the reactive system
- Location: spec/signals.sx, `batch` — increments `*batch-depth*`, runs thunk with no unwind protection; decrement skipped on throw
- What: after any error escapes a batch thunk (even if caught outside), `*batch-depth*` stays >0 — every future `notify-subscribers` queues forever and never flushes; all reactivity dead. Related: `(import (sx signals))` copies value bindings rather than aliasing, so the top-level `*batch-depth*` reads 0 while the library-internal one is 1 (exported mutable state vars are misleading).
- Repro: effect on `a3` (fired=1); `(guard (e (true "caught")) (batch (fn () (error "boom"))))` → caught; `(reset! a3 2)` → fired stays 1. Control test without error flushes correctly.
- Coverage: not covered.
### [high] [CONFIRMED — surfaced by hosts lane, verified here] emit!/emitted state accumulates across evaluator invocations — cross-request contamination on the server
- Location: spec/evaluator.sx scope/emit frame handlers + the process-global scope stacks (hosts: sx_primitives.ml `_scope_stacks`)
- What: `(scope (emit! :k 1) (emit! :k 2) (len (emitted :k)))` returns 2, then 4, then 6 on successive epoch-server evals — the emit accumulator for a normally-exited scope persists in process-global state and each new scope sees prior invocations' values. On the HTTP server this means one request's emitted values are visible to the next (correctness + information-leak class). Complements the provide/raise leak finding: the scope facility's global stacks are neither unwind-safe NOR invocation-scoped. (My in-eval probe showed no leak *within* one evaluation — the leak is across evaluator entries.)
- Repro: three identical `(eval "(scope (emit! :k 1) (emit! :k 2) (len (emitted :k)))")` epochs on one fresh sx_server → `2`, `4`, `6`. JIT disabled, so not a VM bug.
- Coverage: scope/emit!/emitted have zero tests (noted previously); cross-invocation behavior untested anywhere.
### [medium] [CONFIRMED] provide's dynamic value permanently leaks on non-local exit (raise, shift)
- Location: spec/evaluator.sx, `step-sf-provide` (:3344 `scope-push!`) + "provide" frame handler (:4293, `scope-pop!` only on normal completion); no pop during raise/guard/shift unwinding
- What: `provide` pushes onto a global per-name stack, popped only on normal frame completion. Any non-local exit through the body skips the pop — the value stays on the global stack **forever**, and `context` prefers `scope-peek`, so all later code sees the stale value.
- Repro: `(do (guard (e (true "caught")) (provide "kk" 42 (raise "boom"))) (context "kk"))`**42** (expected nil). `(do (reset (provide "esc" 9 (shift k 77))) (context "esc"))`**9**.
- Coverage: test-unified-reactive.sx covers provide/context nesting for normal exits only.
### [medium] [CONFIRMED] provide! outside any enclosing provide installs a permanent ambient global
- Location: spec/evaluator.sx, "provide-set" frame handler (:4334-4346: pop-then-push); host `scope-pop!` on empty stack is a no-op (sx_primitives.ml:1998)
- Repro: `(do (provide! "pk" 7) nil)` then, in a later top-level eval, `(context "pk")`**7**.
- Coverage: provide! tests all run inside provide scopes; bare case uncovered.
### [medium] [CONFIRMED×2] set! on unbound name silently creates a binding — contradicting both spec docs — and JIT vs interpreter write different global tables (split brain)
- Location: spec/evaluator.sx `step-sf-set!` + hosts/ocaml/lib/sx_types.ml `env_set_id` (:378 root-create fallback) vs sx_vm.ml OP_GLOBAL_SET (:606 writes `vm.globals`); contradicted docs: spec/eval-rules.sx:112 ("Error if name is not bound"), spec/special-forms.sx:141 ("must already be bound")
- What: (a) interpreted `set!` on unbound silently creates a root binding — typo'd set! hides bugs, and directly contradicts both spec documents (test-scope.sx:196 locks the create behavior, so impl-vs-doc conflict must be resolved one way or the other). (b) inside a JIT-compiled lambda the same `set!` writes the VM's separate `vm.globals` table — visible to VM code, **invisible to interpreted code**.
- Repro: `(set! never-defined-var 5)` → 5 (readable after). Split brain: `(do (define setter (fn () (set! q5 42))) (define reader (fn () q5)) (setter) (reader))`**"Undefined symbol: q5"** (yet q5 reads as 42 inside setter).
- Coverage: test-scope.sx:196 asserts creation only; visibility split uncovered.
### [medium] [CONFIRMED] Quasiquote has no depth tracking — nested quasiquote evaluates inner unquotes early; `,,x` errors
- Location: spec/evaluator.sx, `qq-expand` (no level parameter)
- Repro: `(let ((x 7)) (quasiquote (a (quasiquote (b (unquote x))))))``(a (quasiquote (b 7)))` (Scheme: unquote preserved); `` `(a `(b ,,x)) `` → `Undefined symbol: unquote`.
- Coverage: test-cek-advanced.sx:486 "nested unquote" is single-level despite its name.
### [medium] [CONFIRMED] Quasiquote does not traverse dict literals — `,v` inside `{...}` stays literal
- Location: spec/evaluator.sx, `qq-expand` (non-list templates returned as-is)
- Repro: `(let ((v 3)) (quasiquote {:k (unquote v)}))``{:k (unquote v)}`. Inconsistent with dict eval rule ("values are evaluated", eval-rules.sx:40).
- Coverage: not covered.
### [medium] [CONFIRMED] guard clause bodies: multi-expr → crash; multi-expr `else` → "Undefined symbol: else"
- Location: spec/evaluator.sx, `step-sf-guard` — clauses spliced verbatim into a generated `cond`, inheriting the cond dual-mode defect
- Repro: `(guard (e (true 1 2)) (raise 9))``Not callable: nil`; `(guard (e (else 1 2 3)) (raise 9))``Undefined symbol: else`. R7RS requires body sequencing. `=>` receiver works.
- Coverage: only single-expr clause bodies tested.
### [medium] [CONFIRMED] defmacro/fn `&key` params silently misbind — keyword names ignored, off-by-one positional binding
- Location: spec/evaluator.sx, macro/lambda param binding (&key pairing implemented only for components)
- Repro: `(defmacro mk2 (&key a b) ...)`: `(mk2 :a 10 :b 20)` → a=10, b=`:b` (the keyword itself); `(mk2 :b 20 :a 10)` → a=20 despite the `:b` label. Plain `(fn (&key a b) ...)` treats `&key` as a positional param name → "expects 3 args, got 4". Accepted without error, misbehaves.
- Coverage: not covered.
### [medium] [CONFIRMED] Splicing a non-list silently wraps it; malformed splice forms pass through literally
- Location: spec/evaluator.sx, `qq-expand`
- Repro: `(quasiquote (a (splice-unquote 5)))``(a 5)` (Scheme: error); `(splice-unquote xs ys)` (arity 3) → stays literal; `(unquote a b)` silently drops b.
- Coverage: not covered.
### [medium] [CONFIRMED] `do` misparses a first form whose head is a list (IIFE) as a Scheme do-loop
- Location: spec/evaluator.sx, step-eval-list "do" branch (~1843): dispatches to do-loop when `(list? (first (first args)))`
- Repro: `(do ((fn (x) x) 5) 99)` → error `"first: expected list, got 5"`; expected 99.
- Coverage: not covered.
### [medium] [CONFIRMED] scope's `:value` parameter is parsed but unreadable — dead feature + dead frame type
- Location: spec/evaluator.sx, `step-sf-scope` (:3318) / `make-scope-acc-frame` (:120); `context`/`peek` never consult scope-acc frames. Pre-CEK `sf-scope` (:1495) did `scope-push!`; the CEK rewrite dropped it. Frame type "scope" (make-scope-frame :111, handler :4279) is never pushed by any live path.
- Repro: `(scope "v" :value 10 (list (context "v") (peek "v")))``(nil nil)`.
- Coverage: scope/emit!/emitted have ZERO tests in spec/tests (doc example only, eval-rules.sx:200).
### [medium] [CONFIRMED] Host-level errors are uncatchable by guard (only SX-level raise is)
- Location: spec/evaluator.sx raise/handler machinery vs host primitive errors
- What: errors from host primitives (`rest: 1 list arg`, `Undefined symbol`, arity errors) escape enclosing `guard` entirely; only guest `(raise ...)` unwinds to handlers. Guest code cannot write defensive wrappers around primitive misuse.
- Repro: `(guard (e (true "caught")) (undefined-symbol-xyz))` → propagates, guard never fires.
- Coverage: test-errors.sx/test-conditions.sx exercise guest raise only.
### [medium] [CONFIRMED] `values`/`call-with-values` bound only inside the test runner — Undefined symbol on every real runtime surface; `let-values`/`define-values` unusable
- Location: spec/evaluator.sx `values` (2093), `call-with-values` (1392), `sf-let-values` (1403), `sf-define-values` (1437); hosts/ocaml/bin/run_tests.ml:1131 (`bind "values"` — test env only)
- Repro: `(call-with-values (fn () (values 1 2)) +)` on CLI → `Undefined symbol: call-with-values`; same expr under run_tests → PASS. test-values.sx (22 tests) overstates the shipped runtime.
- Coverage: green only in the runner environment.
### [medium] [CONFIRMED] map/filter/map-indexed are O(n²)
- Location: spec/evaluator.sx, "map"/"filter" continue handlers (~4364, ~4397): `(append results (list value))` per element; map-indexed also recomputes `(len new-results)` each step
- Repro: fresh sx_server: 10k → 0.58s, 20k → 2.56s, 40k → 13.6s (≈×4.7 per doubling); 100k map DNF in 120s while `(reduce + 0 (in-range 100000))` takes 0.32s. Stack-safe — purely time.
- Coverage: not covered (no perf tests)
### [medium] [CONFIRMED] HO form names are not first-class — value position yields nil with a misleading type
- Location: spec/evaluator.sx, symbol lookup (~1650) vs special-cased call dispatch
- Repro: `(define f2 map) (f2 (fn (x) x) (list 1 2))``Not callable: nil`; yet `(type-of map)``"function"`.
- Coverage: not covered
### [medium] [CONFIRMED] Cryptic uncatchable errors for bad HO data: dicts, both-args-callable
- Location: spec/evaluator.sx, `seq-to-list` `(else x)` passthrough (~3573) + `ho-setup-dispatch`
- Repro: `(map (fn (kv) kv) {:a 1 :b 2})``rest: 1 list arg`; `(map (fn (x) 1) (fn (y) 2))` → same. Expected: iterate dict entries or a clear "map: cannot iterate X".
- Coverage: not covered
### [medium] [CONFIRMED] Multi-collection map rejects strings/vectors that single-collection map accepts
- Location: spec/evaluator.sx, `ho-setup-dispatch` "map" N-coll branch skips `seq-to-list`
- Repro: `(map + (vector 1 2) (vector 10 20))``first: expected list, got #(1 2)`; single-collection vector/string map works.
- Coverage: list multi-map covered (test-r7rs.sx:110124); strings/vectors not
### [medium] [CONFIRMED] Threading a lambda literal returns a silently malformed lambda
- Location: spec/evaluator.sx, `thread-insert-arg` — splices the value into the params position of `(fn ...)`
- Repro: `((-> 5 (fn (y) (+ y 1))) 7)``Undefined symbol: y`. Should error at thread time.
- Coverage: not covered
### [medium] [CONFIRMED] Attribute names are never escaped/validated — spreading an untrusted-keyed dict injects attributes (XSS class)
- Location: spec/render.sx, `render-attrs` (emits key raw) + `merge-spread-attrs` (copies spread-dict keys verbatim)
- What: attribute *values* are escaped; attribute *names* are concatenated raw. Keys reach render-attrs via the spread operator, so spreading a dict built from user data yields event-handler injection.
- Repro: `(render-attrs {"x onload=alert(1) y" "1"})`` x onload=alert(1) y="1"`. Values confirmed safe.
- Coverage: not covered
### [medium] [CONFIRMED] Five void elements unrenderable — in VOID_ELEMENTS but missing from HTML_TAGS
- Location: spec/render.sx, `VOID_ELEMENTS` vs `HTML_TAGS`
- Repro: `area base embed param track` fall through to function-call dispatch: `(render-to-html '(base :href "x") ...)``Undefined symbol: base`.
- Coverage: void suite tests br/hr/img/input/meta/link/source/col/wbr only
### [medium] [CONFIRMED] aser serialises list-valued keyword args as bare unquoted lists → breaks on client re-evaluation
- Location: web/adapter-sx.sx `aser-call`
- Repro: `(aser '(~tags :items (list "a" "b")) env)``(~tags :items ("a" "b"))`; re-evaluating the wire form → `Not callable: nil`. Dicts round-trip fine; only lists break. Should emit `(quote (...))` or `(list ...)`.
- Coverage: test-aser covers lists as children, not as kwarg values
### [medium] [CONFIRMED-html / SUSPECTED-dom — independently double-confirmed] render-to-dom disagrees with render-to-html on non-boolean attrs valued true/false (hydration mismatch)
- Location: web/adapter-dom.sx (attr cond ~357) vs spec/render.sx `render-attrs`
- What: for attrs NOT in BOOLEAN_ATTRS, HTML mode stringifies (`data-flag="true"`, `data-off="false"`), DOM mode omits `false` and emits `true` as an empty attr. SSR HTML and hydrated DOM differ. HTML side executed; DOM side code-read (dom adapter not loadable in harness). Cross-check: hosts lane C19 found the same defect independently (same conclusion, same confidence split) — treat as confirmed pending a browser-side execution.
- Repro: `(render-to-html '(div :data-flag true :data-off false) ...)``<div data-off="false" data-flag="true">`.
- Coverage: not covered
### [medium] [CONFIRMED] String primitives are byte-based; substring can produce invalid UTF-8
- Location: `string-length`, `substring`, `upcase`/`downcase`
- Repro: `(string-length "é")` → 2, `"👍"` → 4; `(substring "é" 0 1)``"<22>"`; `(upcase "héllo")``"HéLLO"`. Constructors are codepoint-aware (`char-from-code 233``"é"`) while measurement is byte-based. Project rule "use UTF-8 chars" makes this a live hazard.
- Coverage: no codepoint-semantics tests.
### [medium] [CONFIRMED] Spec declares primitives that don't exist; runtime has primitives the spec omits
- Location: spec/primitives.sx
- What: `eq?` (L285), `eqv?` (L292) declared, undefined in both harness and sx_server; `into` (L722) declared — IO-bridge-only in server; `json-encode` declared plain but IO-bridge-only; `sort` exists in runtime but NOT in spec; header (L27-35) claims ~40 functions "moved to stdlib.sx" but stdlib.sx contains only `format`.
- Repro: `(eq? 1 1)``Undefined symbol: eq?`; `(sort (list 3 1 2))``(1 2 3)`.
- Coverage: drift untested.
### [medium] [CONFIRMED] Division-by-zero inconsistency: / returns inf silently, mod/quotient leak raw OCaml exception
- Repro: `(/ 1 0)``inf`; `(mod 7 0)`/`(quotient 7 0)` → unstructured host `Division_by_zero`.
- Coverage: not covered.
### [medium] [CONFIRMED] `/` doc contradicts behavior: ":returns float" but exact results snap to int
- Repro: `(integer? (/ 6 3))` → true. `(/ 1 3)` → float, never rational despite `make-rational`.
- Coverage: behavior covered green — the doc is wrong.
### [medium] [CONFIRMED] sort takes no comparator
- Repro: `(sort (list 3 1 2) (fn (a b) (> a b)))``Unhandled exception: "sort: 1 list"`. Natural ascending on numbers/strings only.
- Coverage: not covered.
### [medium] [CONFIRMED] Strict type errors are uncatchable by guard (host/spec error-channel divergence)
- Location: sx_ref.ml `strict_check_args` (:516, raises Eval_error outside the CEK raise-eval machinery); the spec expresses it as `(error ...)` which would use the ordinary condition channel
- What: `(guard (e (true ...)) (typed-call bad-arg))` does not catch — the type error escapes to top level, while user `(error "boom")` IS caught by the same guard. Programs cannot recover from type errors. Same channel problem as the general host-errors-uncatchable finding, but here spec and host disagree about which channel it should be.
- Repro: `(guard (e (true (str "CAUGHT: " e))) (s1 "bad"))` → protocol-level type error; `(guard (e (true ...)) (error "boom"))` → caught.
- Coverage: test-strict.sx asserts at the runner level; the guard channel untested.
### [medium] [CONFIRMED] Strict mode: unknown type names silently match everything
- Location: spec/evaluator.sx `value-matches-type?``_` fallback returns true for any unknown non-"?"-suffixed string; `set-prim-param-types!` does no validation
- Repro: `gg` typed `(x "integer")`: `(gg "abc")``"abc"` (typo silently disables checking); `"frobnicate?"` matches all values.
- Coverage: not covered.
### [medium] [CONFIRMED] Strict mode: `"keyword"` type is dead; components are untypeable
- Location: `value-matches-type?` vs eval-rules.sx keyword rule (keywords evaluate to strings)
- What: (a) evaluated keyword args arrive as strings, so a `"keyword"`-typed param always fails on `(f :foo)` and passes plain strings via `"string"`; (b) `type-of` a component is `"component"`, which fails `"lambda"`, and `"component"` isn't a match branch — falls to the catch-all and **accepts everything**. No way to require a component.
- Repro: `(pk :foo)` → "expected keyword … got string (foo)"; `c7` typed `"component"`: `(c7 42)` passes.
- Coverage: no keyword/lambda/component type tests.
### [medium] [CONFIRMED] Strict mode: component `&key` calls misalign with positional type specs
- Location: strict-check-args positional indexing vs component keyword calling convention
- Repro: `~tc` typed `(a number)`: `(~tc :n 5)` → "expected number for param a, got string (n)" — the keyword marker itself is checked as arg 0. Typing components via this machinery is impossible.
- Coverage: not covered.
### [medium] [CONFIRMED] Signals: reset!/computed change-detection is dead for numbers and strings
- Location: spec/signals.sx `reset!`, `swap!`, `computed``(when (not (identical? old value)) ...)`; `identical?` is physical equality: `(identical? 5 5)` → false
- What: setting a signal to its current value still notifies; computeds recomputing to an equal number/string still cascade — spurious re-runs throughout the reactive graph.
- Repro: effect on `(signal 5)` (runs=1); `(reset! a7 5)` → runs=2. Expected 1.
- Coverage: not covered.
### [medium] [CONFIRMED] Signals: diamond dependency glitch — no glitch-freedom
- Location: spec/signals.sx — notify/flush propagate depth-first synchronously; batch dedups only direct subscribers of directly-mutated signals and decrements depth before cascades
- What: a → b,c → d: one change to `a` recomputes `d` twice; the first recompute observes new-b with stale-c (inconsistent intermediate state).
- Repro: initial d runs=1; `(reset! a 2)` → d runs=3, final value correct.
- Coverage: not covered.
### [medium] [CONFIRMED] Datum comment `#;` cannot precede `)` or end input — all three parsers
- Location: spec/parser.sx read-expr `#;` branch (discard-then-read-next); sx_parser.ml:167-171 same structure
- Repro: `(sx-parse "(a #;b)")``Unexpected character: )`; `(sx-parse "1 2 #;3")``Unexpected end of input`. Standard Lisp: `(a #;b)` = `(a)`.
- Coverage: three datum-comment tests, all mid-list.
### [medium] [CONFIRMED] Char values never compare equal — `=` lacks a Char case
- Location: hosts/ocaml/lib/sx_primitives.ml `safe_eq` (749-804): no Char,Char arm → falls to `_ -> false`
- Repro: `(= (make-char 32) (make-char 32))``false`. parse(serialize(char)) ≠ char for every char; char-keyed memoization silently fails.
- Coverage: test-chars.sx compares via char->integer/predicates; no `=`-on-chars test.
### [medium] [CONFIRMED] `#\a` char literals crash the guest parser on the mcp-tree host (Int/Float primitive drift)
- Location: mcp_tree.ml:378 (`char-code` returns float) vs sx_primitives.ml:2811 (`make-char` requires Integer)
- Repro: mcp harness `(sx-parse "#\a")``make-char: expected integer codepoint`; sx_server OK. Same shadowing family as parse-number.
- Coverage: no char-literal-via-sx-parse tests.
### [medium] [CONFIRMED] Multibyte character literals broken everywhere; serialized chars ≥128 don't reparse; unknown char names silently truncate
- Location: sx_parser.ml:153-159 (byte-level Char.code); spec/parser.sx `read-char-literal` (byte-level); serializer emits `#\` + raw char
- Repro: native `'#\é``Parse_error "Unexpected char: \169"`; `(sx-serialize (make-char 233))``"#\é"` which no parser reads back; `#\spade``#\s` silently (both implementations).
- Coverage: no non-ASCII char literals tested.
### [medium] [CONFIRMED] `\uXXXX` escape: invalid input crashes raw (OCaml) or silently corrupts (JS); no astral/surrogate-pair support
- Location: spec/parser.sx read-string `\u` branch (no hex validation, no bounds check, -1 from failed digit lookup); sx_parser.ml:70-77
- What: valid BMP works on all three parsers (the "never use \uXXXX" project rule is style, not brokenness). Invalid hex: guest → raw `Invalid_argument` (negative codepoint); native → uncaught `Failure("int_of_string")`; JS → silently yields garbage string. Surrogates: OCaml raises, JS produces lone surrogate. Truncated `"\u41"` → guest reads past the closing quote (`Expected string, got nil`). Astral unrepresentable.
- Coverage: zero \u tests in any suite.
### [medium] [CONFIRMED] Unknown string escapes diverge: native keeps the backslash, guest/JS drop it
- Location: sx_parser.ml:79 (`_ -> add '\\'; add esc`) vs spec/parser.sx read-string `:else esc`
- Repro: `"a\qb"` is 4 chars through the native reader, 3 chars through guest/JS — same source file, different data depending on which parser read it. `\b`/`\f` unsupported both (silent literal); native additionally accepts undocumented `\/` and `` \` ``.
- Coverage: only \n \t \" tested.
### [medium] [CONFIRMED] `#name` extensible reader-macro dispatch is unimplemented on OCaml hosts
- Location: spec/parser.sx:459 (`reader-macro-get`); registry exists only in hosts/javascript/platform.py:2639-2640
- Repro: mcp harness `(sx-parse "#t")``Undefined symbol: reader-macro-get` (instead of the intended "Unknown reader macro" error). The sole production evaluator cannot register reader macros at all.
- Coverage: reader-macro suite tests only `#;` `#|` `#'`.
### [low-medium] [CONFIRMED] case: `:else`/`else` matches in ANY position, shadowing later valid clauses
- Location: spec/evaluator.sx, `step-sf-case` / `is-else-clause?`
- Repro: `(case 1 :else "e" 1 "one")` → "e".
- Coverage: not covered.
### [low-medium] [CONFIRMED] case: evaluated datums, keyword/string punning, Scheme clause syntax crashes misleadingly
- Location: spec/evaluator.sx, `step-sf-case`; documented flat in eval-rules.sx:70 (but rule text doesn't say vals are evaluated)
- What (verified): vals evaluated sequentially until match (side effects only for pre-match vals), scrutinee once, comparison structural `=` (lists match), duplicates first-wins, no-match+no-else → nil; keywords evaluate to strings so `(case "k" :k "kw")` matches. Scheme datum-list clauses crash: `(case "a" (("a") 1) (else 2))``Not callable: ("a")`. Flat form is intended (test-cek.sx:130-138); the unstated eval semantics + hostile diagnostic are the issues.
- Coverage: happy paths only.
### [low] [CONFIRMED×2] letrec is parallel (not letrec*) and reference-before-init silently yields nil
- Location: spec/evaluator.sx, `sf-letrec` (~1366-1469: all inits evaluated before any name bound; names pre-bound nil)
- Repro: `(letrec ((a b) (b 1)) a)``nil` (R7RS: error); `(letrec ((a 1) (b (+ a 1))) b)`**1** (nil-coerced by +; letrec* would give 2). Masks initialization-order bugs.
- Coverage: only well-formed lambda recursion tested.
### [low] [CONFIRMED] Documentation contradicts implementation: let IS sequential and multi-expression bodies ARE implicit begin
- Location: spec/evaluator.sx `step-sf-let` (:3133 — `let` and `let*` dispatch identically, shared local frame) vs CLAUDE.md "SX Island Authoring Rules" (claims parallel let, last-expr-only bodies, "reactive text needs deref computed", "effects go in inner let")
- What: `(let ((a 1) (b a)) b)` → 1; `(let ((x 5) (x (* x 2))) x)` → 10; let/when/fn multi-expr bodies evaluate every form (side effects verified). Sequential let is explicitly tested intent (test-scope.sx:45). The CLAUDE.md gotchas describe a different evaluator (likely the OCaml SSR island path) — doc drift that misleads every SX author. Also `(let ((f (fn () a2)) (a2 5)) (f))` → 5: binding-init lambdas capture the let frame itself (letrec-like — beyond even letrec* semantics; worth documenting).
- Coverage: sequential let tested; the doc is what's wrong.
### [low] [CONFIRMED] Component &key argument `false` is coerced to nil
- Location: spec/evaluator.sx, component branch: `(env-bind! local p (or (dict-get kwargs p) nil))`
- Repro: `(do (defcomp ~t1 (&key flag) (if (nil? flag) "NIL" "VAL")) (~t1 :flag false))``"NIL"`. Components can't distinguish `:flag false` from omitted.
- Coverage: invisible to test-defcomp.sx (only used in conditionals).
### [low] [CONFIRMED] Trailing keyword argument without a value silently accepted
- Location: spec/evaluator.sx, `parse-keyword-args` (:935)
- Repro: `(do (defcomp ~c4 (&key a) (list a)) (~c4 :a))``(nil)`; expected kwarg error.
- Coverage: not covered.
### [low] [CONFIRMED] defmacro is unhygienic (classic capture) while the test suite is named "macro-hygiene"
- Repro: `(defmacro my-or2 (a b) `(let ((t ,a)) (if t t ,b)))`; `(let ((t 5)) (my-or2 false t))` → `false`. CL-style defmacro — judged intended (gensym available, unique, tested); but test-macros.sx "macro-hygiene" suite (line 208) tests only the leak-OUT direction, overstating the guarantee.
### [low] [CONFIRMED] match has no guard clauses — Racket-style `(pattern (when cond))` silently read as a structural pattern
- Repro: `(match 9 ((x (when (> x 5))) "big") (_ "small"))` → "small" (silent structural fail → fall through). Supported features work; non-match raises properly. `let-match` is dict-destructuring only; list patterns give a confusing "no clause matched".
- Coverage: supported features covered; guard-clause rejection not.
### [low] [CONFIRMED] Components not recognized by `ho-fn?`; map-with-component yields silent zeros
- Location: spec/evaluator.sx, `ho-fn?` (3554) — no component check
- Repro: `(defcomp ~c2 (x) (* x 2))`; `(map ~c2 (list 1 2 3))``(0 0 0)`; `(map (list 1 2 3) ~c2)``rest: 1 list arg`.
- Coverage: not covered
### [low] [CONFIRMED] `|>` alias is dead code — parser rejects `|`
- Location: spec/evaluator.sx step-eval-list `("|>" ...)` (1906); tokenizer
- Repro: `(|> (list 1 2 3) ...)``Parse_error("Unexpected char: |")`. Branch unreachable.
### [low] [CONFIRMED] Keywords-as-getters unsupported in HO fn position and `->` chains, with misleading errors
- Repro: `(map :name (list {:name 1}))``Not callable: "name"`; `(-> {:a {:b 42}} :a :b)``Not callable: nil`.
- Coverage: not covered
### [low] [CONFIRMED] Zero/one-arg HO calls return empty results silently
- Repro: `(map)``()`; `(map (fn (x) x))``()`; `(reduce +)``nil`; `(-> (list 1 2 3) map)``()` (plausible typo silently discards data).
- Coverage: not covered
### [low] [CONFIRMED] Boolean-attr truthiness footguns: string "false" and 0 emit the bare attribute
- Location: spec/render.sx, `render-attrs` (SX truthiness)
- Repro: `(input :disabled "false")``<input disabled />`; `(input :disabled 0)` same. Aligns with SX truthiness but surprising when values come from data.
- Coverage: true/false booleans tested; string/number values not
### [low] [CONFIRMED] `is-render-expr?` exported but dead; `html:` tags and hyphenated custom elements error despite being "recognised"
- Location: spec/render.sx, `is-render-expr?` — zero callers
- Repro: `(render-to-html '(html:my-tag :foo "bar") ...)``Undefined symbol: html:my-tag`; `(aser '(custom-widget :foo "bar" "child") ...)``Undefined symbol: custom-widget`.
- Coverage: not covered
### [low] [CONFIRMED] `<script>`/`<style>` content is HTML-escaped like text — corrupts legitimate inline JS/CSS
- Location: web/adapter-html.sx `render-html-element`
- Repro: `(script "if (a < b && c) { x=\"y\"; }")` → entities inside script (broken JS); `(style ".a > .b {}")``.a &gt; .b {}`. Blocks `</script>` breakout (good) but breaks real inline code; `raw!` is the workaround.
- Coverage: only script attrs tested, never content
### [low] [CONFIRMED] Comparison/equality strictly binary; = is deep structural equality conflating exactness
- Repro: `(< 1 2 3)`/`(= 1)` → unstructured arity error (matches spec, deviates from Scheme); `(= {:a 1} {:a 1})` → true; `(= 1 1.0)` → true (dedup-key hazard).
### [low] [CONFIRMED] Rounding half-away-from-zero, not banker's; inexact->exact rounds; (sqrt -1) → nan
- Repro: `(round 2.5)` → 3 (R7RS: 2); `(inexact->exact 1.5)` → 2 (locked by test-numeric-tower.sx:115 — intended but R7RS-divergent); `(sqrt -1)` → nan silently.
### [low] [CONFIRMED] Float/nil rendering inconsistencies across str/format/render
- Repro: `(str 1.0)``"1"` (float/int distinction lost — also `(div 1.0)` renders `1`); `(str nil)``""` but `(format "~a" nil)``"()"`; `(format "~d" 3.7)``"3"` (silent truncation).
### [low] [CONFIRMED] Inconsistent nil/empty tolerance across list ops
- Repro: `(first nil)` → nil, `(rest nil)``()`, `(nth (list 1 2) 5)` → nil silently — but `(last nil)`, `(reverse nil)`, `(nth nil 0)` all raise.
### [low] [CONFIRMED] keys returns strings in reverse insertion order
- Repro: `(keys {:a 1 :b 2 :c 3})``("c" "b" "a")`. Determinism footgun for serialization/content-addressing.
### [low] [CONFIRMED] keyword-name unusable on evaluated keywords
- Repro: `(keyword-name :kw)` → error (`:kw` self-evaluates to `"kw"`); only `(keyword-name ':kw)` works.
### [low] [CONFIRMED] string->number: no rational/whitespace parsing
- Repro: `"1/2"` → nil (despite make-rational), `" 5 "` → nil, `"1e3"` → 1000, garbage → nil (good).
### [medium] [CONFIRMED — CORRECTED after cross-lane check] `apply` does not spread AT ALL on the native production surface
- Location: continue-with-call native-call path / apply primitive
- What: originally reported as "leading-args form missing, two-arg form works" — WRONG. Re-verified on fresh sx_server: `(apply + (list 1 2))``Unhandled exception: "Expected number, got list: "`. The list is passed as a single argument, never spread — `(apply str (list 1 2 3))``"(1 2 3)"` (str of the list itself). The earlier "works" observation came from a test-runner/harness environment with its own apply. Conformance lane F-3 independently found this AND that the WASM kernel spreads the 2-arg form (→ 6) while native errors — the same kernel family disagrees with itself on apply.
- Repro: `(apply + (list 1 2))` → error; `(apply + (list 1 2 3))` → error; `(apply str (list 1 2 3))``"(1 2 3)"` (fresh sx_server, verified 2026-07-03).
- Coverage: not covered on the production surface (runner env has a different apply — see the values/call-with-values finding for the same pattern).
### [low] [CONFIRMED] Strict checks are name-keyed at the call site — trivially evaded, and shadowers inherit checks
- Repro: `(let ((zz hh)) (zz "a"))` → unchecked; computed heads `((mk) "bad")` → unchecked; conversely a user fn shadowing a typed name gets the declared checks applied to it. First-class function flow is entirely unchecked.
- Coverage: not covered.
### [low] [CONFIRMED] set-prim-param-types! replaces wholesale; no validation; malformed specs fail cryptically and uncatchably
- What: second call wipes all earlier declarations (no merge); nonexistent prim names accepted silently; `{"positional" "oops"}` errors at call time with "Expected list, got string" (uncatchable, doesn't name the spec as culprit); `{"name" "not-a-dict"}` silently checks nothing; declaring types for HO-form names never fires (HO dispatch intercepts before the arg frame).
- Coverage: only the nil-reset path tested.
### [low] [CONFIRMED] Too-few args never error and their declared types are silently skipped
- What: user lambdas nil-fill missing params (`(f2 1)``(1 nil)` with b typed number, no error); strict-check-args guards `idx < len(args)` so unsupplied params skip checking. Too-many args DO error. `foreign-check-args` has the mirror asymmetry (extra args unchecked; code-level).
- Coverage: not covered.
### [low] [CONFIRMED] `(:as type)` parameter annotations are never enforced — even in strict mode
- Location: eval-rules.sx documents `(:as type)` in the lambda rule; spec/signals.sx uses them pervasively (`(s :as signal)`)
- Repro: `(define tf (fn ((x :as number)) x))` `(tf "not-a-number")` → returns the string, strict on or off. The natural per-param channel is decorative; strict mode reads only the global name-keyed dict.
- Coverage: not covered.
### [low] [CONFIRMED] Strict-machinery paper cuts
- Return types unsupported anywhere (params only). Rest-arg errors index from 0 within the rest section ("rest arg 0" is overall arg 2). `set-strict!` is one global OCaml ref — not per-env, not captured by continuations; toggling mid-program retroactively affects existing lambdas. Dead shadowed duplicates `_strict_ref`/`_prim_param_types_ref` at sx_ref.ml:18-19 (transpiler cruft, no desync). Host surface inconsistency: sx_server binds set-strict!/set-prim-param-types! but not value-matches-type?; the harness binds none. Positive: error message quality is good (names function, param, expected, actual, value).
### [low] [CONFIRMED] `batch` unusable on the server host; coroutines module inert outside the test runner
- What: `batch` calls `(batch-begin!)` on non-client hosts; `batch-begin!`/`batch-end!` are bound only in run_tests.ml:564 — on sx_server `(batch ...)``Undefined symbol: batch-begin!` (which, per the wedge finding, also leaves `*batch-depth*` stuck). Separately, spec/coroutines.sx lacks the trailing `(import (sx coroutines))` re-export that signals.sx/harness.sx have — loading it binds nothing globally; tests work only via explicit import + run_tests-only cek-* hooks.
- Coverage: not covered.
### [low] [CONFIRMED] `effect` stale cleanup double-invocation
- Location: spec/signals.sx `effect`/`run-effect` — cleanup-fn invoked at each re-run start but never cleared; only overwritten when a run returns a new callable
- Repro: effect returns cleanup only when v=0: after two resets, cleanup-calls = 2. Expected 1.
- Coverage: not covered.
### [low] [CONFIRMED] Guest parse errors carry no source locations; native has line/col on only 2 of ~8 error types
- Location: spec/parser.sx (all error sites location-free); sx_parser.ml (locations only for "Unexpected end of input"/"Unexpected char"; unterminated string/list/dict etc. location-free)
- Repro: `(sx-parse "(a (b)")` → just `"Unterminated list"`. Also test-source-locations.sx tests a parser-combinator library, NOT spec/parser.sx, and its cols are 0-based vs native 1-based.
- Coverage: no reader-location tests exist.
### [low] [CONFIRMED] Dict literal edges: odd form count → misleading error; duplicate keys silently last-win
- Repro: `{:a}``Unexpected character: }` (no mention of pairing); `{:a 1 :a 3}``{:a 3}` silently (both parsers).
- Coverage: not covered.
### [low] [CONFIRMED] `#|...|` is a raw string to the first `|`, not a block comment; `#|a|#` leaves a dangling `#`
- Repro: `(sx-parse "#|hello world|")``("hello world")`. Documented, but a Scheme-expectation trap with no test for the `|#` suffix case.
### [low] [CONFIRMED] Keyword edge tokens: `:` parses as keyword with empty name; `::a` is a keyword named ":a"
- Coverage: numeric-suffix/consecutive keywords tested; `:`/`::` not.
### [low] [CONFIRMED] Harness contract nits
- A throwing mock leaves no IO-log entry (append happens after the mock returns) — failed calls invisible to assert-io-called. `(assert cond)` one-arg form works only via the evaluator-wide nil-fill of missing params.
### [low] [CONFIRMED] CLAUDE.md points at a deleted canonical spec (`shared/sx/ref/*.sx`)
- What: CLAUDE.md instructs reading `shared/sx/ref/eval.sx`/`parser.sx`/`primitives.sx`/`render.sx` as "authoritative SX semantics"; the directory contains only `BOUNDARY.md` + Python cache. Live spec is `spec/*.sx`. Together with the island-authoring-rules drift (let/body semantics above), the project docs actively mislead on core semantics.
---
## SUSPECTED findings (reasoning only, not reproduced)
### [medium] [SUSPECTED] More nested-eval boundaries: `expand-macro`, `sf-let-values`, `sf-define-values`, `qq-expand` unquotes all evaluate via `(trampoline (eval-expr ...))` instead of CEK frames
- Location: spec/evaluator.sx, expand-macro (1548-1580), sf-let-values (1411-1417), sf-define-values (1443-1445), qq-expand unquote eval
- Reasoning: same structural pattern as the three CONFIRMED nested-run bugs (shift-k invoke, threading, signal) — continuation capture, `perform`/IO suspension, or raise-to-outer-handler inside a macro body, let-values initializer, or unquote crosses a nested trampoline the outer kont cannot see. let-values untestable at runtime (`values` missing — see medium finding); macro-expansion capture is expansion-time and rare.
- Coverage: not covered.
### [low] [SUSPECTED] env_merge is_descendant depth cap (>100) silently flips scoping semantics
- Location: hosts/ocaml/lib/sx_types.ml:394 (`if depth > 100 then false`)
- Reasoning: call-site env chains deeper than 100 frames false-negative the descendant check, activating the caller-frame-copy branch (the dynamic-scoping leak above) in code that was previously purely lexical. Rare (needs ~100 nested closure/let layers), silent flip. Code-read only.
- Coverage: not covered.
### [medium] [SUSPECTED] Canonical serialization is not cross-host deterministic — CIDs can differ between OCaml and JS
- Location: spec/canonical.sx (`canonical-number` uses host `str`; string case uses host `escape-string`)
- Reasoning + partial confirmation: OCaml `(canonical-serialize 1e-7)``"1e-07"` (verified live) while JS `String(1e-7)``"1e-7"` (code-read) — different canonical text → different sha3 CIDs for the same value. Also: sx_server escapes `\r` (sx_server.ml:1275), JS platform does not (platform.py:2628); integers beyond 2^53 exact on OCaml, unrepresentable in JS. Full cross-host CID comparison not run.
- Coverage: test-canonical.sx never canonicalises exponent-form floats, CR strings, or big ints. (Dict-key sorting IS implemented and idempotence holds for tested classes.)
### [medium] [SUSPECTED] Coroutine performing a non-yield effect is permanently wedged
- Location: spec/coroutines.sx, `coroutine-handle-result` — for a suspension with op ≠ "coroutine-yield" it does `(perform request)`: forwards outward but **discards both the answer and the coroutine's suspension**; state stays "running" and `coroutine-resume` has no "running" branch → "unexpected state: running"
- Reasoning: code-level; not reproducible outside run_tests (needs cek-step-loop/cek-resume hooks bound only in run_tests.ml:951-955). Correct forwarding would cek-resume the suspension with the outer answer in a loop.
- Coverage: test-coroutines.sx (27 tests) has zero `perform` usage.
### [low] [SUSPECTED] VM/JIT execution path has no strict checking
- Location: sx_vm.ml — zero callers of `strict_check_args` (repo-wide grep: only sx_ref.ml)
- Reasoning: any call executed as compiled bytecode bypasses checks. Could not confirm live — lazy JIT never engaged in CLI probes (bytecode-inspect after 300 calls: "no compiled bytecode").
- Coverage: not covered.
---
## Checked, NOT reproducible (negative results correcting project memory)
- **"Short helper names (name/dyad) hang the runtime"**: does NOT reproduce — `(define name …)`/`(define dyad …)` work. The `guard` case is the unshadowable-name finding (error, not hang).
- **"split is char-class not substring"**: harness/guest-worktree only. Real sx_server `(split "a--b" "--")``("a" "b")` substring, keeps empties. Multi-char delimiter untested in spec/tests — worth a pinning test.
- **"let is parallel / bodies evaluate only last expr / effects need inner let"** (CLAUDE.md island rules): all false for the spec evaluator — let is sequential, bodies are implicit begin (tested intent). Likely describes the separate OCaml SSR island path → doc fix + cross-lane check.
## Clean areas verified
**CEK core**: TCO through all special forms (named-let 200k, mutual 100k, non-tail 100k heap-safe);
call/cc escape/multi-shot/independence; shift/reset delimiting + multi-shot composable k; shift
without reset → clean error; escape from HO callbacks; multi-shot resume INTO map frames (no
accumulator leakage); raise through dynamic-wind one-shot (after exactly once, 50k-frame unwind);
`(and)`/`(or)`/`(begin)`/`(cond)`/if-no-else edge values; cond `=>`; head-position exprs;
parameterize; restart-case/invoke-restart.
**Env/scope**: closure sharing + isolation both directions; define local-in-lambda vs top-level
redefine; set! write-through 1-2 levels; `(let ((x x)) x)` → outer; letrec mutual recursion
(lambda case); emit!/emitted ordering/extent/nesting/TCO-survival/no-leak (correct but ZERO test
coverage — gap worth closing given the scope/provide bugs); provide/context/peek normal-flow
nesting (well covered); component &rest/kwarg interleaving; component set! does not write back
to caller; primitive shadowing works for genuine primitives.
**HO forms**: HoSetupFrame stages both args exactly once, left-to-right, both orders; map over
list-of-functions picks sane reading; guest raise mid-map caught cleanly; some/every?/filter/
for-each/map-indexed semantics sane (0/"" truthy — internally consistent); no double-eval in
threading (quoted-value splice protects data both paths); as->; ->> normalizes via swap; nested
map-in-map; reduce 100k in 0.3s; multi-map zips to shortest (covered).
**Special forms**: when/begin/do sequencing; and/or/if falsiness fully consistent (only false/nil
falsy); short-circuit verified; defmacro recursive expansion, &rest + `,@` templates, ~name heads
in qq; guard happy paths incl. R7RS auto-reraise; ->/set! interplay; eval-rules.sx accurate except
set!-error claim, cond clause mode, case evaluated-vals; `unless` intentionally userland.
**Render**: text + attr-value escaping correct; raw!/SxExpr single-escape guarantee (no double-
escape); registered void elements self-close, drop children silently; boolean-attr registry (23)
correct for true/false/nil; numbers/booleans/nil as children; aser wire semantics (components
unexpanded, control flow evaluated, string/dict args round-trip incl. quotes/unicode); recursive-
with-base-case components; fragment/nil/string/number component returns; &rest spliced flat.
**Primitives**: quotient/remainder/modulo signs R7RS-correct; substring clamping; replace;
trim/index-of/starts-with?/ends-with?; assoc/dissoc/merge/has-key?; range/flatten/chunk-every;
rationals (normalization, contagion, zero-denominator errors); vectors/sets/ports/chars/string-
buffers basics; dict-set! vs assoc; truthiness consistent; format directives; max/min zero-arg
errors clean. Not probed (dedicated green suites): zip-pairs, bitwise, bytevectors, regexp.
**Parser/serializer**: basic escapes correct + exact round-trips (quotes/backslashes/newlines/
multibyte strings); quote sugar nesting incl. before `)`/EOF; 10k-deep nesting + 10k-char tokens
parse fine (heap frames, no hangs on any adversarial input — every failure errors rather than
loops); serializer round-trips for number/keyword/symbol/list/nested-dict(ident keys)/bool/nil;
nil vs () vs {} distinct; canonical-dict key sorting + idempotence (tested classes); -0.0 → "0";
negative numbers vs `-` symbol; `5.`/`1e10`/`-1.5e-3`; comments at EOF; dotted pairs cleanly
rejected on all hosts; keyword AST round-trip.
**Strict typing**: value-matches-type? core semantics correct (number/string/boolean/nil/list/
dict; empty list not a dict; nullability exclusively via "type?" suffix — consistent; floats+ints
both "number"; quoted symbols; lambdas). ->/->> threading IS strict-checked (re-dispatches a real
call form). Recovery after a strict error works. Error messages high quality.
**Signals**: effect does not re-run on unrelated signals; effect's dispose-fn unsubscribes
correctly; batch dedups multiple resets of one signal (when it works — see wedge finding).
**Harness (spec/harness.sx)**: interceptors log args/result/op correctly; arity fan-out 0-3 +
apply; custom-platform merge over defaults; assertion messages descriptive.
---
## Handoffs to other lanes
- **HOSTS**: hosts/ocaml/bin/mcp_tree.ml maintains its own primitive table, drifted from
sx_primitives.ml (empty?/get/split/contains?/equal?/keyword-name differ — details in the
harness-divergence finding). Also: sx_harness_eval is a shared persistent image, not a fresh
sandbox; sx_read_subtree ignores `path`; sx_read_tree ignores `max_lines`.
- **HOSTS/CONFORMANCE — JIT vs interpreter divergence**: three confirmed behavior flips between
VM-compiled and interpreted paths: (1) set!-unbound writes vm.globals vs root env (split brain);
(2) env_merge caller-frame leak exists only interpreted ("VM undefined" under JIT); (3) named-let
leaked loop name reads as lambda interpreted / nil under JIT. Parity suite has no coverage.
- **HOSTS (Python shell)**: aser output embedded into `<script>` via `json.dumps` in
shared/sx/helpers.py `sx_streaming_resolve_script``json.dumps` doesn't escape `/` or `<`;
check whether serialized SX can ever contain `</script>` (aser HTML-escapes text children,
but attr/raw paths unverified).
- **CONFORMANCE**: run_tests.ml injects bindings absent from the real runtime — `values`/
`call-with-values` (test-values.sx), `contains-char?`/`trim-right` (canonical.sx),
`batch-begin!`/`batch-end!` (signals), cek-step-loop/cek-resume (coroutines). Whole suites are
green only in-runner; test-env vs runtime-env parity needs a systematic sweep.
- **CONFORMANCE — parser fleet**: three parser implementations (native OCaml reader, spec guest
parser over per-host primitive bindings, JS transpiled spec) with four ident/number classifier
tables that were never reconciled (details in the AST-divergence finding). Guest-parser platform
primitives (`parse-number`, `char-code`, `contains-char?`, `trim-right`, `reader-macro-get`,
`escape-string`) drift per host because each host re-binds them ad hoc. Suites only exercise
the intersection — that's why everything stays 1080/1080 green.
- **HOSTS (JS)**: JS parser silently corrupts invalid `\uXXXX` escapes (garbage string, no error)
where OCaml raises; JS `reader-macro-get` registry exists but OCaml's doesn't.
- **DOCS**: CLAUDE.md island-authoring rules describe non-spec semantics (parallel let, last-expr
bodies); CLAUDE.md canonical-reference section points at deleted files.
- **TOOLING incident log**: mid-review another session polluted the shared MCP image (`inc`
redefined to a constant, breaking guest parsing with spurious "Unterminated" errors); the parser
agent restored it. Underlines the harness-not-fresh finding — harness state is shared across
concurrent sessions.

1103
plans/sx-review/hosts.md Normal file

File diff suppressed because it is too large Load Diff