The OCaml suite's permanent ~273-failure band (in-progress hs-* + the
r7rs radix shadow) is normalized, so real regressions hide in red noise
(conformance.md F-10). A runner skip-list would rewrite the hs loops'
scoreboards mid-flight — instead, pin the band:
scripts/test-suite-baseline.sh runs the full suite and diffs its FAIL set
against spec/tests/known-failures.txt (273 entries, identity =
"suite > name", error text stripped). Red on a NEW failure (regression)
AND red on a vanished failure (fix landed — delete it from the baseline,
locking in the win). The band still prints as FAIL lines for the teams
working through it; nothing in the runner changes.
Bonus capture: 2 of the 273 have EMPTY suite labels (can-map-an-array,
string->number) — live evidence for C9, the next checklist item.
Validated end-to-end: GREEN on current tree (5800p/273f — 38 net passes
above dc7aa709's 5762 from this loop's added pins). Runtime ~12 min.
Test-only: no semantics edits, no push.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
All five protocol quirks are OPEN server-side, so the suite pins CURRENT
behavior (verified live) as a bidirectional ledger in
scripts/test-protocol-gate.sh:
- C3: stray (io-response ...) answered as Unknown command (dead guard)
- C4: malformed (epoch) errors and leaves the epoch stale (envelope
changed since the finding: the dc7aa709 guard answers rather than kills)
- C5: decreasing epoch accepted silently (no monotonic enforcement)
- C6: two commands on one line -> one error, neither executed
- C7: vm-trace without compiler -> opaque "Not callable: nil"
Plus the fuzz property that matters: 60 deterministically-seeded hostile
lines (unbalanced parens, control chars, unicode, 2KB lines, stray
io-responses, epoch mutations) followed by a well-formed command — the
server must still answer and exit cleanly. protocol-gate: 11/11.
When a server-side fix lands, the matching ledger pin fails loudly and the
ledger is updated to assert the corrected behavior.
Test-only: no semantics edits, no push.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
conformance.md F-2: no runner fed spec/tests through the shipped
sx_browser.bc.wasm.js — the F-1/F-3 native/WASM divergences existed
undetected because of exactly this gap.
Add hosts/ocaml/browser/run_wasm_corpus.js: boots the shipped kernel
headless in Node (stub block + module preload mirroring
test_wasm_native.js, the blessed boot path), registers the test-framework
hooks, runs ONE test file per process and emits a parseable CORPUS-RESULT
line — process isolation means a hanging file is killed by the driver's
per-file timeout without ending the sweep.
Add scripts/test-wasm-corpus.sh: sweeps spec/tests, applies a SKIP /
KNOWN_FAIL ledger (green-flip on a KNOWN_FAIL fails the run so the ledger
cannot rot), gates on everything else.
Empirical baseline (2026-07-04): 83 files, 80 fully green, 5192 passes,
zero test failures on the shipped kernel — including test-gate-pins
(29/29). KNOWN_FAIL: test-hash-table/test-r7rs/test-sets hit an opaque
jsoo load-error mid-file (22/87/30 tests pass first). Full sweep ~13 min;
sx-build-all.sh wiring deferred to the D3 gate-definition decision.
Test-only: no semantics edits, no push.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
mcp_tree.ml's parallel primitive table drifted from sx_primitives.ml —
the spec-mandated harness verification path silently produced false
findings ((get {:a 1} :a 99) -> nil vs 1, char-class vs substring split,
etc.). dc7aa709 aligned 8 entries as a stopgap; the real fix (linking
sx_primitives) is hosts-lane.
Add scripts/test-harness-parity.sh: drives mcp_tree.exe sx_eval via raw
JSON-RPC and a fresh sx_server.exe via the epoch protocol, runs the
finding's 12-probe battery through both, fails on any divergence (errors
compared by inner message). 12/12 parity today — the stopgap holds and
can no longer rot silently.
Test-only: no semantics edits, no push.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Section-B audit, all verified live over the epoch protocol. Runner-only
bindings absent from production: values, call-with-values (run_tests.ml
:1131/:1140), contains-char? (rt.ml:728 + rt.js:85), trim-right (JS runner
ONLY — absent even from the OCaml runner), sha3-256 (rt.ml:745 + rt.js:88;
production's real primitive is crypto-sha3-256).
Consequences pinned: (canonical-serialize 42) on a fresh server errors
"Undefined symbol: contains-char?" — content addressing broken for ANY
number outside the runners. And BOTH runners' sha3-256 are FAKE stubs
(OCaml: Hashtbl.hash), so every test-computed CID differs from production.
scripts/test-env-parity.sh is a bidirectional ledger: MUST_HAVE bindings
going missing fail; a KNOWN_DRIFT binding APPEARING also fails with
instructions to move it to MUST_HAVE and flip the consequence pin — the
ledger cannot rot silently in either direction. 7/7 green.
Test-only: no semantics edits, no push.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Pre-fix, a routing-failure page was stored in the HTTP response cache as
200 and served byte-identically to every later visitor until restart
(cold 2s -> warm 0.0005s). dc7aa709 made http_render_page return
(html, is_error) and gated cache insertion on `not is_err`.
Extend scripts/test-protocol-gate.sh with an HTTP-mode case: fresh
sx_server.exe --http on a random port (timeout-bounded, own child killed),
GET the same nonexistent path twice, assert both requests re-render (two
[sx-http] render lines) and the "[cache] ... error page, not cached" gate
line appears. Standalone-worktree caveat (all docs pages render as soft
error pages, so no positive cache control) documented in the script.
5/5 protocol-gate green; 267/0 sx gate pins. All seven section-A test-debt
pins now landed (K18, K20, K09/K11/K39, K49, crit-2, C1/C1b, S4).
Test-only: no semantics edits, no push.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Pre-fix, one malformed or non-ASCII line on sx_server's top-level command
channel raised an uncaught Parse_error and killed the whole shared process
(bridges + conformance runners). dc7aa709 guards the parse; the server now
answers (error N "Malformed command line: ...") and keeps serving.
Add scripts/test-protocol-gate.sh: per case, spawn a fresh timeout-bounded
sx_server.exe (never touches a shared process) and assert the error
response, the follow-up epoch still evaluating, and a clean exit. Cases:
C1 unterminated list + garbage line, C1b non-ASCII byte (exact review
repros from plans/sx-review/hosts.md), plus a well-formed control. 4/4
green. Structured to grow into W14 section E's protocol fuzz suite (C3-C7).
Test-only: no semantics edits, no push.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Replaces the watchdog-bump approach with an automated check. The next 5× (or
worse) substrate regression will trip the alarm at build time instead of
hiding behind a deadline bump and only being noticed weeks later.
Components:
* lib/perf-smoke.sx — four micro-benchmarks chosen for distinct substrate
failure modes: function-call dispatch (fib), env construction (let-chain),
HO-form dispatch + lambda creation (map-sq), TCO + primitive dispatch
(tail-loop). Warm-up pass populates JIT cache before the timed pass so we
measure the steady state.
* scripts/perf-smoke.sh — pipes lib/perf-smoke.sx to sx_server.exe, parses
per-bench wall-time, asserts each is within FACTOR× of the recorded
reference (default 5×). `--update` rewrites the reference in-place.
* scripts/sx-build-all.sh — perf-smoke wired in as a post-step after JS
tests. Hard fail if any benchmark regressed beyond budget.
Reference numbers: minimum across 6 back-to-back runs on this dev machine
under typical concurrent-loop contention (load ~9, 2 vCPU, 7.6 GiB RAM,
OCaml 5.2.0, architecture @ 92f6f187). Documented in
plans/jit-perf-regression.md including how to update them.
The 5× factor is chosen so contention noise (~1–2× variance) doesn't trigger
false alarms but a real ≥5× substrate regression — the kind that motivated
this whole investigation — fails the build immediately.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
scripts/extract-upstream-tests.py — new walker that scrapes
/tmp/hs-upstream/test/**/*.js for test('name', ...) patterns. Uses
brace-counting that handles strings, regex, comments, and template
literals. Two modes:
- merge (default): preserves existing test bodies, only adds new tests
- --replace: discards old bodies, fully re-extracts (use when bodies
drift due to upstream cleanup)
Merge mode is what we want for an incremental sync — the old snapshot
had bodies that had been hand-tuned for our auto-translator; raw
re-extraction loses those tweaks and regresses ~250 working tests
back to SKIP (untranslated).
Snapshot updated: spec/tests/hyperscript-upstream-tests.json grows
from 1496 → 1514 tests. All 18 new tests are documented as either
manual bodies (3) or skips (15):
Manual bodies (3):
- on resize from window — dispatches via host-global "window"
- toggle between followed by for-in loop works — direct test
Skips for architectural reasons (15):
- 13× core/tokenizer — upstream exposes a streaming token API
(matchToken, peekToken, consumeUntil, pushFollow…) that our
tokenizer doesn't surface. Implementing it = a token-stream
wrapper primitive over hs-tokenize output.
- 2× ext/component — template-based components via
<script type="text/hyperscript-template">. We use defcomp directly;
no template-bootstrap path.
- 1× toggle does not consume a following for-in loop — parser
ambiguity in 'toggle .foo for <X>'. Parser must distinguish
'for <duration>ms' from 'for <ident> in <expr>'. The 'toggle
between' variant works (different parse path).
Net per-suite status: every individual suite passes 100% on counted
tests (skips excluded). 1496 runnable / 1514 total = 100% on what runs.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Plans + briefings for four new language loops, each with a delcc/JIT
showcase that the runtime already supports natively:
- common-lisp — conditions + restarts on delimited continuations
- apl — rank-polymorphic primitives + 6 operators on the JIT
- ruby — fibers as delcc, blocks/yield as escape continuations
- tcl — uplevel/upvar via first-class env chain, the Dodekalogue
Launcher scripts now spawn 12 windows (was 8).
Previous version ran all 7 claude sessions in the main working tree on
branch 'architecture'. That would race on git operations and cross-
contaminate commits between languages even though their file scopes
don't overlap. Now each session runs in /root/rose-ash-loops/<lang> on
branch loops/<lang>, created from the current architecture HEAD.
sx-loops-down.sh gains --clean to remove the worktrees; loops/<lang>
branches stay unless explicitly deleted.
Also: second Enter keystroke after the /loop command, since Claude's
input box sometimes interprets the first newline as a soft break.
sx-loops-up.sh spawns a tmux session 'sx-loops' with 7 windows (lua,
prolog, forth, erlang, haskell, js, hs). Each window runs 'claude'
and then /loop against its briefing at plans/agent-briefings/<x>-loop.md.
Optional arg is the interval (e.g. 15m); omit for model-self-paced.
Each loop does ONE iteration per fire: pick the first unchecked [ ] item,
implement, test, commit, tick, log — then stop. Commits push to
origin/loops/<lang> (safe; not main).
sx-loops-down.sh sends /exit to each window and kills the session.
Attach with: tmux a -t sx-loops
- scripts/loop-guard.sh — atomic claim with 30-min staleness overtake,
appends NDJSON event to .loop-logs/<lang>.ndjson. Exit 0 = go ahead,
exit 1 = another run is live, skip.
- scripts/loop-release.sh — clear lock, log release with exit status.
Intended for 7 per-language /schedule routines firing every 15 minutes.
Lock detects overlap so tight cadences are safe; stale lock (>30 min)
overtaken automatically if an agent dies mid-run.
Python + shell tooling used to split grouped index.sx files into
one-directory-per-page layout (see the hyperscript gallery migration).
name-mapping.json records the rename table; strip_names.py is a helper
for extracting component names from .sx sources.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 1 Step 1 of the architecture roadmap. The old cssx.sx
(cssx-resolve, cssx-process-token, cssx-template, old tw function)
is superseded by the ~tw component system in tw.sx.
- Delete shared/sx/templates/cssx.sx
- Remove cssx.sx from all load lists (sx_server.ml, run_tests.ml,
mcp_tree.ml, compile-modules.js, bundle.sh, sx-build-all.sh)
- Replace (tw "tokens") inline style calls with (~tw :tokens "tokens")
in layouts.sx and not-found.sx
- Remove _css-hash / init-css-tracking / SX-Css header plumbing
(dead code — ~tw/flush + flush-collected-styles handle CSS now)
- Remove sx-css-classes param and meta tag from shell template
- Update stale data-cssx references to data-sx-css in tests
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
signal-add-sub! used (append! subscribers f) which returns a new list
for immutable List but discards the result — after signal-remove-sub!
replaces the subscribers list via dict-set!, re-adding subscribers
silently fails. Counter island only worked once (0→1 then stuck).
Fix: use (dict-set! s "subscribers" (append ...)) to explicitly update
the dict field, matching signal-remove-sub!'s pattern.
Build pipeline fixes:
- sx-build-all.sh now bundles spec→dist and recompiles .sxbc bytecode
- compile-modules.js syncs .sx source files alongside .sxbc to wasm/sx/
- Per-file cache busting: wasm, platform JS, and sxbc each get own hash
- bundle.sh adds cssx.sx to dist
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The deployed sx_browser.bc.wasm.js was actually the js_of_ocaml output
(pure JS), not the wasm_of_ocaml loader. Nothing synced the correct
build output from _build/ to shared/static/wasm/.
- sx_build target=ocaml now auto-syncs WASM kernel + JS fallback + assets
- sx-build-all.sh syncs after dune build
- Correct 68KB WASM loader replaces 3.6MB JS imposter
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>