conformance.md F-2: no runner fed spec/tests through the shipped
sx_browser.bc.wasm.js — the F-1/F-3 native/WASM divergences existed
undetected because of exactly this gap.
Add hosts/ocaml/browser/run_wasm_corpus.js: boots the shipped kernel
headless in Node (stub block + module preload mirroring
test_wasm_native.js, the blessed boot path), registers the test-framework
hooks, runs ONE test file per process and emits a parseable CORPUS-RESULT
line — process isolation means a hanging file is killed by the driver's
per-file timeout without ending the sweep.
Add scripts/test-wasm-corpus.sh: sweeps spec/tests, applies a SKIP /
KNOWN_FAIL ledger (green-flip on a KNOWN_FAIL fails the run so the ledger
cannot rot), gates on everything else.
Empirical baseline (2026-07-04): 83 files, 80 fully green, 5192 passes,
zero test failures on the shipped kernel — including test-gate-pins
(29/29). KNOWN_FAIL: test-hash-table/test-r7rs/test-sets hit an opaque
jsoo load-error mid-file (22/87/30 tests pass first). Full sweep ~13 min;
sx-build-all.sh wiring deferred to the D3 gate-definition decision.
Test-only: no semantics edits, no push.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The DOM adapter's only test file asserts membership predicates ("if is a
render form") — zero tests inspect what render-to-dom actually builds
(hosts.md C23), and the file is excluded from the runner as browser-only.
Discovery: the browser-only assumption is false for render output —
(import (web adapter-dom)) disk-resolves in the OCaml runner and
render-to-dom works against its mock DOM. Add
web/tests/test-adapter-dom-render.sx (8 tests) pinning the adapter's real
output contract (probed first): text renders as a nodeType-3 child text
node; when/map wrap output in a FRAGMENT child; if inlines the chosen
branch; attrs/class/id land on the element; voids have no children.
Auto-included in default runs — first render-output coverage of the
1512-line adapter in the standard gate. Remaining depth (boolean attrs,
on-*/bind/ref/key, reactive attrs, hydration cursor) tracked on the
checklist. 254/0 standalone.
Test-only: no semantics edits, no push.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The synchronous harness binds mocks as plain NativeFns, so no harness test
could exercise the real CEK perform/suspend/resume path — the HO+perform
element-drop class (S10) was structurally invisible (hosts.md C21).
Add harness-run-perform to spec/harness.sx: drives make-cek-state/
cek-step-loop, services each (perform {:op X :args L}) suspension from the
session's platform mocks (entry logged before invocation, C22-consistent),
cek-resumes with the mock value, loops to terminal; clear error on an
unmocked op. Shared arity dispatch extracted as harness-invoke-mock.
Pins (gate-C21-perform-mode-harness): single suspension, arithmetic-frame
resume, sequential performs, unmocked-op error, and the S10 probe — map
over a perform-suspending lambda keeps ALL 3 elements through 3
suspensions on the CEK path (localizing the drop class to serving-JIT).
290/0 under OCaml run_tests; harness self-suite green.
Caveat (documented): requires the runner's cek-* driver bindings — absent
on bare sx_server/MCP, the same runner-only-binding theme as section B.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The interceptor appended the IO-log entry only after the mock returned, so
a throwing mock left no entry and error-path tests falsely reported "never
invoked" through assert-io-called/count (hosts.md C22, core.md K104).
spec/harness.sx make-interceptor now appends {:args :result nil :op}
BEFORE invoking the mock and updates :result in place via dict-set! on
return. This is W14-owned test infrastructure (PLAN.md W14 approach item
4), not a semantics edit.
Pins: suite gate-C22-throwing-mock-logged (throwing mock leaves an entry
with pending result; happy path updates the result; mixed throwing +
successful sequence counts all calls). Harness self-suite (15 tests) and
test-relate-picker (the only other harness consumer) verified green;
285/0 on the pins run.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
mcp_tree.ml's parallel primitive table drifted from sx_primitives.ml —
the spec-mandated harness verification path silently produced false
findings ((get {:a 1} :a 99) -> nil vs 1, char-class vs substring split,
etc.). dc7aa709 aligned 8 entries as a stopgap; the real fix (linking
sx_primitives) is hosts-lane.
Add scripts/test-harness-parity.sh: drives mcp_tree.exe sx_eval via raw
JSON-RPC and a fresh sx_server.exe via the epoch protocol, runs the
finding's 12-probe battery through both, fails on any divergence (errors
compared by inner message). 12/12 parity today — the stopgap holds and
can no longer rot silently.
Test-only: no semantics edits, no push.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Section-B audit, all verified live over the epoch protocol. Runner-only
bindings absent from production: values, call-with-values (run_tests.ml
:1131/:1140), contains-char? (rt.ml:728 + rt.js:85), trim-right (JS runner
ONLY — absent even from the OCaml runner), sha3-256 (rt.ml:745 + rt.js:88;
production's real primitive is crypto-sha3-256).
Consequences pinned: (canonical-serialize 42) on a fresh server errors
"Undefined symbol: contains-char?" — content addressing broken for ANY
number outside the runners. And BOTH runners' sha3-256 are FAKE stubs
(OCaml: Hashtbl.hash), so every test-computed CID differs from production.
scripts/test-env-parity.sh is a bidirectional ledger: MUST_HAVE bindings
going missing fail; a KNOWN_DRIFT binding APPEARING also fails with
instructions to move it to MUST_HAVE and flip the consequence pin — the
ledger cannot rot silently in either direction. 7/7 green.
Test-only: no semantics edits, no push.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Pre-fix, a routing-failure page was stored in the HTTP response cache as
200 and served byte-identically to every later visitor until restart
(cold 2s -> warm 0.0005s). dc7aa709 made http_render_page return
(html, is_error) and gated cache insertion on `not is_err`.
Extend scripts/test-protocol-gate.sh with an HTTP-mode case: fresh
sx_server.exe --http on a random port (timeout-bounded, own child killed),
GET the same nonexistent path twice, assert both requests re-render (two
[sx-http] render lines) and the "[cache] ... error page, not cached" gate
line appears. Standalone-worktree caveat (all docs pages render as soft
error pages, so no positive cache control) documented in the script.
5/5 protocol-gate green; 267/0 sx gate pins. All seven section-A test-debt
pins now landed (K18, K20, K09/K11/K39, K49, crit-2, C1/C1b, S4).
Test-only: no semantics edits, no push.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Pre-fix, one malformed or non-ASCII line on sx_server's top-level command
channel raised an uncaught Parse_error and killed the whole shared process
(bridges + conformance runners). dc7aa709 guards the parse; the server now
answers (error N "Malformed command line: ...") and keeps serving.
Add scripts/test-protocol-gate.sh: per case, spawn a fresh timeout-bounded
sx_server.exe (never touches a shared process) and assert the error
response, the follow-up epoch still evaluating, and a clean exit. Cases:
C1 unterminated list + garbage line, C1b non-ASCII byte (exact review
repros from plans/sx-review/hosts.md), plus a well-formed control. 4/4
green. Structured to grow into W14 section E's protocol fuzz suite (C3-C7).
Test-only: no semantics edits, no push.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
crit-2's failure mode discards every frame outside the signal site —
including the covering test's own assert — which is why the shipped test
"signal returns handler value to call site" passed vacuously pre-fix. A
plain assert pin would inherit that vacuity on regression.
Add suite gate-crit2-signal-return-kont with a side-effect sentinel: test 1
runs the core.md repros ((list "outer" (handler-bind ... (+ 1
(signal-condition 5))) "end") -> ("outer" 43 "end"); raise-continuable ->
143) then set!s a top-level flag; test 2 independently asserts the flag, so
a dropped continuation fails loudly even though test 1 would "pass". Third
test pins the shipped-test expression (51). 267 passed / 0 failed under
OCaml run_tests.
Test-only: no semantics edits, no push.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
K49: area/base/embed/param/track were in VOID_ELEMENTS but missing from
HTML_TAGS — render fell through to "Undefined symbol: base". dc7aa709 fixed
spec/render.sx; add suite gate-K49-void-elements-renderable (3 tests): the
spec registry contains all five, and render-to-html renders each as a
self-closing void. 264 passed / 0 failed under OCaml run_tests.
DISCOVERY (recorded in the briefing's Blocked section): the generated
hosts/ocaml/lib/sx_render.ml was never regenerated after the spec fix — its
stale html_tags_list still lacks the five tags, so the runner's native
render-html path STILL errors. Fix is a bootstrap_render.py regen (hosts
lane, out of scope for this test-only loop). Live evidence for F13
(regen-diff CI gate). Pin covers the spec side only for now.
Also corrects the checklist label: K49 = void elements; the depth/cycle
guard is K16 (OPEN, W8).
Test-only: no semantics edits, no push.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Three dc7aa709 fixes shipped without pinning tests:
- K09: R7RS longhand (unquote-splicing X) now splices (was silent zero-splice)
- K11: guard re-raise sentinel gensym'd — a user value shaped like
(list '__guard-reraise__ X) is data, not a forged re-raise
- K39: (do ((fn (x) x) 5) 99) -> 99, not a misparsed Scheme do-loop
Add suites gate-K09-longhand-unquote-splicing, gate-K11-guard-reraise-forgeable,
gate-K39-do-iife-head to spec/tests/test-gate-pins.sx with exact reprs from
plans/sx-review/core.md. 261 passed / 0 failed under OCaml run_tests.
Test-only: no semantics edits, no push.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
contains? did not support dict key membership in the real runtime —
(contains? {:a 1} :a) threw "contains?: 2 args", contradicting its own :doc.
The fix landed (primitives.sx + sx_primitives.ml) but had no pinning test.
Add suite gate-K20-contains-dict to spec/tests/test-gate-pins.sx (4 tests,
repro from plans/sx-review/core.md): present key true, missing key false,
list membership + string substring unchanged. 8/8 green under OCaml run_tests.
Test-only: no semantics edits, no push.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The dc7aa709 quick-wins batch fixed `expt`'s silent 63-bit int wrap (now
promotes to float like +/*) but shipped no pinning test — a regression would
pass silently. Add spec/tests/test-gate-pins.sx suite gate-K18-expt-overflow
(4 tests, minimal reprs from plans/sx-review/core.md): small exponents exact,
2^62 and 2^100 do not wrap, 2^100 is a float. 4/4 green under OCaml run_tests.
Also bootstraps plans/agent-briefings/sx-gate-loop.md (the loop's own briefing,
absent until now) with the W14 checklist derived from PLAN.md §W14.
Test-only: no semantics edits, no push.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Companion to plans/fed-sx-host-types.md. Build sheet for the deferred
lib/host adapter slice (fed_sx_outbox / fed_sx_inbox): projects the
host's existing type-post metamodel (blog.sx: :cid, :schema, subtype-of
graph) onto the fed-sx DefineType/SubtypeOf verbs, ingests peers' types
into peer_types, validates inbound typed objects via
pipeline:apply_object_schema/2, and serves GET /types/<cid>.
Surfaces the two gating decisions for loops/host: the SX-host <->
Erlang-on-SX runtime boundary (recommends an HTTP boundary to dodge the
er-scheduler gen_server:call deadlock) and the type-CID identity choice.
Scope is the inverse of this loop: lib/host/** only, no next/ edits.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Pairs with Blockers #4 in plans/fed-sx-milestone-2.md. The
http-listen handler holds the SX runtime mutex; any gen_server:call
from inside a route deadlocks because the gen_server reply
scheduler needs the runtime the caller is sitting on. m2's Step 12
two-instance smoke test gates on this.
Briefing pre-loads the fix-loop agent with:
- Verified reproducer (deterministic curl-hang against
http_server:start(P, [{kernel, nx_kernel}]))
- Two fix-pattern candidates (release mutex around sx_call vs
spawn handler in fresh er-process)
- Acceptance criteria: http_server_tcp.sh 5/5 + a NEW kernel-
aware request passes without hanging
- Scope guardrails: only hosts/ocaml/bin/sx_server.ml +
adjacent lib/sx_runtime.ml; m2's next/** and lib/erlang/** are
OFF LIMITS
Worktree at /root/rose-ash-loops/fed-prims, branch loops/fed-prims
already exists (Phases A-J landed). This is a follow-up fix loop,
not a continuation of the original phase plan.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
tcl conformance.sh walks foreign lib/tcl/tests/programs/*.tcl files, reads each
first line's '# expected: VALUE' annotation, uses python3 to escape the Tcl
source into an SX helper, evaluates via (tcl-eval-string ...), and string-compares
got vs expected in bash. No SX test suites and no SX counter/dict scoreboard, so
the shared driver can't drive it (same category as lua/js/forth). Left
conformance.sh untouched; recorded the exclusion.
This completes the A1 worklist: 4 migrated onto the shared driver (common-lisp,
erlang, feed, go) and 5 excluded as foreign runners (forth, js, ocaml,
smalltalk, tcl).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
smalltalk conformance.sh catalogs foreign lib/smalltalk/tests/programs/*.st
programs, runs 'bash lib/smalltalk/test.sh -v', and scrapes its output (the
'OK 403/403' summary plus per-file pass counts via awk). It loads no SX test
suites directly and emits no SX counter/dict scoreboard. This is the briefing's
own classification example ('smalltalk runs *.st via test.sh') and the same
'scrapes a test.sh' exclusion as ocaml/lua. Left conformance.sh untouched;
recorded the exclusion.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
ocaml conformance.sh runs 'bash lib/ocaml/test.sh -v', scrapes its
human-readable ok/FAIL lines, and re-classifies each test into suites via bash
description-matching heuristics; it also scrapes lib/ocaml/baseline/run.sh
(foreign .ml programs). The underlying test.sh is a per-assertion epoch runner
(hundreds of individual (ocaml-test-...) evals, one epoch each) with no
suite-level counter variables or dict runners, so the driver's
counter/dict-scoreboard model has nothing to point at without rewriting the test
harness. 'Scrapes a test.sh' is the briefing's named exclusion criterion (test.sh
even notes it mirrors lib/lua/test.sh, the canonical excluded case). Left
conformance.sh untouched; recorded the exclusion.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
js conformance.sh walks lib/js/test262-slice/**/*.js (foreign test262
fixtures), escapes each with python3, evals via (js-eval), and compares output
to a sibling .expected file by substring match — counting pass/fail in bash
against a >=50% target. It loads no SX test suites and emits no SX counter/dict
scoreboard (no scoreboard.json). The shared driver only epoch-loads SX preloads
and evals SX test suites emitting a scoreboard — it cannot drive a
foreign-fixture-vs-expected comparison harness (same category as
lua/forth/smalltalk). Left conformance.sh untouched; recorded the exclusion.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Go has the same structure as erlang: suites load into one session and each
exposes a pass counter plus a *count* (total) counter rather than a fail
counter. MODE=dict fits — each suite's runner is a dict literal
{:passed P :failed (- count P) :total count}. No driver change; conformance.conf
+ 3-line shim, historical scoreboard schema preserved.
Parity verified 609/609 (0 fail), every suite matching baseline.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
forth's conformance.sh reads a foreign Forth test corpus (Hayes Core core.fr),
preprocesses it with awk + an external python3 chunk-splitter that generates a
chunks.sx of raw source strings, then runs them through the interpreter via
(hayes-run-all). The shared driver only epoch-loads SX preloads and evals SX
test suites emitting a counter/dict scoreboard — it cannot reproduce the
external preprocessing pipeline over a foreign .fr corpus (same category as
lua/smalltalk). No SX tests/*.sx suites exist to migrate. Left conformance.sh
untouched; recorded the exclusion.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Feed is the canonical MODE=counters shape: each suite runs in a fresh session
with shared preloads and a single feed-test-pass/feed-test-fail pair. Lifted the
old script's inline epoch-2 counter + feed-test helper defs into
lib/feed/test-harness.sx (preloaded last) so the driver can load them before
each suite. conformance.conf + 3-line shim; historical scoreboard schema
preserved. No driver change needed.
Parity verified 189/189 (0 fail), every suite matching baseline.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Erlang's suites load into one session and each exposes a pass counter plus a
*count* (total) counter rather than a fail counter, so MODE=dict fits directly:
each suite's runner is a dict literal {:passed P :failed (- count P) :total count}.
No driver change needed (dict mode already supports arbitrary runner expressions).
conformance.conf + 3-line shim; historical scoreboard schema preserved.
Parity verified 761/761 (0 fail), every suite matching baseline.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Extend the shared driver's MODE=counters with a backward-compatible SUITES
format: name:file[:pass-var:fail-var[:extra-preload ...]]. Optional per-suite
counter symbols (override the global COUNTERS_PASS/COUNTERS_FAIL) and per-suite
preload chains (loaded after the global PRELOADS). Plain name:file entries are
unchanged — verified against haskell (fib/sieve/quicksort 2/2/5, matches
committed scoreboard).
common-lisp has 8 distinct per-suite counter pairs and a different preload
chain per suite, so it could not fit the single-counter/fixed-preload model;
the extended format expresses it directly. conformance.conf keeps the historical
scoreboard schema; conformance.sh becomes the 3-line shim.
Result 487/487 (0 fail) vs the old 305/0 baseline — higher and explained: the
old per-suite 'timeout 30' was too tight for the slow eval suite (~15-25s under
contention), silently recording it as 0; the driver's 180s budget recovers its
true 182. geometry/mop-trace stay 0/0 (pre-existing refl-class-chain-depth-with
load error; counter vars defined as 0 -> clean gc-result, no fail-fallback).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Classified migratable-in-kind (SX suites over epoch, not a foreign runner)
but blocked on driver feature gaps: 8 distinct per-suite counter variable
name pairs and per-suite preload chains, neither supported by MODE=counters
(single global counter + fixed preloads) nor MODE=dict (load-time counter
collisions across suites). Baseline 305/0 across 12 suites. Did not migrate;
conformance.sh left untouched. Driver unchanged (out of per-iteration scope).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Briefing for the loop that builds the host-side servicer for persist/* IO ops,
making lib/persist's durable backend actually durable. Points at the Blocker
spec in plans/persist-on-sx.md as the authoritative contract; hard rules on
build isolation (worktree _build only, never clobber the shared binary) and not
pkilling the shared sx_server.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Make explicit that the loop may lean on Prolog backtracking (pl-query-all) and cut,
preferring clause-order precedence via pl-query-one. Default to sx_write_file over
path/pattern edits; flag that sx_insert_near drops all but the first form. Document
the loaded-env primitive restriction (includes?/chars/etc. undefined after prolog
preloads; use the tokenizer's surviving set) and that negation is the not(Goal)
functor, not the prefix \+ operator.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Plans for acl-on-sx (Datalog), flow-on-sx (Scheme), feed-on-sx (APL),
mod-on-sx (Prolog), search-on-sx (Haskell). Each is a 4-phase queue
sitting on its respective guest language, targeting rose-ash needs:
access control, durable workflows, activity feeds, moderation, search.
Federation extension in Phase 4 of each (plugs into fed-sx).
Briefings for the three loops we're kicking off now: acl-loop,
flow-loop, feed-loop. mod-sx and search-sx briefings will follow
once the first three have surfaced any shared infrastructure
worth extracting to lib/guest/.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Acceptance bar hit (40 runtime, 497 total). Tests: timer ready,
select-with-timeout, fan-in (3 producers), worker queue, pipeline,
fan-out-then-fan-in, select source-order, fallback case, default,
producer-consumer, two-stage pipeline, channel-counter, after+default,
tick-collector.
Shape chiselled: timer collapses "after duration" into
"channel ready immediately" — select needs only ready? from each
case. Real time is when the flip happens, not what the protocol is.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pure-OCaml crypto/CBOR/CID/Ed25519/RSA + native HTTP server in
hosts/ocaml/, the host-primitive surface Erlang Phase 8 BIFs and
fed-sx Milestone 1 are blocked on. WASM-safe lib boundary enforced.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
plans/sx-vm-opcode-extension.md ports over from loops/erlang (f6a68656)
with the opcode partition adjusted to match real VM usage: 1-199 core
(current ceiling 175 = OP_DEC), 200-247 extensions, 248-255 reserved.
plans/agent-briefings/sx-vm-extensions-loop.md captures the per-fire
workflow and ground rules.