The interceptor appended the IO-log entry only after the mock returned, so
a throwing mock left no entry and error-path tests falsely reported "never
invoked" through assert-io-called/count (hosts.md C22, core.md K104).
spec/harness.sx make-interceptor now appends {:args :result nil :op}
BEFORE invoking the mock and updates :result in place via dict-set! on
return. This is W14-owned test infrastructure (PLAN.md W14 approach item
4), not a semantics edit.
Pins: suite gate-C22-throwing-mock-logged (throwing mock leaves an entry
with pending result; happy path updates the result; mixed throwing +
successful sequence counts all calls). Harness self-suite (15 tests) and
test-relate-picker (the only other harness consumer) verified green;
285/0 on the pins run.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
13 KiB
13 KiB
W14 — Test gate & conformance infrastructure loop
Forge agent ws-W14. Role: build out W14 from the SX review remediation plan
(plans/sx-review/PLAN.md, §"W14. Test gate & conformance infrastructure") —
the enabler that makes every other fix verifiable. One checklist item per fire.
You are on branch loops/sx-ws-w14, worktree /root/rose-ash-loops/sx-ws-w14.
Hard guardrails (read every fire)
- TEST-ONLY. No semantics edits. Do NOT touch
spec/evaluator.sx,spec/primitives.sx,spec/parser.sx,spec/render.sx, the OCaml kernel, or any host runtime. W14 pins behavior with tests and productionizes the test/runner surface; the actual fixes are other workstreams (W1–W12). A pin that fails means the finding regressed — do NOT relax the assertion, record it as a blocker. - NO PUSH. Commit locally on
loops/sx-ws-w14only. Never push; never touchmainorarchitecture. .sxfiles: usesx-treeMCP tools only (a hook blocks Read/Write/Edit on.sx).sx_write_filetakes paramsfileandsource(NOTcontent— a wrong key yields ayojson … got nullerror and no write)..md/.sh/.mlfiles: normal tools are fine.- Never
pkill/killsx_server— sibling loops share the binary. Bound every run withtimeout(e.g.timeout 300 …); if it hangs, let the timeout end it. - One item per fire, then stop. No batching.
Per-iteration procedure
- Pick the first unchecked
[ ]in the checklist. - Implement (test file or runner/harness change), lifting minimal repros from
the review lane files (
plans/sx-review/{core,hosts,conformance}.md) — they are a ready-made corpus of confirmed reprs. - Build + run the affected tests:
sx_build(target ocaml) thentimeout 300 ./hosts/ocaml/_build/default/bin/run_tests.exe <test-name>to run a single file. Newspec/tests/test-*.sxfiles are auto-discovered. - Confirm green (a pin must PASS on current HEAD — the fix already landed).
- Commit locally:
git add -A && git commitwith aW14:prefix. - Tick the box, prepend one dated line to the Progress log, stop.
Checklist
A. Test-debt pins — dc7aa709's landed fixes shipped without regression tests
Pin each confirmed-and-fixed finding with a minimal repro. Add suites to
spec/tests/test-gate-pins.sx (one defsuite per finding).
- K18 [W7] —
exptoverflow now float-promotes (no 63-bit wrap) - K20 [W7] —
contains?now supports dict key membership - K09/K11/K39 [W5] — longhand
unquote-splicing, guard sentinel gensym,doIIFE-head - K49 [W8] — five void elements (area/base/embed/param/track) renderable (spec side; native regen drift → see Blocked). NB: the depth/cycle guard is K16 [W8], still OPEN — not a W14 pin target until its fix lands
- crit-2 [W1] — signal-return kont pinned NON-VACUOUSLY (side-effect sentinel across two tests; a plain assert would inherit the vacuity)
- C1/C1b [W3] — command-channel crash guards pinned
(
scripts/test-protocol-gate.sh, seed for section E's fuzz suite) - S4 [hosts] — soft error pages not cached (HTTP-mode pin in
scripts/test-protocol-gate.sh; NB S4 lives in hosts.md, not conformance — "housekeeping" was a mislabel from F-15's tag)
B. Runner/production env unification
- Audit runner-only bindings — inventory + bidirectional ledger in
scripts/test-env-parity.sh(KNOWN_DRIFT: values, call-with-values, contains-char?, trim-right, sha3-256; consequence pin: canonical-serialize broken on server; BOTH runners' sha3-256 are FAKE stubs → test CIDs ≠ production CIDs)
C. Harness honesty
- K19 — harness/runtime parity pinned (
scripts/test-harness-parity.sh: drives mcp_tree sx_eval over JSON-RPC vs fresh sx_server over epoch, 12-probe battery from the finding, errors compared by message) - C22/K104 — FIXED harness (spec/harness.sx make-interceptor: log entry appended before the mock runs, :result updated via dict-set!) + 3 pins
- C21 — real perform/suspend mode in harness
- C23 — adapter-dom render-output tests
D. WASM corpus runner
- F2 — promote conformance's
run_wasm.jsprototype into CI
E. Epoch-loop protocol fuzz + skip-list
- C3/C4/C5/C6/C7 — epoch protocol fuzz suite
- F10 — hs-upstream skip-list so browser-only FAILs mean something
- C9 — empty suite label
F. Differential battery
- F8 — cross-host differential battery (same source, all hosts agree)
Progress log (newest first)
- 2026-07-04 — C22/K104 throwing-mock fix + pins (item C.2). First
actual FIX of the loop — in scope because spec/harness.sx is W14-owned
test infrastructure (PLAN approach item 4 assigns "log IO before invoking
the mock" to W14). TDD: reproduced pre-fix (caught error, 0 log entries),
then restructured
make-interceptorto append the entry BEFORE the mock runs (:result nilwhile pending,dict-set!in place on return). Verified: throwing mock leaves entry, happy path updates result, mixed sequence counts all 3. Added suitegate-C22-throwing-mock-logged(3 tests). Harness self-suite (15) + test-relate-picker (only other harness consumer) green; 285/0 pins run. Tooling notes: replace/insert tools takenew_source(notreplacement); find_all paths still disagree with read_subtree/replace_node on define-library files — sx_write_file remains the reliable route. Test-infra-only. - 2026-07-04 — K19 harness-parity pin (item C.1). Authored
scripts/test-harness-parity.sh: drivesmcp_tree.exesx_evalwith raw JSON-RPC over stdio and a freshsx_server.exeover the epoch protocol, running the finding's exact 12-probe battery (empty?/get/ split/equal?/contains?/keyword-name/char-code/parse-number) through both and failing on ANY divergence. Errors normalized to their inner message so identical failures compare equal (keyword-name :kwerrors the same way on both — keywords evaluate to strings before the call). Result: 12/12 parity — dc7aa709's 8-entry stopgap alignment holds; this pin keeps it honest until the real fix (mcp_tree links sx_primitives) lands in the hosts lane. Test-only. - 2026-07-04 — Section B: env-parity audit + ledger. Probed a fresh
sx_serverover the epoch protocol (deps-check+ live eval). Confirmed runner-only drift:values/call-with-values(run_tests.ml:1131/1140),contains-char?(rt.ml:728 + rt.js:85),trim-right(JS runner only — absent even from the OCaml runner),sha3-256(rt.ml:745 + rt.js:88). Consequence verified live:(canonical-serialize 42)on the server →Undefined symbol: contains-char?(content addressing broken for ANY number outside the runners). Worse than the finding: BOTH runners'sha3-256are FAKE stubs (OCaml usesHashtbl.hash!) while production has realcrypto-sha3-256— every CID computed in tests differs from production CIDs. Authoredscripts/test-env-parity.shas a bidirectional ledger: MUST_HAVE regressions fail; a KNOWN_DRIFT binding appearing also fails (forces ledger + consequence-pin update when W5/W7/W12 land fixes). 7/7 green. Test-only. - 2026-07-04 — S4 error-page-cache pin (item A.7) — section A COMPLETE.
Extended
scripts/test-protocol-gate.shwith an HTTP-mode case: freshsx_server.exe --http <random-port>(timeout-bounded, own PID killed at end), GET the same nonexistent path twice, assert BOTH requests re-render (2[sx-http]lines — pre-fix the 2nd was cache-served at 0.0005s) and the[cache] … error page, not cachedis_err gate line appears. Findings from prototyping: standalone worktree renders ALL docs pages as soft error pages (no content), so a positive "real page IS cached" control is not assertable here — documented in the script; startup takes ~12-15s (poll loop, 40s budget). 5/5 protocol-gate green + 267/0 sx pins. Test-only. - 2026-07-04 — C1/C1b command-channel pins (item A.6). These are
protocol-level, not .sx-suite pins: authored
scripts/test-protocol-gate.sh— each case spawns its OWN timeout-boundedsx_server.exe(no shared process touched) and asserts three things: an(error N "Malformed command line: ...")response is emitted, the follow-up epoch still evaluates (process survived), and noFatal errorescapes / exit is clean. Cases: C1 unterminated list (exact review repro), C1 plain-garbage line, C1b non-ASCII byte (café), plus a well-formed control session. 4/4 green. The script is deliberately structured to grow into section E's fuzz suite (C3–C7). Test-only. - 2026-07-04 — crit-2 non-vacuous pin (item A.5). The original bug's
signature — handler value becomes the WHOLE program result, discarding
every outer frame including the covering test's own assert — means a
plain
(assert= repro expected)pin would pass vacuously on regression. Added suitegate-crit2-signal-return-kontwith a side-effect sentinel: test 1 runs both repros (("outer" 43 "end")list shape +raise-continuable→ 143) thenset!s a top-level flag; test 2 independently asserts the flag — if the continuation is ever dropped again, test 1 "passes" but test 2 fails loudly. Third test pins the exact shipped-test expr (51). Verified both repro shapes live via sx_eval first. 267 passed / 0 failed. Test-only. - 2026-07-03 — K49 void-elements pin (item A.4) + regen-drift DISCOVERY.
Corrected the checklist label first: K49 is "five void elements
unrenderable" (core.md:335), not the depth guard (that's K16, OPEN). Added
suite
gate-K49-void-elements-renderable(3 tests): specHTML_TAGScontains all five;(render-to-html '(base :href "x") (make-env))→<base href="x" />; all five render self-closing. Runner-env gotchas:current-env/symbolare not bound in run_tests — use(make-env)and literal quoted forms. Discovery: the first draft pinned via the runner's nativerender-htmland FAILED —hosts/ocaml/lib/sx_render.ml(generated) was never regenerated after dc7aa709's spec fix, so the native render path still errors on the five tags. Recorded under Blocked; live evidence for F13 (regen-diff gate). 264 passed / 0 failed. Test-only. - 2026-07-03 — K09/K11/K39 W5 special-form pins (item A.3). Three suites
added to
spec/tests/test-gate-pins.sx:gate-K09-longhand-unquote-splicing(R7RS longhand(unquote-splicing X)now splices, incl. empty-list case; shorthand still works),gate-K11-guard-reraise-forgeable(a body/clause value shaped like(list '__guard-reraise__ X)is returned as data, not misread as a re-raise — sentinel is now gensym'd),gate-K39-do-iife-head((do ((fn (x) x) 5) 99)→ 99, not a misparsed do-loop — exact core.md repro). Gotchas hit and fixed: quasiquoted bare idents are symbols not strings, andassert=compares with=(notequal?, which returns false on these spliced lists). 261 passed / 0 failed under OCaml run_tests. Test-only. - 2026-07-03 — K20 contains?-dict pin (item A.2). Mapped K-codes by
core.md severity order (K17 append!, K18 expt, K19 harness-drift, K20
contains?-dict). Added suite
gate-K20-contains-dicttospec/tests/test-gate-pins.sx(4 tests): present dict key → true, missing key → false, list membership unchanged, string substring unchanged. Repro from core.md ("(contains? {:a 1} :a) threwcontains?: 2 args"). 8/8 green across both suites under OCaml run_tests. Test-only. - 2026-07-03 — K18 expt-overflow pin (item A.1). Bootstrapped this briefing
from PLAN.md §W14 (the referenced file did not exist yet). Added
spec/tests/test-gate-pins.sxwith suitegate-K18-expt-overflow(4 tests): small exponents stay exact (2^0=1,2^10=1024),2^62 > 0(no negative 63-bit wrap),2^100 > 0(no wrap-to-zero),2^100is a number (float promotion). Verified 4/4 green under the OCaml run_tests kernel. Test-only.
Blocked
- K49 native path — sx_render.ml regen drift (found 2026-07-03 while
pinning A.4):
dc7aa709fixed HTML_TAGS inspec/render.sxbut never re-ranhosts/ocaml/bootstrap_render.py, so the generatedhosts/ocaml/lib/sx_render.mlstill carries a stalehtml_tags_listwithout area/base/embed/param/track. The runner's nativerender-htmlconvenience (and any native fast-path render) therefore STILL throwsUndefined symbol: base— dc7aa709's "verified on the native binary" claim did not cover this path. Fix = regen (hosts lane, semantics-adjacent — out of scope for this test-only loop). This is a live instance of F13 (regen-diff CI gate, section-B/D territory): a regen-diff check would have caught it at commit time. The K49 pin covers the spec side only; when the regen lands, extend the suite withrender-html-path assertions.