Files
rose-ash/plans/agent-briefings/sx-gate-loop.md
giles cbdde5fe63 W14: pin crit-2 signal-return kont non-vacuously (test-only)
crit-2's failure mode discards every frame outside the signal site —
including the covering test's own assert — which is why the shipped test
"signal returns handler value to call site" passed vacuously pre-fix. A
plain assert pin would inherit that vacuity on regression.

Add suite gate-crit2-signal-return-kont with a side-effect sentinel: test 1
runs the core.md repros ((list "outer" (handler-bind ... (+ 1
(signal-condition 5))) "end") -> ("outer" 43 "end"); raise-continuable ->
143) then set!s a top-level flag; test 2 independently asserts the flag, so
a dropped continuation fails loudly even though test 1 would "pass". Third
test pins the shipped-test expression (51). 267 passed / 0 failed under
OCaml run_tests.

Test-only: no semantics edits, no push.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-04 00:06:46 +00:00

8.2 KiB
Raw Blame History

W14 — Test gate & conformance infrastructure loop

Forge agent ws-W14. Role: build out W14 from the SX review remediation plan (plans/sx-review/PLAN.md, §"W14. Test gate & conformance infrastructure") — the enabler that makes every other fix verifiable. One checklist item per fire.

You are on branch loops/sx-ws-w14, worktree /root/rose-ash-loops/sx-ws-w14.

Hard guardrails (read every fire)

  • TEST-ONLY. No semantics edits. Do NOT touch spec/evaluator.sx, spec/primitives.sx, spec/parser.sx, spec/render.sx, the OCaml kernel, or any host runtime. W14 pins behavior with tests and productionizes the test/runner surface; the actual fixes are other workstreams (W1W12). A pin that fails means the finding regressed — do NOT relax the assertion, record it as a blocker.
  • NO PUSH. Commit locally on loops/sx-ws-w14 only. Never push; never touch main or architecture.
  • .sx files: use sx-tree MCP tools only (a hook blocks Read/Write/Edit on .sx). sx_write_file takes params file and source (NOT content — a wrong key yields a yojson … got null error and no write). .md/.sh/.ml files: normal tools are fine.
  • Never pkill/kill sx_server — sibling loops share the binary. Bound every run with timeout (e.g. timeout 300 …); if it hangs, let the timeout end it.
  • One item per fire, then stop. No batching.

Per-iteration procedure

  1. Pick the first unchecked [ ] in the checklist.
  2. Implement (test file or runner/harness change), lifting minimal repros from the review lane files (plans/sx-review/{core,hosts,conformance}.md) — they are a ready-made corpus of confirmed reprs.
  3. Build + run the affected tests: sx_build (target ocaml) then timeout 300 ./hosts/ocaml/_build/default/bin/run_tests.exe <test-name> to run a single file. New spec/tests/test-*.sx files are auto-discovered.
  4. Confirm green (a pin must PASS on current HEAD — the fix already landed).
  5. Commit locally: git add -A && git commit with a W14: prefix.
  6. Tick the box, prepend one dated line to the Progress log, stop.

Checklist

A. Test-debt pins — dc7aa709's landed fixes shipped without regression tests

Pin each confirmed-and-fixed finding with a minimal repro. Add suites to spec/tests/test-gate-pins.sx (one defsuite per finding).

  • K18 [W7] — expt overflow now float-promotes (no 63-bit wrap)
  • K20 [W7] — contains? now supports dict key membership
  • K09/K11/K39 [W5] — longhand unquote-splicing, guard sentinel gensym, do IIFE-head
  • K49 [W8] — five void elements (area/base/embed/param/track) renderable (spec side; native regen drift → see Blocked). NB: the depth/cycle guard is K16 [W8], still OPEN — not a W14 pin target until its fix lands
  • crit-2 [W1] — signal-return kont pinned NON-VACUOUSLY (side-effect sentinel across two tests; a plain assert would inherit the vacuity)
  • C1/C1b [W3] — HTTP-mode concurrency fixes, pin
  • S4 [conformance] — housekeeping repro, pin

B. Runner/production env unification

  • Audit runner-only bindings (values/call-with-values F7/K42, JS fake sha3/equal?/apply/env-set! shims JS5) — inventory + failing pin that a fresh sx_server reproduces the drift

C. Harness honesty

  • K19 — MCP mcp_tree.ml harness primitive table drift vs sx_primitives (parity test)
  • C22/K104 — harness logs IO before invoking the mock (throwing-mock pin)
  • C21 — real perform/suspend mode in harness
  • C23 — adapter-dom render-output tests

D. WASM corpus runner

  • F2 — promote conformance's run_wasm.js prototype into CI

E. Epoch-loop protocol fuzz + skip-list

  • C3/C4/C5/C6/C7 — epoch protocol fuzz suite
  • F10 — hs-upstream skip-list so browser-only FAILs mean something
  • C9 — empty suite label

F. Differential battery

  • F8 — cross-host differential battery (same source, all hosts agree)

Progress log (newest first)

  • 2026-07-04 — crit-2 non-vacuous pin (item A.5). The original bug's signature — handler value becomes the WHOLE program result, discarding every outer frame including the covering test's own assert — means a plain (assert= repro expected) pin would pass vacuously on regression. Added suite gate-crit2-signal-return-kont with a side-effect sentinel: test 1 runs both repros (("outer" 43 "end") list shape + raise-continuable → 143) then set!s a top-level flag; test 2 independently asserts the flag — if the continuation is ever dropped again, test 1 "passes" but test 2 fails loudly. Third test pins the exact shipped-test expr (51). Verified both repro shapes live via sx_eval first. 267 passed / 0 failed. Test-only.
  • 2026-07-03 — K49 void-elements pin (item A.4) + regen-drift DISCOVERY. Corrected the checklist label first: K49 is "five void elements unrenderable" (core.md:335), not the depth guard (that's K16, OPEN). Added suite gate-K49-void-elements-renderable (3 tests): spec HTML_TAGS contains all five; (render-to-html '(base :href "x") (make-env))<base href="x" />; all five render self-closing. Runner-env gotchas: current-env/symbol are not bound in run_tests — use (make-env) and literal quoted forms. Discovery: the first draft pinned via the runner's native render-html and FAILED — hosts/ocaml/lib/sx_render.ml (generated) was never regenerated after dc7aa709's spec fix, so the native render path still errors on the five tags. Recorded under Blocked; live evidence for F13 (regen-diff gate). 264 passed / 0 failed. Test-only.
  • 2026-07-03 — K09/K11/K39 W5 special-form pins (item A.3). Three suites added to spec/tests/test-gate-pins.sx: gate-K09-longhand-unquote-splicing (R7RS longhand (unquote-splicing X) now splices, incl. empty-list case; shorthand still works), gate-K11-guard-reraise-forgeable (a body/clause value shaped like (list '__guard-reraise__ X) is returned as data, not misread as a re-raise — sentinel is now gensym'd), gate-K39-do-iife-head ((do ((fn (x) x) 5) 99) → 99, not a misparsed do-loop — exact core.md repro). Gotchas hit and fixed: quasiquoted bare idents are symbols not strings, and assert= compares with = (not equal?, which returns false on these spliced lists). 261 passed / 0 failed under OCaml run_tests. Test-only.
  • 2026-07-03 — K20 contains?-dict pin (item A.2). Mapped K-codes by core.md severity order (K17 append!, K18 expt, K19 harness-drift, K20 contains?-dict). Added suite gate-K20-contains-dict to spec/tests/test-gate-pins.sx (4 tests): present dict key → true, missing key → false, list membership unchanged, string substring unchanged. Repro from core.md ("(contains? {:a 1} :a) threw contains?: 2 args"). 8/8 green across both suites under OCaml run_tests. Test-only.
  • 2026-07-03 — K18 expt-overflow pin (item A.1). Bootstrapped this briefing from PLAN.md §W14 (the referenced file did not exist yet). Added spec/tests/test-gate-pins.sx with suite gate-K18-expt-overflow (4 tests): small exponents stay exact (2^0=1, 2^10=1024), 2^62 > 0 (no negative 63-bit wrap), 2^100 > 0 (no wrap-to-zero), 2^100 is a number (float promotion). Verified 4/4 green under the OCaml run_tests kernel. Test-only.

Blocked

  • K49 native path — sx_render.ml regen drift (found 2026-07-03 while pinning A.4): dc7aa709 fixed HTML_TAGS in spec/render.sx but never re-ran hosts/ocaml/bootstrap_render.py, so the generated hosts/ocaml/lib/sx_render.ml still carries a stale html_tags_list without area/base/embed/param/track. The runner's native render-html convenience (and any native fast-path render) therefore STILL throws Undefined symbol: base — dc7aa709's "verified on the native binary" claim did not cover this path. Fix = regen (hosts lane, semantics-adjacent — out of scope for this test-only loop). This is a live instance of F13 (regen-diff CI gate, section-B/D territory): a regen-diff check would have caught it at commit time. The K49 pin covers the spec side only; when the regen lands, extend the suite with render-html-path assertions.