Files
rose-ash/plans/agent-briefings/sx-gate-loop.md
giles f09368e1c2 W14: pin K18 expt-overflow float-promotion (test-only) + bootstrap gate briefing
The dc7aa709 quick-wins batch fixed `expt`'s silent 63-bit int wrap (now
promotes to float like +/*) but shipped no pinning test — a regression would
pass silently. Add spec/tests/test-gate-pins.sx suite gate-K18-expt-overflow
(4 tests, minimal reprs from plans/sx-review/core.md): small exponents exact,
2^62 and 2^100 do not wrap, 2^100 is a float. 4/4 green under OCaml run_tests.

Also bootstraps plans/agent-briefings/sx-gate-loop.md (the loop's own briefing,
absent until now) with the W14 checklist derived from PLAN.md §W14.

Test-only: no semantics edits, no push.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-07-03 22:33:22 +00:00

4.2 KiB
Raw Blame History

W14 — Test gate & conformance infrastructure loop

Forge agent ws-W14. Role: build out W14 from the SX review remediation plan (plans/sx-review/PLAN.md, §"W14. Test gate & conformance infrastructure") — the enabler that makes every other fix verifiable. One checklist item per fire.

You are on branch loops/sx-ws-w14, worktree /root/rose-ash-loops/sx-ws-w14.

Hard guardrails (read every fire)

  • TEST-ONLY. No semantics edits. Do NOT touch spec/evaluator.sx, spec/primitives.sx, spec/parser.sx, spec/render.sx, the OCaml kernel, or any host runtime. W14 pins behavior with tests and productionizes the test/runner surface; the actual fixes are other workstreams (W1W12). A pin that fails means the finding regressed — do NOT relax the assertion, record it as a blocker.
  • NO PUSH. Commit locally on loops/sx-ws-w14 only. Never push; never touch main or architecture.
  • .sx files: use sx-tree MCP tools only (a hook blocks Read/Write/Edit on .sx). sx_write_file takes params file and source (NOT content — a wrong key yields a yojson … got null error and no write). .md/.sh/.ml files: normal tools are fine.
  • Never pkill/kill sx_server — sibling loops share the binary. Bound every run with timeout (e.g. timeout 300 …); if it hangs, let the timeout end it.
  • One item per fire, then stop. No batching.

Per-iteration procedure

  1. Pick the first unchecked [ ] in the checklist.
  2. Implement (test file or runner/harness change), lifting minimal repros from the review lane files (plans/sx-review/{core,hosts,conformance}.md) — they are a ready-made corpus of confirmed reprs.
  3. Build + run the affected tests: sx_build (target ocaml) then timeout 300 ./hosts/ocaml/_build/default/bin/run_tests.exe <test-name> to run a single file. New spec/tests/test-*.sx files are auto-discovered.
  4. Confirm green (a pin must PASS on current HEAD — the fix already landed).
  5. Commit locally: git add -A && git commit with a W14: prefix.
  6. Tick the box, prepend one dated line to the Progress log, stop.

Checklist

A. Test-debt pins — dc7aa709's landed fixes shipped without regression tests

Pin each confirmed-and-fixed finding with a minimal repro. Add suites to spec/tests/test-gate-pins.sx (one defsuite per finding).

  • K18 [W7] — expt overflow now float-promotes (no 63-bit wrap)
  • K20 [W7] — identify the landed W7 fix and pin it
  • K09/K11/K39 [W5] — landed special-form fixes, pin each
  • K49 [W8] — render depth/cycle guard (infinite recursive component)
  • crit-2 [W1] — signal-return frame key (verify the pin is non-vacuous)
  • C1/C1b [W3] — HTTP-mode concurrency fixes, pin
  • S4 [conformance] — housekeeping repro, pin

B. Runner/production env unification

  • Audit runner-only bindings (values/call-with-values F7/K42, JS fake sha3/equal?/apply/env-set! shims JS5) — inventory + failing pin that a fresh sx_server reproduces the drift

C. Harness honesty

  • K19 — MCP mcp_tree.ml harness primitive table drift vs sx_primitives (parity test)
  • C22/K104 — harness logs IO before invoking the mock (throwing-mock pin)
  • C21 — real perform/suspend mode in harness
  • C23 — adapter-dom render-output tests

D. WASM corpus runner

  • F2 — promote conformance's run_wasm.js prototype into CI

E. Epoch-loop protocol fuzz + skip-list

  • C3/C4/C5/C6/C7 — epoch protocol fuzz suite
  • F10 — hs-upstream skip-list so browser-only FAILs mean something
  • C9 — empty suite label

F. Differential battery

  • F8 — cross-host differential battery (same source, all hosts agree)

Progress log (newest first)

  • 2026-07-03 — K18 expt-overflow pin (item A.1). Bootstrapped this briefing from PLAN.md §W14 (the referenced file did not exist yet). Added spec/tests/test-gate-pins.sx with suite gate-K18-expt-overflow (4 tests): small exponents stay exact (2^0=1, 2^10=1024), 2^62 > 0 (no negative 63-bit wrap), 2^100 > 0 (no wrap-to-zero), 2^100 is a number (float promotion). Verified 4/4 green under the OCaml run_tests kernel. Test-only.

Blocked

  • (none)