diff --git a/plans/agent-briefings/sx-gate-loop.md b/plans/agent-briefings/sx-gate-loop.md new file mode 100644 index 00000000..5a0d0797 --- /dev/null +++ b/plans/agent-briefings/sx-gate-loop.md @@ -0,0 +1,88 @@ +# W14 — Test gate & conformance infrastructure loop + +Forge agent **ws-W14**. Role: build out **W14** from the SX review remediation plan +(`plans/sx-review/PLAN.md`, §"W14. Test gate & conformance infrastructure") — +*the enabler that makes every other fix verifiable*. One checklist item per fire. + +You are on branch `loops/sx-ws-w14`, worktree `/root/rose-ash-loops/sx-ws-w14`. + +## Hard guardrails (read every fire) + +- **TEST-ONLY.** No semantics edits. Do NOT touch `spec/evaluator.sx`, + `spec/primitives.sx`, `spec/parser.sx`, `spec/render.sx`, the OCaml kernel, + or any host runtime. W14 pins behavior with tests and productionizes the + *test/runner* surface; the actual fixes are other workstreams (W1–W12). + A pin that *fails* means the finding regressed — do NOT relax the assertion, + record it as a blocker. +- **NO PUSH.** Commit locally on `loops/sx-ws-w14` only. Never push; never touch + `main` or `architecture`. +- **`.sx` files: use `sx-tree` MCP tools only** (a hook blocks Read/Write/Edit + on `.sx`). `sx_write_file` takes params **`file`** and **`source`** (NOT + `content` — a wrong key yields a `yojson … got null` error and no write). + `.md`/`.sh`/`.ml` files: normal tools are fine. +- **Never `pkill`/`kill` `sx_server`** — sibling loops share the binary. Bound + every run with `timeout` (e.g. `timeout 300 …`); if it hangs, let the timeout end it. +- **One item per fire, then stop.** No batching. + +## Per-iteration procedure + +1. Pick the first unchecked `[ ]` in the checklist. +2. Implement (test file or runner/harness change), lifting minimal repros from + the review lane files (`plans/sx-review/{core,hosts,conformance}.md`) — they + are a ready-made corpus of confirmed reprs. +3. Build + run the affected tests: + `sx_build` (target ocaml) then + `timeout 300 ./hosts/ocaml/_build/default/bin/run_tests.exe ` + to run a single file. New `spec/tests/test-*.sx` files are auto-discovered. +4. Confirm green (a pin must PASS on current HEAD — the fix already landed). +5. Commit locally: `git add -A && git commit` with a `W14:` prefix. +6. Tick the box, prepend one dated line to the Progress log, stop. + +## Checklist + +### A. Test-debt pins — dc7aa709's landed fixes shipped without regression tests +Pin each confirmed-and-fixed finding with a minimal repro. Add suites to +`spec/tests/test-gate-pins.sx` (one `defsuite` per finding). + +- [x] K18 [W7] — `expt` overflow now float-promotes (no 63-bit wrap) +- [ ] K20 [W7] — identify the landed W7 fix and pin it +- [ ] K09/K11/K39 [W5] — landed special-form fixes, pin each +- [ ] K49 [W8] — render depth/cycle guard (infinite recursive component) +- [ ] crit-2 [W1] — signal-return frame key (verify the pin is non-vacuous) +- [ ] C1/C1b [W3] — HTTP-mode concurrency fixes, pin +- [ ] S4 [conformance] — housekeeping repro, pin + +### B. Runner/production env unification +- [ ] Audit runner-only bindings (`values`/`call-with-values` F7/K42, JS + fake sha3/equal?/apply/env-set! shims JS5) — inventory + failing pin + that a fresh `sx_server` reproduces the drift + +### C. Harness honesty +- [ ] K19 — MCP `mcp_tree.ml` harness primitive table drift vs `sx_primitives` + (parity test) +- [ ] C22/K104 — harness logs IO *before* invoking the mock (throwing-mock pin) +- [ ] C21 — real perform/suspend mode in harness +- [ ] C23 — adapter-dom render-output tests + +### D. WASM corpus runner +- [ ] F2 — promote conformance's `run_wasm.js` prototype into CI + +### E. Epoch-loop protocol fuzz + skip-list +- [ ] C3/C4/C5/C6/C7 — epoch protocol fuzz suite +- [ ] F10 — hs-upstream skip-list so browser-only FAILs mean something +- [ ] C9 — empty suite label + +### F. Differential battery +- [ ] F8 — cross-host differential battery (same source, all hosts agree) + +## Progress log (newest first) + +- 2026-07-03 — **K18 expt-overflow pin (item A.1)**. Bootstrapped this briefing + from PLAN.md §W14 (the referenced file did not exist yet). Added + `spec/tests/test-gate-pins.sx` with suite `gate-K18-expt-overflow` (4 tests): + small exponents stay exact (`2^0=1`, `2^10=1024`), `2^62 > 0` (no negative + 63-bit wrap), `2^100 > 0` (no wrap-to-zero), `2^100` is a number (float + promotion). Verified 4/4 green under the OCaml run_tests kernel. Test-only. + +## Blocked +- (none) diff --git a/spec/tests/test-gate-pins.sx b/spec/tests/test-gate-pins.sx new file mode 100644 index 00000000..a2de6ede --- /dev/null +++ b/spec/tests/test-gate-pins.sx @@ -0,0 +1,33 @@ +;; ========================================================================== +;; test-gate-pins.sx — W14 regression pins for dc7aa709's landed fixes +;; +;; The quick-wins batch (commit dc7aa709) landed real semantics fixes but +;; shipped WITHOUT pinning tests, so a regression would pass silently. This +;; file pins each confirmed-and-fixed finding with a minimal repro lifted +;; from the review lane files (plans/sx-review/*.md). One suite per finding. +;; +;; TEST-ONLY: no semantics edits. If a pin fails, the fix regressed — do NOT +;; relax the assertion; investigate the evaluator/primitive change. +;; ========================================================================== + +;; -------------------------------------------------------------------------- +;; K18 [W7, high] expt silently wrapped at 63-bit int — now promotes to float +;; like +/*. Repro (core.md): (expt 2 62) -> -4611686018427387904 (wrapped); +;; (expt 2 100) -> 0. Fixed: both are positive floats. +;; -------------------------------------------------------------------------- +(defsuite + "gate-K18-expt-overflow" + (deftest + "small integer exponents stay exact" + (do + (assert= (expt 2 0) 1) + (assert= (expt 2 10) 1024))) + (deftest + "expt 2^62 does not wrap to a negative int" + (assert (> (expt 2 62) 0))) + (deftest + "expt 2^100 does not wrap to zero" + (assert (> (expt 2 100) 0))) + (deftest + "expt 2^100 promotes to float" + (assert (number? (expt 2 100)))))