The dc7aa709 quick-wins batch fixed `expt`'s silent 63-bit int wrap (now
promotes to float like +/*) but shipped no pinning test — a regression would
pass silently. Add spec/tests/test-gate-pins.sx suite gate-K18-expt-overflow
(4 tests, minimal reprs from plans/sx-review/core.md): small exponents exact,
2^62 and 2^100 do not wrap, 2^100 is a float. 4/4 green under OCaml run_tests.
Also bootstraps plans/agent-briefings/sx-gate-loop.md (the loop's own briefing,
absent until now) with the W14 checklist derived from PLAN.md §W14.
Test-only: no semantics edits, no push.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
89 lines
4.2 KiB
Markdown
89 lines
4.2 KiB
Markdown
# W14 — Test gate & conformance infrastructure loop
|
||
|
||
Forge agent **ws-W14**. Role: build out **W14** from the SX review remediation plan
|
||
(`plans/sx-review/PLAN.md`, §"W14. Test gate & conformance infrastructure") —
|
||
*the enabler that makes every other fix verifiable*. One checklist item per fire.
|
||
|
||
You are on branch `loops/sx-ws-w14`, worktree `/root/rose-ash-loops/sx-ws-w14`.
|
||
|
||
## Hard guardrails (read every fire)
|
||
|
||
- **TEST-ONLY.** No semantics edits. Do NOT touch `spec/evaluator.sx`,
|
||
`spec/primitives.sx`, `spec/parser.sx`, `spec/render.sx`, the OCaml kernel,
|
||
or any host runtime. W14 pins behavior with tests and productionizes the
|
||
*test/runner* surface; the actual fixes are other workstreams (W1–W12).
|
||
A pin that *fails* means the finding regressed — do NOT relax the assertion,
|
||
record it as a blocker.
|
||
- **NO PUSH.** Commit locally on `loops/sx-ws-w14` only. Never push; never touch
|
||
`main` or `architecture`.
|
||
- **`.sx` files: use `sx-tree` MCP tools only** (a hook blocks Read/Write/Edit
|
||
on `.sx`). `sx_write_file` takes params **`file`** and **`source`** (NOT
|
||
`content` — a wrong key yields a `yojson … got null` error and no write).
|
||
`.md`/`.sh`/`.ml` files: normal tools are fine.
|
||
- **Never `pkill`/`kill` `sx_server`** — sibling loops share the binary. Bound
|
||
every run with `timeout` (e.g. `timeout 300 …`); if it hangs, let the timeout end it.
|
||
- **One item per fire, then stop.** No batching.
|
||
|
||
## Per-iteration procedure
|
||
|
||
1. Pick the first unchecked `[ ]` in the checklist.
|
||
2. Implement (test file or runner/harness change), lifting minimal repros from
|
||
the review lane files (`plans/sx-review/{core,hosts,conformance}.md`) — they
|
||
are a ready-made corpus of confirmed reprs.
|
||
3. Build + run the affected tests:
|
||
`sx_build` (target ocaml) then
|
||
`timeout 300 ./hosts/ocaml/_build/default/bin/run_tests.exe <test-name>`
|
||
to run a single file. New `spec/tests/test-*.sx` files are auto-discovered.
|
||
4. Confirm green (a pin must PASS on current HEAD — the fix already landed).
|
||
5. Commit locally: `git add -A && git commit` with a `W14:` prefix.
|
||
6. Tick the box, prepend one dated line to the Progress log, stop.
|
||
|
||
## Checklist
|
||
|
||
### A. Test-debt pins — dc7aa709's landed fixes shipped without regression tests
|
||
Pin each confirmed-and-fixed finding with a minimal repro. Add suites to
|
||
`spec/tests/test-gate-pins.sx` (one `defsuite` per finding).
|
||
|
||
- [x] K18 [W7] — `expt` overflow now float-promotes (no 63-bit wrap)
|
||
- [ ] K20 [W7] — identify the landed W7 fix and pin it
|
||
- [ ] K09/K11/K39 [W5] — landed special-form fixes, pin each
|
||
- [ ] K49 [W8] — render depth/cycle guard (infinite recursive component)
|
||
- [ ] crit-2 [W1] — signal-return frame key (verify the pin is non-vacuous)
|
||
- [ ] C1/C1b [W3] — HTTP-mode concurrency fixes, pin
|
||
- [ ] S4 [conformance] — housekeeping repro, pin
|
||
|
||
### B. Runner/production env unification
|
||
- [ ] Audit runner-only bindings (`values`/`call-with-values` F7/K42, JS
|
||
fake sha3/equal?/apply/env-set! shims JS5) — inventory + failing pin
|
||
that a fresh `sx_server` reproduces the drift
|
||
|
||
### C. Harness honesty
|
||
- [ ] K19 — MCP `mcp_tree.ml` harness primitive table drift vs `sx_primitives`
|
||
(parity test)
|
||
- [ ] C22/K104 — harness logs IO *before* invoking the mock (throwing-mock pin)
|
||
- [ ] C21 — real perform/suspend mode in harness
|
||
- [ ] C23 — adapter-dom render-output tests
|
||
|
||
### D. WASM corpus runner
|
||
- [ ] F2 — promote conformance's `run_wasm.js` prototype into CI
|
||
|
||
### E. Epoch-loop protocol fuzz + skip-list
|
||
- [ ] C3/C4/C5/C6/C7 — epoch protocol fuzz suite
|
||
- [ ] F10 — hs-upstream skip-list so browser-only FAILs mean something
|
||
- [ ] C9 — empty suite label
|
||
|
||
### F. Differential battery
|
||
- [ ] F8 — cross-host differential battery (same source, all hosts agree)
|
||
|
||
## Progress log (newest first)
|
||
|
||
- 2026-07-03 — **K18 expt-overflow pin (item A.1)**. Bootstrapped this briefing
|
||
from PLAN.md §W14 (the referenced file did not exist yet). Added
|
||
`spec/tests/test-gate-pins.sx` with suite `gate-K18-expt-overflow` (4 tests):
|
||
small exponents stay exact (`2^0=1`, `2^10=1024`), `2^62 > 0` (no negative
|
||
63-bit wrap), `2^100 > 0` (no wrap-to-zero), `2^100` is a number (float
|
||
promotion). Verified 4/4 green under the OCaml run_tests kernel. Test-only.
|
||
|
||
## Blocked
|
||
- (none)
|