Pre-fix, a routing-failure page was stored in the HTTP response cache as
200 and served byte-identically to every later visitor until restart
(cold 2s -> warm 0.0005s). dc7aa709 made http_render_page return
(html, is_error) and gated cache insertion on `not is_err`.
Extend scripts/test-protocol-gate.sh with an HTTP-mode case: fresh
sx_server.exe --http on a random port (timeout-bounded, own child killed),
GET the same nonexistent path twice, assert both requests re-render (two
[sx-http] render lines) and the "[cache] ... error page, not cached" gate
line appears. Standalone-worktree caveat (all docs pages render as soft
error pages, so no positive cache control) documented in the script.
5/5 protocol-gate green; 267/0 sx gate pins. All seven section-A test-debt
pins now landed (K18, K20, K09/K11/K39, K49, crit-2, C1/C1b, S4).
Test-only: no semantics edits, no push.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
165 lines
9.8 KiB
Markdown
165 lines
9.8 KiB
Markdown
# W14 — Test gate & conformance infrastructure loop
|
||
|
||
Forge agent **ws-W14**. Role: build out **W14** from the SX review remediation plan
|
||
(`plans/sx-review/PLAN.md`, §"W14. Test gate & conformance infrastructure") —
|
||
*the enabler that makes every other fix verifiable*. One checklist item per fire.
|
||
|
||
You are on branch `loops/sx-ws-w14`, worktree `/root/rose-ash-loops/sx-ws-w14`.
|
||
|
||
## Hard guardrails (read every fire)
|
||
|
||
- **TEST-ONLY.** No semantics edits. Do NOT touch `spec/evaluator.sx`,
|
||
`spec/primitives.sx`, `spec/parser.sx`, `spec/render.sx`, the OCaml kernel,
|
||
or any host runtime. W14 pins behavior with tests and productionizes the
|
||
*test/runner* surface; the actual fixes are other workstreams (W1–W12).
|
||
A pin that *fails* means the finding regressed — do NOT relax the assertion,
|
||
record it as a blocker.
|
||
- **NO PUSH.** Commit locally on `loops/sx-ws-w14` only. Never push; never touch
|
||
`main` or `architecture`.
|
||
- **`.sx` files: use `sx-tree` MCP tools only** (a hook blocks Read/Write/Edit
|
||
on `.sx`). `sx_write_file` takes params **`file`** and **`source`** (NOT
|
||
`content` — a wrong key yields a `yojson … got null` error and no write).
|
||
`.md`/`.sh`/`.ml` files: normal tools are fine.
|
||
- **Never `pkill`/`kill` `sx_server`** — sibling loops share the binary. Bound
|
||
every run with `timeout` (e.g. `timeout 300 …`); if it hangs, let the timeout end it.
|
||
- **One item per fire, then stop.** No batching.
|
||
|
||
## Per-iteration procedure
|
||
|
||
1. Pick the first unchecked `[ ]` in the checklist.
|
||
2. Implement (test file or runner/harness change), lifting minimal repros from
|
||
the review lane files (`plans/sx-review/{core,hosts,conformance}.md`) — they
|
||
are a ready-made corpus of confirmed reprs.
|
||
3. Build + run the affected tests:
|
||
`sx_build` (target ocaml) then
|
||
`timeout 300 ./hosts/ocaml/_build/default/bin/run_tests.exe <test-name>`
|
||
to run a single file. New `spec/tests/test-*.sx` files are auto-discovered.
|
||
4. Confirm green (a pin must PASS on current HEAD — the fix already landed).
|
||
5. Commit locally: `git add -A && git commit` with a `W14:` prefix.
|
||
6. Tick the box, prepend one dated line to the Progress log, stop.
|
||
|
||
## Checklist
|
||
|
||
### A. Test-debt pins — dc7aa709's landed fixes shipped without regression tests
|
||
Pin each confirmed-and-fixed finding with a minimal repro. Add suites to
|
||
`spec/tests/test-gate-pins.sx` (one `defsuite` per finding).
|
||
|
||
- [x] K18 [W7] — `expt` overflow now float-promotes (no 63-bit wrap)
|
||
- [x] K20 [W7] — `contains?` now supports dict key membership
|
||
- [x] K09/K11/K39 [W5] — longhand `unquote-splicing`, guard sentinel gensym, `do` IIFE-head
|
||
- [x] K49 [W8] — five void elements (area/base/embed/param/track) renderable
|
||
(spec side; native regen drift → see Blocked). NB: the depth/cycle guard
|
||
is K16 [W8], still OPEN — not a W14 pin target until its fix lands
|
||
- [x] crit-2 [W1] — signal-return kont pinned NON-VACUOUSLY (side-effect
|
||
sentinel across two tests; a plain assert would inherit the vacuity)
|
||
- [x] C1/C1b [W3] — command-channel crash guards pinned
|
||
(`scripts/test-protocol-gate.sh`, seed for section E's fuzz suite)
|
||
- [x] S4 [hosts] — soft error pages not cached (HTTP-mode pin in
|
||
`scripts/test-protocol-gate.sh`; NB S4 lives in hosts.md, not
|
||
conformance — "housekeeping" was a mislabel from F-15's tag)
|
||
|
||
### B. Runner/production env unification
|
||
- [ ] Audit runner-only bindings (`values`/`call-with-values` F7/K42, JS
|
||
fake sha3/equal?/apply/env-set! shims JS5) — inventory + failing pin
|
||
that a fresh `sx_server` reproduces the drift
|
||
|
||
### C. Harness honesty
|
||
- [ ] K19 — MCP `mcp_tree.ml` harness primitive table drift vs `sx_primitives`
|
||
(parity test)
|
||
- [ ] C22/K104 — harness logs IO *before* invoking the mock (throwing-mock pin)
|
||
- [ ] C21 — real perform/suspend mode in harness
|
||
- [ ] C23 — adapter-dom render-output tests
|
||
|
||
### D. WASM corpus runner
|
||
- [ ] F2 — promote conformance's `run_wasm.js` prototype into CI
|
||
|
||
### E. Epoch-loop protocol fuzz + skip-list
|
||
- [ ] C3/C4/C5/C6/C7 — epoch protocol fuzz suite
|
||
- [ ] F10 — hs-upstream skip-list so browser-only FAILs mean something
|
||
- [ ] C9 — empty suite label
|
||
|
||
### F. Differential battery
|
||
- [ ] F8 — cross-host differential battery (same source, all hosts agree)
|
||
|
||
## Progress log (newest first)
|
||
|
||
- 2026-07-04 — **S4 error-page-cache pin (item A.7) — section A COMPLETE**.
|
||
Extended `scripts/test-protocol-gate.sh` with an HTTP-mode case: fresh
|
||
`sx_server.exe --http <random-port>` (timeout-bounded, own PID killed at
|
||
end), GET the same nonexistent path twice, assert BOTH requests re-render
|
||
(2 `[sx-http]` lines — pre-fix the 2nd was cache-served at 0.0005s) and
|
||
the `[cache] … error page, not cached` is_err gate line appears. Findings
|
||
from prototyping: standalone worktree renders ALL docs pages as soft error
|
||
pages (no content), so a positive "real page IS cached" control is not
|
||
assertable here — documented in the script; startup takes ~12-15s (poll
|
||
loop, 40s budget). 5/5 protocol-gate green + 267/0 sx pins. Test-only.
|
||
- 2026-07-04 — **C1/C1b command-channel pins (item A.6)**. These are
|
||
protocol-level, not .sx-suite pins: authored
|
||
`scripts/test-protocol-gate.sh` — each case spawns its OWN timeout-bounded
|
||
`sx_server.exe` (no shared process touched) and asserts three things: an
|
||
`(error N "Malformed command line: ...")` response is emitted, the
|
||
follow-up epoch still evaluates (process survived), and no `Fatal error`
|
||
escapes / exit is clean. Cases: C1 unterminated list (exact review repro),
|
||
C1 plain-garbage line, C1b non-ASCII byte (`café`), plus a well-formed
|
||
control session. 4/4 green. The script is deliberately structured to grow
|
||
into section E's fuzz suite (C3–C7). Test-only.
|
||
- 2026-07-04 — **crit-2 non-vacuous pin (item A.5)**. The original bug's
|
||
signature — handler value becomes the WHOLE program result, discarding
|
||
every outer frame *including the covering test's own assert* — means a
|
||
plain `(assert= repro expected)` pin would pass vacuously on regression.
|
||
Added suite `gate-crit2-signal-return-kont` with a **side-effect sentinel**:
|
||
test 1 runs both repros (`("outer" 43 "end")` list shape + `raise-continuable`
|
||
→ 143) then `set!`s a top-level flag; test 2 independently asserts the flag
|
||
— if the continuation is ever dropped again, test 1 "passes" but test 2
|
||
fails loudly. Third test pins the exact shipped-test expr (51). Verified
|
||
both repro shapes live via sx_eval first. 267 passed / 0 failed. Test-only.
|
||
- 2026-07-03 — **K49 void-elements pin (item A.4) + regen-drift DISCOVERY**.
|
||
Corrected the checklist label first: K49 is "five void elements
|
||
unrenderable" (core.md:335), not the depth guard (that's K16, OPEN). Added
|
||
suite `gate-K49-void-elements-renderable` (3 tests): spec `HTML_TAGS`
|
||
contains all five; `(render-to-html '(base :href "x") (make-env))` →
|
||
`<base href="x" />`; all five render self-closing. Runner-env gotchas:
|
||
`current-env`/`symbol` are not bound in run_tests — use `(make-env)` and
|
||
literal quoted forms. **Discovery:** the first draft pinned via the
|
||
runner's native `render-html` and FAILED — `hosts/ocaml/lib/sx_render.ml`
|
||
(generated) was never regenerated after dc7aa709's spec fix, so the native
|
||
render path still errors on the five tags. Recorded under Blocked; live
|
||
evidence for F13 (regen-diff gate). 264 passed / 0 failed. Test-only.
|
||
- 2026-07-03 — **K09/K11/K39 W5 special-form pins (item A.3)**. Three suites
|
||
added to `spec/tests/test-gate-pins.sx`: `gate-K09-longhand-unquote-splicing`
|
||
(R7RS longhand `(unquote-splicing X)` now splices, incl. empty-list case;
|
||
shorthand still works), `gate-K11-guard-reraise-forgeable` (a body/clause
|
||
value shaped like `(list '__guard-reraise__ X)` is returned as data, not
|
||
misread as a re-raise — sentinel is now gensym'd), `gate-K39-do-iife-head`
|
||
(`(do ((fn (x) x) 5) 99)` → 99, not a misparsed do-loop — exact core.md
|
||
repro). Gotchas hit and fixed: quasiquoted bare idents are *symbols* not
|
||
strings, and `assert=` compares with `=` (not `equal?`, which returns false
|
||
on these spliced lists). 261 passed / 0 failed under OCaml run_tests. Test-only.
|
||
- 2026-07-03 — **K20 contains?-dict pin (item A.2)**. Mapped K-codes by
|
||
core.md severity order (K17 append!, K18 expt, K19 harness-drift, K20
|
||
contains?-dict). Added suite `gate-K20-contains-dict` to
|
||
`spec/tests/test-gate-pins.sx` (4 tests): present dict key → true, missing
|
||
key → false, list membership unchanged, string substring unchanged. Repro
|
||
from core.md ("(contains? {:a 1} :a) threw `contains?: 2 args`"). 8/8 green
|
||
across both suites under OCaml run_tests. Test-only.
|
||
- 2026-07-03 — **K18 expt-overflow pin (item A.1)**. Bootstrapped this briefing
|
||
from PLAN.md §W14 (the referenced file did not exist yet). Added
|
||
`spec/tests/test-gate-pins.sx` with suite `gate-K18-expt-overflow` (4 tests):
|
||
small exponents stay exact (`2^0=1`, `2^10=1024`), `2^62 > 0` (no negative
|
||
63-bit wrap), `2^100 > 0` (no wrap-to-zero), `2^100` is a number (float
|
||
promotion). Verified 4/4 green under the OCaml run_tests kernel. Test-only.
|
||
|
||
## Blocked
|
||
- **K49 native path — sx_render.ml regen drift** (found 2026-07-03 while
|
||
pinning A.4): dc7aa709 fixed HTML_TAGS in `spec/render.sx` but never re-ran
|
||
`hosts/ocaml/bootstrap_render.py`, so the generated
|
||
`hosts/ocaml/lib/sx_render.ml` still carries a stale `html_tags_list`
|
||
without area/base/embed/param/track. The runner's native `render-html`
|
||
convenience (and any native fast-path render) therefore STILL throws
|
||
`Undefined symbol: base` — dc7aa709's "verified on the native binary" claim
|
||
did not cover this path. Fix = regen (hosts lane, semantics-adjacent — out
|
||
of scope for this test-only loop). This is a live instance of **F13**
|
||
(regen-diff CI gate, section-B/D territory): a regen-diff check would have
|
||
caught it at commit time. The K49 pin covers the spec side only; when the
|
||
regen lands, extend the suite with `render-html`-path assertions.
|