Files
rose-ash/plans/agent-briefings/sx-gate-loop.md
giles bf5c238218 W14: C23 adapter-dom render-output tests (test-only) — section C complete
The DOM adapter's only test file asserts membership predicates ("if is a
render form") — zero tests inspect what render-to-dom actually builds
(hosts.md C23), and the file is excluded from the runner as browser-only.

Discovery: the browser-only assumption is false for render output —
(import (web adapter-dom)) disk-resolves in the OCaml runner and
render-to-dom works against its mock DOM. Add
web/tests/test-adapter-dom-render.sx (8 tests) pinning the adapter's real
output contract (probed first): text renders as a nodeType-3 child text
node; when/map wrap output in a FRAGMENT child; if inlines the chosen
branch; attrs/class/id land on the element; voids have no children.

Auto-included in default runs — first render-output coverage of the
1512-line adapter in the standard gate. Remaining depth (boolean attrs,
on-*/bind/ref/key, reactive attrs, hydration cursor) tracked on the
checklist. 254/0 standalone.

Test-only: no semantics edits, no push.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-04 02:55:26 +00:00

242 lines
15 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# W14 — Test gate & conformance infrastructure loop
Forge agent **ws-W14**. Role: build out **W14** from the SX review remediation plan
(`plans/sx-review/PLAN.md`, §"W14. Test gate & conformance infrastructure") —
*the enabler that makes every other fix verifiable*. One checklist item per fire.
You are on branch `loops/sx-ws-w14`, worktree `/root/rose-ash-loops/sx-ws-w14`.
## Hard guardrails (read every fire)
- **TEST-ONLY.** No semantics edits. Do NOT touch `spec/evaluator.sx`,
`spec/primitives.sx`, `spec/parser.sx`, `spec/render.sx`, the OCaml kernel,
or any host runtime. W14 pins behavior with tests and productionizes the
*test/runner* surface; the actual fixes are other workstreams (W1W12).
A pin that *fails* means the finding regressed — do NOT relax the assertion,
record it as a blocker.
- **NO PUSH.** Commit locally on `loops/sx-ws-w14` only. Never push; never touch
`main` or `architecture`.
- **`.sx` files: use `sx-tree` MCP tools only** (a hook blocks Read/Write/Edit
on `.sx`). `sx_write_file` takes params **`file`** and **`source`** (NOT
`content` — a wrong key yields a `yojson … got null` error and no write).
`.md`/`.sh`/`.ml` files: normal tools are fine.
- **Never `pkill`/`kill` `sx_server`** — sibling loops share the binary. Bound
every run with `timeout` (e.g. `timeout 300 …`); if it hangs, let the timeout end it.
- **One item per fire, then stop.** No batching.
## Per-iteration procedure
1. Pick the first unchecked `[ ]` in the checklist.
2. Implement (test file or runner/harness change), lifting minimal repros from
the review lane files (`plans/sx-review/{core,hosts,conformance}.md`) — they
are a ready-made corpus of confirmed reprs.
3. Build + run the affected tests:
`sx_build` (target ocaml) then
`timeout 300 ./hosts/ocaml/_build/default/bin/run_tests.exe <test-name>`
to run a single file. New `spec/tests/test-*.sx` files are auto-discovered.
4. Confirm green (a pin must PASS on current HEAD — the fix already landed).
5. Commit locally: `git add -A && git commit` with a `W14:` prefix.
6. Tick the box, prepend one dated line to the Progress log, stop.
## Checklist
### A. Test-debt pins — dc7aa709's landed fixes shipped without regression tests
Pin each confirmed-and-fixed finding with a minimal repro. Add suites to
`spec/tests/test-gate-pins.sx` (one `defsuite` per finding).
- [x] K18 [W7] — `expt` overflow now float-promotes (no 63-bit wrap)
- [x] K20 [W7] — `contains?` now supports dict key membership
- [x] K09/K11/K39 [W5] — longhand `unquote-splicing`, guard sentinel gensym, `do` IIFE-head
- [x] K49 [W8] — five void elements (area/base/embed/param/track) renderable
(spec side; native regen drift → see Blocked). NB: the depth/cycle guard
is K16 [W8], still OPEN — not a W14 pin target until its fix lands
- [x] crit-2 [W1] — signal-return kont pinned NON-VACUOUSLY (side-effect
sentinel across two tests; a plain assert would inherit the vacuity)
- [x] C1/C1b [W3] — command-channel crash guards pinned
(`scripts/test-protocol-gate.sh`, seed for section E's fuzz suite)
- [x] S4 [hosts] — soft error pages not cached (HTTP-mode pin in
`scripts/test-protocol-gate.sh`; NB S4 lives in hosts.md, not
conformance — "housekeeping" was a mislabel from F-15's tag)
### B. Runner/production env unification
- [x] Audit runner-only bindings — inventory + bidirectional ledger in
`scripts/test-env-parity.sh` (KNOWN_DRIFT: values, call-with-values,
contains-char?, trim-right, sha3-256; consequence pin:
canonical-serialize broken on server; BOTH runners' sha3-256 are FAKE
stubs → test CIDs ≠ production CIDs)
### C. Harness honesty
- [x] K19 — harness/runtime parity pinned (`scripts/test-harness-parity.sh`:
drives mcp_tree sx_eval over JSON-RPC vs fresh sx_server over epoch,
12-probe battery from the finding, errors compared by message)
- [x] C22/K104 — FIXED harness (spec/harness.sx make-interceptor: log entry
appended before the mock runs, :result updated via dict-set!) + 3 pins
- [x] C21 — BUILT `harness-run-perform` (spec/harness.sx): drives real CEK
suspend/resume, services performs from session mocks, C22-style
logging; 5 pins incl. the S10 map-over-perform probe (CEK keeps all
elements — the drop class is serving-JIT-side). Runner-only (needs
cek-* driver bindings)
- [x] C23 — adapter-dom render-output tests
(`web/tests/test-adapter-dom-render.sx`, 8 tests vs runner mock DOM;
follow-up depth still open: boolean attrs, on-*/bind/ref/key,
reactive attrs, hydration cursor)
### D. WASM corpus runner
- [ ] F2 — promote conformance's `run_wasm.js` prototype into CI
### E. Epoch-loop protocol fuzz + skip-list
- [ ] C3/C4/C5/C6/C7 — epoch protocol fuzz suite
- [ ] F10 — hs-upstream skip-list so browser-only FAILs mean something
- [ ] C9 — empty suite label
### F. Differential battery
- [ ] F8 — cross-host differential battery (same source, all hosts agree)
## Progress log (newest first)
- 2026-07-04 — **C23 adapter-dom render-output tests (item C.4) — section C
COMPLETE**. Key discovery: the "browser-only" exclusion of adapter-dom
testing is FALSE for render output — `(import (web adapter-dom))`
disk-resolves in the OCaml runner and `render-to-dom` works against its
mock DOM (dom-* → host-* → mock elements). New
`web/tests/test-adapter-dom-render.sx` (8 tests): tag/text-child-node,
class+id, ordered children, void element, when-false empty FRAGMENT,
when-true branch-in-fragment, map N-children-in-fragment, if inlines
branch. Probed the adapter's output contract first (text = nodeType-3
child; control flow = FRAGMENT wrapper; if inlines). Auto-included in
default runs (not on the exclusion list) — first render-output coverage
of the 1512-line adapter in the standard gate. Follow-up depth (boolean
attrs, on-*/bind/ref/key, reactive, hydration) noted on the checklist.
254/0 standalone. Test-only.
- 2026-07-04 — **C21 perform-mode harness (item C.3)**. Added
`harness-run-perform` to spec/harness.sx (exported): drives
`make-cek-state`/`cek-step-loop`, services each
`(perform {:op X :args L})` suspension from the session's platform mocks
(entry logged before invocation, C22-consistent), `cek-resume`s with the
mock value, loops to terminal. Self-recursion via the `(self self …)`
pattern (avoids letrec-injection K06 territory). Extracted the arity
dispatch into shared `harness-invoke-mock`. 5 pins in
`gate-C21-perform-mode-harness` — notably the **S10 probe**: `(map (fn (u)
(perform …)) '("a" "b" "c"))` keeps ALL elements through 3 suspensions on
the CEK path, confirming the element-drop class is serving-JIT-side, not
CEK. Caveat noted in the docstring: needs the runner's cek-* driver
bindings (absent on bare sx_server/MCP — the env-parity theme again).
290/0. Test-infra-only.
- 2026-07-04 — **C22/K104 throwing-mock fix + pins (item C.2)**. First
actual FIX of the loop — in scope because spec/harness.sx is W14-owned
test infrastructure (PLAN approach item 4 assigns "log IO before invoking
the mock" to W14). TDD: reproduced pre-fix (caught error, 0 log entries),
then restructured `make-interceptor` to append the entry BEFORE the mock
runs (`:result nil` while pending, `dict-set!` in place on return).
Verified: throwing mock leaves entry, happy path updates result, mixed
sequence counts all 3. Added suite `gate-C22-throwing-mock-logged`
(3 tests). Harness self-suite (15) + test-relate-picker (only other
harness consumer) green; 285/0 pins run. Tooling notes: replace/insert
tools take `new_source` (not `replacement`); find_all paths still
disagree with read_subtree/replace_node on define-library files —
sx_write_file remains the reliable route. Test-infra-only.
- 2026-07-04 — **K19 harness-parity pin (item C.1)**. Authored
`scripts/test-harness-parity.sh`: drives `mcp_tree.exe` `sx_eval` with
raw JSON-RPC over stdio and a fresh `sx_server.exe` over the epoch
protocol, running the finding's exact 12-probe battery (empty?/get/
split/equal?/contains?/keyword-name/char-code/parse-number) through both
and failing on ANY divergence. Errors normalized to their inner message
so identical failures compare equal (`keyword-name :kw` errors the same
way on both — keywords evaluate to strings before the call). Result:
12/12 parity — dc7aa709's 8-entry stopgap alignment holds; this pin keeps
it honest until the real fix (mcp_tree links sx_primitives) lands in the
hosts lane. Test-only.
- 2026-07-04 — **Section B: env-parity audit + ledger**. Probed a fresh
`sx_server` over the epoch protocol (`deps-check` + live eval). Confirmed
runner-only drift: `values`/`call-with-values` (run_tests.ml:1131/1140),
`contains-char?` (rt.ml:728 + rt.js:85), `trim-right` (**JS runner only**
— absent even from the OCaml runner), `sha3-256` (rt.ml:745 + rt.js:88).
Consequence verified live: `(canonical-serialize 42)` on the server →
`Undefined symbol: contains-char?` (content addressing broken for ANY
number outside the runners). **Worse than the finding**: BOTH runners'
`sha3-256` are FAKE stubs (OCaml uses `Hashtbl.hash`!) while production
has real `crypto-sha3-256` — every CID computed in tests differs from
production CIDs. Authored `scripts/test-env-parity.sh` as a bidirectional
ledger: MUST_HAVE regressions fail; a KNOWN_DRIFT binding *appearing*
also fails (forces ledger + consequence-pin update when W5/W7/W12 land
fixes). 7/7 green. Test-only.
- 2026-07-04 — **S4 error-page-cache pin (item A.7) — section A COMPLETE**.
Extended `scripts/test-protocol-gate.sh` with an HTTP-mode case: fresh
`sx_server.exe --http <random-port>` (timeout-bounded, own PID killed at
end), GET the same nonexistent path twice, assert BOTH requests re-render
(2 `[sx-http]` lines — pre-fix the 2nd was cache-served at 0.0005s) and
the `[cache] … error page, not cached` is_err gate line appears. Findings
from prototyping: standalone worktree renders ALL docs pages as soft error
pages (no content), so a positive "real page IS cached" control is not
assertable here — documented in the script; startup takes ~12-15s (poll
loop, 40s budget). 5/5 protocol-gate green + 267/0 sx pins. Test-only.
- 2026-07-04 — **C1/C1b command-channel pins (item A.6)**. These are
protocol-level, not .sx-suite pins: authored
`scripts/test-protocol-gate.sh` — each case spawns its OWN timeout-bounded
`sx_server.exe` (no shared process touched) and asserts three things: an
`(error N "Malformed command line: ...")` response is emitted, the
follow-up epoch still evaluates (process survived), and no `Fatal error`
escapes / exit is clean. Cases: C1 unterminated list (exact review repro),
C1 plain-garbage line, C1b non-ASCII byte (`café`), plus a well-formed
control session. 4/4 green. The script is deliberately structured to grow
into section E's fuzz suite (C3C7). Test-only.
- 2026-07-04 — **crit-2 non-vacuous pin (item A.5)**. The original bug's
signature — handler value becomes the WHOLE program result, discarding
every outer frame *including the covering test's own assert* — means a
plain `(assert= repro expected)` pin would pass vacuously on regression.
Added suite `gate-crit2-signal-return-kont` with a **side-effect sentinel**:
test 1 runs both repros (`("outer" 43 "end")` list shape + `raise-continuable`
→ 143) then `set!`s a top-level flag; test 2 independently asserts the flag
— if the continuation is ever dropped again, test 1 "passes" but test 2
fails loudly. Third test pins the exact shipped-test expr (51). Verified
both repro shapes live via sx_eval first. 267 passed / 0 failed. Test-only.
- 2026-07-03 — **K49 void-elements pin (item A.4) + regen-drift DISCOVERY**.
Corrected the checklist label first: K49 is "five void elements
unrenderable" (core.md:335), not the depth guard (that's K16, OPEN). Added
suite `gate-K49-void-elements-renderable` (3 tests): spec `HTML_TAGS`
contains all five; `(render-to-html '(base :href "x") (make-env))`
`<base href="x" />`; all five render self-closing. Runner-env gotchas:
`current-env`/`symbol` are not bound in run_tests — use `(make-env)` and
literal quoted forms. **Discovery:** the first draft pinned via the
runner's native `render-html` and FAILED — `hosts/ocaml/lib/sx_render.ml`
(generated) was never regenerated after dc7aa709's spec fix, so the native
render path still errors on the five tags. Recorded under Blocked; live
evidence for F13 (regen-diff gate). 264 passed / 0 failed. Test-only.
- 2026-07-03 — **K09/K11/K39 W5 special-form pins (item A.3)**. Three suites
added to `spec/tests/test-gate-pins.sx`: `gate-K09-longhand-unquote-splicing`
(R7RS longhand `(unquote-splicing X)` now splices, incl. empty-list case;
shorthand still works), `gate-K11-guard-reraise-forgeable` (a body/clause
value shaped like `(list '__guard-reraise__ X)` is returned as data, not
misread as a re-raise — sentinel is now gensym'd), `gate-K39-do-iife-head`
(`(do ((fn (x) x) 5) 99)` → 99, not a misparsed do-loop — exact core.md
repro). Gotchas hit and fixed: quasiquoted bare idents are *symbols* not
strings, and `assert=` compares with `=` (not `equal?`, which returns false
on these spliced lists). 261 passed / 0 failed under OCaml run_tests. Test-only.
- 2026-07-03 — **K20 contains?-dict pin (item A.2)**. Mapped K-codes by
core.md severity order (K17 append!, K18 expt, K19 harness-drift, K20
contains?-dict). Added suite `gate-K20-contains-dict` to
`spec/tests/test-gate-pins.sx` (4 tests): present dict key → true, missing
key → false, list membership unchanged, string substring unchanged. Repro
from core.md ("(contains? {:a 1} :a) threw `contains?: 2 args`"). 8/8 green
across both suites under OCaml run_tests. Test-only.
- 2026-07-03 — **K18 expt-overflow pin (item A.1)**. Bootstrapped this briefing
from PLAN.md §W14 (the referenced file did not exist yet). Added
`spec/tests/test-gate-pins.sx` with suite `gate-K18-expt-overflow` (4 tests):
small exponents stay exact (`2^0=1`, `2^10=1024`), `2^62 > 0` (no negative
63-bit wrap), `2^100 > 0` (no wrap-to-zero), `2^100` is a number (float
promotion). Verified 4/4 green under the OCaml run_tests kernel. Test-only.
## Blocked
- **K49 native path — sx_render.ml regen drift** (found 2026-07-03 while
pinning A.4): dc7aa709 fixed HTML_TAGS in `spec/render.sx` but never re-ran
`hosts/ocaml/bootstrap_render.py`, so the generated
`hosts/ocaml/lib/sx_render.ml` still carries a stale `html_tags_list`
without area/base/embed/param/track. The runner's native `render-html`
convenience (and any native fast-path render) therefore STILL throws
`Undefined symbol: base` — dc7aa709's "verified on the native binary" claim
did not cover this path. Fix = regen (hosts lane, semantics-adjacent — out
of scope for this test-only loop). This is a live instance of **F13**
(regen-diff CI gate, section-B/D territory): a regen-diff check would have
caught it at commit time. The K49 pin covers the spec side only; when the
regen lands, extend the suite with `render-html`-path assertions.