Files
rose-ash/plans/agent-briefings/sx-gate-loop.md
giles 8181421cf1 scripts: forge-driven fix-loop launcher (git→gitea→agentic→tmux)
First live test of the sx-forge technology driving a real work session:
- sx-fix-up.sh <forge-agent> <briefing.md>: reads the agent's briefing FROM
  the rose-ash/sx-review forge (agentic-sx branch), materialises a git
  worktree + branch (loops/sx-<slug>), and spins up a tmux+claude session
  briefed from the forge. Commits are LOCAL by default (no push).
- sx-fix-down.sh [--clean]: stop the sx-fix session; --clean removes worktrees.
- plans/agent-briefings/sx-gate-loop.md: W14 (test gate) briefing — the safe
  first payload (test-only, cannot regress the 5762p/274f baseline), scoped
  commit-no-push with hard guardrails.

Verified live: launcher read the W14 briefing from the forge, created worktree
/root/rose-ash-loops/sx-ws-w14 on loops/sx-ws-w14, booted claude, and the agent
picked up the briefing. Watch: tmux a -t sx-fix. Note: MCP servers need /mcp
auth in a fresh worktree (agent works via Bash meanwhile).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-03 22:27:47 +00:00

4.1 KiB

sx-gate loop — W14 test gate (first live test of git→gitea→agentic→tmux)

Forge agent: agents/ws-W14 in the rose-ash/sx-review forge (git-sx/gitea-sx/agentic-sx). Goal (from the forge briefing): make the verification infrastructure trustworthy — runner env == production env, a WASM corpus runner, harness honesty, and pinning tests for the fixes already landed. This is W14 in plans/sx-review/PLAN.md (read that section — it lists the findings). Findings: C0b C9 C21 C22 C23 C3 C4 C5 C6 C7 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 F12 K19 K104.

Why this workstream first

The review's prime directive: no semantic fix should merge before its pinning test + a working gate exist, because the verification infra currently can't tell you whether a fix works. W14 produces that infra. It changes no language semantics, so it cannot regress the 5762p/274f baseline — the ideal first payload while we test the agentic launch technology.

Hard guardrails (this is a monitored test loop)

  • Commit locally, do NOT push. No git push at all. (This is a test; the maintainer reviews before anything reaches origin.)
  • Stay in W14 scope — tests, runners, harness, gate tooling. Do NOT edit spec/*.sx, hosts/ocaml/lib/*.ml, or any language semantics. If a task tempts you toward semantics, skip it and note why in the Progress log.
  • Never pkill sx_server (shared binary). Bound every sx_server/build/test with timeout.
  • You are on branch loops/sx-gate in worktree /root/rose-ash-loops/sx-gate. Build/test here only.
  • If the OCaml build or full suite is involved, compare against the recorded baseline 5762 passed / 274 failed (fail set is the 273 hs-* + 1 r7rs radix; see PLAN W14/F10).

One iteration per fire — pick the first unchecked [ ], implement, test, commit (no push),

tick the box, prepend one dated line to the Progress log, then stop.

  • Pin the dc7aa709 quick-wins batch. Add regression tests (spec/tests/ or a new suite) that lock in the fixes that currently have none: K09 unquote-splicing longhand splices; K11 guard re-raise sentinel is unforgeable ((guard (e (true (list 'quoted x))) ...) returns the list); K18 (expt 2 100) is a float not 0; K20 (contains? {:a 1} :a) is true; K39 (do ((fn (x) x) 5) 99) → 99; K49 the five void elements render. (K02 is already non-vacuously covered.) Confirm they pass on the current binary.
  • Pin C1/C1b/S4 at the host level (a small OCaml or shell test): a malformed command line returns an error response and the process survives; an error page is not cached.
  • WASM corpus runner (F2). Stand up a Node harness that runs a curated spec/tests subset against the shipped WASM kernel (seed: the conformance lane's run_wasm.js pattern, referenced in PLAN). Curated subset, not the full 6k (js_of_ocaml is ~24s/test — see F18). Wire it as a script.
  • Harness honesty (C22/K104): make spec/harness.sx log the IO call before invoking the mock so a throwing mock is recorded. Add a test that a throwing mock leaves a log entry.
  • Runner-vs-prod env audit (F7/K42): list every binding that exists only in run_tests.ml but not the production kernel env (values/call-with-values are the known ones). Write the audit to plans/sx-review/runner-env-gap.md. (Fixing them is later; the audit is the W14 task.)
  • Protocol fuzz suite (C3/C4/C5/C6): a bounded test that feeds the epoch loop malformed lines ((epoch), (epoch foo), stray (io-response …), two-exprs-per-line) and asserts the process never dies and responses stay correctly tagged.
  • hs-upstream skip-list (F10/F18): make the native runner's 272 hs-* failures a skip-list so a red FAIL column means something. Record the count moved.

Progress log (newest first)

  • (none yet — first fire will add the first entry)

Recording back to the forge

After each commit, note the sha here; the maintainer (or a later step) records it as a test-kind commit on agents/ws-W14 in the forge so the program stays the system of record.