First live test of the sx-forge technology driving a real work session: - sx-fix-up.sh <forge-agent> <briefing.md>: reads the agent's briefing FROM the rose-ash/sx-review forge (agentic-sx branch), materialises a git worktree + branch (loops/sx-<slug>), and spins up a tmux+claude session briefed from the forge. Commits are LOCAL by default (no push). - sx-fix-down.sh [--clean]: stop the sx-fix session; --clean removes worktrees. - plans/agent-briefings/sx-gate-loop.md: W14 (test gate) briefing — the safe first payload (test-only, cannot regress the 5762p/274f baseline), scoped commit-no-push with hard guardrails. Verified live: launcher read the W14 briefing from the forge, created worktree /root/rose-ash-loops/sx-ws-w14 on loops/sx-ws-w14, booted claude, and the agent picked up the briefing. Watch: tmux a -t sx-fix. Note: MCP servers need /mcp auth in a fresh worktree (agent works via Bash meanwhile). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
4.1 KiB
sx-gate loop — W14 test gate (first live test of git→gitea→agentic→tmux)
Forge agent: agents/ws-W14 in the rose-ash/sx-review forge (git-sx/gitea-sx/agentic-sx).
Goal (from the forge briefing): make the verification infrastructure trustworthy — runner env
== production env, a WASM corpus runner, harness honesty, and pinning tests for the fixes already
landed. This is W14 in plans/sx-review/PLAN.md (read that section — it lists the findings).
Findings: C0b C9 C21 C22 C23 C3 C4 C5 C6 C7 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 F12 K19 K104.
Why this workstream first
The review's prime directive: no semantic fix should merge before its pinning test + a working gate exist, because the verification infra currently can't tell you whether a fix works. W14 produces that infra. It changes no language semantics, so it cannot regress the 5762p/274f baseline — the ideal first payload while we test the agentic launch technology.
Hard guardrails (this is a monitored test loop)
- Commit locally, do NOT push. No
git pushat all. (This is a test; the maintainer reviews before anything reaches origin.) - Stay in W14 scope — tests, runners, harness, gate tooling. Do NOT edit
spec/*.sx,hosts/ocaml/lib/*.ml, or any language semantics. If a task tempts you toward semantics, skip it and note why in the Progress log. - Never
pkill sx_server(shared binary). Bound everysx_server/build/test withtimeout. - You are on branch
loops/sx-gatein worktree/root/rose-ash-loops/sx-gate. Build/test here only. - If the OCaml build or full suite is involved, compare against the recorded baseline 5762 passed / 274 failed (fail set is the 273 hs-* + 1 r7rs radix; see PLAN W14/F10).
One iteration per fire — pick the first unchecked [ ], implement, test, commit (no push),
tick the box, prepend one dated line to the Progress log, then stop.
- Pin the
dc7aa709quick-wins batch. Add regression tests (spec/tests/ or a new suite) that lock in the fixes that currently have none: K09unquote-splicinglonghand splices; K11 guard re-raise sentinel is unforgeable ((guard (e (true (list 'quoted x))) ...)returns the list); K18(expt 2 100)is a float not 0; K20(contains? {:a 1} :a)is true; K39(do ((fn (x) x) 5) 99)→ 99; K49 the five void elements render. (K02 is already non-vacuously covered.) Confirm they pass on the current binary. - Pin C1/C1b/S4 at the host level (a small OCaml or shell test): a malformed command line returns an error response and the process survives; an error page is not cached.
- WASM corpus runner (F2). Stand up a Node harness that runs a curated spec/tests subset
against the shipped WASM kernel (seed: the conformance lane's
run_wasm.jspattern, referenced in PLAN). Curated subset, not the full 6k (js_of_ocaml is ~24s/test — see F18). Wire it as a script. - Harness honesty (C22/K104): make
spec/harness.sxlog the IO call before invoking the mock so a throwing mock is recorded. Add a test that a throwing mock leaves a log entry. - Runner-vs-prod env audit (F7/K42): list every binding that exists only in
run_tests.mlbut not the production kernel env (values/call-with-valuesare the known ones). Write the audit toplans/sx-review/runner-env-gap.md. (Fixing them is later; the audit is the W14 task.) - Protocol fuzz suite (C3/C4/C5/C6): a bounded test that feeds the epoch loop malformed
lines (
(epoch),(epoch foo), stray(io-response …), two-exprs-per-line) and asserts the process never dies and responses stay correctly tagged. - hs-upstream skip-list (F10/F18): make the native runner's 272 hs-* failures a skip-list so a red FAIL column means something. Record the count moved.
Progress log (newest first)
- (none yet — first fire will add the first entry)
Recording back to the forge
After each commit, note the sha here; the maintainer (or a later step) records it as a
test-kind commit on agents/ws-W14 in the forge so the program stays the system of record.