First live test of the sx-forge technology driving a real work session: - sx-fix-up.sh <forge-agent> <briefing.md>: reads the agent's briefing FROM the rose-ash/sx-review forge (agentic-sx branch), materialises a git worktree + branch (loops/sx-<slug>), and spins up a tmux+claude session briefed from the forge. Commits are LOCAL by default (no push). - sx-fix-down.sh [--clean]: stop the sx-fix session; --clean removes worktrees. - plans/agent-briefings/sx-gate-loop.md: W14 (test gate) briefing — the safe first payload (test-only, cannot regress the 5762p/274f baseline), scoped commit-no-push with hard guardrails. Verified live: launcher read the W14 briefing from the forge, created worktree /root/rose-ash-loops/sx-ws-w14 on loops/sx-ws-w14, booted claude, and the agent picked up the briefing. Watch: tmux a -t sx-fix. Note: MCP servers need /mcp auth in a fresh worktree (agent works via Bash meanwhile). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
58 lines
4.1 KiB
Markdown
58 lines
4.1 KiB
Markdown
# sx-gate loop — W14 test gate (first live test of git→gitea→agentic→tmux)
|
|
|
|
**Forge agent:** `agents/ws-W14` in the `rose-ash/sx-review` forge (git-sx/gitea-sx/agentic-sx).
|
|
**Goal (from the forge briefing):** make the verification infrastructure trustworthy — runner env
|
|
== production env, a WASM corpus runner, harness honesty, and pinning tests for the fixes already
|
|
landed. This is **W14** in `plans/sx-review/PLAN.md` (read that section — it lists the findings).
|
|
**Findings:** C0b C9 C21 C22 C23 C3 C4 C5 C6 C7 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 F12 K19 K104.
|
|
|
|
## Why this workstream first
|
|
The review's prime directive: no semantic fix should merge before its pinning test + a working
|
|
gate exist, because the verification infra currently can't tell you whether a fix works. W14
|
|
produces that infra. It changes **no language semantics**, so it cannot regress the 5762p/274f
|
|
baseline — the ideal first payload while we test the agentic launch technology.
|
|
|
|
## Hard guardrails (this is a monitored test loop)
|
|
- **Commit locally, do NOT push.** No `git push` at all. (This is a test; the maintainer reviews
|
|
before anything reaches origin.)
|
|
- **Stay in W14 scope** — tests, runners, harness, gate tooling. Do NOT edit `spec/*.sx`,
|
|
`hosts/ocaml/lib/*.ml`, or any language semantics. If a task tempts you toward semantics, skip it
|
|
and note why in the Progress log.
|
|
- **Never `pkill sx_server`** (shared binary). Bound every `sx_server`/build/test with `timeout`.
|
|
- You are on branch `loops/sx-gate` in worktree `/root/rose-ash-loops/sx-gate`. Build/test here only.
|
|
- If the OCaml build or full suite is involved, compare against the recorded baseline
|
|
**5762 passed / 274 failed** (fail set is the 273 hs-* + 1 r7rs radix; see PLAN W14/F10).
|
|
|
|
## One iteration per fire — pick the first unchecked `[ ]`, implement, test, commit (no push),
|
|
tick the box, prepend one dated line to the Progress log, then stop.
|
|
|
|
- [ ] **Pin the dc7aa709 quick-wins batch.** Add regression tests (spec/tests/ or a new suite) that
|
|
lock in the fixes that currently have none: K09 `unquote-splicing` longhand splices; K11 guard
|
|
re-raise sentinel is unforgeable (`(guard (e (true (list 'quoted x))) ...)` returns the list);
|
|
K18 `(expt 2 100)` is a float not 0; K20 `(contains? {:a 1} :a)` is true; K39 `(do ((fn (x) x) 5) 99)`
|
|
→ 99; K49 the five void elements render. (K02 is already non-vacuously covered.) Confirm they pass
|
|
on the current binary.
|
|
- [ ] **Pin C1/C1b/S4 at the host level** (a small OCaml or shell test): a malformed command line
|
|
returns an error response and the process survives; an error page is not cached.
|
|
- [ ] **WASM corpus runner (F2).** Stand up a Node harness that runs a curated spec/tests subset
|
|
against the shipped WASM kernel (seed: the conformance lane's `run_wasm.js` pattern, referenced in
|
|
PLAN). Curated subset, not the full 6k (js_of_ocaml is ~24s/test — see F18). Wire it as a script.
|
|
- [ ] **Harness honesty (C22/K104):** make `spec/harness.sx` log the IO call *before* invoking the
|
|
mock so a throwing mock is recorded. Add a test that a throwing mock leaves a log entry.
|
|
- [ ] **Runner-vs-prod env audit (F7/K42):** list every binding that exists only in `run_tests.ml`
|
|
but not the production kernel env (`values`/`call-with-values` are the known ones). Write the audit
|
|
to `plans/sx-review/runner-env-gap.md`. (Fixing them is later; the audit is the W14 task.)
|
|
- [ ] **Protocol fuzz suite (C3/C4/C5/C6):** a bounded test that feeds the epoch loop malformed
|
|
lines (`(epoch)`, `(epoch foo)`, stray `(io-response …)`, two-exprs-per-line) and asserts the
|
|
process never dies and responses stay correctly tagged.
|
|
- [ ] **hs-upstream skip-list (F10/F18):** make the native runner's 272 hs-* failures a skip-list so
|
|
a red FAIL column means something. Record the count moved.
|
|
|
|
## Progress log (newest first)
|
|
<!-- prepend: `- YYYY-MM-DD <what landed, test result, commit sha>` -->
|
|
- (none yet — first fire will add the first entry)
|
|
|
|
## Recording back to the forge
|
|
After each commit, note the sha here; the maintainer (or a later step) records it as a
|
|
`test`-kind commit on `agents/ws-W14` in the forge so the program stays the system of record.
|