Files
rose-ash/plans/agent-briefings/conformance-loop.md
giles bfdd0fe65a
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 1m6s
conformance: record common-lisp blocker (per-suite counters + preloads)
Classified migratable-in-kind (SX suites over epoch, not a foreign runner)
but blocked on driver feature gaps: 8 distinct per-suite counter variable
name pairs and per-suite preload chains, neither supported by MODE=counters
(single global counter + fixed preloads) nor MODE=dict (load-time counter
collisions across suites). Baseline 305/0 across 12 suites. Did not migrate;
conformance.sh left untouched. Driver unchanged (out of per-iteration scope).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 09:22:39 +00:00

7.0 KiB

A1 conformance-driver migration loop

Role: migrate every remaining subsystem that hand-rolls its own conformance.sh onto the shared conformance driver (lib/guest/conformance.sh + lib/guest/conformance.sx), one subsystem per iteration, verifying test-count parity before every commit. This executes item A1 from the radar backlog (plans/abstractions.md, read-only context). You are an implementer, not a scout.

You are on branch loops/conformance, worktree /root/rose-ash-loops/conformance.

Hard safety rails (read every time)

  • NEVER push to main or architecture. Push only to origin/loops/conformance.
  • NEVER pkill/kill sx_server or any shared process — sibling loops share the binary. Bound every test run with timeout (e.g. timeout 600 bash …). If a run hangs, let the timeout end it; never kill globally.
  • One subsystem per iteration, then stop. No batching.
  • Never commit a regression. If post-migration test counts don't match the baseline (or an error appears), REVERT (git checkout -- lib/<x>/conformance.sh and rm -f lib/<x>/conformance.conf) and record the blocker — do not commit.
  • .sx files: use the sx-tree MCP tools, never Read/Write/Edit. .sh/.conf/.md files: normal tools are fine.
  • Preserve the bash lib/<x>/conformance.sh entry point (the shim keeps it working) so no other loop is disrupted.

The candidate worklist

Remaining hand-rolled conformance.sh (from radar A1): common-lisp, erlang, feed, forth, go, js, ocaml, smalltalk, tcl. Already migrated (do not touch): acl, apl, datalog, haskell, mod, prolog. Already excluded (different harness): lua.

Work them roughly simplest-first. Track status in the checklist at the bottom.

What "fits the driver" means — classify FIRST

The shared driver works for subsystems whose tests are SX test-suites loaded over the epoch protocol and run by an expression that emits a counter/dict scoreboard. It does NOT fit subsystems that run foreign source programs through a separate runner (e.g. lua walks *.lua via Python; smalltalk runs *.st via test.sh).

Per candidate, before migrating, decide:

  • Migratable — its conformance.sh epoch-loads SX preloads and evals SX test suites → proceed to migrate.
  • Excluded — it shells out to a foreign program runner / scrapes a test.sh → DO NOT migrate. Record the exclusion (one line in the checklist + a git-free note in this briefing's Progress log) with the reason, and move on. Excluding is a valid, honest result — a forced migration that loses coverage is worse than none.

Per-iteration procedure

  1. Pick the next [ ] candidate in the checklist.
  2. Read its lib/<x>/conformance.sh in full. Read the two recipe templates — lib/haskell/conformance.conf (MODE=counters) and lib/prolog/conformance.conf (MODE=dict) — and skim lib/guest/conformance.sh + lib/guest/conformance.sx.
  3. Classify (above). If Excluded → record reason, tick as excluded, stop.
  4. Baseline: timeout 600 bash lib/<x>/conformance.sh, then read lib/<x>/scoreboard.json and record the pass/total. This is the parity target.
  5. Author lib/<x>/conformance.conf:
    • LANG_NAME=<x>
    • MODE=dict or MODE=counters (match how the old script counted)
    • PRELOADS=( … ) — the lib files in load order, lifted from the old script
    • SUITES=( "name:lib/<x>/tests/<file>:(<run-expr>)" … ) — one per suite, with the exact run expression the old script used
    • If counters mode needs counter definitions, add a small test-harness.sx preload (author it with sx_write_file).
  6. Replace lib/<x>/conformance.sh with the 3-line shim:
    #!/usr/bin/env bash
    # Thin wrapper — see lib/guest/conformance.sh and lib/<x>/conformance.conf.
    exec bash "$(dirname "$0")/../guest/conformance.sh" "$(dirname "$0")/conformance.conf" "$@"
    
  7. Verify parity: timeout 600 bash lib/<x>/conformance.sh again. Read scoreboard.json. The pass/total MUST equal the baseline (a higher count is only acceptable if you can explain it — e.g. the old extractor under-counted, as happened with apl's pipeline; document it in the commit). Any mismatch/error → revert (step: rails) and record the blocker.
  8. Commit on loops/conformance: conformance: migrate <x> onto shared driver (<mode>, <pass>/<total> parity) then git push origin loops/conformance.
  9. Update this file: tick the checklist box and add one dated line to the Progress log (newest first). Then stop.

If a candidate is genuinely blocked (driver lacks a needed mode/feature), record it under Blocked with specifics and move to the next candidate next iteration.

Checklist

  • [!] common-lisp — blocked: needs per-suite counter names + per-suite preloads (see Blocked)
  • erlang
  • feed
  • forth
  • go
  • js
  • ocaml
  • smalltalk
  • tcl

(Mark [x] <x> — migrated N/N or [~] <x> — excluded: <reason> or [!] <x> — blocked: <reason>.)

Progress log (newest first)

  • 2026-06-07 — common-lisp: classified migratable-in-kind (SX suites over epoch) but BLOCKED on driver feature gaps. Baseline bash lib/common-lisp/conformance.sh = 305 passed / 0 failed across 12 suites (3 — evaluator/geometry/mop-trace — already emit 0/0, a pre-existing extraction quirk). Not a foreign runner, so not Excluded. Did NOT migrate (parity unachievable under current modes); left conformance.sh untouched. See Blocked. Driver left unchanged (out of strict per-iteration scope).

Blocked

  • common-lisp — the shared driver's two modes can't reproduce its 305/0 breakdown:
    1. Per-suite counter variable names. Old script reads 8 distinct pairs across its 12 suites: cl-test-pass/cl-test-fail (read, lambda, eval), passed/failed (conditions, clos), demo-passed/demo-failed, parse-passed/parse-failed, debugger-passed/debugger-failed, geo-passed/geo-failed, mop-passed/mop-failed, macro-passed/macro-failed, stdlib-passed/stdlib-failed. MODE=counters supports only one global COUNTERS_PASS/COUNTERS_FAIL.
    2. Per-suite preload chains. Each suite loads a different file set (e.g. read: reader.sx; clos: runtime.sx clos.sx; macros: reader parser eval loop). MODE=counters loads one fixed PRELOADS set before every suite. MODE=dict also fails: these tests run at load time mutating globals (no -tests-run! runner fns), and dict mode loads all suites into one session — the shared passed/failed counters used by both conditions and clos would collide. Unblock path (driver enhancement, deferred — out of this loop's per-iteration scope): extend MODE=counters with optional per-suite counter names and per-suite preloads in the SUITES entry format (backward-compatible with the name:file shape haskell uses). Same gap likely affects other remaining candidates; worth a one-time driver change before resuming migrations.