Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 1m6s
Classified migratable-in-kind (SX suites over epoch, not a foreign runner) but blocked on driver feature gaps: 8 distinct per-suite counter variable name pairs and per-suite preload chains, neither supported by MODE=counters (single global counter + fixed preloads) nor MODE=dict (load-time counter collisions across suites). Baseline 305/0 across 12 suites. Did not migrate; conformance.sh left untouched. Driver unchanged (out of per-iteration scope). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
130 lines
7.0 KiB
Markdown
130 lines
7.0 KiB
Markdown
# A1 conformance-driver migration loop
|
|
|
|
Role: migrate every remaining subsystem that hand-rolls its own `conformance.sh`
|
|
onto the **shared conformance driver** (`lib/guest/conformance.sh` + `lib/guest/conformance.sx`),
|
|
one subsystem per iteration, **verifying test-count parity before every commit**.
|
|
This executes item **A1** from the radar backlog (`plans/abstractions.md`, read-only
|
|
context). You are an implementer, not a scout.
|
|
|
|
You are on branch `loops/conformance`, worktree `/root/rose-ash-loops/conformance`.
|
|
|
|
## Hard safety rails (read every time)
|
|
|
|
- **NEVER push to `main` or `architecture`.** Push only to `origin/loops/conformance`.
|
|
- **NEVER `pkill`/`kill` `sx_server` or any shared process** — sibling loops share the
|
|
binary. Bound every test run with `timeout` (e.g. `timeout 600 bash …`). If a run
|
|
hangs, let the timeout end it; never kill globally.
|
|
- **One subsystem per iteration, then stop.** No batching.
|
|
- **Never commit a regression.** If post-migration test counts don't match the baseline
|
|
(or an error appears), REVERT (`git checkout -- lib/<x>/conformance.sh` and
|
|
`rm -f lib/<x>/conformance.conf`) and record the blocker — do not commit.
|
|
- `.sx` files: use the `sx-tree` MCP tools, never Read/Write/Edit. `.sh`/`.conf`/`.md`
|
|
files: normal tools are fine.
|
|
- Preserve the `bash lib/<x>/conformance.sh` entry point (the shim keeps it working) so
|
|
no other loop is disrupted.
|
|
|
|
## The candidate worklist
|
|
|
|
Remaining hand-rolled `conformance.sh` (from radar A1): **common-lisp, erlang, feed,
|
|
forth, go, js, ocaml, smalltalk, tcl**. Already migrated (do not touch): acl, apl,
|
|
datalog, haskell, mod, prolog. Already excluded (different harness): lua.
|
|
|
|
Work them roughly simplest-first. Track status in the checklist at the bottom.
|
|
|
|
## What "fits the driver" means — classify FIRST
|
|
|
|
The shared driver works for subsystems whose tests are **SX test-suites loaded over the
|
|
epoch protocol** and run by an expression that emits a counter/dict scoreboard. It does
|
|
NOT fit subsystems that run **foreign source programs** through a separate runner
|
|
(e.g. lua walks `*.lua` via Python; smalltalk runs `*.st` via `test.sh`).
|
|
|
|
Per candidate, before migrating, decide:
|
|
- **Migratable** — its `conformance.sh` epoch-loads SX preloads and evals SX test suites
|
|
→ proceed to migrate.
|
|
- **Excluded** — it shells out to a foreign program runner / scrapes a `test.sh` →
|
|
DO NOT migrate. Record the exclusion (one line in the checklist + a `git`-free note in
|
|
this briefing's Progress log) with the reason, and move on. Excluding is a valid,
|
|
honest result — a forced migration that loses coverage is worse than none.
|
|
|
|
## Per-iteration procedure
|
|
|
|
1. **Pick** the next `[ ]` candidate in the checklist.
|
|
2. **Read** its `lib/<x>/conformance.sh` in full. Read the two recipe templates —
|
|
`lib/haskell/conformance.conf` (MODE=counters) and `lib/prolog/conformance.conf`
|
|
(MODE=dict) — and skim `lib/guest/conformance.sh` + `lib/guest/conformance.sx`.
|
|
3. **Classify** (above). If Excluded → record reason, tick as excluded, stop.
|
|
4. **Baseline:** `timeout 600 bash lib/<x>/conformance.sh`, then read
|
|
`lib/<x>/scoreboard.json` and record the pass/total. This is the parity target.
|
|
5. **Author `lib/<x>/conformance.conf`:**
|
|
- `LANG_NAME=<x>`
|
|
- `MODE=dict` or `MODE=counters` (match how the old script counted)
|
|
- `PRELOADS=( … )` — the lib files in load order, lifted from the old script
|
|
- `SUITES=( "name:lib/<x>/tests/<file>:(<run-expr>)" … )` — one per suite, with the
|
|
exact run expression the old script used
|
|
- If counters mode needs counter definitions, add a small `test-harness.sx` preload
|
|
(author it with `sx_write_file`).
|
|
6. **Replace `lib/<x>/conformance.sh`** with the 3-line shim:
|
|
```bash
|
|
#!/usr/bin/env bash
|
|
# Thin wrapper — see lib/guest/conformance.sh and lib/<x>/conformance.conf.
|
|
exec bash "$(dirname "$0")/../guest/conformance.sh" "$(dirname "$0")/conformance.conf" "$@"
|
|
```
|
|
7. **Verify parity:** `timeout 600 bash lib/<x>/conformance.sh` again. Read
|
|
`scoreboard.json`. The pass/total MUST equal the baseline (a *higher* count is only
|
|
acceptable if you can explain it — e.g. the old extractor under-counted, as happened
|
|
with apl's `pipeline`; document it in the commit). Any mismatch/error → **revert**
|
|
(step: rails) and record the blocker.
|
|
8. **Commit** on `loops/conformance`:
|
|
`conformance: migrate <x> onto shared driver (<mode>, <pass>/<total> parity)`
|
|
then `git push origin loops/conformance`.
|
|
9. **Update** this file: tick the checklist box and add one dated line to the Progress
|
|
log (newest first). Then stop.
|
|
|
|
If a candidate is genuinely blocked (driver lacks a needed mode/feature), record it under
|
|
Blocked with specifics and move to the next candidate next iteration.
|
|
|
|
## Checklist
|
|
|
|
- [!] common-lisp — blocked: needs per-suite counter names + per-suite preloads (see Blocked)
|
|
- [ ] erlang
|
|
- [ ] feed
|
|
- [ ] forth
|
|
- [ ] go
|
|
- [ ] js
|
|
- [ ] ocaml
|
|
- [ ] smalltalk
|
|
- [ ] tcl
|
|
|
|
(Mark `[x] <x> — migrated N/N` or `[~] <x> — excluded: <reason>` or
|
|
`[!] <x> — blocked: <reason>`.)
|
|
|
|
## Progress log (newest first)
|
|
|
|
- 2026-06-07 — common-lisp: classified migratable-in-kind (SX suites over epoch) but
|
|
BLOCKED on driver feature gaps. Baseline `bash lib/common-lisp/conformance.sh` =
|
|
305 passed / 0 failed across 12 suites (3 — evaluator/geometry/mop-trace — already
|
|
emit 0/0, a pre-existing extraction quirk). Not a foreign runner, so not Excluded.
|
|
Did NOT migrate (parity unachievable under current modes); left conformance.sh
|
|
untouched. See Blocked. Driver left unchanged (out of strict per-iteration scope).
|
|
|
|
## Blocked
|
|
|
|
- **common-lisp** — the shared driver's two modes can't reproduce its 305/0 breakdown:
|
|
1. **Per-suite counter variable names.** Old script reads 8 distinct pairs across
|
|
its 12 suites: `cl-test-pass`/`cl-test-fail` (read, lambda, eval), `passed`/`failed`
|
|
(conditions, clos), `demo-passed`/`demo-failed`, `parse-passed`/`parse-failed`,
|
|
`debugger-passed`/`debugger-failed`, `geo-passed`/`geo-failed`,
|
|
`mop-passed`/`mop-failed`, `macro-passed`/`macro-failed`, `stdlib-passed`/`stdlib-failed`.
|
|
`MODE=counters` supports only one global `COUNTERS_PASS`/`COUNTERS_FAIL`.
|
|
2. **Per-suite preload chains.** Each suite loads a different file set
|
|
(e.g. read: `reader.sx`; clos: `runtime.sx clos.sx`; macros: `reader parser eval loop`).
|
|
`MODE=counters` loads one fixed `PRELOADS` set before every suite.
|
|
`MODE=dict` also fails: these tests run at *load* time mutating globals (no
|
|
`-tests-run!` runner fns), and dict mode loads all suites into one session — the
|
|
shared `passed`/`failed` counters used by both conditions and clos would collide.
|
|
**Unblock path (driver enhancement, deferred — out of this loop's per-iteration scope):**
|
|
extend `MODE=counters` with optional per-suite counter names and per-suite preloads
|
|
in the `SUITES` entry format (backward-compatible with the `name:file` shape haskell
|
|
uses). Same gap likely affects other remaining candidates; worth a one-time driver
|
|
change before resuming migrations.
|