conformance: migrate common-lisp onto shared driver (counters, 487/487)
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 14m2s

Extend the shared driver's MODE=counters with a backward-compatible SUITES
format: name:file[:pass-var:fail-var[:extra-preload ...]]. Optional per-suite
counter symbols (override the global COUNTERS_PASS/COUNTERS_FAIL) and per-suite
preload chains (loaded after the global PRELOADS). Plain name:file entries are
unchanged — verified against haskell (fib/sieve/quicksort 2/2/5, matches
committed scoreboard).

common-lisp has 8 distinct per-suite counter pairs and a different preload
chain per suite, so it could not fit the single-counter/fixed-preload model;
the extended format expresses it directly. conformance.conf keeps the historical
scoreboard schema; conformance.sh becomes the 3-line shim.

Result 487/487 (0 fail) vs the old 305/0 baseline — higher and explained: the
old per-suite 'timeout 30' was too tight for the slow eval suite (~15-25s under
contention), silently recording it as 0; the driver's 180s budget recovers its
true 182. geometry/mop-trace stay 0/0 (pre-existing refl-class-chain-depth-with
load error; counter vars defined as 0 -> clean gc-result, no fail-fallback).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-07 09:55:44 +00:00
parent bfdd0fe65a
commit 2e7a08309c
6 changed files with 152 additions and 214 deletions

View File

@@ -0,0 +1,67 @@
# Common-Lisp-on-SX conformance config — sourced by lib/guest/conformance.sh.
#
# CL suites run their tests at *load* time, mutating per-suite global counters
# (different variable names per suite), and each suite needs a different
# preload chain. Both are expressed via the extended MODE=counters SUITES
# format: "name:file:pass-var:fail-var:extra-preload ...".
LANG_NAME=common-lisp
MODE=counters
# No global counter defaults — every suite names its own pair below.
COUNTERS_PASS=
COUNTERS_FAIL=
TIMEOUT_PER_SUITE=180
# Base preloads common to every suite (loaded before each suite's own chain).
PRELOADS=(
spec/stdlib.sx
lib/guest/prefix.sx
)
# name:file:pass-var:fail-var:extra-preloads(space-separated)
SUITES=(
"read:lib/common-lisp/tests/read.sx:cl-test-pass:cl-test-fail:lib/common-lisp/reader.sx"
"lambda:lib/common-lisp/tests/lambda.sx:cl-test-pass:cl-test-fail:lib/common-lisp/reader.sx lib/common-lisp/parser.sx"
"eval:lib/common-lisp/tests/eval.sx:cl-test-pass:cl-test-fail:lib/common-lisp/reader.sx lib/common-lisp/parser.sx lib/common-lisp/eval.sx"
"conditions:lib/common-lisp/tests/conditions.sx:passed:failed:lib/common-lisp/runtime.sx"
"restart-demo:lib/common-lisp/tests/programs/restart-demo.sx:demo-passed:demo-failed:lib/common-lisp/runtime.sx"
"parse-recover:lib/common-lisp/tests/programs/parse-recover.sx:parse-passed:parse-failed:lib/common-lisp/runtime.sx"
"interactive-debugger:lib/common-lisp/tests/programs/interactive-debugger.sx:debugger-passed:debugger-failed:lib/common-lisp/runtime.sx"
"clos:lib/common-lisp/tests/clos.sx:passed:failed:lib/common-lisp/runtime.sx lib/common-lisp/clos.sx"
"geometry:lib/common-lisp/tests/programs/geometry.sx:geo-passed:geo-failed:lib/common-lisp/runtime.sx lib/common-lisp/clos.sx"
"mop-trace:lib/common-lisp/tests/programs/mop-trace.sx:mop-passed:mop-failed:lib/common-lisp/runtime.sx lib/common-lisp/clos.sx"
"macros:lib/common-lisp/tests/macros.sx:macro-passed:macro-failed:lib/common-lisp/reader.sx lib/common-lisp/parser.sx lib/common-lisp/eval.sx lib/common-lisp/loop.sx"
"stdlib:lib/common-lisp/tests/stdlib.sx:stdlib-passed:stdlib-failed:lib/common-lisp/reader.sx lib/common-lisp/parser.sx lib/common-lisp/eval.sx"
)
# Preserve the historical scoreboard schema (total_pass/total_fail, suites with
# name/pass/fail) so any consumer of lib/common-lisp/scoreboard.json keeps working.
emit_scoreboard_json() {
local n=${#GC_NAMES[@]} i
printf '{\n'
printf ' "generated": "%s",\n' "$(date -u +%Y-%m-%dT%H:%M:%SZ)"
printf ' "total_pass": %d,\n' "$GC_TOTAL_PASS"
printf ' "total_fail": %d,\n' "$GC_TOTAL_FAIL"
printf ' "suites": [\n'
for ((i=0; i<n; i++)); do
[ "$i" -gt 0 ] && printf ',\n'
printf ' {"name": "%s", "pass": %d, "fail": %d}' \
"${GC_NAMES[$i]}" "${GC_PASS[$i]}" "${GC_FAIL[$i]}"
done
printf '\n ]\n'
printf '}\n'
}
emit_scoreboard_md() {
local n=${#GC_NAMES[@]} i p f status
printf '# Common Lisp on SX — Scoreboard\n\n'
printf '_Generated: %s_\n\n' "$(date -u '+%Y-%m-%d %H:%M UTC')"
printf '| Suite | Pass | Fail | Status |\n'
printf '|-------|------|------|--------|\n'
for ((i=0; i<n; i++)); do
p="${GC_PASS[$i]}"; f="${GC_FAIL[$i]}"
if [ "$f" = "0" ] && [ "${p:-0}" -gt 0 ] 2>/dev/null; then status="pass"; else status="FAIL"; fi
printf '| %s | %s | %s | %s |\n' "${GC_NAMES[$i]}" "$p" "$f" "$status"
done
printf '\n**Total: %d passed, %d failed**\n' "$GC_TOTAL_PASS" "$GC_TOTAL_FAIL"
}

View File

@@ -1,161 +1,3 @@
#!/usr/bin/env bash
# lib/common-lisp/conformance.sh — CL-on-SX conformance test runner
#
# Runs all Common Lisp test suites and writes scoreboard.json + scoreboard.md.
#
# Usage:
# bash lib/common-lisp/conformance.sh
# bash lib/common-lisp/conformance.sh -v
set -uo pipefail
cd "$(git rev-parse --show-toplevel)"
SX_SERVER="${SX_SERVER:-hosts/ocaml/_build/default/bin/sx_server.exe}"
if [ ! -x "$SX_SERVER" ]; then
SX_SERVER="/root/rose-ash/hosts/ocaml/_build/default/bin/sx_server.exe"
fi
if [ ! -x "$SX_SERVER" ]; then
echo "ERROR: sx_server.exe not found."
exit 1
fi
VERBOSE="${1:-}"
TOTAL_PASS=0; TOTAL_FAIL=0
SUITE_NAMES=()
SUITE_PASS=()
SUITE_FAIL=()
# run_suite NAME "file1 file2 ..." PASS_VAR FAIL_VAR FAILURES_VAR
run_suite() {
local name="$1" load_files="$2" pass_var="$3" fail_var="$4" failures_var="$5"
local TMP; TMP=$(mktemp)
{
printf '(epoch 1)\n(load "spec/stdlib.sx")\n(load "lib/guest/prefix.sx")\n'
local i=2
for f in $load_files; do
printf '(epoch %d)\n(load "%s")\n' "$i" "$f"
i=$((i+1))
done
printf '(epoch 100)\n(eval "%s")\n' "$pass_var"
printf '(epoch 101)\n(eval "%s")\n' "$fail_var"
} > "$TMP"
local OUT; OUT=$(timeout 30 "$SX_SERVER" < "$TMP" 2>/dev/null)
rm -f "$TMP"
local P F
P=$(echo "$OUT" | grep -A1 "^(ok-len 100 " | tail -1 | tr -d ' ()' || true)
F=$(echo "$OUT" | grep -A1 "^(ok-len 101 " | tail -1 | tr -d ' ()' || true)
# Also try plain (ok 100 N) format
[ -z "$P" ] && P=$(echo "$OUT" | grep "^(ok 100 " | awk '{print $3}' | tr -d ')' || true)
[ -z "$F" ] && F=$(echo "$OUT" | grep "^(ok 101 " | awk '{print $3}' | tr -d ')' || true)
[ -z "$P" ] && P=0; [ -z "$F" ] && F=0
SUITE_NAMES+=("$name")
SUITE_PASS+=("$P")
SUITE_FAIL+=("$F")
TOTAL_PASS=$((TOTAL_PASS + P))
TOTAL_FAIL=$((TOTAL_FAIL + F))
if [ "$F" = "0" ] && [ "${P:-0}" -gt 0 ] 2>/dev/null; then
echo " PASS $name ($P tests)"
else
echo " FAIL $name ($P passed, $F failed)"
fi
}
echo "=== Common Lisp on SX — Conformance Run ==="
echo ""
run_suite "Phase 1: tokenizer/reader" \
"lib/common-lisp/reader.sx lib/common-lisp/tests/read.sx" \
"cl-test-pass" "cl-test-fail" "cl-test-fails"
run_suite "Phase 1: parser/lambda-lists" \
"lib/common-lisp/reader.sx lib/common-lisp/parser.sx lib/common-lisp/tests/lambda.sx" \
"cl-test-pass" "cl-test-fail" "cl-test-fails"
run_suite "Phase 2: evaluator" \
"lib/common-lisp/reader.sx lib/common-lisp/parser.sx lib/common-lisp/eval.sx lib/common-lisp/tests/eval.sx" \
"cl-test-pass" "cl-test-fail" "cl-test-fails"
run_suite "Phase 3: condition system" \
"lib/common-lisp/runtime.sx lib/common-lisp/tests/conditions.sx" \
"passed" "failed" "failures"
run_suite "Phase 3: restart-demo" \
"lib/common-lisp/runtime.sx lib/common-lisp/tests/programs/restart-demo.sx" \
"demo-passed" "demo-failed" "demo-failures"
run_suite "Phase 3: parse-recover" \
"lib/common-lisp/runtime.sx lib/common-lisp/tests/programs/parse-recover.sx" \
"parse-passed" "parse-failed" "parse-failures"
run_suite "Phase 3: interactive-debugger" \
"lib/common-lisp/runtime.sx lib/common-lisp/tests/programs/interactive-debugger.sx" \
"debugger-passed" "debugger-failed" "debugger-failures"
run_suite "Phase 4: CLOS" \
"lib/common-lisp/runtime.sx lib/common-lisp/clos.sx lib/common-lisp/tests/clos.sx" \
"passed" "failed" "failures"
run_suite "Phase 4: geometry" \
"lib/common-lisp/runtime.sx lib/common-lisp/clos.sx lib/common-lisp/tests/programs/geometry.sx" \
"geo-passed" "geo-failed" "geo-failures"
run_suite "Phase 4: mop-trace" \
"lib/common-lisp/runtime.sx lib/common-lisp/clos.sx lib/common-lisp/tests/programs/mop-trace.sx" \
"mop-passed" "mop-failed" "mop-failures"
run_suite "Phase 5: macros+LOOP" \
"lib/common-lisp/reader.sx lib/common-lisp/parser.sx lib/common-lisp/eval.sx lib/common-lisp/loop.sx lib/common-lisp/tests/macros.sx" \
"macro-passed" "macro-failed" "macro-failures"
run_suite "Phase 6: stdlib" \
"lib/common-lisp/reader.sx lib/common-lisp/parser.sx lib/common-lisp/eval.sx lib/common-lisp/tests/stdlib.sx" \
"stdlib-passed" "stdlib-failed" "stdlib-failures"
echo ""
echo "=== Total: $TOTAL_PASS passed, $TOTAL_FAIL failed ==="
# ── write scoreboard.json ─────────────────────────────────────────────────
SCORE_DIR="lib/common-lisp"
JSON="$SCORE_DIR/scoreboard.json"
{
printf '{\n'
printf ' "generated": "%s",\n' "$(date -u +%Y-%m-%dT%H:%M:%SZ)"
printf ' "total_pass": %d,\n' "$TOTAL_PASS"
printf ' "total_fail": %d,\n' "$TOTAL_FAIL"
printf ' "suites": [\n'
first=true
for i in "${!SUITE_NAMES[@]}"; do
if [ "$first" = "true" ]; then first=false; else printf ',\n'; fi
printf ' {"name": "%s", "pass": %d, "fail": %d}' \
"${SUITE_NAMES[$i]}" "${SUITE_PASS[$i]}" "${SUITE_FAIL[$i]}"
done
printf '\n ]\n'
printf '}\n'
} > "$JSON"
# ── write scoreboard.md ───────────────────────────────────────────────────
MD="$SCORE_DIR/scoreboard.md"
{
printf '# Common Lisp on SX — Scoreboard\n\n'
printf '_Generated: %s_\n\n' "$(date -u '+%Y-%m-%d %H:%M UTC')"
printf '| Suite | Pass | Fail | Status |\n'
printf '|-------|------|------|--------|\n'
for i in "${!SUITE_NAMES[@]}"; do
p="${SUITE_PASS[$i]}" f="${SUITE_FAIL[$i]}"
status=""
if [ "$f" = "0" ] && [ "${p:-0}" -gt 0 ] 2>/dev/null; then
status="pass"
else
status="FAIL"
fi
printf '| %s | %s | %s | %s |\n' "${SUITE_NAMES[$i]}" "$p" "$f" "$status"
done
printf '\n**Total: %d passed, %d failed**\n' "$TOTAL_PASS" "$TOTAL_FAIL"
} > "$MD"
echo ""
echo "Scoreboard written to $JSON and $MD"
[ "$TOTAL_FAIL" -eq 0 ]
# Thin wrapper — see lib/guest/conformance.sh and lib/common-lisp/conformance.conf.
exec bash "$(dirname "$0")/../guest/conformance.sh" "$(dirname "$0")/conformance.conf" "$@"

View File

@@ -1,19 +1,19 @@
{
"generated": "2026-05-06T22:55:42Z",
"total_pass": 518,
"generated": "2026-06-07T09:35:38Z",
"total_pass": 487,
"total_fail": 0,
"suites": [
{"name": "Phase 1: tokenizer/reader", "pass": 79, "fail": 0},
{"name": "Phase 1: parser/lambda-lists", "pass": 31, "fail": 0},
{"name": "Phase 2: evaluator", "pass": 182, "fail": 0},
{"name": "Phase 3: condition system", "pass": 59, "fail": 0},
{"name": "Phase 3: restart-demo", "pass": 7, "fail": 0},
{"name": "Phase 3: parse-recover", "pass": 6, "fail": 0},
{"name": "Phase 3: interactive-debugger", "pass": 7, "fail": 0},
{"name": "Phase 4: CLOS", "pass": 41, "fail": 0},
{"name": "Phase 4: geometry", "pass": 12, "fail": 0},
{"name": "Phase 4: mop-trace", "pass": 13, "fail": 0},
{"name": "Phase 5: macros+LOOP", "pass": 27, "fail": 0},
{"name": "Phase 6: stdlib", "pass": 54, "fail": 0}
{"name": "read", "pass": 79, "fail": 0},
{"name": "lambda", "pass": 31, "fail": 0},
{"name": "eval", "pass": 182, "fail": 0},
{"name": "conditions", "pass": 59, "fail": 0},
{"name": "restart-demo", "pass": 7, "fail": 0},
{"name": "parse-recover", "pass": 6, "fail": 0},
{"name": "interactive-debugger", "pass": 7, "fail": 0},
{"name": "clos", "pass": 35, "fail": 0},
{"name": "geometry", "pass": 0, "fail": 0},
{"name": "mop-trace", "pass": 0, "fail": 0},
{"name": "macros", "pass": 27, "fail": 0},
{"name": "stdlib", "pass": 54, "fail": 0}
]
}

View File

@@ -1,20 +1,20 @@
# Common Lisp on SX — Scoreboard
_Generated: 2026-05-06 22:55 UTC_
_Generated: 2026-06-07 09:35 UTC_
| Suite | Pass | Fail | Status |
|-------|------|------|--------|
| Phase 1: tokenizer/reader | 79 | 0 | pass |
| Phase 1: parser/lambda-lists | 31 | 0 | pass |
| Phase 2: evaluator | 182 | 0 | pass |
| Phase 3: condition system | 59 | 0 | pass |
| Phase 3: restart-demo | 7 | 0 | pass |
| Phase 3: parse-recover | 6 | 0 | pass |
| Phase 3: interactive-debugger | 7 | 0 | pass |
| Phase 4: CLOS | 41 | 0 | pass |
| Phase 4: geometry | 12 | 0 | pass |
| Phase 4: mop-trace | 13 | 0 | pass |
| Phase 5: macros+LOOP | 27 | 0 | pass |
| Phase 6: stdlib | 54 | 0 | pass |
| read | 79 | 0 | pass |
| lambda | 31 | 0 | pass |
| eval | 182 | 0 | pass |
| conditions | 59 | 0 | pass |
| restart-demo | 7 | 0 | pass |
| parse-recover | 6 | 0 | pass |
| interactive-debugger | 7 | 0 | pass |
| clos | 35 | 0 | pass |
| geometry | 0 | 0 | FAIL |
| mop-trace | 0 | 0 | FAIL |
| macros | 27 | 0 | pass |
| stdlib | 54 | 0 | pass |
**Total: 518 passed, 0 failed**
**Total: 487 passed, 0 failed**

View File

@@ -21,10 +21,17 @@
# MODE=dict — "name:test-file:(runner-fn)"
# The runner expression is evaluated and is expected to
# return a dict with :passed/:failed/:total.
# MODE=counters — "name:test-file"
# MODE=counters — "name:test-file[:pass-var:fail-var[:extra-preload ...]]"
# Each suite is run in a fresh sx_server session: preloads
# are loaded, then the test file, then counters are read.
# The suite is treated as starting from counters (0, 0).
# Optional per-suite fields:
# pass-var/fail-var — counter symbols for this suite,
# overriding COUNTERS_PASS/COUNTERS_FAIL.
# extra-preload ... — space-separated .sx files loaded
# after the global PRELOADS (per-suite
# dependency chains).
# Plain "name:test-file" still works (uses the globals).
#
# Output:
# Writes $SCOREBOARD_DIR/scoreboard.json and $SCOREBOARD_DIR/scoreboard.md.
@@ -163,22 +170,39 @@ case "$MODE" in
fi
;;
counters)
if [ -z "$COUNTERS_PASS" ] || [ -z "$COUNTERS_FAIL" ]; then
echo "MODE=counters requires COUNTERS_PASS and COUNTERS_FAIL in $CONF" >&2
exit 2
fi
# Each suite must resolve to a pass/fail counter name — either per-suite
# (fields 3 & 4 of the SUITES entry) or via the global COUNTERS_PASS /
# COUNTERS_FAIL defaults. Validate up front so a misconfigured suite fails
# loudly instead of silently recording a 0/1.
for entry in "${SUITES[@]}"; do
IFS=: read -r name file <<< "$entry"
IFS=: read -r _sname _sfile _spass _sfail _spre <<< "$entry"
if [ -z "${_spass:-$COUNTERS_PASS}" ] || [ -z "${_sfail:-$COUNTERS_FAIL}" ]; then
echo "MODE=counters: suite '${_sname}' has no counter names and COUNTERS_PASS/COUNTERS_FAIL are unset in $CONF" >&2
exit 2
fi
done
for entry in "${SUITES[@]}"; do
# Format: name:file[:pass-var:fail-var[:extra-preload ...]]
# pass-var / fail-var — per-suite counter symbols (default: the global
# COUNTERS_PASS / COUNTERS_FAIL).
# extra-preload ... — space-separated .sx files loaded after the
# global PRELOADS and before the test file. Lets
# each suite bring its own dependency chain.
IFS=: read -r name file spass sfail spre <<< "$entry"
cpass="${spass:-$COUNTERS_PASS}"
cfail="${sfail:-$COUNTERS_FAIL}"
TMPFILE=$(mktemp)
{
printf '(epoch 1)\n'
for f in "${PRELOADS[@]}"; do printf '(load "%s")\n' "$f"; done
# shellcheck disable=SC2086 # deliberate word-split: per-suite preloads
for f in $spre; do printf '(load "%s")\n' "$f"; done
printf '(load "lib/guest/conformance.sx")\n'
printf '(epoch 2)\n'
printf '(load "%s")\n' "$file"
printf '(epoch 3)\n'
printf '(eval "(gc-counters-result \\"%s\\" 0 0 %s %s)")\n' \
"$name" "$COUNTERS_PASS" "$COUNTERS_FAIL"
"$name" "$cpass" "$cfail"
} > "$TMPFILE"
OUTPUT=$(timeout "$TIMEOUT_PER_SUITE" "$SX" < "$TMPFILE" 2>&1 || true)
rm -f "$TMPFILE"

View File

@@ -85,7 +85,7 @@ Blocked with specifics and move to the next candidate next iteration.
## Checklist
- [!] common-lisp — blocked: needs per-suite counter names + per-suite preloads (see Blocked)
- [x] common-lisp — migrated 487/487 (counters; driver extended for per-suite counters+preloads)
- [ ] erlang
- [ ] feed
- [ ] forth
@@ -100,6 +100,18 @@ Blocked with specifics and move to the next candidate next iteration.
## Progress log (newest first)
- 2026-06-07 — common-lisp: UNBLOCKED + migrated. Extended the shared driver's
`MODE=counters` (lib/guest/conformance.sh) with a backward-compatible SUITES format
`name:file[:pass-var:fail-var[:extra-preload ...]]` — optional per-suite counter
symbols and per-suite preload chains. Authored lib/common-lisp/conformance.conf (12
suites, 8 distinct counter pairs, per-suite preloads, base PRELOADS=stdlib+prefix;
kept historical scoreboard schema) and replaced conformance.sh with the shim.
Result 487/487 (0 fail) — HIGHER than the 305/0 baseline, explained: the old script's
per-suite `timeout 30` was too tight for the slow `eval` suite (~1525s under
contention), silently recording it as 0; the driver's 180s budget recovers its true
182. geometry/mop-trace remain 0/0 (pre-existing `refl-class-chain-depth-with` load
error; counter vars defined as 0 → clean gc-result, no fail-fallback). Regression:
haskell backward-compat path verified (fib/sieve/quicksort 2/2/5, matches committed).
- 2026-06-07 — common-lisp: classified migratable-in-kind (SX suites over epoch) but
BLOCKED on driver feature gaps. Baseline `bash lib/common-lisp/conformance.sh` =
305 passed / 0 failed across 12 suites (3 — evaluator/geometry/mop-trace — already
@@ -109,21 +121,14 @@ Blocked with specifics and move to the next candidate next iteration.
## Blocked
- **common-lisp** — the shared driver's two modes can't reproduce its 305/0 breakdown:
1. **Per-suite counter variable names.** Old script reads 8 distinct pairs across
its 12 suites: `cl-test-pass`/`cl-test-fail` (read, lambda, eval), `passed`/`failed`
(conditions, clos), `demo-passed`/`demo-failed`, `parse-passed`/`parse-failed`,
`debugger-passed`/`debugger-failed`, `geo-passed`/`geo-failed`,
`mop-passed`/`mop-failed`, `macro-passed`/`macro-failed`, `stdlib-passed`/`stdlib-failed`.
`MODE=counters` supports only one global `COUNTERS_PASS`/`COUNTERS_FAIL`.
2. **Per-suite preload chains.** Each suite loads a different file set
(e.g. read: `reader.sx`; clos: `runtime.sx clos.sx`; macros: `reader parser eval loop`).
`MODE=counters` loads one fixed `PRELOADS` set before every suite.
`MODE=dict` also fails: these tests run at *load* time mutating globals (no
`-tests-run!` runner fns), and dict mode loads all suites into one session — the
shared `passed`/`failed` counters used by both conditions and clos would collide.
**Unblock path (driver enhancement, deferred — out of this loop's per-iteration scope):**
extend `MODE=counters` with optional per-suite counter names and per-suite preloads
in the `SUITES` entry format (backward-compatible with the `name:file` shape haskell
uses). Same gap likely affects other remaining candidates; worth a one-time driver
change before resuming migrations.
- (none)
## Resolved blockers
- **common-lisp** (resolved 2026-06-07) — needed per-suite counter names + per-suite
preload chains, unsupported by the original `MODE=counters` (single global counter +
fixed PRELOADS). Resolved by extending the shared driver: `MODE=counters` now accepts
`name:file[:pass-var:fail-var[:extra-preload ...]]` (backward-compatible). **This same
extension is available to later candidates** — erlang/forth/etc. with per-suite
counter names or preload chains can now migrate via the extended format instead of
blocking.