Extend the shared driver's MODE=counters with a backward-compatible SUITES
format: name:file[:pass-var:fail-var[:extra-preload ...]]. Optional per-suite
counter symbols (override the global COUNTERS_PASS/COUNTERS_FAIL) and per-suite
preload chains (loaded after the global PRELOADS). Plain name:file entries are
unchanged — verified against haskell (fib/sieve/quicksort 2/2/5, matches
committed scoreboard).
common-lisp has 8 distinct per-suite counter pairs and a different preload
chain per suite, so it could not fit the single-counter/fixed-preload model;
the extended format expresses it directly. conformance.conf keeps the historical
scoreboard schema; conformance.sh becomes the 3-line shim.
Result 487/487 (0 fail) vs the old 305/0 baseline — higher and explained: the
old per-suite 'timeout 30' was too tight for the slow eval suite (~15-25s under
contention), silently recording it as 0; the driver's 180s budget recovers its
true 182. geometry/mop-trace stay 0/0 (pre-existing refl-class-chain-depth-with
load error; counter vars defined as 0 -> clean gc-result, no fail-fallback).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Extracted the duplicated conformance plumbing into a single driver:
- lib/guest/conformance.sx — two helper fns that emit (gc-result NAME P F T)
lines for the bash side to grep: gc-dict-result for runners returning
a {:passed :failed :total} dict, and gc-counters-result for guests that
bump a global pass/fail counter from a test file load.
- lib/guest/conformance.sh — config-driven bash driver. Sources a per-lang
conf, locates sx_server, runs sx_server in either single-session "dict"
mode (one preload + many suite evals) or per-suite "counters" mode
(fresh sx_server per suite, with shared preloads). Aggregates and writes
scoreboard.{json,md} via per-lang emit_scoreboard_* functions.
- Ported lib/prolog/conformance.sh and lib/haskell/conformance.sh down to
one-line wrappers that exec the shared driver against their .conf file.
Verification:
- Prolog: 590/590 — diff vs baseline is timestamp-only.
- Haskell: 156/156 — significantly higher than the 0/18 in baseline. The
old conformance.sh was buggy (its `(ok-len 3 ...)` grep never matched,
defaulting every program to 0 pass / 1 fail). Updated baseline to the
true count; no actual test regressed. Plan baseline cell updated.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>