GUEST: step 1 — lib/guest/conformance.{sx,sh} config-driven driver

Extracted the duplicated conformance plumbing into a single driver: - lib/guest/conformance.sx — two helper fns that emit (gc-result NAME P F T) lines for the bash side to grep: gc-dict-result for runners returning a {:passed :failed :total} dict, and gc-counters-result for guests that bump a global pass/fail counter from a test file load. - lib/guest/conformance.sh — config-driven bash driver. Sources a per-lang conf, locates sx_server, runs sx_server in either single-session "dict" mode (one preload + many suite evals) or per-suite "counters" mode (fresh sx_server per suite, with shared preloads). Aggregates and writes scoreboard.{json,md} via per-lang emit_scoreboard_* functions. - Ported lib/prolog/conformance.sh and lib/haskell/conformance.sh down to one-line wrappers that exec the shared driver against their .conf file. Verification: - Prolog: 590/590 — diff vs baseline is timestamp-only. - Haskell: 156/156 — significantly higher than the 0/18 in baseline. The old conformance.sh was buggy (its `(ok-len 3 ...)` grep never matched, defaulting every program to 0 pass / 1 fail). Updated baseline to the true count; no actual test regressed. Plan baseline cell updated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 22:46:48 +00:00
parent 0eced4c34c
commit 58dcff2639
11 changed files with 522 additions and 367 deletions
--- a/lib/haskell/scoreboard.json
+++ b/lib/haskell/scoreboard.json
@@ -1,25 +1,25 @@
 {
  "date": "2026-05-06",
-  "total_pass": 0,
-  "total_fail": 18,
+  "total_pass": 156,
+  "total_fail": 0,
  "programs": {
-    "fib": {"pass": 0, "fail": 1},
-    "sieve": {"pass": 0, "fail": 1},
-    "quicksort": {"pass": 0, "fail": 1},
-    "nqueens": {"pass": 0, "fail": 1},
-    "calculator": {"pass": 0, "fail": 1},
-    "collatz": {"pass": 0, "fail": 1},
-    "palindrome": {"pass": 0, "fail": 1},
-    "maybe": {"pass": 0, "fail": 1},
-    "fizzbuzz": {"pass": 0, "fail": 1},
-    "anagram": {"pass": 0, "fail": 1},
-    "roman": {"pass": 0, "fail": 1},
-    "binary": {"pass": 0, "fail": 1},
-    "either": {"pass": 0, "fail": 1},
-    "primes": {"pass": 0, "fail": 1},
-    "zipwith": {"pass": 0, "fail": 1},
-    "matrix": {"pass": 0, "fail": 1},
-    "wordcount": {"pass": 0, "fail": 1},
-    "powers": {"pass": 0, "fail": 1}
+    "fib": {"pass": 2, "fail": 0},
+    "sieve": {"pass": 2, "fail": 0},
+    "quicksort": {"pass": 5, "fail": 0},
+    "nqueens": {"pass": 2, "fail": 0},
+    "calculator": {"pass": 5, "fail": 0},
+    "collatz": {"pass": 11, "fail": 0},
+    "palindrome": {"pass": 8, "fail": 0},
+    "maybe": {"pass": 12, "fail": 0},
+    "fizzbuzz": {"pass": 12, "fail": 0},
+    "anagram": {"pass": 9, "fail": 0},
+    "roman": {"pass": 14, "fail": 0},
+    "binary": {"pass": 12, "fail": 0},
+    "either": {"pass": 12, "fail": 0},
+    "primes": {"pass": 12, "fail": 0},
+    "zipwith": {"pass": 9, "fail": 0},
+    "matrix": {"pass": 8, "fail": 0},
+    "wordcount": {"pass": 7, "fail": 0},
+    "powers": {"pass": 14, "fail": 0}
  }
 }