ocaml conformance.sh runs 'bash lib/ocaml/test.sh -v', scrapes its
human-readable ok/FAIL lines, and re-classifies each test into suites via bash
description-matching heuristics; it also scrapes lib/ocaml/baseline/run.sh
(foreign .ml programs). The underlying test.sh is a per-assertion epoch runner
(hundreds of individual (ocaml-test-...) evals, one epoch each) with no
suite-level counter variables or dict runners, so the driver's
counter/dict-scoreboard model has nothing to point at without rewriting the test
harness. 'Scrapes a test.sh' is the briefing's named exclusion criterion (test.sh
even notes it mirrors lib/lua/test.sh, the canonical excluded case). Left
conformance.sh untouched; recorded the exclusion.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
js conformance.sh walks lib/js/test262-slice/**/*.js (foreign test262
fixtures), escapes each with python3, evals via (js-eval), and compares output
to a sibling .expected file by substring match — counting pass/fail in bash
against a >=50% target. It loads no SX test suites and emits no SX counter/dict
scoreboard (no scoreboard.json). The shared driver only epoch-loads SX preloads
and evals SX test suites emitting a scoreboard — it cannot drive a
foreign-fixture-vs-expected comparison harness (same category as
lua/forth/smalltalk). Left conformance.sh untouched; recorded the exclusion.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Go has the same structure as erlang: suites load into one session and each
exposes a pass counter plus a *count* (total) counter rather than a fail
counter. MODE=dict fits — each suite's runner is a dict literal
{:passed P :failed (- count P) :total count}. No driver change; conformance.conf
+ 3-line shim, historical scoreboard schema preserved.
Parity verified 609/609 (0 fail), every suite matching baseline.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
forth's conformance.sh reads a foreign Forth test corpus (Hayes Core core.fr),
preprocesses it with awk + an external python3 chunk-splitter that generates a
chunks.sx of raw source strings, then runs them through the interpreter via
(hayes-run-all). The shared driver only epoch-loads SX preloads and evals SX
test suites emitting a counter/dict scoreboard — it cannot reproduce the
external preprocessing pipeline over a foreign .fr corpus (same category as
lua/smalltalk). No SX tests/*.sx suites exist to migrate. Left conformance.sh
untouched; recorded the exclusion.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Feed is the canonical MODE=counters shape: each suite runs in a fresh session
with shared preloads and a single feed-test-pass/feed-test-fail pair. Lifted the
old script's inline epoch-2 counter + feed-test helper defs into
lib/feed/test-harness.sx (preloaded last) so the driver can load them before
each suite. conformance.conf + 3-line shim; historical scoreboard schema
preserved. No driver change needed.
Parity verified 189/189 (0 fail), every suite matching baseline.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Erlang's suites load into one session and each exposes a pass counter plus a
*count* (total) counter rather than a fail counter, so MODE=dict fits directly:
each suite's runner is a dict literal {:passed P :failed (- count P) :total count}.
No driver change needed (dict mode already supports arbitrary runner expressions).
conformance.conf + 3-line shim; historical scoreboard schema preserved.
Parity verified 761/761 (0 fail), every suite matching baseline.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Extend the shared driver's MODE=counters with a backward-compatible SUITES
format: name:file[:pass-var:fail-var[:extra-preload ...]]. Optional per-suite
counter symbols (override the global COUNTERS_PASS/COUNTERS_FAIL) and per-suite
preload chains (loaded after the global PRELOADS). Plain name:file entries are
unchanged — verified against haskell (fib/sieve/quicksort 2/2/5, matches
committed scoreboard).
common-lisp has 8 distinct per-suite counter pairs and a different preload
chain per suite, so it could not fit the single-counter/fixed-preload model;
the extended format expresses it directly. conformance.conf keeps the historical
scoreboard schema; conformance.sh becomes the 3-line shim.
Result 487/487 (0 fail) vs the old 305/0 baseline — higher and explained: the
old per-suite 'timeout 30' was too tight for the slow eval suite (~15-25s under
contention), silently recording it as 0; the driver's 180s budget recovers its
true 182. geometry/mop-trace stay 0/0 (pre-existing refl-class-chain-depth-with
load error; counter vars defined as 0 -> clean gc-result, no fail-fallback).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Classified migratable-in-kind (SX suites over epoch, not a foreign runner)
but blocked on driver feature gaps: 8 distinct per-suite counter variable
name pairs and per-suite preload chains, neither supported by MODE=counters
(single global counter + fixed preloads) nor MODE=dict (load-time counter
collisions across suites). Baseline 305/0 across 12 suites. Did not migrate;
conformance.sh left untouched. Driver unchanged (out of per-iteration scope).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Reading lib/mod (Prolog) and lib/acl (Datalog) side by side shows the convergence
is in module names only. Federation: opposite trust models (SX registry + decision
sharing vs in-engine Datalog trust facts + fact replication), zero shared code.
Audit: only a ~5-fn core overlaps and it diverges (entry shapes, seq base 0 vs 1,
op sets, mutation idiom) — not worth a shared module under two restricted envs.
Outcome: keep them parallel; revisit only on a third same-model consumer.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>