HS: batched conformance runner + JIT cache architecture plan
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 55s

tests/hs-run-batched.js — fresh-kernel-per-batch conformance runner.
Solves the WASM kernel JIT-cache-saturation problem (compiled VmClosures
accumulate over a single process and slow tests at the tail of the run)
by spawning a child Node process per batch. Each batch starts with an
empty cache, so tests at index 1400 perform identically to tests at
index 100. Configurable batch size (HS_BATCH_SIZE, default 150) and
parallelism (HS_PARALLEL, default 1).

This is option 2 from the cache-architecture plan — the lowest-risk fix:
zero kernel changes, deterministic results, runs in the same time as the
single-process version when parallelism matches CPU count.

plans/jit-cache-architecture.md — sketches the SX-wide architectural
fix in three phases:

  1. Tiered compilation — call counter on lambdas; only JIT after K
     invocations. Filters out one-shot lambdas (test harness, dynamic
     eval, REPLs) at the source.
  2. LRU eviction — central cache with fixed budget. Predictable memory
     ceiling regardless of input pattern.
  3. Reset API — jit-reset!, jit-clear-cold!, jit-stats, jit-pin!
     primitives for app-driven cache management.

Layer split: cache datastructure + LRU in hosts/ocaml/lib/sx_jit_cache.ml
(new), VM integration in sx_vm.ml, primitives registered in
sx_primitives.ml, declarative spec in spec/primitives.sx, and SX-level
ergonomics (with-jit-threshold, with-fresh-jit, jit-report) in lib/jit.sx.
This is host-specific to the OCaml WASM kernel but the SX API surface is
shared across all hosted languages (HS, Common Lisp, Erlang, etc.).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-05-07 18:41:06 +00:00
parent 0b4b7c9dbc
commit 21e6351657
2 changed files with 374 additions and 0 deletions

View File

@@ -0,0 +1,223 @@
# JIT Cache Architecture — Tiered + LRU + Reset API
## Problem statement
The OCaml WASM kernel JIT-compiles every lambda body on first call and caches
the resulting `vm_closure` in a mutable slot on the lambda itself
(`Lambda.l_compiled`, `Component.c_compiled`, `Island.i_compiled`). Cache
growth is unbounded — there is no eviction, no threshold, no reset.
**Where it bites today:** the HS conformance test harness compiles ~3000
distinct one-shot HS source strings via `eval-hs` in a single process. Each
compilation creates a fresh lambda → fresh `vm_closure`. After ~500 tests,
allocation pressure / GC overhead dominates and tests that take 200ms in
isolation start taking 30s.
**Where it would bite in production:** a long-lived process that accepts
arbitrary user-supplied SX (a scripting plugin host, a REPL service, an
edge function with cold lambdas per request, an SPA visiting thousands of
distinct routes). Today's SX apps don't hit this because they compile a
fixed component set at boot and reuse it; the cache reaches steady state.
## Architecture
Three coordinated mechanisms, deployed in order:
### 1. Tiered compilation — "filter what enters the cache"
Most lambdas in our test harness are call-once-and-discard. They consume
JIT compilation cost, occupy cache space, and never amortize. Solution:
don't JIT until a lambda has been called K times.
**OCaml changes:**
```ocaml
(* sx_types.ml *)
type lambda = {
...
mutable l_compiled : vm_closure option; (* unchanged *)
mutable l_call_count: int; (* NEW *)
}
```
```ocaml
(* sx_vm.ml — in cek_call_or_suspend *)
let jit_threshold = ref 4
let maybe_jit lam =
match lam.l_compiled with
| Some _ -> () (* already compiled *)
| None ->
lam.l_call_count <- lam.l_call_count + 1;
if lam.l_call_count >= !jit_threshold then
lam.l_compiled <- !jit_compile_ref lam globals
```
**Tunable via primitive:** `(jit-set-threshold! N)` (default 4; 1 = old
behavior; ∞ = disable JIT).
**Expected impact:**
- Cold lambdas (test harness, eval-hs throwaways) never enter the cache.
- Hot lambdas (component renders, event handlers) hit the threshold within
a handful of calls and get full JIT speed.
- Eliminates the test-harness pathology entirely without touching cache size.
### 2. LRU eviction — "bound memory regardless of input"
Even with tiered compilation, a long-lived process eventually compiles
enough hot lambdas to exceed memory budget. Pure LRU eviction with a
fixed budget gives a predictable ceiling.
**OCaml changes:**
```ocaml
(* sx_jit_cache.ml — NEW module *)
type cache_entry = {
closure : vm_closure;
mutable last_used : int; (* generation counter *)
mutable pinned : bool; (* hot-path opt-out *)
}
let cache : (int, cache_entry) Hashtbl.t = Hashtbl.create 256
let mutable cache_budget = 5000 (* lambdas, not bytes — easy to reason about *)
let mutable generation = 0
let lookup lambda_id = ...
let insert lambda_id closure =
generation <- generation + 1;
Hashtbl.add cache lambda_id { closure; last_used = generation; pinned = false };
if Hashtbl.length cache > cache_budget then evict_oldest ()
let pin lambda_id = ...
```
**Migration:** `Lambda.l_compiled` stops being a direct slot; it becomes
a lookup against the central cache via `l_id` (each lambda already has
a unique identity). Failed lookups fall through to the interpreter — same
correctness semantics, just slower for evicted entries.
**Tunable:** `(jit-set-budget! N)` (default 5000; 0 = disable cache).
**Pinning:** `(jit-pin! 'fn-name)` keeps a function from ever being evicted.
Use for stdlib helpers, hot rendering paths.
### 3. Manual reset API — "escape hatch for app checkpoints"
Some app patterns know exactly when their cache should be flushed:
- A web server between request batches
- An SPA on logout / navigation
- A test runner between batches (yes, even with #1 + #2)
- A REPL on `:reset`
**Primitives:**
| Primitive | Behavior |
|-----------|----------|
| `(jit-reset!)` | Drop all cache entries. Hot paths re-JIT on next call. |
| `(jit-clear-cold!)` | Drop only entries that haven't been used in N generations. |
| `(jit-stats)` | Returns dict: `{:size N :budget M :hits H :misses I :evictions E}`. |
| `(jit-set-threshold! N)` | Raise/lower compilation threshold at runtime. |
| `(jit-set-budget! N)` | Raise/lower cache size budget. |
| `(jit-pin! sym)` | Pin a named function against eviction. |
| `(jit-unpin! sym)` | Unpin. |
All zero-cost when not called — just a few atomic counter increments.
## Where it lives
The JIT is host-specific (OCaml WASM kernel). The plan splits across
three layers:
```
hosts/ocaml/lib/sx_jit_cache.ml NEW — cache datastructure + LRU
hosts/ocaml/lib/sx_vm.ml Modified — call counter, lookup integration
hosts/ocaml/lib/sx_types.ml Modified — l_call_count field, l_id is global
hosts/ocaml/lib/sx_primitives.ml Modified — register jit-* primitives
spec/primitives.sx Modified — declarative spec for jit-* primitives
lib/jit.sx NEW — SX-level helpers + macros
```
**lib/jit.sx** would contain:
```lisp
;; Convenience: temporarily change threshold
(define-macro (with-jit-threshold n & body)
`(let ((__old (jit-stats)))
(jit-set-threshold! ,n)
(let ((__r (do ,@body))) (jit-set-threshold! (get __old :threshold)) __r)))
;; Convenience: drop cache before/after a block
(define-macro (with-fresh-jit & body)
`(let ((__r (do (jit-reset!) ,@body))) (jit-reset!) __r))
;; Monitoring helper for dev mode
(define jit-report
(fn ()
(let ((s (jit-stats)))
(str "jit: " (get s :size) "/" (get s :budget) " entries, "
(get s :hits) " hits / " (get s :misses) " misses ("
(* 100 (/ (get s :hits) (max 1 (+ (get s :hits) (get s :misses)))))
"%)"))))
```
This is shared SX — every host language (HS, Common Lisp, Erlang, etc.)
gets the same API for free.
## Rollout
**Phase 1: Tiered compilation (1-2 days)**
- Add `l_call_count` to lambda type
- Wire counter increment in `cek_call_or_suspend`
- Add `jit-set-threshold!` primitive
- Default threshold = 1 (no change in behavior)
- Bump default to 4 once test suite confirms stability
- Verify: HS conformance full-suite run completes without JIT saturation
**Phase 2: LRU cache (3-5 days)**
- Extract `Lambda.l_compiled` into central `sx_jit_cache.ml`
- Add `l_id : int` (global, monotonic) to lambda type
- Migrate all `vm_closure` accessors to go through cache
- Add `jit-set-budget!`, `jit-pin!`, `jit-unpin!` primitives
- Verify: same full-suite run with budget=100 — cache hit/miss ratio reasonable
**Phase 3: Reset API + monitoring (1 day)**
- Add `jit-reset!`, `jit-clear-cold!`, `jit-stats` primitives
- Add `lib/jit.sx` SX-level wrappers
- Integrate into HS test runner: call `jit-reset!` between batches as belt-and-suspenders
- Document in CLAUDE.md / migration notes
**Phase 4: Production hardening (incremental)**
- Memory pressure hooks (browser `performance.measureUserAgentSpecificMemory`)
- Bytecode interning (dedupe identical `vm_closure` bodies across lambdas)
- Generational sweep on idle (browser `requestIdleCallback`)
- These are nice-to-have, not required for correctness.
## Testing
Each phase ships with:
- Unit tests in `spec/tests/test-jit-cache.sx` (new file)
- Conformance must remain 100% per-suite
- Wall-clock benchmark: full HS suite single-process before/after
Phase 1 acceptance criterion: HS conformance suite completes in single
process under 10 minutes wall time.
Phase 2 acceptance: same as 1 but with budget=500. Cache size stays
bounded throughout the run; hit rate >90% on hot paths.
Phase 3 acceptance: `jit-reset!` between batches reduces test-harness
wall time by >50% vs no reset (because hot stdlib stays cached, but
test-specific lambdas don't accumulate).
## Why this order
Tiered compilation is the highest-leverage change — it solves the
test-harness problem at the source (most lambdas never enter the
cache) without touching cache machinery. LRU is the safety net
(unbounded growth still possible if every lambda is hot, e.g., huge
dynamic component graph). Reset is the escape hatch for situations
neither mechanism can handle (logout, hard memory pressure, app
restart without process restart).
Doing them in reverse would invert the value — reset alone fixes
nothing without app-level integration, and LRU without tiered
compilation churns the cache constantly on cold lambdas.

151
tests/hs-run-batched.js Executable file
View File

@@ -0,0 +1,151 @@
#!/usr/bin/env node
/**
* Batched HS conformance runner — option 2 (per-process kernel isolation).
*
* Each batch spawns a fresh Node process running tests/hs-run-filtered.js
* with HS_START/HS_END set, so the WASM kernel's JIT cache starts empty.
* Avoids the cumulative slowdown that hits the 1-process runner around
* test 500-700 (compiled lambdas accumulate, allocation stalls).
*
* Usage:
* node tests/hs-run-batched.js
* HS_BATCH_SIZE=100 node tests/hs-run-batched.js
* HS_PARALLEL=4 node tests/hs-run-batched.js
*/
const { spawnSync, spawn } = require('child_process');
const path = require('path');
const fs = require('fs');
const FILTERED = path.join(__dirname, 'hs-run-filtered.js');
const TOTAL = parseInt(process.env.HS_TOTAL || '1496');
const FROM = parseInt(process.env.HS_FROM || '0');
const BATCH_SIZE = parseInt(process.env.HS_BATCH_SIZE || '150');
const PARALLEL = parseInt(process.env.HS_PARALLEL || '1');
const VERBOSE = !!process.env.HS_VERBOSE;
function makeBatches() {
const batches = [];
for (let i = FROM; i < TOTAL; i += BATCH_SIZE) {
batches.push({ start: i, end: Math.min(i + BATCH_SIZE, TOTAL) });
}
return batches;
}
function runBatch({ start, end }) {
const t0 = Date.now();
const r = spawnSync('node', [FILTERED], {
env: { ...process.env, HS_START: String(start), HS_END: String(end) },
encoding: 'utf8',
timeout: 1800_000, // 30 min per batch hard cap
});
const out = (r.stdout || '') + (r.stderr || '');
const elapsed = Date.now() - t0;
return { start, end, elapsed, out, code: r.status };
}
function parseBatch(out) {
const result = { pass: 0, fail: 0, failures: [], slow: [], timeouts: [] };
const m = out.match(/Results:\s+(\d+)\/(\d+)/);
if (m) {
result.pass = parseInt(m[1]);
const total = parseInt(m[2]);
result.fail = total - result.pass;
}
// Capture each "[suite] name: error" failure line
const failSection = out.split('All failures:')[1] || '';
for (const line of failSection.split('\n')) {
const fm = line.match(/^\s*\[([^\]]+)\]\s+(.+?):\s*(.*)$/);
if (fm) result.failures.push({ suite: fm[1], name: fm[2], err: fm[3] || '(empty)' });
}
for (const line of out.split('\n')) {
const sm = line.match(/SLOW: test (\d+) took (\d+)ms \[([^\]]+)\] (.+)$/);
if (sm) result.slow.push({ idx: +sm[1], ms: +sm[2], suite: sm[3], name: sm[4] });
const tm = line.match(/TIMEOUT: test (\d+) \[([^\]]+)\] (.+)$/);
if (tm) result.timeouts.push({ idx: +tm[1], suite: tm[2], name: tm[3] });
}
return result;
}
function fmtTime(ms) {
if (ms < 1000) return `${ms}ms`;
if (ms < 60_000) return `${(ms / 1000).toFixed(1)}s`;
return `${Math.floor(ms / 60_000)}m${Math.round((ms % 60_000) / 1000)}s`;
}
async function runParallel(batches, concurrency) {
const results = new Array(batches.length);
let cursor = 0;
async function worker() {
while (cursor < batches.length) {
const i = cursor++;
results[i] = await new Promise((resolve) => {
const t0 = Date.now();
let out = '';
const child = spawn('node', [FILTERED], {
env: { ...process.env, HS_START: String(batches[i].start), HS_END: String(batches[i].end) },
});
child.stdout.on('data', d => out += d);
child.stderr.on('data', d => out += d);
child.on('exit', (code) => resolve({ ...batches[i], elapsed: Date.now() - t0, out, code }));
});
const r = parseBatch(results[i].out);
process.stderr.write(` batch ${batches[i].start}-${batches[i].end}: ${r.pass}/${r.pass + r.fail} (${fmtTime(results[i].elapsed)})\n`);
}
}
await Promise.all(Array.from({ length: concurrency }, worker));
return results;
}
(async () => {
const batches = makeBatches();
const t0 = Date.now();
process.stderr.write(`Running ${TOTAL} tests in ${batches.length} batches of ${BATCH_SIZE} (parallelism=${PARALLEL})\n`);
let results;
if (PARALLEL > 1) {
results = await runParallel(batches, PARALLEL);
} else {
results = [];
for (const b of batches) {
const r = runBatch(b);
results.push(r);
const p = parseBatch(r.out);
process.stderr.write(` batch ${b.start}-${b.end}: ${p.pass}/${p.pass + p.fail} (${fmtTime(r.elapsed)})\n`);
}
}
let totalPass = 0, totalFail = 0;
const allFailures = [];
const allTimeouts = [];
const slowest = [];
for (const r of results) {
const p = parseBatch(r.out);
totalPass += p.pass;
totalFail += p.fail;
allFailures.push(...p.failures);
allTimeouts.push(...p.timeouts);
slowest.push(...p.slow);
if (VERBOSE) process.stdout.write(r.out);
}
const totalElapsed = Date.now() - t0;
process.stdout.write(`\n=== Conformance ===\n`);
process.stdout.write(`Total: ${totalPass}/${totalPass + totalFail} (${(100 * totalPass / (totalPass + totalFail)).toFixed(2)}%)\n`);
process.stdout.write(`Wall: ${fmtTime(totalElapsed)} across ${batches.length} batches\n`);
if (allFailures.length) {
process.stdout.write(`\nFailures (${allFailures.length}):\n`);
for (const f of allFailures) process.stdout.write(` [${f.suite}] ${f.name}: ${f.err}\n`);
}
if (allTimeouts.length && allTimeouts.length !== allFailures.length) {
process.stdout.write(`\nTimeouts (${allTimeouts.length}):\n`);
for (const t of allTimeouts) process.stdout.write(` [${t.suite}] ${t.name}\n`);
}
slowest.sort((a, b) => b.ms - a.ms);
if (slowest.length) {
process.stdout.write(`\nSlowest 10 tests:\n`);
for (const s of slowest.slice(0, 10)) process.stdout.write(` ${s.ms}ms [${s.suite}] ${s.name}\n`);
}
process.exit(totalFail > 0 ? 1 : 0);
})();