coop/rose-ash

Fork 0

Files

giles 30a7dd2108

Test, Build, and Deploy / test-build-deploy (push) Failing after 47s

Details

JIT: mark Phase 1 done in architecture plan; document WASM ABI rollout caveat

2026-05-08 23:57:53 +00:00

9.1 KiB

Raw Blame History

JIT Cache Architecture — Tiered + LRU + Reset API

Problem statement

The OCaml WASM kernel JIT-compiles every lambda body on first call and caches the resulting vm_closure in a mutable slot on the lambda itself (Lambda.l_compiled, Component.c_compiled, Island.i_compiled). Cache growth is unbounded — there is no eviction, no threshold, no reset.

Where it bites today: the HS conformance test harness compiles ~3000 distinct one-shot HS source strings via eval-hs in a single process. Each compilation creates a fresh lambda → fresh vm_closure. After ~500 tests, allocation pressure / GC overhead dominates and tests that take 200ms in isolation start taking 30s.

Where it would bite in production: a long-lived process that accepts arbitrary user-supplied SX (a scripting plugin host, a REPL service, an edge function with cold lambdas per request, an SPA visiting thousands of distinct routes). Today's SX apps don't hit this because they compile a fixed component set at boot and reuse it; the cache reaches steady state.

Architecture

Three coordinated mechanisms, deployed in order:

1. Tiered compilation — "filter what enters the cache"

Most lambdas in our test harness are call-once-and-discard. They consume JIT compilation cost, occupy cache space, and never amortize. Solution: don't JIT until a lambda has been called K times.

OCaml changes:

(* sx_types.ml *)
type lambda = {
  ...
  mutable l_compiled  : vm_closure option;  (* unchanged *)
  mutable l_call_count: int;                (* NEW *)
}

(* sx_vm.ml — in cek_call_or_suspend *)
let jit_threshold = ref 4

let maybe_jit lam =
  match lam.l_compiled with
  | Some _ -> ()  (* already compiled *)
  | None ->
    lam.l_call_count <- lam.l_call_count + 1;
    if lam.l_call_count >= !jit_threshold then
      lam.l_compiled <- !jit_compile_ref lam globals

Tunable via primitive: (jit-set-threshold! N) (default 4; 1 = old behavior; ∞ = disable JIT).

Expected impact:

Cold lambdas (test harness, eval-hs throwaways) never enter the cache.
Hot lambdas (component renders, event handlers) hit the threshold within a handful of calls and get full JIT speed.
Eliminates the test-harness pathology entirely without touching cache size.

2. LRU eviction — "bound memory regardless of input"

Even with tiered compilation, a long-lived process eventually compiles enough hot lambdas to exceed memory budget. Pure LRU eviction with a fixed budget gives a predictable ceiling.

OCaml changes:

(* sx_jit_cache.ml — NEW module *)
type cache_entry = {
  closure : vm_closure;
  mutable last_used : int;  (* generation counter *)
  mutable pinned : bool;    (* hot-path opt-out *)
}

let cache : (int, cache_entry) Hashtbl.t = Hashtbl.create 256
let mutable cache_budget = 5000   (* lambdas, not bytes — easy to reason about *)
let mutable generation = 0

let lookup lambda_id = ...
let insert lambda_id closure = 
  generation <- generation + 1;
  Hashtbl.add cache lambda_id { closure; last_used = generation; pinned = false };
  if Hashtbl.length cache > cache_budget then evict_oldest ()
let pin lambda_id = ...

Migration: Lambda.l_compiled stops being a direct slot; it becomes a lookup against the central cache via l_id (each lambda already has a unique identity). Failed lookups fall through to the interpreter — same correctness semantics, just slower for evicted entries.

Tunable: (jit-set-budget! N) (default 5000; 0 = disable cache).

Pinning: (jit-pin! 'fn-name) keeps a function from ever being evicted. Use for stdlib helpers, hot rendering paths.

3. Manual reset API — "escape hatch for app checkpoints"

Some app patterns know exactly when their cache should be flushed:

A web server between request batches
An SPA on logout / navigation
A test runner between batches (yes, even with #1 + #2)
A REPL on :reset

Primitives:

Primitive	Behavior
`(jit-reset!)`	Drop all cache entries. Hot paths re-JIT on next call.
`(jit-clear-cold!)`	Drop only entries that haven't been used in N generations.
`(jit-stats)`	Returns dict: `{:size N :budget M :hits H :misses I :evictions E}`.
`(jit-set-threshold! N)`	Raise/lower compilation threshold at runtime.
`(jit-set-budget! N)`	Raise/lower cache size budget.
`(jit-pin! sym)`	Pin a named function against eviction.
`(jit-unpin! sym)`	Unpin.

All zero-cost when not called — just a few atomic counter increments.

Where it lives

The JIT is host-specific (OCaml WASM kernel). The plan splits across three layers:

hosts/ocaml/lib/sx_jit_cache.ml      NEW — cache datastructure + LRU
hosts/ocaml/lib/sx_vm.ml             Modified — call counter, lookup integration
hosts/ocaml/lib/sx_types.ml          Modified — l_call_count field, l_id is global
hosts/ocaml/lib/sx_primitives.ml     Modified — register jit-* primitives
spec/primitives.sx                   Modified — declarative spec for jit-* primitives
lib/jit.sx                            NEW — SX-level helpers + macros

lib/jit.sx would contain:

;; Convenience: temporarily change threshold
(define-macro (with-jit-threshold n & body)
  `(let ((__old (jit-stats)))
     (jit-set-threshold! ,n)
     (let ((__r (do ,@body))) (jit-set-threshold! (get __old :threshold)) __r)))

;; Convenience: drop cache before/after a block
(define-macro (with-fresh-jit & body)
  `(let ((__r (do (jit-reset!) ,@body))) (jit-reset!) __r))

;; Monitoring helper for dev mode
(define jit-report
  (fn ()
    (let ((s (jit-stats)))
      (str "jit: " (get s :size) "/" (get s :budget) " entries, "
           (get s :hits) " hits / " (get s :misses) " misses ("
           (* 100 (/ (get s :hits) (max 1 (+ (get s :hits) (get s :misses)))))
           "%)"))))

This is shared SX — every host language (HS, Common Lisp, Erlang, etc.) gets the same API for free.

Rollout

Phase 1: Tiered compilation — IMPLEMENTED (commit b9d63112)

✅ l_call_count : int field on lambda type (sx_types.ml)
✅ Counter increment + threshold check in cek_call_or_suspend Lambda case (sx_vm.ml)
✅ Module-level refs in sx_types: jit_threshold (default 4), jit_compiled_count, jit_skipped_count, jit_threshold_skipped_count. Refs live in sx_types so sx_primitives can read them without creating an import cycle.
✅ Primitives: jit-stats, jit-set-threshold!, jit-reset-counters! (sx_primitives.ml)
Verified: 4771/1111 OCaml run_tests, identical to baseline — no regressions.

WASM rollout note: The native binary has Phase 1 active. The browser WASM (shared/static/wasm/sx_browser.bc.js) needs to be rebuilt, but the new build uses a different value-wrapping ABI ({_type, __sx_handle} for numbers) incompatible with the current test runner (tests/hs-run-filtered.js). For now the test tree pins the pre-rewrite WASM. Resolving the ABI gap is a separate task — either update the test runner to unwrap, or expose a value-marshalling helper from the kernel.

Phase 2: LRU cache (3-5 days)

Extract Lambda.l_compiled into central sx_jit_cache.ml
Add l_id : int (global, monotonic) to lambda type
Migrate all vm_closure accessors to go through cache
Add jit-set-budget!, jit-pin!, jit-unpin! primitives
Verify: same full-suite run with budget=100 — cache hit/miss ratio reasonable

Phase 3: Reset API + monitoring (1 day)

Add jit-reset!, jit-clear-cold!, jit-stats primitives
Add lib/jit.sx SX-level wrappers
Integrate into HS test runner: call jit-reset! between batches as belt-and-suspenders
Document in CLAUDE.md / migration notes

Phase 4: Production hardening (incremental)

Memory pressure hooks (browser performance.measureUserAgentSpecificMemory)
Bytecode interning (dedupe identical vm_closure bodies across lambdas)
Generational sweep on idle (browser requestIdleCallback)
These are nice-to-have, not required for correctness.

Testing

Each phase ships with:

Unit tests in spec/tests/test-jit-cache.sx (new file)
Conformance must remain 100% per-suite
Wall-clock benchmark: full HS suite single-process before/after

Phase 1 acceptance criterion: HS conformance suite completes in single process under 10 minutes wall time.

Phase 2 acceptance: same as 1 but with budget=500. Cache size stays bounded throughout the run; hit rate >90% on hot paths.

Phase 3 acceptance: jit-reset! between batches reduces test-harness wall time by >50% vs no reset (because hot stdlib stays cached, but test-specific lambdas don't accumulate).

Why this order

Tiered compilation is the highest-leverage change — it solves the test-harness problem at the source (most lambdas never enter the cache) without touching cache machinery. LRU is the safety net (unbounded growth still possible if every lambda is hot, e.g., huge dynamic component graph). Reset is the escape hatch for situations neither mechanism can handle (logout, hard memory pressure, app restart without process restart).

Doing them in reverse would invert the value — reset alone fixes nothing without app-level integration, and LRU without tiered compilation churns the cache constantly on cold lambdas.

9.1 KiB Raw Blame History