Files
rose-ash/plans/HANDOFF-enable-serving-jit.md
giles e6a1180d50 docs: serving-JIT handoff (from sx-vm-extensions) + host-loop correction
Carry the sx-vm-extensions loop's serving-JIT handoff notes, and add a
correction: the post-page slowness was the durable read count (fixed in
0a2f1a61), not the (long-gone) Smalltalk render path — so SX_SERVING_JIT is an
optional general speedup, not the perf blocker.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-28 18:53:25 +00:00

5.1 KiB
Raw Blame History

Hand-off: enable serving-mode JIT for ~34× request CPU

From the sx-vm-extensions loop (2026-06-28). The serving-mode JIT is merged to architecture and is the host's real perf win — it just needs switching on. No further engine work is required from your side.

TL;DR

Run the host server on the merged architecture binary with SX_SERVING_JIT=1 in its environment. Expected: ~34× lower per-request CPU (measured ~9 ms → ~2.7 ms on the /feed pipeline). Already verified correct: full host conformance is 181/181 under SX_SERVING_JIT=1.

What changed (already merged to architecture)

The bytecode JIT now works in the persistent/epoch serving mode, opt-in via the SX_SERVING_JIT env var (default OFF). Default-off means zero change until you opt in — nothing regressed for any loop. Merge commit on architecture: 089ed88f (rebuild the shared binary from architecture to pick it up).

The JIT is safe for the host's request pipeline because:

  • The pipeline (dream router + feed/relations/blog handlers + JSON + render-to-html) is pure SX with no call/cc; the only continuation-style code is guard (Dream's dream-catch-with / wrap-errors), which the JIT auto-detects and runs interpreted (recursive PUSH_HANDLER scan). So error handling stays correct; everything else JITs.
  • Proven end-to-end: combined host+JIT binary, full conformance under SX_SERVING_JIT=1 = 181/181, all 10 suites green (handler 14, middleware 9, sxtp 39, router 6, feed 14, relations 22, blog 27, page 8, server 13, ledger 29).

How to enable

  1. Rebuild the shared binary from architecture (it carries the merge): cd hosts/ocaml && dune build bin/sx_server.exe
  2. Launch the host server process with SX_SERVING_JIT=1 set in its environment (whatever wrapper/serve path you use — lib/host/serve.sx / the http-listen entry). Default-off means you must set it explicitly.
  3. One-time cost: JIT compiles hot functions on first call (~+1 s at startup / first requests). Amortized immediately for a long-lived server.

Measurements (this is the evidence)

In-process, full request pipeline (host/native-handler (host/make-app …)/feed, 2000 requests, in-memory persist backend):

per-request CPU total 2000 reqs
CEK (default, no JIT) ~9 ms ~1520 s
JIT (SX_SERVING_JIT=1) ~2.7 ms ~56 s

JIT is also markedly less variable run-to-run. The cost is the pipeline (routing + feed normalize/stream + handler + JSON), not rendering — render-to-html alone is only ~50 µs/render and is already fast.

What was ruled out (don't chase these)

The original kickoff framed the slowness as "interpreted Smalltalk (content/html) in ~2 s". The host does not load lib/smalltalk or lib/content — that was a different subsystem. We measured and confirmed:

  • The host's render path is render-to-html (SX markup → HTML), already fast.
  • The proposed big engine projects — VM continuation-escape and a compile-to-closures Smalltalk interpreter — would not help the host (wrong subsystem) and are not needed. (Scoping kept in the vm-extensions loop under plans/vm-continuation-escape.md / plans/smalltalk-dispatch-perf.md if a Smalltalk-backed workload ever needs them.)

Caveat — this is CPU only

The ~34× is the in-process CPU path (which JIT controls). It does not touch network/IO latency. If your production TTFB is dominated by a non-in-memory persist backend, cross-service fetches, TLS/connection setup, or the known homepage SSR-stepper issue, profile those separately — JIT won't move them. To find your real split, break a live TTFB into: request parse → route → handler (+ persist read) → render → serialize → network. The in-memory measurement above says the code path is ~2.7 ms under JIT; anything beyond that in production is infrastructure, not the SX engine.

One known residual (not host-affecting, for awareness)

The serving hook re-runs a JIT'd function on the CEK if it fails mid-execution (correct result, but could duplicate side effects for an impure function that fails mid-run). The host conformance is clean (181/181), so nothing triggers it on your paths today. The clean general fix (propagate-don't-rerun) is deferred in the vm-extensions loop.

Correction (host loop, 2026-06-28)

The premise above ("~2s interpreted-Smalltalk render") is STALE: the blog moved off content-on-sx Smalltalk to render-to-html long ago (render-page ~2ms). The actual post-page unresponsiveness was NOT CPU/render — it was the DURABLE READ COUNT: host/blog--relation-blocks did ~7 kv-keys performs per page (each host/blog-out/in re-scanned the KV). Collapsing to one shared kv-keys read fixed it (~1s -> ~0.02s; commit 0a2f1a61). So serving-JIT was NOT the fix here.

Serving-JIT may still be a worthwhile general speedup (the ~3-4× CPU claim, and the Datalog instances-of on /tags is CPU-bound), but it requires running the host on the merged architecture binary — this worktree's binary has no SX_SERVING_JIT gate. Treat it as an optional future win, not the perf blocker.