Files
rose-ash/plans/HANDOFF-enable-serving-jit.md
giles e6a1180d50 docs: serving-JIT handoff (from sx-vm-extensions) + host-loop correction
Carry the sx-vm-extensions loop's serving-JIT handoff notes, and add a
correction: the post-page slowness was the durable read count (fixed in
0a2f1a61), not the (long-gone) Smalltalk render path — so SX_SERVING_JIT is an
optional general speedup, not the perf blocker.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-28 18:53:25 +00:00

99 lines
5.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Hand-off: enable serving-mode JIT for ~34× request CPU
> From the **sx-vm-extensions** loop (2026-06-28). The serving-mode JIT is merged
> to `architecture` and is the host's real perf win — it just needs switching on.
> No further engine work is required from your side.
## TL;DR
Run the host server on the merged `architecture` binary with **`SX_SERVING_JIT=1`**
in its environment. Expected: **~34× lower per-request CPU** (measured ~9 ms →
~2.7 ms on the `/feed` pipeline). Already verified correct: full host conformance
is **181/181 under `SX_SERVING_JIT=1`**.
## What changed (already merged to architecture)
The bytecode JIT now works in the persistent/epoch serving mode, **opt-in via the
`SX_SERVING_JIT` env var (default OFF)**. Default-off means zero change until you
opt in — nothing regressed for any loop. Merge commit on `architecture`:
`089ed88f` (rebuild the shared binary from architecture to pick it up).
The JIT is safe for the host's request pipeline because:
- The pipeline (dream router + feed/relations/blog handlers + JSON + render-to-html)
is pure SX with **no `call/cc`**; the only continuation-style code is `guard`
(Dream's `dream-catch-with` / `wrap-errors`), which the JIT **auto-detects and
runs interpreted** (recursive `PUSH_HANDLER` scan). So error handling stays
correct; everything else JITs.
- Proven end-to-end: combined host+JIT binary, full conformance under
`SX_SERVING_JIT=1` = **181/181, all 10 suites green** (handler 14, middleware 9,
sxtp 39, router 6, feed 14, relations 22, blog 27, page 8, server 13, ledger 29).
## How to enable
1. Rebuild the shared binary from `architecture` (it carries the merge):
`cd hosts/ocaml && dune build bin/sx_server.exe`
2. Launch the host server process with `SX_SERVING_JIT=1` set in its environment
(whatever wrapper/serve path you use — `lib/host/serve.sx` / the http-listen
entry). Default-off means you must set it explicitly.
3. One-time cost: JIT compiles hot functions on first call (~+1 s at startup /
first requests). Amortized immediately for a long-lived server.
## Measurements (this is the evidence)
In-process, full request pipeline (`host/native-handler (host/make-app …)`
`/feed`, 2000 requests, in-memory persist backend):
| | per-request CPU | total 2000 reqs |
|---|---|---|
| CEK (default, no JIT) | ~9 ms | ~1520 s |
| **JIT (`SX_SERVING_JIT=1`)** | **~2.7 ms** | **~56 s** |
JIT is also markedly *less* variable run-to-run. The cost is the pipeline
(routing + feed normalize/stream + handler + JSON), not rendering —
`render-to-html` alone is only ~50 µs/render and is already fast.
## What was ruled out (don't chase these)
The original kickoff framed the slowness as "interpreted Smalltalk (`content/html`)
in ~2 s". **The host does not load `lib/smalltalk` or `lib/content`** — that was a
different subsystem. We measured and confirmed:
- The host's render path is `render-to-html` (SX markup → HTML), already fast.
- The proposed big engine projects — **VM continuation-escape** and a
**compile-to-closures Smalltalk interpreter** — would *not* help the host
(wrong subsystem) and are **not needed**. (Scoping kept in the vm-extensions
loop under `plans/vm-continuation-escape.md` / `plans/smalltalk-dispatch-perf.md`
if a Smalltalk-backed workload ever needs them.)
## Caveat — this is CPU only
The ~34× is the in-process CPU path (which JIT controls). It does **not** touch
network/IO latency. If your production TTFB is dominated by a non-in-memory
`persist` backend, cross-service fetches, TLS/connection setup, or the known
homepage SSR-stepper issue, profile those separately — JIT won't move them. To
find your real split, break a live TTFB into: request parse → route → handler
(+ persist read) → render → serialize → network. The in-memory measurement above
says the *code path* is ~2.7 ms under JIT; anything beyond that in production is
infrastructure, not the SX engine.
## One known residual (not host-affecting, for awareness)
The serving hook re-runs a JIT'd function on the CEK if it fails mid-execution
(correct result, but could duplicate side effects for an impure function that
fails mid-run). The host conformance is clean (181/181), so nothing triggers it
on your paths today. The clean general fix (propagate-don't-rerun) is deferred in
the vm-extensions loop.
## Correction (host loop, 2026-06-28)
The premise above ("~2s interpreted-Smalltalk render") is STALE: the blog moved
off content-on-sx Smalltalk to `render-to-html` long ago (render-page ~2ms). The
actual post-page unresponsiveness was NOT CPU/render — it was the DURABLE READ
COUNT: host/blog--relation-blocks did ~7 `kv-keys` performs per page (each
host/blog-out/in re-scanned the KV). Collapsing to one shared kv-keys read fixed
it (~1s -> ~0.02s; commit 0a2f1a61). So serving-JIT was NOT the fix here.
Serving-JIT may still be a worthwhile general speedup (the ~3-4× CPU claim, and
the Datalog `instances-of` on /tags is CPU-bound), but it requires running the
host on the merged `architecture` binary — this worktree's binary has no
SX_SERVING_JIT gate. Treat it as an optional future win, not the perf blocker.