Carry the sx-vm-extensions loop's serving-JIT handoff notes, and add a
correction: the post-page slowness was the durable read count (fixed in
0a2f1a61), not the (long-gone) Smalltalk render path — so SX_SERVING_JIT is an
optional general speedup, not the perf blocker.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
99 lines
5.1 KiB
Markdown
99 lines
5.1 KiB
Markdown
# Hand-off: enable serving-mode JIT for ~3–4× request CPU
|
||
|
||
> From the **sx-vm-extensions** loop (2026-06-28). The serving-mode JIT is merged
|
||
> to `architecture` and is the host's real perf win — it just needs switching on.
|
||
> No further engine work is required from your side.
|
||
|
||
## TL;DR
|
||
|
||
Run the host server on the merged `architecture` binary with **`SX_SERVING_JIT=1`**
|
||
in its environment. Expected: **~3–4× lower per-request CPU** (measured ~9 ms →
|
||
~2.7 ms on the `/feed` pipeline). Already verified correct: full host conformance
|
||
is **181/181 under `SX_SERVING_JIT=1`**.
|
||
|
||
## What changed (already merged to architecture)
|
||
|
||
The bytecode JIT now works in the persistent/epoch serving mode, **opt-in via the
|
||
`SX_SERVING_JIT` env var (default OFF)**. Default-off means zero change until you
|
||
opt in — nothing regressed for any loop. Merge commit on `architecture`:
|
||
`089ed88f` (rebuild the shared binary from architecture to pick it up).
|
||
|
||
The JIT is safe for the host's request pipeline because:
|
||
- The pipeline (dream router + feed/relations/blog handlers + JSON + render-to-html)
|
||
is pure SX with **no `call/cc`**; the only continuation-style code is `guard`
|
||
(Dream's `dream-catch-with` / `wrap-errors`), which the JIT **auto-detects and
|
||
runs interpreted** (recursive `PUSH_HANDLER` scan). So error handling stays
|
||
correct; everything else JITs.
|
||
- Proven end-to-end: combined host+JIT binary, full conformance under
|
||
`SX_SERVING_JIT=1` = **181/181, all 10 suites green** (handler 14, middleware 9,
|
||
sxtp 39, router 6, feed 14, relations 22, blog 27, page 8, server 13, ledger 29).
|
||
|
||
## How to enable
|
||
|
||
1. Rebuild the shared binary from `architecture` (it carries the merge):
|
||
`cd hosts/ocaml && dune build bin/sx_server.exe`
|
||
2. Launch the host server process with `SX_SERVING_JIT=1` set in its environment
|
||
(whatever wrapper/serve path you use — `lib/host/serve.sx` / the http-listen
|
||
entry). Default-off means you must set it explicitly.
|
||
3. One-time cost: JIT compiles hot functions on first call (~+1 s at startup /
|
||
first requests). Amortized immediately for a long-lived server.
|
||
|
||
## Measurements (this is the evidence)
|
||
|
||
In-process, full request pipeline (`host/native-handler (host/make-app …)` →
|
||
`/feed`, 2000 requests, in-memory persist backend):
|
||
|
||
| | per-request CPU | total 2000 reqs |
|
||
|---|---|---|
|
||
| CEK (default, no JIT) | ~9 ms | ~15–20 s |
|
||
| **JIT (`SX_SERVING_JIT=1`)** | **~2.7 ms** | **~5–6 s** |
|
||
|
||
JIT is also markedly *less* variable run-to-run. The cost is the pipeline
|
||
(routing + feed normalize/stream + handler + JSON), not rendering —
|
||
`render-to-html` alone is only ~50 µs/render and is already fast.
|
||
|
||
## What was ruled out (don't chase these)
|
||
|
||
The original kickoff framed the slowness as "interpreted Smalltalk (`content/html`)
|
||
in ~2 s". **The host does not load `lib/smalltalk` or `lib/content`** — that was a
|
||
different subsystem. We measured and confirmed:
|
||
- The host's render path is `render-to-html` (SX markup → HTML), already fast.
|
||
- The proposed big engine projects — **VM continuation-escape** and a
|
||
**compile-to-closures Smalltalk interpreter** — would *not* help the host
|
||
(wrong subsystem) and are **not needed**. (Scoping kept in the vm-extensions
|
||
loop under `plans/vm-continuation-escape.md` / `plans/smalltalk-dispatch-perf.md`
|
||
if a Smalltalk-backed workload ever needs them.)
|
||
|
||
## Caveat — this is CPU only
|
||
|
||
The ~3–4× is the in-process CPU path (which JIT controls). It does **not** touch
|
||
network/IO latency. If your production TTFB is dominated by a non-in-memory
|
||
`persist` backend, cross-service fetches, TLS/connection setup, or the known
|
||
homepage SSR-stepper issue, profile those separately — JIT won't move them. To
|
||
find your real split, break a live TTFB into: request parse → route → handler
|
||
(+ persist read) → render → serialize → network. The in-memory measurement above
|
||
says the *code path* is ~2.7 ms under JIT; anything beyond that in production is
|
||
infrastructure, not the SX engine.
|
||
|
||
## One known residual (not host-affecting, for awareness)
|
||
|
||
The serving hook re-runs a JIT'd function on the CEK if it fails mid-execution
|
||
(correct result, but could duplicate side effects for an impure function that
|
||
fails mid-run). The host conformance is clean (181/181), so nothing triggers it
|
||
on your paths today. The clean general fix (propagate-don't-rerun) is deferred in
|
||
the vm-extensions loop.
|
||
|
||
## Correction (host loop, 2026-06-28)
|
||
|
||
The premise above ("~2s interpreted-Smalltalk render") is STALE: the blog moved
|
||
off content-on-sx Smalltalk to `render-to-html` long ago (render-page ~2ms). The
|
||
actual post-page unresponsiveness was NOT CPU/render — it was the DURABLE READ
|
||
COUNT: host/blog--relation-blocks did ~7 `kv-keys` performs per page (each
|
||
host/blog-out/in re-scanned the KV). Collapsing to one shared kv-keys read fixed
|
||
it (~1s -> ~0.02s; commit 0a2f1a61). So serving-JIT was NOT the fix here.
|
||
|
||
Serving-JIT may still be a worthwhile general speedup (the ~3-4× CPU claim, and
|
||
the Datalog `instances-of` on /tags is CPU-bound), but it requires running the
|
||
host on the merged `architecture` binary — this worktree's binary has no
|
||
SX_SERVING_JIT gate. Treat it as an optional future win, not the perf blocker.
|