From e6a1180d50b49d8cbf17296e8284e2aa56ef53b0 Mon Sep 17 00:00:00 2001 From: giles Date: Sun, 28 Jun 2026 18:53:25 +0000 Subject: [PATCH] docs: serving-JIT handoff (from sx-vm-extensions) + host-loop correction MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Carry the sx-vm-extensions loop's serving-JIT handoff notes, and add a correction: the post-page slowness was the durable read count (fixed in 0a2f1a61), not the (long-gone) Smalltalk render path — so SX_SERVING_JIT is an optional general speedup, not the perf blocker. Co-Authored-By: Claude Opus 4.8 --- plans/HANDOFF-enable-serving-jit.md | 98 +++++++++++++++++++++++++++++ plans/host-on-sx.md | 28 +++++++++ 2 files changed, 126 insertions(+) create mode 100644 plans/HANDOFF-enable-serving-jit.md diff --git a/plans/HANDOFF-enable-serving-jit.md b/plans/HANDOFF-enable-serving-jit.md new file mode 100644 index 00000000..a7848e1d --- /dev/null +++ b/plans/HANDOFF-enable-serving-jit.md @@ -0,0 +1,98 @@ +# Hand-off: enable serving-mode JIT for ~3–4× request CPU + +> From the **sx-vm-extensions** loop (2026-06-28). The serving-mode JIT is merged +> to `architecture` and is the host's real perf win — it just needs switching on. +> No further engine work is required from your side. + +## TL;DR + +Run the host server on the merged `architecture` binary with **`SX_SERVING_JIT=1`** +in its environment. Expected: **~3–4× lower per-request CPU** (measured ~9 ms → +~2.7 ms on the `/feed` pipeline). Already verified correct: full host conformance +is **181/181 under `SX_SERVING_JIT=1`**. + +## What changed (already merged to architecture) + +The bytecode JIT now works in the persistent/epoch serving mode, **opt-in via the +`SX_SERVING_JIT` env var (default OFF)**. Default-off means zero change until you +opt in — nothing regressed for any loop. Merge commit on `architecture`: +`089ed88f` (rebuild the shared binary from architecture to pick it up). + +The JIT is safe for the host's request pipeline because: +- The pipeline (dream router + feed/relations/blog handlers + JSON + render-to-html) + is pure SX with **no `call/cc`**; the only continuation-style code is `guard` + (Dream's `dream-catch-with` / `wrap-errors`), which the JIT **auto-detects and + runs interpreted** (recursive `PUSH_HANDLER` scan). So error handling stays + correct; everything else JITs. +- Proven end-to-end: combined host+JIT binary, full conformance under + `SX_SERVING_JIT=1` = **181/181, all 10 suites green** (handler 14, middleware 9, + sxtp 39, router 6, feed 14, relations 22, blog 27, page 8, server 13, ledger 29). + +## How to enable + +1. Rebuild the shared binary from `architecture` (it carries the merge): + `cd hosts/ocaml && dune build bin/sx_server.exe` +2. Launch the host server process with `SX_SERVING_JIT=1` set in its environment + (whatever wrapper/serve path you use — `lib/host/serve.sx` / the http-listen + entry). Default-off means you must set it explicitly. +3. One-time cost: JIT compiles hot functions on first call (~+1 s at startup / + first requests). Amortized immediately for a long-lived server. + +## Measurements (this is the evidence) + +In-process, full request pipeline (`host/native-handler (host/make-app …)` → +`/feed`, 2000 requests, in-memory persist backend): + +| | per-request CPU | total 2000 reqs | +|---|---|---| +| CEK (default, no JIT) | ~9 ms | ~15–20 s | +| **JIT (`SX_SERVING_JIT=1`)** | **~2.7 ms** | **~5–6 s** | + +JIT is also markedly *less* variable run-to-run. The cost is the pipeline +(routing + feed normalize/stream + handler + JSON), not rendering — +`render-to-html` alone is only ~50 µs/render and is already fast. + +## What was ruled out (don't chase these) + +The original kickoff framed the slowness as "interpreted Smalltalk (`content/html`) +in ~2 s". **The host does not load `lib/smalltalk` or `lib/content`** — that was a +different subsystem. We measured and confirmed: +- The host's render path is `render-to-html` (SX markup → HTML), already fast. +- The proposed big engine projects — **VM continuation-escape** and a + **compile-to-closures Smalltalk interpreter** — would *not* help the host + (wrong subsystem) and are **not needed**. (Scoping kept in the vm-extensions + loop under `plans/vm-continuation-escape.md` / `plans/smalltalk-dispatch-perf.md` + if a Smalltalk-backed workload ever needs them.) + +## Caveat — this is CPU only + +The ~3–4× is the in-process CPU path (which JIT controls). It does **not** touch +network/IO latency. If your production TTFB is dominated by a non-in-memory +`persist` backend, cross-service fetches, TLS/connection setup, or the known +homepage SSR-stepper issue, profile those separately — JIT won't move them. To +find your real split, break a live TTFB into: request parse → route → handler +(+ persist read) → render → serialize → network. The in-memory measurement above +says the *code path* is ~2.7 ms under JIT; anything beyond that in production is +infrastructure, not the SX engine. + +## One known residual (not host-affecting, for awareness) + +The serving hook re-runs a JIT'd function on the CEK if it fails mid-execution +(correct result, but could duplicate side effects for an impure function that +fails mid-run). The host conformance is clean (181/181), so nothing triggers it +on your paths today. The clean general fix (propagate-don't-rerun) is deferred in +the vm-extensions loop. + +## Correction (host loop, 2026-06-28) + +The premise above ("~2s interpreted-Smalltalk render") is STALE: the blog moved +off content-on-sx Smalltalk to `render-to-html` long ago (render-page ~2ms). The +actual post-page unresponsiveness was NOT CPU/render — it was the DURABLE READ +COUNT: host/blog--relation-blocks did ~7 `kv-keys` performs per page (each +host/blog-out/in re-scanned the KV). Collapsing to one shared kv-keys read fixed +it (~1s -> ~0.02s; commit 0a2f1a61). So serving-JIT was NOT the fix here. + +Serving-JIT may still be a worthwhile general speedup (the ~3-4× CPU claim, and +the Datalog `instances-of` on /tags is CPU-bound), but it requires running the +host on the merged `architecture` binary — this worktree's binary has no +SX_SERVING_JIT gate. Treat it as an optional future win, not the perf blocker. diff --git a/plans/host-on-sx.md b/plans/host-on-sx.md index d21e8f58..6d698e2a 100644 --- a/plans/host-on-sx.md +++ b/plans/host-on-sx.md @@ -425,3 +425,31 @@ Swap `host/blog-open-create-routes` → `host/blog-write-routes ` to g here even `sx_write_file` fails. Read-side sx-tree tools work. New `.sx` files were created with the `Write` tool (the .sx hook is inactive in this worktree) and each validated afterwards with `sx_validate` to keep the parse guarantee. + +## Action item — serving-JIT speedup is NOT a code merge; it's a one-line flag flip + +The ~2s interpreted-Smalltalk render (`/welcome/`, blog post pages) is being fixed +by the **`sx-vm-extensions`** loop — the JIT-bytecode-correctness handoff we kicked +off on 2026-06-19. **Do not wait for a code merge into `lib/host/**`** — the fix +lives entirely in the shared kernel (`hosts/ocaml/**`: `sx_server.ml`, `sx_vm.ml`, +extension modules) + shared guest runtimes (`lib/smalltalk/eval.sx`, +`lib/compiler.sx`, `lib/*/runtime.sx`). None of it is host code. The speedup is a +property of the shared `sx_server.exe` binary every loop already runs. + +The serving-mode JIT is **gated behind `SX_SERVING_JIT`** (vm-ext commit +`bf298684`), and host's `serve.sh` / `conformance.sh` currently do **not** set it. +So host's entire adoption step is: + +1. Wait for `sx-vm-extensions` → `architecture` (kernel + guest-runtime merge) and + the rebuilt shared binary. Watch its scoreboard: serving-JIT must be green across + ALL guest suites (Smalltalk, Datalog, Scheme, Haskell, Erlang, Prolog, APL, js) + with `SX_SERVING_JIT=1` — already done as of vm-ext `fed58b28` (js 148/148). +2. Gate locally: run `SX_SERVING_JIT=1 bash lib/host/conformance.sh` against the + rebuilt binary. Must stay green — this is the exact suite that first exposed the + miscompile (`router 3/6, feed 4/11, relations 9/16, blog 4/11` with the old JIT + on). If green, the residual exclusions in vm-ext covered host's workload. +3. Flip it on live: add `export SX_SERVING_JIT=1` to `lib/host/serve.sh` (the one + in-scope `lib/host/**` change). Commit as a feature. Live render should drop from + ~2s to tens of ms — highest-leverage perf win on the platform. + +Until step 1's binary is in, this is a no-op — leave `serve.sh` as is.