docs: hand off serving-JIT host miscompile to sx-vm-extensions
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
64
plans/HANDOFF-jit-miscompile.md
Normal file
64
plans/HANDOFF-jit-miscompile.md
Normal file
@@ -0,0 +1,64 @@
|
|||||||
|
# Hand-off: serving-mode JIT miscompiles host handlers (to sx-vm-extensions)
|
||||||
|
|
||||||
|
> From the **host-on-sx** loop, 2026-06-28. We enabled `SX_SERVING_JIT=1` on the
|
||||||
|
> live host (blog.rose-ash.com) — the Datalog/relations saturation JITs cleanly
|
||||||
|
> and is the real win (host conformance 271/271 under JIT, 5.4× faster; live
|
||||||
|
> `/tags` 2.5s → 0.76s). BUT host app handlers MISCOMPILE in the serving path, so
|
||||||
|
> we had to `(jit-exclude! "host/*" "dream-*" "dr/*")` in serve.sh as a band-aid.
|
||||||
|
> Please fix the underlying bug so the exclude can be dropped.
|
||||||
|
|
||||||
|
## Symptom
|
||||||
|
|
||||||
|
Under `SX_SERVING_JIT=1`, the FIRST request to most pages 500s, then self-heals
|
||||||
|
(retries 200). stderr shows, paired:
|
||||||
|
|
||||||
|
```
|
||||||
|
[jit] host/blog--edges-block first-call fallback to CEK: Sx_types.Eval_error("map: expected (fn list) (in CALL_PRIM \"map\" with 2 args)")
|
||||||
|
[http-listen] handler error: Sx_types.Eval_error("map: expected (fn list) (in CALL_PRIM \"map\" with 2 args)")
|
||||||
|
```
|
||||||
|
Also seen: `Sx_types.Eval_error("rest: 1 list arg")`.
|
||||||
|
|
||||||
|
## Two distinct bugs
|
||||||
|
|
||||||
|
**(A) codegen / VM-state.** A JIT'd function's bytecode runs `CALL_PRIM "map"`
|
||||||
|
(and `rest`) with args the primitive rejects (`expected (fn list)`, 2 args
|
||||||
|
pushed but wrong). KEY CLUE: **host conformance under `SX_SERVING_JIT=1` is
|
||||||
|
271/271** — the SAME functions (host/blog--edges-block etc.) JIT fine when driven
|
||||||
|
via the epoch `(eval ...)` path. It ONLY miscompiles in the **http-listen +
|
||||||
|
cek_run_with_io** serving path. So it is not pure codegen — it's triggered by the
|
||||||
|
serving/IO context. Strong hypothesis: a `perform`/`VmSuspended` earlier in the
|
||||||
|
request (the handler does durable kv reads) resumes the VM with a misaligned
|
||||||
|
stack, so the NEXT `CALL_PRIM` (often a `map`) gets wrong args. The map/rest are
|
||||||
|
just the first prim call after a resume. Worth a `vm-trace` of a handler that
|
||||||
|
suspends then maps.
|
||||||
|
|
||||||
|
**(B) fallback doesn't recover the failed call.** `register_jit_hook`
|
||||||
|
(`hosts/ocaml/bin/sx_server.ml` ~L1607-1623): on first-call error it warns, sets
|
||||||
|
`l.l_compiled <- jit_failed_sentinel`, and returns `None` — intended to fall
|
||||||
|
through to CEK. But the error still escapes to the http-listen handler (→ 500)
|
||||||
|
instead of the call being re-run on CEK and returning a value. So even granting
|
||||||
|
(A), the request shouldn't 500: the fallback should recover THIS call, not just
|
||||||
|
mark the fn for next time. (Your own notes flagged this as the deferred
|
||||||
|
"propagate-don't-rerun" shared-CEK change — this is the same thing biting live.)
|
||||||
|
|
||||||
|
Fixing EITHER (A) or (B) unblocks the host: (A) removes the miscompile; (B) makes
|
||||||
|
any miscompile self-heal on the first hit instead of 500ing.
|
||||||
|
|
||||||
|
## Repro
|
||||||
|
|
||||||
|
1. Build the merged binary (loops/host now carries sx-vm-extensions; the gate +
|
||||||
|
render-page coexist in sx_server.ml's persistent serving branch).
|
||||||
|
2. `SX_SERVING_JIT=1 bash lib/host/serve.sh` on a port (durable backend), but
|
||||||
|
FIRST remove the `(jit-exclude! "host/*" ...)` line from serve.sh so host code
|
||||||
|
JITs.
|
||||||
|
3. `curl http://127.0.0.1:PORT/welcome/` → first hit 500 (`map: expected (fn list)`),
|
||||||
|
retry 200. `curl /` (home, uses map+rest) likewise.
|
||||||
|
|
||||||
|
Tooling: `(vm-trace "<sx>")`, `(bytecode-inspect "host/blog--edges-block")`,
|
||||||
|
`(prim-check "host/blog--edges-block")` (CLAUDE.md "VM/Bytecode Debugging").
|
||||||
|
|
||||||
|
## Current mitigation (host side, to remove once fixed)
|
||||||
|
|
||||||
|
`lib/host/serve.sh`: when `SX_SERVING_JIT=1`, `(jit-exclude! "host/*" "dream-*"
|
||||||
|
"dr/*")`. Host app + Dream framework run on CEK (they're IO-bound — no perf loss);
|
||||||
|
Datalog (`dl-*`/`relations-*`) keeps JITting (the win). Drop this once (A)/(B) land.
|
||||||
Reference in New Issue
Block a user