register_jit_hook is now installed in the persistent (epoch) serving-mode branch of sx_server.ml, not just --http/cli/site. Smalltalk-on-SX conformance under JIT is 847/847 — identical to the no-JIT baseline; Datalog 356/356. run_tests --jit/no-jit are byte-identical before/after (no regression). Five distinct root causes fixed (not one "miscompile"): 1. Serving mode never loaded lib/compiler.sx, so JIT used the native Sx_compiler.compile stub (arity-0 bytecode, params as GLOBAL_GET → "VM undefined: <param>"). Server-mode branch now loads compiler.sx before registering the hook, matching http/cli/site. 2. compile-cond / compile-case-clauses / compile-guard-clauses only treated keyword :else and true as the catch-all, not the bare symbol `else` that the CEK's is-else-clause? accepts → GLOBAL_GET "else". (lib/compiler.sx) 3. OP_DIV produced a float for non-divisible Integer/Integer (1/2 → 0.5) instead of the exact Rational the "/" primitive returns. Now delegates to the primitive, matching CEK. (sx_vm.ml) 4. OP_EQ / _fast_eq lacked Rational/ListRef cases that the "=" primitive's safe_eq has → (= 1/2 1/2) false under JIT. OP_EQ now delegates non-scalars to the "=" primitive; _fast_eq gained rational + ListRef. (sx_vm.ml, sx_runtime.ml) 5. Continuation-based control flow (Smalltalk ^expr non-local return, block escape, exceptions via call/cc) can't run in the stack VM. New data-driven exclusion set Sx_types.jit_excluded + `jit-exclude!` primitive, consulted in jit_compile_lambda (covers both the CEK hook and vm_call's tiered path). lib/smalltalk/eval.sx self-declares its continuation dispatch core interpret-only; pure helpers still JIT. The SUnit suite-runner test helper pharo-test-class miscompiles mid-loop and is excluded in tests/tokenize.sx. Also adds SX_JIT_DENY / SX_JIT_ONLY env-var bisection filters to the serving hook. Known residual documented in plans/jit-bytecode-correctness.md: the hook re-runs a failed VM execution via CEK (correct result, possible duplicate side effects); adopting run_tests' propagate-don't-rerun semantics is deferred to avoid changing shared VM/CEK behavior under this loop. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
124 lines
6.6 KiB
Markdown
124 lines
6.6 KiB
Markdown
# JIT bytecode correctness — enable the JIT in serving mode
|
||
|
||
> Kickoff handed over from the **host-on-sx** loop (2026-06-19). This is the
|
||
> highest-leverage perf win on the platform.
|
||
|
||
## Why this matters
|
||
|
||
Every SX-on-SX subsystem runs **interpreted on the tree-walking CEK**: the
|
||
Smalltalk runtime (→ content-on-sx rendering), and the guest languages
|
||
(Datalog, Prolog, APL, Scheme, Haskell, Erlang, Maude). The lazy JIT
|
||
(`register_jit_hook` → bytecode VM) would speed all of them up ~10–60×. It is
|
||
currently **only installed in `--http` page-server mode**, not the epoch /
|
||
`http-listen` serving mode — because it **miscompiles** these workloads.
|
||
|
||
Concrete impact: the host serves a blog post (`content/html`, interpreted
|
||
Smalltalk) in **~2 seconds per request**. With a correct JIT it should be tens
|
||
of ms. Same slowdown applies to every guest-language-backed service.
|
||
|
||
## Concrete repro (from the host loop)
|
||
|
||
In `hosts/ocaml/bin/sx_server.ml`, the persistent server mode (`make_server_env`,
|
||
~line 4871) does **not** call `register_jit_hook env` — only the `--http` mode
|
||
(~line 4034) does. To reproduce the miscompile:
|
||
|
||
1. Add `register_jit_hook env;` right after `let env = make_server_env () in` in
|
||
the persistent server-mode branch (~4871).
|
||
2. Rebuild: `eval $(opam env --switch=5.2.0); dune build bin/sx_server.exe`.
|
||
3. Run a Smalltalk/content-heavy suite, e.g. the host-on-sx conformance
|
||
(`bash /root/rose-ash-loops/host/lib/host/conformance.sh`, or any
|
||
content-on-sx suite). **With the hook ON, tests FAIL** — host-on-sx dropped to
|
||
`router 3/6, feed 4/11, relations 9/16, blog 4/11`. With the hook OFF: all green.
|
||
|
||
So the JIT produces **wrong results** (the known "compiled compiler helpers loop
|
||
on complex nested ASTs" — see memory `project_jit_bytecode_bug`).
|
||
|
||
## Goal
|
||
|
||
Make the JIT compile the Smalltalk-on-SX evaluator + guest-language evaluators
|
||
**correctly**, so `register_jit_hook` can be enabled in serving mode with
|
||
conformance **fully green**. Then enable it there.
|
||
|
||
## Suggested approach
|
||
|
||
- Minimal repro to bisect: render a `lib/content` doc via `content/html` with JIT
|
||
ON vs OFF, diff the output, find the first divergence.
|
||
- Localize with the VM debugging tools (see CLAUDE.md): `(vm-trace ...)`,
|
||
`(bytecode-inspect ...)`, `(prim-check ...)`, `(deps-check ...)`.
|
||
- Likely suspects: nested closures / TCO, dict construction, `st-send` dispatch
|
||
patterns, recursion through the Smalltalk method interpreter.
|
||
|
||
## Pointers
|
||
|
||
- `register_jit_hook` — `sx_server.ml` ~1493; JIT VM-suspend/resolve path ~1497–1514.
|
||
- `hosts/ocaml/lib/sx_vm.ml` — the bytecode VM + compiler.
|
||
- `plans/jit-cache-architecture.md`, `plans/jit-perf-regression.md`, `restore-jit-perf.sh`.
|
||
- Memory: `project_jit_bytecode_bug.md` (plan ref `plans/reflective-rolling-treehouse.md`).
|
||
- The shared `sx_server.exe` binary is used by ALL loops — coordinate before
|
||
changing VM semantics that could affect sibling conformance runs.
|
||
|
||
---
|
||
|
||
## Resolution (2026-06-19, loop loops/sx-vm-extensions)
|
||
|
||
JIT is now enabled in the persistent (epoch) serving mode (`register_jit_hook`
|
||
in `sx_server.ml`'s server-mode branch). Smalltalk conformance is **847/847 —
|
||
identical to the no-JIT baseline** (no failures, no double-counted rows).
|
||
Datalog conformance (a non-continuation guest) is **356/356** under JIT.
|
||
|
||
Five distinct root causes were found and fixed (not one "miscompile"):
|
||
|
||
1. **Serving mode never loaded `lib/compiler.sx`.** The JIT then used the
|
||
native `Sx_compiler.compile` stub, which emits arity-0 bytecode with every
|
||
parameter compiled as `GLOBAL_GET` → "VM undefined: <param>" on the first
|
||
call of essentially every function. `http`/`cli`/`site` modes already load
|
||
`compiler.sx`; the epoch serving branch now does too (before the hook).
|
||
*Fix: `sx_server.ml` server-mode branch loads `lib/compiler.sx`.*
|
||
|
||
2. **`compile-cond`/`compile-case-clauses`/`compile-guard-clauses` only treated
|
||
the keyword `:else` and `true` as the catch-all** — not the bare symbol
|
||
`else` that the CEK's `is-else-clause?` accepts. They emitted
|
||
`GLOBAL_GET "else"` → runtime "VM undefined: else".
|
||
*Fix: `lib/compiler.sx` — add the symbol-`else` case to all three.*
|
||
|
||
3. **`OP_DIV` produced a float for non-divisible Integer/Integer** (`1/2` → 0.5)
|
||
instead of the exact `Rational` the `/` primitive returns → diverged from CEK
|
||
and broke equality vs rational results.
|
||
*Fix: `sx_vm.ml` — delegate non-divisible int/int to the `/` primitive.*
|
||
|
||
4. **`OP_EQ` / `_fast_eq` lacked `Rational`/`ListRef` cases** that the real `=`
|
||
primitive's `safe_eq` has → `(= 1/2 1/2)` was false under JIT.
|
||
*Fix: `OP_EQ` delegates non-trivial types to the `=` primitive;
|
||
`_fast_eq` (also used by `prim_call "="`) gained rational + ListRef cases.*
|
||
|
||
5. **Continuation-based control flow can't run in the stack VM.** Smalltalk's
|
||
non-local return (`^expr`), block escape, and exception unwinding use
|
||
`call/cc`; a JIT-compiled frame between a `call/cc` capture and its `(k v)`
|
||
invocation cannot transfer control and (via the hook's re-run-on-failure)
|
||
double-executes side effects.
|
||
*Fix: a general, data-driven exclusion set — `Sx_types.jit_excluded`,
|
||
populated from SX via the new `jit-exclude!` primitive, consulted in
|
||
`jit_compile_lambda` so it covers BOTH JIT entry points (CEK hook + in-VM
|
||
tiered path). `lib/smalltalk/eval.sx` self-declares its continuation-using
|
||
dispatch core interpret-only; pure helpers (parsing, lookup, formatting,
|
||
arithmetic) still JIT.* One SUnit suite-runner test helper
|
||
(`pharo-test-class`) miscompiles under JIT on a specific iteration and is
|
||
excluded in the test prelude (`tests/tokenize.sx`).
|
||
|
||
### Known residual / follow-up
|
||
- The hook still **re-runs a failed VM execution via CEK** (always yields the
|
||
correct result, but can duplicate side effects if a JIT'd function fails
|
||
mid-run after a side effect). `run_tests`'s hook instead propagates non-IO /
|
||
non-"VM undefined" exceptions. Adopting that propagate-don't-rerun semantics
|
||
in the serving hook would remove the double-execution class entirely, but it
|
||
surfaces genuine mid-run miscompiles as errors — so it must land together
|
||
with fixing/excluding any function that miscompiles mid-run (e.g.
|
||
`pharo-test-class`). Deferred to avoid changing shared VM/CEK semantics under
|
||
this loop.
|
||
- Other continuation-heavy guests (Scheme, Erlang use `call/cc`) will need
|
||
their own `jit-exclude!` declarations for their dispatch cores; the mechanism
|
||
is in place. Non-continuation guests (Datalog/Prolog/Haskell/APL) JIT as-is.
|
||
- A debug aid was added to the serving hook: `SX_JIT_DENY=name,...` /
|
||
`SX_JIT_ONLY=name,...` env vars to bisect which named lambda the VM
|
||
mishandles (hook-path only).
|