The OP_DIV/numeric-tower work on this branch made the OCaml `/` primitive
return an exact Rational for (/ int int) (e.g. (/ 5 2)=5/2), diverging from
the canonical spec ("/ always returns inexact float"), the test-rationals.sx
header ("in the JS host, (/ int int) returns float — backward-compatible"),
and the JS host itself. That leaked rationals into arithmetic results and
rendered CSS (tw-opacity emitted `opacity:1/20` instead of `0.05`).
Decision (with the user): keep exact rationals as an explicit opt-in
(literals 1/3, make-rational) but bring `/` back into spec/host parity —
the isomorphic SSR↔hydration invariant requires both hosts to agree, and
JS has no native rational type.
- sx_primitives.ml `/`: (/ int int) → integer when exactly divisible, else
inexact float; a Rational operand still yields an exact rational (matches
test-numeric-tower: (/ 6 2)=3, (/ 1 4)=0.25, (/ 5 2)=2.5, (/ 1/2 2)=1/4).
- sx_primitives.ml number? / exact?: recognise the Rational type (real bugs —
test-rationals asserts (number? 1/3) and (exact? 1/3); inexact?/float?
already returned false for Rational, correct).
- sx_vm.ml OP_DIV: comment updated (it delegates to the now-float `/`).
- test-rationals.sx: fix typo in "rational * float = float" — used int 2,
should be 2.0 (1/2 * 2 = 1 exact, not a float; name + siblings use floats).
OCaml conformance 4834→4863 (+29 fixed, zero new failures); rationals,
numeric-tower, arithmetic, tw-opacity suites all 0 failures. Remaining run_tests
failures are the pre-existing environmental hyperscript (host-call-fn) set.
JS host already handles number?/exact? on rationals and float `/`; its
remaining float?/contagion failures are a separate pre-existing limitation
(JS has no distinct float type), out of scope here.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Root cause (found via bin/repro_jit_resume.ml, 9 surgical cases): when a
`perform` (durable kv read) fires inside a native HO-primitive callback
(map/filter/reduce/for-each/some/every?), the VmSuspended unwound through
the primitive's native OCaml loop (List.map etc.), destroying the loop's
iteration state. The remaining elements were dropped and the stack left
misaligned, so the NEXT CALL_PRIM (map/rest/drop) read wrong args —
"map: expected (fn list)", "rest: 1 list arg", "drop: list and number".
Only triggers in the http-listen + cek_run_with_io serving path (epoch
eval has no synchronous resolver, so conformance was 271/271).
(A) lib/sx_vm.ml call_closure_reuse: when a callback suspends AND a
synchronous IO resolver is installed (serving mode), resolve the
callback's IO inline and run it to completion right there, returning its
value to the native loop — so the loop is never unwound. Scoped to the
resolver-set path; the CEK-driven path (flow/reactive/async tests) keeps
its existing reuse_stack behaviour, so nothing else changes. reuse_stack
is isolated across the nested resume.
(A') lib/sx_vm.ml resume_vm: re-assert _active_vm := Some vm for the
duration of the resumed run (mirrors call_closure). call_closure restored
_active_vm to the caller when VmSuspended unwound, so HO callbacks during
a resume could land on the wrong VM. Latent-bug fix.
(B) bin/sx_server.ml register_jit_hook: the resolve_loop runs inside the
VmSuspended handler, so a non-VmSuspended exception from resume_vm escaped
to the http handler (→ 500). Catch it and fall back to CEK for THIS call
(mark jit_failed, return None → interpreter re-runs it). Self-heals on the
first hit, not a retry. Defense-in-depth; with (A) it shouldn't trigger.
Verification: repro 9/9 (incl. host shape: map[cb→interpreted-helper
perform]→drop = (7 8); reduce; nested map). Standard + --full OCaml
conformance unchanged at 4834/1110 (baseline identical — the 1110 are
pre-existing environmental: host-call-fn/browser-platform symbols,
rational display, tw/regex). Host loop to re-verify 271/271 serving and
drop its (jit-exclude! "host/*" "dream-*" "dr/*") band-aid.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The 6 common-lisp opt-in-JIT failures were all condition-system continuation
escape: cl-restart-case/cl-handler-case/cl-handler-bind wrap their body in
call/cc (restarts + non-local handler exit). When an SX function that drives
the condition system (the parse-recover / interactive-debugger fixtures, e.g.
parse-numbers, make-policy-debugger) is JIT-compiled, the call/cc form runs in
a NESTED cek-run where invoking the captured continuation
runs-to-completion-and-returns instead of escaping — so a restart fails to
abort and the body falls through. Observed as result accumulation
(got (1 3 0 3) vs (1 3)) and no-abort (restart returns the 999 sentinel).
These callers are arbitrary user/fixture code, not a fixed namespace, so they
can't be prefix-excluded. New data-driven mechanism:
- jit-exclude-callers-of! registers call/cc-establishing form names in
Sx_types.jit_excluded_caller_names.
- jit_compile_lambda skips any function whose constant pool (recursively,
incl. nested closures) references a registered name — code_refs_escaping_caller.
Guarded by Hashtbl.length > 0 so it's a no-op for every guest that doesn't
register (zero effect outside CL).
- lib/common-lisp/runtime.sx registers the establish side (cl-restart-case,
cl-handler-case, cl-handler-bind) and the invoke side (cl-invoke-restart,
cl-invoke-debugger, cl-signal, cl-error-with-debugger).
Result: CL conformance under SX_SERVING_JIT=1 = 487/0, EXACTLY matching the CEK
baseline (was 484/6 with a +3 double-execution over-count). parse-recover
3/4 -> 6/0, interactive-debugger 7/2 -> 7/0.
Note: the geometry/mop-trace suites report 0/0 on BOTH CEK and JIT — they error
"Undefined symbol: refl-class-chain-depth-with" (the CLOS suites don't preload
lib/guest/reflective/class-chain.sx). Pre-existing conformance-harness gap, not
a JIT issue; left as-is.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The host combined-binary integration test exposed a new JIT-unsafe class:
Dream's error middleware (host/wrap-errors -> dream-catch-with) failed to catch
a thrown error under JIT — it escaped as "Unhandled exception" and truncated the
host middleware suite (7/9 vs 9/9 on CEK).
Root cause: the VM's OP_PUSH_HANDLER (the compiled form of `guard`) only
intercepts a VM-level RAISE (opcode 37); it does NOT catch the OCaml Eval_error
that the `error` primitive throws from a CALL/CALL_PRIM in a callee frame. So a
JIT-compiled `guard` silently fails to catch. dream-catch-with is curried
((fn (on-error) (fn (next) (fn (req) (guard ...))))), so the guard lives in a
NESTED closure — JIT-compiling the outer function mints that inner guard as a
VmClosure with the broken VM handler.
Fix (central, not per-callsite): scan a JIT candidate's bytecode RECURSIVELY —
including nested closure code in the constant pool — for OP_PUSH_HANDLER, and
skip JIT for any handler-installing function. It then runs on the CEK, whose
guard catches correctly. Covers dream-catch-with, host wrap-errors/blog-render,
and every other guard / handler-bind user automatically.
Verified: minimal direct guard and curried cross-frame guard both return the
caught value under JIT (were "Unhandled exception"); the host run's "kaboom"
escapes went 2 -> 0. (Remaining host blog/page failures are "Undefined symbol:
render-page" — the host's native render fn, absent from the standalone
sx_server.exe; identical on CEK, i.e. an environment artifact, not a JIT
regression. The combined host binary has render-page.)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Enabling the epoch serving-mode JIT globally regressed continuation-based guest
interpreters (the epoch mode is the shared command channel every loop's
conformance runner uses). Two-part fix:
1. SAFE DEFAULT GATE. register_jit_hook in the persistent server branch is now
opt-in via SX_SERVING_JIT=1 (default OFF). Default behaviour is unchanged
(no JIT in epoch serving) → zero regression for sibling loops. The
content/Smalltalk page server opts in.
2. GENERAL FIXES + per-guest interpret-only declarations:
- callable? (sx_server/run_tests/integration_tests/mcp_tree) now accepts
VmClosure. A JIT-compiled higher-order function returns its inner closure
as a VmClosure; callable? previously rejected it, so scheme-apply's
(callable? proc) guard failed with "not a procedure: <vm:anon>".
- jit-exclude! gains a trailing-"*" namespace-prefix form
(Sx_types.jit_excluded_prefixes), the robust way to mark a whole guest
interpreter interpret-only (a name-list misses functions in extra files —
it left erlang's vm/dispatcher JIT'd and 13 tests short).
- Per-guest exclusions in each guest's runtime.sx:
scheme "scheme-*" "scm-*" erlang "er-*" "erlang-*"
prolog "pl-*" common-lisp "cl-*" "clos-*"
js "js-*" haskell "hk-*"
Verified under opt-in JIT (== CEK, no hang): smalltalk 847/847, scheme/flow
166/166, erlang 530/530, prolog 590/590, apl 152/152, js 147/148. Residual
(documented, protected by the default gate): common-lisp 6 fails in advanced
suites (parser-recovery/debugger/CLOS/MOP). lua (0/16) and tcl (3/4) fail
identically on CEK — pre-existing, not JIT. run_tests --jit/no-jit unchanged.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
register_jit_hook is now installed in the persistent (epoch) serving-mode
branch of sx_server.ml, not just --http/cli/site. Smalltalk-on-SX conformance
under JIT is 847/847 — identical to the no-JIT baseline; Datalog 356/356.
run_tests --jit/no-jit are byte-identical before/after (no regression).
Five distinct root causes fixed (not one "miscompile"):
1. Serving mode never loaded lib/compiler.sx, so JIT used the native
Sx_compiler.compile stub (arity-0 bytecode, params as GLOBAL_GET →
"VM undefined: <param>"). Server-mode branch now loads compiler.sx
before registering the hook, matching http/cli/site.
2. compile-cond / compile-case-clauses / compile-guard-clauses only treated
keyword :else and true as the catch-all, not the bare symbol `else` that
the CEK's is-else-clause? accepts → GLOBAL_GET "else". (lib/compiler.sx)
3. OP_DIV produced a float for non-divisible Integer/Integer (1/2 → 0.5)
instead of the exact Rational the "/" primitive returns. Now delegates to
the primitive, matching CEK. (sx_vm.ml)
4. OP_EQ / _fast_eq lacked Rational/ListRef cases that the "=" primitive's
safe_eq has → (= 1/2 1/2) false under JIT. OP_EQ now delegates non-scalars
to the "=" primitive; _fast_eq gained rational + ListRef. (sx_vm.ml,
sx_runtime.ml)
5. Continuation-based control flow (Smalltalk ^expr non-local return, block
escape, exceptions via call/cc) can't run in the stack VM. New data-driven
exclusion set Sx_types.jit_excluded + `jit-exclude!` primitive, consulted in
jit_compile_lambda (covers both the CEK hook and vm_call's tiered path).
lib/smalltalk/eval.sx self-declares its continuation dispatch core
interpret-only; pure helpers still JIT. The SUnit suite-runner test helper
pharo-test-class miscompiles mid-loop and is excluded in tests/tokenize.sx.
Also adds SX_JIT_DENY / SX_JIT_ONLY env-var bisection filters to the serving
hook. Known residual documented in plans/jit-bytecode-correctness.md: the hook
re-runs a failed VM execution via CEK (correct result, possible duplicate side
effects); adopting run_tests' propagate-don't-rerun semantics is deferred to
avoid changing shared VM/CEK behavior under this loop.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds Sx_vm.bytecode_uses_extension_opcodes — an operand-aware
bytecode scanner that walks past CONST u16, CALL_PRIM u16+u8, and
CLOSURE u16+dynamic upvalue descriptors so operand bytes that happen
to be ≥200 don't false-positive as extension opcodes.
jit_compile_lambda calls the scanner on the inner closure's bytecode.
On hit it returns None — the lambda then runs through CEK
interpretation. The VM's dispatch fallthrough still routes the
extension opcodes themselves through the registry; this change just
prevents the JIT from claiming code it has no plan for.
Tests: 7 new foundation cases — pure core eligible, head/middle/
post-CLOSURE detection, CONST + CALL_PRIM + CLOSURE-descriptor false-
positive avoidance. +7 pass vs Phase D baseline, no regressions
across 11 conformance suites.
Loop complete: acceptance criteria 1-4 met. Hand-off to the Erlang
loop — lib/erlang/vm/dispatcher.sx's Phase 9b stub can now be
replaced with a real hosts/ocaml/lib/extensions/erlang.ml consumer.
lib/extensions/ becomes the new home for VM extensions, wired in via
(include_subdirs unqualified). README documents the registration
pattern, opcode-ID range conventions (200-209 guest_vm, 210-219
inline test, 220-229 test_ext, 230-247 ports), and naming rules.
extensions/test_ext.ml is the canonical worked example — two
operand-less opcodes (220 push 42, 221 double TOS) carrying a per-
extension state slot (TestExtState invocation counter). Test_ext.register
called from run_tests.ml at the start of the Phase D suite, on top of
the inline test_reg from earlier suites (disjoint opcode IDs).
Sx_vm.opcode_name now consults extension_opcode_name_ref (forward ref
in the same style as extension_dispatch_ref), so disassemble shows
extension opcodes by name instead of UNKNOWN_n. Registry maintains
name_of_id_table and installs the lookup at module init.
Tests: 5 new foundation cases — primitive resolves test_ext name,
end-to-end bytecode (push + double + return → 84), disassemble shows
"test_ext.OP_TEST_PUSH_42" / "test_ext.OP_TEST_DOUBLE_TOS",
unregistered ext opcodes still fall back to UNKNOWN_n, invocation
counter records the two dispatches. +5 pass vs Phase C baseline, no
regressions across 11 conformance suites.
Adds Invalid_opcode of int exception and extension_dispatch_ref forward
ref (default raises Invalid_opcode op), plus the |op when op >= 200 arm
before the catch-all in the bytecode dispatch loop. Partition comment
documents 1-199 core / 200-247 extensions / 248-255 reserved.
Phase B will install the real registry's dispatch into the ref at module
init, replacing this stub.
Tests: 4 new foundation cases (Invalid_opcode for 200/224/247, Eval_error
for 199 to pin the threshold). +4 pass vs baseline, no regressions.
sx_types.ml:
- Add l_uid field on lambda (unique identity for cache tracking)
- Add lambda_uid_counter + next_lambda_uid () minted on construction
- Add jit_budget (default 5000) and jit_evicted_count counter
- Add jit_cache_queue : (int * value) Queue.t — FIFO of compiled lambdas
- jit_cache_size () helper for stats
sx_vm.ml:
- On successful JIT compile, push (uid, Lambda l) onto jit_cache_queue
- While queue length exceeds jit_budget, pop head (oldest entry) and
clear that lambda's l_compiled slot — evicted entries fall through
to cek_call_or_suspend on next call (correct, just slower)
- Guard JIT trigger by !jit_budget > 0 (budget=0 disables JIT entirely)
sx_primitives.ml:
Phase 2:
- jit-set-budget! N — change cache budget at runtime
- jit-stats includes budget, cache-size, evicted
Phase 3:
- jit-reset-cache! — clear all compiled VmClosures (hot paths re-JIT
on next threshold crossing)
- jit-reset-counters! also resets evicted counter
run_tests.ml:
- Update test-fixture lambda construction to include l_uid
Effect: cache size bounded regardless of input pattern. The HS test harness
compiles ~3000 distinct one-shot lambdas, but tiered compilation (Phase 1)
keeps most below threshold so they never enter the cache. Steady-state count
stays in single digits for typical workloads. When a misbehaving caller
saturates the cache (eval-hs in a tight loop, REPL-style host), LRU
eviction caps memory at jit_budget compiled closures × ~1KB each.
Verification: 4771 passed, 1111 failed in run_tests — identical to
pre-Phase-2 baseline. No regressions.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
OCaml kernel changes:
sx_types.ml:
- Add l_call_count : int field to lambda type — counts how many times
a named lambda has been invoked through the VM dispatch path.
- Add module-level refs jit_threshold (default 4), jit_compiled_count,
jit_skipped_count, jit_threshold_skipped_count for stats.
Refs live here (not sx_vm) so sx_primitives can read them without
creating a sx_primitives → sx_vm dependency cycle.
sx_vm.ml:
- In the Lambda case of cek_call_or_suspend, before triggering the JIT,
increment l.l_call_count. Only call jit_compile_ref if count >= the
runtime-tunable threshold. Below threshold, fall through to the
existing cek_call_or_suspend path (interpreter-style).
sx_primitives.ml:
- Register jit-stats — returns dict {threshold, compiled, compile-failed,
below-threshold}.
- Register jit-set-threshold! N — change threshold at runtime.
- Register jit-reset-counters! — zero the stats counters.
bin/run_tests.ml:
- Add l_call_count = 0 to the test-fixture lambda construction.
Effect: lambdas only get JIT-compiled after the 4th invocation. One-shot
lambdas (test harness wrappers, eval-hs throwaways, REPL inputs) never enter
the JIT cache, eliminating the cumulative slowdown that the batched runner
currently works around. Hot paths (component renders, event handlers) cross
the threshold within a handful of calls and get the full JIT speed.
Phase 2 (LRU eviction) and Phase 3 (jit-reset! / jit-clear-cold!) follow.
Verified: 4771 passed, 1111 failed in OCaml run_tests.exe — identical to
baseline before this change. No regressions; tiered logic is correct.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
After the Integer/Number numeric tower split (c70bbdeb), the bytecode
compiler emits :upvalue-count as Integer, but the VM and SXBC loader
only matched Number. The fallback `_ -> 0` made the VM skip reading
upvalue descriptors entirely, so the IP advanced into raw upvalue
bytes which were then misread as opcodes.
Symptom: JIT runs of nested closures (curried functions, Y combinator,
component bodies that close over outer let-bindings) produced "VM:
CONST index N out of bounds (pool size M)" with N values like 256,
4096, 5120, 12800, 13056 — all of the form `byte | (opcode << 8)`,
i.e. an upvalue descriptor (lo) followed by the next instruction's
opcode (hi) being read as a u16 operand.
Fix all five sites that decode upvalue-count to also accept Integer:
- hosts/ocaml/lib/sx_vm.ml: OP_CLOSURE handler, trace_run, disassemble
- hosts/ocaml/lib/sx_vm_ref.ml + hosts/ocaml/sx_vm_ref.ml + bootstrap_vm.py:
vm_create_closure preamble (the bootstrap source-of-truth and both
generated copies)
- hosts/ocaml/browser/sx_browser.ml: SXBC loader's parse_kv
Test impact: JIT 3848 -> 4538 passing (+690). No-JIT unchanged at 4550.
The previously-failing curried/Y/higher-order tests in
spec/tests/test-cek-advanced.sx now pass under --jit and serve as
regression coverage.
This fixes a real current bug. The 28-day-old memory file describing
parser-combinator JIT bugs predates the numeric tower split and
described a different problem; with this fix the parser-combinator
broken-name list (`_jit_is_broken_name` in sx_vm.ml) is no longer
strictly required for correctness, but keeping it avoids a TIMEOUT
regression in one hyperscript test, so it remains in place.
The bytecode compiler emitted OP_CALL_PRIM (52) for every primitive call, even
for arithmetic and comparison hot-paths. The VM had specialized opcodes
(OP_ADD, OP_SUB, OP_EQ, etc.) defined but unused.
- lib/compiler.sx (compile-call): emit specialized 1-byte opcode when the
primitive name + arity matches one of {+, -, *, /, =, <, >, cons, not, len,
first, rest}. Falls back to CALL_PRIM otherwise. fib bytecode: 50 → 38 bytes.
- hosts/ocaml/lib/sx_compiler.ml: mirror change in the auto-generated OCaml
compiler so SXBC export from mcp_tree uses the same emission.
- hosts/ocaml/lib/sx_vm.ml: extend OP_ADD/SUB/MUL/DIV to handle Integer+Integer
(not just Number+Number). Inline OP_EQ via Sx_runtime._fast_eq. Inline
OP_LT/GT mixed-numeric comparisons. Avoids Hashtbl lookup on the fallback
path for the common integer cases that dominate tight loops.
- hosts/ocaml/bin/bench_vm.ml: VM-only benchmark — loads compiler.sx via CEK,
JIT-compiles each fn, measures Sx_vm.call_closure throughput.
Median improvements (best of 3 runs of 9-min, bench_vm.exe):
fib(22) 107.87ms → 33.13ms -69%
loop(200000) 429.64ms → 161.16ms -62%
sum-to(50000) 72.85ms → 36.74ms -50%
count-lt(20000) 28.44ms → 17.58ms -38%
count-eq(20000) 37.23ms → 15.46ms -58%
Tests: 4550/4550 OCaml passing (unchanged). Zero regressions.
Last step in the sx-improvements roadmap — all 14 steps complete.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
In `resume_vm`'s `restore_reuse`, the saved sp captured by
`call_closure_reuse` was ignored when restoring the caller frame after the
async callback finished. The suspended callee's locals/temps stayed on the
value stack above saved_sp, so subsequent LOCAL_GET/SET in the caller
frame (e.g. letrec sibling bindings waiting on the suspending call) read
stale callee data instead of their own slots. Sibling bindings appeared
nil after a perform/resume cycle on the JIT path used by the WASM
browser kernel.
Fix: after popping the callback result and restoring saved_frames, reset
`vm.sp <- saved_sp` (when sp is above), then push the callback result.
Mirrors the OP_RETURN+sp-reset discipline that sync `call_closure_reuse`
already follows.
New tests in `spec/tests/test-letrec-resume.sx` cover single binding,
sibling bindings, mutual recursion siblings, and nested letrec —
all four pass. Full OCaml run_tests: 4529/5868 (was 4525/5864), zero
regressions.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
In `call_closure_reuse`, the success path used a bare `pop vm` that relied on
OP_RETURN having left the stack at exactly `saved_sp + 1`. When the callee
returns a closure (or hits the bytecode-exhausted fallback path), `vm.sp` can
end up inconsistent with the parent frame's expected layout, corrupting
intermediate values such as parser combinator state in `parse-bind`/`many`/
`seq`.
Fix: read the result at the expected slot, then explicitly reset
`vm.sp <- saved_sp` before returning so the parent frame sees a clean stack
regardless of what the callee left behind.
OCaml run_tests baseline: 4525/5864 unchanged. WASM kernel tests: 24/29
unchanged. No regressions.
JIT-vs-CEK test parity: both now pass 3938/534 (identical failures).
Three fixes in sx_vm.ml + run_tests.ml:
1. OP_CALL_PRIM: fallback to Sx_primitives.get_primitive when vm.globals
misses. Primitives registered after JIT setup (host-global, host-get,
etc. bound inside run_spec_tests) become resolvable at call time.
2. jit_compile_lambda: early-exit for anonymous lambdas, nested lambdas
(closure has parent — recreated per outer call), and a known-broken
name list: parser combinators, hyperscript parse/compile orchestrators,
test helpers, compile-timeout functions, and hs loop runtime (which
uses guard/raise for break/continue). Lives inside jit_compile_lambda
so both the CEK _jit_try_call_fn hook and VM OP_CALL Lambda path
honor the skip list.
3. run_tests.ml _jit_try_call_fn: catch TIMEOUT during jit_compile_lambda.
Sentinel is set before compile, so subsequent calls skip JIT; this
ensures the first call of a suite also falls back to CEK cleanly when
compile exceeds the 5s test budget.
Also includes run_tests.ml 'reset' form helpers refactor (form-element
reset command) that was pending in the working tree.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When an error occurs during resumed VM execution (after perform/hs-wait),
resume_vm now checks the VM's handler_stack. If a handler exists (from a
compiled guard form's OP_PUSH_HANDLER), it unwinds frames and jumps to
the catch block — exactly like OP_RAISE. This enables try/catch across
async perform/resume boundaries.
The guard form compiles to OP_PUSH_HANDLER which lives on the vm struct
and survives across setTimeout-based async resume. Previously, errors
during resume escaped to the JS console as unhandled exceptions.
Also restored guard in the test runner (was cek-try which doesn't survive
async) and restored error-throwing assertions in run-action.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The guard form (call/cc + handler-bind expansion) doesn't survive async
IO suspension — the CEK continuation from guard's call/cc captures frames
that become invalid after the VM resumes from hs-wait. Replacing guard
with cek-try (which compiles to VM-native OP_PUSH_HANDLER/OP_POP_HANDLER)
avoids the CEK boundary crossing.
The test runner now executes: suspends on hs-wait, resumes, runs test
actions, and test assertions fire correctly. The "Not callable: nil"
error is eliminated. Remaining: test assertion errors from iframe content
not loading fast enough (timing issue, not a framework bug).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Not callable: nil error happens on a stub VM (frames=[], sp=0) during
cek_resume with 12 CEK kont frames. The error is from a reactive signal
subscriber (reset! current ...) that triggers during run vm after resume.
The subscriber callback goes through CEK via cek_call_or_suspend and the
CEK continuation tries to call nil.
This is a reactive subscriber notification issue, not a perform/resume
frame management issue. The VM frames are correctly restored — the error
happens during a synchronous reset! call within the resumed VM execution.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Root cause: when perform fires inside a VM closure chain (call_closure_reuse),
the caller frames are saved to reuse_stack on the ACTIVE VM. But the
_cek_io_suspend_hook and _cek_eval_lambda_ref create a NEW stub VM for the
VmSuspended exception. On resume, resume_vm runs on the STUB VM which has
an empty reuse_stack — the caller frames are orphaned on the original VM.
Fix: transfer reuse_stack from _active_vm to the stub VM before raising
VmSuspended. This ensures resume_vm -> restore_reuse can find and restore
the caller's frames after async resume via _driveAsync/setTimeout.
Also restore step_limit/step_count refs dropped by bootstrap.py regeneration.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The _cek_io_suspend_hook creates a stub VM to carry the suspended CEK
state. Previously used empty globals, which caused "Not callable: nil"
when the CEK resume needed platform functions. Now uses _default_vm_globals
(set to _vm_globals by sx_browser.ml) so all platform functions and
definitions are available during resume.
Remaining issue: still getting "resume: Not callable: nil" — the CEK
continuation env may not include letrec bindings from the island body.
The suspension point is inside reload-frame → hs-wait, and the resume
needs to call wait-boot (a letrec binding).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Root cause: cek_run_iterative (used by eval_expr/trampoline) raised
"IO suspension in non-IO context" when the CEK hit a perform. This
blocked IO suspension from propagating through nested eval_expr calls
(event handler → trampoline → eval_expr → for-each callback → hs-wait).
Fix: added _cek_io_suspend_hook (Sx_types) that converts CEK suspension
to VmSuspended, set by sx_vm.ml at init. cek_run_iterative now calls the
hook instead of erroring. The VmSuspended propagates to the value_to_js
wrapper which has _driveAsync handling.
+42 test passes (3924→3966), zero regressions.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
VM frame merging bug: call_closure_reuse now saves caller continuations
on a reuse_stack instead of merging frames. resume_vm restores them in
innermost-first order. Fixes frame count corruption when nested closures
suspend via OP_PERFORM. Zero test regressions (3924/3924).
Island hydration: hydrate-island now looks up components from (global-env)
instead of render-env, triggering the symbol resolve hook. Added JS-level
preload-island-defs that scans DOM for data-sx-island and loads definitions
from the content-addressed manifest BEFORE hydration — avoids K.load
reentrancy when the resolve hook fires inside env_get.
loadDefinitionByHash: fixed isMultiDefine check — defcomp/defisland bodies
containing nested (define ...) forms no longer suppress name insertion.
Added K.load return value checking for silent error string returns.
sx_browser.ml: resolve hook falls back to global_env.bindings when
_vm_globals miss (sync gap). Snapshot reuse_stack alongside pending_cek.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- sx_vm.ml: VM timeout now compares vm_insn_count > step_limit instead of
unconditionally throwing after 65536 instructions when limit > 0
- sx_browser.ml: Expose setStepLimit/resetStepCount APIs on SxKernel;
callFn now returns {__sx_error, message} on Eval_error instead of null
- compiler.sx: emit-set handles array-index targets (host-set! instead of
nth) and 'of' property chains (dom-set-prop with chain navigation)
- hs-run-fast.js: New Node.js test runner with step-limit timeouts,
SX-level guard for error detection, insertAdjacentHTML mock,
range selection (HS_START/HS_END), wall-clock timeout in driveAsync
- hs-debug-test.js: Single-test debugger with DOM state inspection
- hs-verify.js: Assertion verification (proves pass/fail detection works)
Test results: 415/831 (50%), up from 408/831 (49%) baseline.
Fixes: set my style["color"], set X of Y, put at end of (insertAdjacentHTML).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The infinite loops in the HS parser are in transpiled native OCaml code,
not in the VM or CEK step loop. Neither step counters (in cek_step_loop,
cek_step, trampoline) nor VM instruction checks caught them because
the loops are in direct OCaml recursion.
Fix: SIGALRM handler raises Eval_error to break out of native loops.
Also sets step_limit flag to catch VM loops. Combined approach handles
both native OCaml recursion (alarm+raise) and VM bytecode (step check).
The alarm+raise can become unreliable after ~13 timeouts in a single
process, but handles the common case well. Reverts the fork-based
approach which lost inter-test state.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- orchestration.sx: add hs-boot-subtree! call to process-elements
- integration.sx: remove load-library! calls (browser loads via manifest)
- sx_vm.ml: add __resolve-symbol hook to OP_GLOBAL_GET for lazy loading
- compile-modules.js: add HS modules as lazy_deps in manifest
HS compilation works in browser (tokenize→parse→compile verified).
Activation pipeline partially working — hs-activate! needs debugging
(dom-get-data/dom-set-data interaction with WASM host-get on functions).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When cek_call_or_suspend runs a CEK machine for a non-bytecoded Lambda
(e.g. a thunk), _active_vm still pointed to the caller's VM. VmClosure
calls inside the CEK (e.g. hs-wait) would merge their frames with the
caller's VM via call_closure_reuse, causing the VM to skip the CEK's
remaining continuation on resume — producing wrong DOM mutation order
(+active, +active, -active instead of +active, -active, +active).
Fix: swap _active_vm with an empty isolation VM before running the CEK,
restore after. This keeps VmClosure calls on their own frame stack while
preserving js_of_ocaml exception identity (Some path, not None).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Root cause: nested cek_call_or_suspend calls on the same VM (from
synchronous callbacks like dom-listen firing handler immediately)
overwrote pending_cek before the first resume ran.
Fix: _vm_suspension_to_dict snapshots pending_cek at capture time
and restores it in the resume closure before calling resume_vm.
This ensures each suspension's CEK state is preserved regardless
of nested overwrite.
test_bytecode_repeat.js: 4/4 pass (was 3/4).
Source: 6 suspensions ✓ Bytecode: 6 suspensions ✓
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Root cause identified: nested cek_call_or_suspend calls on same VM
overwrite pending_cek. First call suspends (thunk's hs-wait), second
call from synchronous dom-listen callback overwrites before resume.
sandbox host-callback: removed _driveAsync call to prevent duplicate
resume chains. Still 3/6 in Node.js test — issue is in OCaml call
stack nesting, not JS async.
Next: prevent pending_cek overwrite in nested CEK→VM→CEK→VM chains.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Lambda calls in sx_call now go through the CEK machine instead of
returning a Thunk for the tree-walker trampoline. This lets perform/
IO suspension work everywhere — including hyperscript wait/bounce.
Key changes:
- sx_runtime: Lambda case calls _cek_eval_lambda_ref (forward ref)
- sx_vm: initializes ref with cek_step_loop + stub VM for suspension
- sx_apply_cek: VmSuspended → __vm_suspended marker dict (not exception)
- continue_with_call callable path: handles __vm_suspended with
vm-resume-frame, matching the existing JIT Lambda pattern
- sx_render: let VmSuspended propagate through try_catch
- Remove invalid io-contract test (perform now suspends, not errors)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- jit_compile_lambda: merge closure bindings into effective_globals so
GLOBAL_GET resolves variables from let/define blocks (emit-on, etc.)
- code_from_value: scan bytecode for max LOCAL_GET/SET slot to compute
vc_locals (fixes LOCAL_GET overflow in large functions like hs-parse)
3127/3127 no-JIT, 3116/3127 JIT (11 hyperscript on-event: specific
bytecode correctness issue in recursive parser — wrong branch taken
strips on/event-name from result).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- jit_compile_lambda: call compile directly via VM when it has bytecode
(100-400x faster JIT compilation, server pre-warm 1.6s vs hung)
- code_from_value: scan bytecode for highest LOCAL_GET/SET slot to
compute vc_locals correctly (fixes hyperscript LOCAL_GET overflow)
- code_from_value: accept both compiler keys (bytecode) and SX VM
keys (vc-bytecode) for interop
- jit_compile_lambda: skip &key/:as params (compiler can't emit them)
- Test runner: seed VM globals with primitives + env bindings,
native vm-execute-module with suspension fallback to SX version,
_jit_refresh_globals syncs globals after module loading,
VmSuspended + "VM undefined" caught and sentineled
3127/3127 without JIT, 3116/3127 with JIT (11 hyperscript on-event
parsing — specific closure/scope issue, not infrastructure).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Break up the 1735-line handle_tool match into 45 individual handler functions
with hashtable-based dispatch. Add mtime-based file parse caching (AST + CST),
consolidated run_command helper replacing 9 bare open_process_in patterns,
require_file/require_dir input validation, and pagination (limit/offset) for
sx_find_across, sx_comp_list, sx_comp_usage. Also includes pending VM changes:
rest-arity support, hyperscript parser, compiler/transpiler updates.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three interacting JIT bugs caused infinite loops and server hangs:
1. _jit_compiling cascade: the re-entrancy flag was local to each
binary's hook. When vm_call triggered JIT compilation internally,
compiler functions got JIT-compiled during compilation, creating
infinite cascades. Fix: shared _jit_compiling flag in sx_vm.ml,
set in jit_compile_lambda itself.
2. call_closure always created new VMs: every HO primitive callback
(for-each, map, filter) allocated a fresh VM. With 43K+ calls
during compilation, this was the direct cause of hangs. Fix:
call_closure_reuse reuses the active VM by isolating frames and
running re-entrantly. VmSuspended is handled by merging frames
for proper IO resumption.
3. vm_call for compiled Lambdas: OP_CALL dispatching to a Lambda
with cached bytecode created a new VM instead of pushing a frame
on the current one. Fix: push_closure_frame directly.
Additional MCP server fixes:
- Hot-reload: auto-execv when binary on disk is newer (no restart needed)
- Robust JSON: to_int_safe/to_int_or handle null, string, int params
- sx_summarise depth now optional (default 2)
- Per-request error handling (malformed JSON doesn't crash server)
- sx_test uses pre-built binary (skips dune rebuild overhead)
- Timed module loading for startup diagnostics
sx_server.ml fixes:
- Uses shared _jit_compiling flag
- Marks lambdas as jit_failed_sentinel on compile failure (no retry spam)
- call_closure_reuse with VmSuspended frame merging for IO support
Compiled compiler bytecode bug: deeply nested cond/case/let forms
(e.g. tw-resolve-style) cause the compiled compiler to loop.
Workaround: _jit_compiling guard prevents compiled function execution
during compilation. Compilation uses CEK (slower but correct).
Test suite: 3127/3127 passed.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The previous commit fixed lib/vm.sx (SX spec) but the server uses
sx_vm.ml (hand-maintained native OCaml) and sx_vm_ref.ml (transpiled).
Both had the same globals-first lookup bug. Now all three implementations
check closure env before vm.globals, matching vm-global-set.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tier 1 — Component keyword dispatch on VM:
- Components/islands JIT-compile bodies via jit_compile_comp
- parse_keyword_args matches keyword names against component params
- Added i_compiled field to island type for JIT cache
- Component calls no longer fall back to CEK
Tier 2 — OP_SWAP (opcode 7):
- New stack swap operation for future HO loop compilation
- HO forms already efficient via NativeFn + VmClosure callbacks
Tier 3 — Exception handler stack:
- OP_PUSH_HANDLER (35), OP_POP_HANDLER (36), OP_RAISE (37)
- VM gains handler_stack with frame depth tracking
- Compiler handles guard and raise as bytecode
- Functions with exception handling no longer cause JIT failure
Tier 4 — Scope forms as bytecode:
- Compiler handles provide, context, peek, scope, provide!,
bind, emit!, emitted via CALL_PRIM sequences
- Functions using reactive scope no longer trigger JIT failure
4 new opcodes (SWAP, PUSH_HANDLER, POP_HANDLER, RAISE) → 37 total.
2776/2776 tests pass, zero regressions.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Bytecode compiler now emits OP_PERFORM for (import ...) and compiles
(define-library ...) bodies. The VM stores the import request in
globals["__io_request"] and stops the run loop — no exceptions needed.
vm-execute-module returns a suspension dict, vm-resume-module continues.
Browser: sx_browser.ml detects suspension dicts from execute_module and
returns JS {suspended, op, request, resume} objects. The sx-platform.js
while loop handles cascading suspensions via handleImportSuspension.
13 modules load via .sxbc bytecode in 226ms (manifest-driven), both
islands hydrate, all handlers wired. 2650/2650 tests pass including
6 new vm-import-suspension tests.
Also: consolidated sx-platform-2.js → sx-platform.js, fixed
vm-execute-module missing code-from-value call, fixed bootstrap.py
protocol registry transpiler issues.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Seed all primitives into vm_globals as NativeFn values at init.
CALL_PRIM now looks up vm.globals only (not the separate primitives
table). This means OP_DEFINE and registerNative naturally override
primitives — browser.sx's (define set-cookie ...) now takes effect.
The primitives Hashtbl remains for the compiler's primitive? predicate
but has no runtime dispatch role.
Tests: 2435 pass / 64 fail (pre-existing), vs 1718/771 baseline.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Root cause: sx_primitives.ml registered "effect" as a native no-op (for SSR).
The bytecode compiler's (primitive? "effect") returned true, so it emitted
OP_CALL_PRIM instead of OP_GLOBAL_GET + OP_CALL. The VM's CALL_PRIM handler
found the native Nil-returning stub and never called the real effect function
from core-signals.sx.
Fix: Remove effect and register-in-scope from the primitives table. The server
overrides them via env_bind in sx_server.ml (after compilation), which doesn't
affect primitive? checks.
Also: VM CALL_PRIM now falls back to cek_call for non-NativeFn values (safety
net for any other functions that get misclassified).
15/15 source mode, 15/15 bytecode mode.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
JIT compiler:
- Fix jit_compile_lambda: resolve `compile` via symbol lookup in env
instead of embedding VmClosure in AST (CEK dispatches differently)
- Register eval-defcomp/eval-defisland/eval-defmacro runtime helpers
in browser kernel for bytecoded defcomp forms
- Disable broken .sxbc.json path (missing arity in nested code blocks),
use .sxbc text format only
- Mark JIT-failed closures as sentinel to stop retrying
CSSX in browser:
- Add cssx.sx symlink + cssx.sxbc to browser web stack
- Add flush-cssx! to orchestration.sx post-swap for SPA nav
- Add cssx.sx to compile-modules.js and mcp_tree.ml bytecode lists
SPA navigation:
- Fix double-fetch: check e.defaultPrevented in click delegation
(bind-event already handled the click)
- Fix layout destruction: change nav links from outerHTML to innerHTML
swap (outerHTML destroyed #main-panel when response lacked it)
- Guard JS popstate handler when SX engine is booted
- Rename sx-platform.js → sx-platform-2.js to bust immutable cache
Playwright tests:
- Add trackErrors() helper to all test specs
- Add SPA DOM comparison test (SPA nav vs fresh load)
- Add single-fetch + no-duplicate-elements test
- Improve MCP tool output: show failure details and error messages
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Root cause: _env_bind_hook mirrored ALL env_bind calls (including
lambda parameter bindings) to the shared VM globals table. Factory
functions like make-page-fn that return closures capturing different
values for the same param names (default-name, prefix, suffix) would
have the last call's values overwrite all previous closures' captured
state in globals. OP_GLOBAL_GET reads globals first, so all closures
returned the last factory call's values.
Fix: only sync root-env bindings (parent=None) to VM globals. Lambda
parameter bindings stay in their local env, found via vm_closure_env
fallback in OP_GLOBAL_GET.
Also in this commit:
- OP_CLOSURE propagates parent vm_closure_env to child closures
- Remove JIT globals injection (closure vars found via env chain)
- sx_server.ml: SX-Request header → returns text/sx (aser only)
- sx_server.ml: diagnostic endpoint GET /sx/_debug/{env,eval,route}
- sx_server.ml: page helper stubs for deep page rendering
- sx_server.ml: skip client-libs/ dir (browser-only definitions)
- adapter-html.sx: unknown components → HTML comment (not error)
- sx-platform.js: .sxbc fallback loader for bytecode modules
- Delete sx_http.ml (standalone HTTP server, unused)
- Delete stale .sxbc.json files (arity=0 bug, replaced by .sxbc)
- 7 new closure isolation tests in test-closure-isolation.sx
- mcp_tree.ml: emit arity + upvalue-count in .sxbc.json output
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Browser kernel:
- Add `parse` native fn (matches server: unwrap single, list for multiple)
- Restore env==global_env guard on _env_bind_hook (let bindings must not
leak to _vm_globals — caused JIT CSSX "Not callable: nil" errors)
- Add _env_bind_hook call in env_set_id so set! mutations sync to VM globals
- Fire _vm_global_set_hook from OP_DEFINE so VM defines sync back to CEK env
CEK evaluator:
- Replace recursive cek_run with iterative while loop using sx_truthy
(previous attempt used strict Bool true matching, broke in wasm_of_ocaml)
- Remove dead cek_run_iterative function
Web modules:
- Remove find-matching-route and parse-route-pattern stubs from
boot-helpers.sx that shadowed real implementations from router.sx
- Sync boot-helpers.sx to dist/static dirs for bytecode compilation
Platform (sx-platform.js):
- Set data-sx-ready attribute after boot completes (was only in boot-init
which sx-platform.js doesn't call — it steps through boot manually)
- Add document-level click delegation for a[sx-get] links as workaround
for bytecoded bind-event not attaching per-element listeners (VM closure
issue under investigation — bind-event runs but dom-add-listener calls
don't result in addEventListener)
Tests:
- New test_kernel.js: 24 tests covering env sync, parse, route matching,
host FFI/preventDefault, deep recursion
- New navigation test: "sx-get link fetches SX not HTML and preserves layout"
(currently catches layout breakage after SPA swap — known issue)
Known remaining issues:
- JIT CSSX failures: closure-captured variables resolve to nil in VM bytecode
- SPA content swap via execute-request breaks page layout
- Bytecoded bind-event doesn't attach per-element addEventListener (root
cause unknown — when listen-target guard appears to block despite element
being valid)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The optimization to call the compiler through VM directly (instead
of CEK) when it's JIT-compiled was producing incorrect bytecode for
all subsequently compiled functions, causing "Expected number, got
symbol" errors across render-to-html, parse-loop, etc.
Revert to always using CEK for compilation. The compiler runs via
CEK which is slower but produces correct bytecode. JIT-compiled
USER functions still run at VM speed.
1166 passed, 0 failed.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>