When a `perform` fired inside a tree-walked eval_expr path — sf_letrec init
exprs / non-last body exprs, expand_macro body, qq_expand unquote,
sf_dynamic_wind / sf_scope / sf_provide bodies — cek_run raised
"IO suspension in non-IO context" and swallowed the suspension. The hook
that converts the CEK suspended state to VmSuspended (so the outer driver
sees it as a resumable suspension object) was defined in sx_vm.ml but
never invoked from cek_run.
Repro in Node.js (hosts/ocaml/browser/test_letrec_resume.js):
(letrec ((x (perform {:op "io"}))) "ok") ;; threw the error
(letrec ((x 1)) (perform {:op "io"}) "after") ;; threw the error
The originally reported browser symptom — "[sx] resume: Not callable: nil"
after hs-wait resumes inside a letrec — was the same root cause showing
through the JIT/VM resume path instead of as a top-level error.
Fix: cek_run and cek_run_iterative now check !_cek_io_suspend_hook and
invoke it when the loop terminates in a suspended state. The hook (set by
sx_vm.ml in the browser, by run_tests.ml in the test runner) converts the
suspension to VmSuspended / resolves IO synchronously. When the hook is
unset (pure-CEK harness), the legacy Eval_error is raised so misuse stays
visible.
Also patches:
- hosts/ocaml/bootstrap.py — regex-patches the transpiled cek_run on regen
so the fix survives a fresh `python3 hosts/ocaml/bootstrap.py` cycle.
- hosts/ocaml/browser/sx_browser.ml — api_eval / api_eval_vm / api_eval_expr
now catch VmSuspended and surface a clean error string (K.eval has no
driver to resume; callers who want resumption use callFn).
Tests:
- spec/tests/test-letrec-resume-treewalk.sx — 7 CEK-level regression tests
covering letrec init / non-last body, scope/provide bodies, sibling
fn-after-perform. All 7 fail in baseline ("IO suspension in non-IO
context"), all 7 pass with the fix.
- hosts/ocaml/browser/test_letrec_resume.js — 13 WASM kernel tests via
callFn driveSync, including the wait-boot pattern from the briefing.
All 13 pass.
Suite results: 4557 pass / 1338 fail (was 4550 / 1339); +7 new passes,
-1 flaky timeout (hs-upstream-if sieve), no regressions.
After the Integer/Number numeric tower split (c70bbdeb), the bytecode
compiler emits :upvalue-count as Integer, but the VM and SXBC loader
only matched Number. The fallback `_ -> 0` made the VM skip reading
upvalue descriptors entirely, so the IP advanced into raw upvalue
bytes which were then misread as opcodes.
Symptom: JIT runs of nested closures (curried functions, Y combinator,
component bodies that close over outer let-bindings) produced "VM:
CONST index N out of bounds (pool size M)" with N values like 256,
4096, 5120, 12800, 13056 — all of the form `byte | (opcode << 8)`,
i.e. an upvalue descriptor (lo) followed by the next instruction's
opcode (hi) being read as a u16 operand.
Fix all five sites that decode upvalue-count to also accept Integer:
- hosts/ocaml/lib/sx_vm.ml: OP_CLOSURE handler, trace_run, disassemble
- hosts/ocaml/lib/sx_vm_ref.ml + hosts/ocaml/sx_vm_ref.ml + bootstrap_vm.py:
vm_create_closure preamble (the bootstrap source-of-truth and both
generated copies)
- hosts/ocaml/browser/sx_browser.ml: SXBC loader's parse_kv
Test impact: JIT 3848 -> 4538 passing (+690). No-JIT unchanged at 4550.
The previously-failing curried/Y/higher-order tests in
spec/tests/test-cek-advanced.sx now pass under --jit and serve as
regression coverage.
This fixes a real current bug. The 28-day-old memory file describing
parser-combinator JIT bugs predates the numeric tower split and
described a different problem; with this fix the parser-combinator
broken-name list (`_jit_is_broken_name` in sx_vm.ml) is no longer
strictly required for correctness, but keeping it avoids a TIMEOUT
regression in one hyperscript test, so it remains in place.
11 new SX primitives in sx_primitives.ml wrapping Unix.openfile/read/write/
lseek/set_nonblock: channel-open/close/read/read-line/write/flush/seek/tell/
eof?/blocking?/set-blocking!.
Tcl runtime now uses real channel ops:
- open ?-mode? returns "fileN" handle (modes r/w/a/r+/w+/a+)
- close/read/gets/puts/seek/tell/eof/flush wired through
- new fconfigure command supports -blocking 0|1
- puts dispatches to channel-write when first arg starts with "file"
- gets command registration fixed (was pointing to old stub)
eof-returns-1 coro test updated to match real Tcl semantics (eof flips
only after a read hits EOF).
Test runner timeout bumped 180s→1200s (post-merge JIT is slow).
+7 idiom tests covering write+read, gets-loop, seek/tell, eof-after-read,
append mode, seek-to-end, fconfigure-blocking. 349/349 green.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Root cause investigation of WASM kernel timeout for tests 200, 207, 211:
verified the kernel's __hs_deadline check IS firing correctly with the
JS-side _testDeadline value. The tests were genuinely taking 60s+ because
the (raise msg) inside hs-null-error! propagated up through the JIT
continuation chain and triggered the slow host_error path (~34s per
comment in the test runner override).
The companion helpers hs-null-raise! and hs-empty-raise! already wrap
their raise in (guard (_e (true nil)) (raise msg)) so the exception
is swallowed before escaping. hs-null-error! was missing this guard —
it just did (raise (str ...)).
Fix: hs-null-error! now sets window._hs_null_error and uses the same
self-contained guard pattern. The error message is still recoverable
through the side channel, matching how the eval-hs-error override in
the test harness expects to find it.
Bumped hypertrace deadlines 8s→30s (modules-loaded JIT state has grown
since the original 8s budget was set).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The bytecode compiler emitted OP_CALL_PRIM (52) for every primitive call, even
for arithmetic and comparison hot-paths. The VM had specialized opcodes
(OP_ADD, OP_SUB, OP_EQ, etc.) defined but unused.
- lib/compiler.sx (compile-call): emit specialized 1-byte opcode when the
primitive name + arity matches one of {+, -, *, /, =, <, >, cons, not, len,
first, rest}. Falls back to CALL_PRIM otherwise. fib bytecode: 50 → 38 bytes.
- hosts/ocaml/lib/sx_compiler.ml: mirror change in the auto-generated OCaml
compiler so SXBC export from mcp_tree uses the same emission.
- hosts/ocaml/lib/sx_vm.ml: extend OP_ADD/SUB/MUL/DIV to handle Integer+Integer
(not just Number+Number). Inline OP_EQ via Sx_runtime._fast_eq. Inline
OP_LT/GT mixed-numeric comparisons. Avoids Hashtbl lookup on the fallback
path for the common integer cases that dominate tight loops.
- hosts/ocaml/bin/bench_vm.ml: VM-only benchmark — loads compiler.sx via CEK,
JIT-compiles each fn, measures Sx_vm.call_closure throughput.
Median improvements (best of 3 runs of 9-min, bench_vm.exe):
fib(22) 107.87ms → 33.13ms -69%
loop(200000) 429.64ms → 161.16ms -62%
sum-to(50000) 72.85ms → 36.74ms -50%
count-lt(20000) 28.44ms → 17.58ms -38%
count-eq(20000) 37.23ms → 15.46ms -58%
Tests: 4550/4550 OCaml passing (unchanged). Zero regressions.
Last step in the sx-improvements roadmap — all 14 steps complete.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Added short aliases make-buffer / buffer? / buffer-append! / buffer->string /
buffer-length on both OCaml and JS hosts, sharing the existing StringBuffer
value type. buffer-append! auto-coerces non-strings via inspect.
Rewrote the OCaml host inspect function to walk a single shared Buffer.t
instead of allocating O(n) intermediate strings via String.concat at every
recursion level. inspect underlies sx-serialize and error-path formatting,
so this benefits the tightest serialization paths.
Median improvements (bin/bench_inspect.exe, best-of-3 of 9-run min):
tree-d8 (75KB): 5.31ms -> 1.30ms (-76%)
tree-d10 (679KB): 81.89ms -> 16.02ms (-80%)
dict-1000: 0.80ms -> 0.31ms (-61%)
list-2000: 0.74ms -> 0.33ms (-55%)
Tests: OCaml 4545 -> 4550. JS 2591 -> 2596. Zero regressions.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>