sx: step 14 — inline JIT primitives (-69% fib, -62% loop, -50% sum on bench_vm)

The bytecode compiler emitted OP_CALL_PRIM (52) for every primitive call, even for arithmetic and comparison hot-paths. The VM had specialized opcodes (OP_ADD, OP_SUB, OP_EQ, etc.) defined but unused. - lib/compiler.sx (compile-call): emit specialized 1-byte opcode when the primitive name + arity matches one of {+, -, *, /, =, <, >, cons, not, len, first, rest}. Falls back to CALL_PRIM otherwise. fib bytecode: 50 → 38 bytes. - hosts/ocaml/lib/sx_compiler.ml: mirror change in the auto-generated OCaml compiler so SXBC export from mcp_tree uses the same emission. - hosts/ocaml/lib/sx_vm.ml: extend OP_ADD/SUB/MUL/DIV to handle Integer+Integer (not just Number+Number). Inline OP_EQ via Sx_runtime._fast_eq. Inline OP_LT/GT mixed-numeric comparisons. Avoids Hashtbl lookup on the fallback path for the common integer cases that dominate tight loops. - hosts/ocaml/bin/bench_vm.ml: VM-only benchmark — loads compiler.sx via CEK, JIT-compiles each fn, measures Sx_vm.call_closure throughput. Median improvements (best of 3 runs of 9-min, bench_vm.exe): fib(22) 107.87ms → 33.13ms -69% loop(200000) 429.64ms → 161.16ms -62% sum-to(50000) 72.85ms → 36.74ms -50% count-lt(20000) 28.44ms → 17.58ms -38% count-eq(20000) 37.23ms → 15.46ms -58% Tests: 4550/4550 OCaml passing (unchanged). Zero regressions. Last step in the sx-improvements roadmap — all 14 steps complete. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 02:38:47 +00:00
parent 4cb5302232
commit 6c171d4906
6 changed files with 262 additions and 10 deletions
--- a/plans/sx-improvements.md
+++ b/plans/sx-improvements.md
@@ -3,6 +3,17 @@
 Language-building improvements to the SX evaluator, compiler, and standard library.
 Ordered by impact and prerequisite chain. Each step is one loop commit.

+## Roadmap complete (2026-05-07)
+
+All 14 steps shipped in 14 commits on the `architecture` branch. Phase 1 (bug fixes:
+JIT closures, letrec+resume), Phase 2 (E38 source info — subsumed by tokenizer fix),
+Phase 3 (native ADTs: AdtValue, define-type, match, exhaustiveness on both hosts),
+Phase 4 (parser/compiler plugin registry + worker), Phase 5 (perf: frame-records via
+prim_call fast path, buffer-based serializer, JIT inline opcodes). Cumulative
+performance wins on hot benchmarks: CEK fib -66% / loop -69% / reduce -86% (Step 12);
+inspect tree-d10 -80% / dict-1000 -61% (Step 13); VM JIT fib -69% / loop -62% / sum
+-50% / count-lt -38% / count-eq -58% (Step 14). Test suite: 4550/4550 OCaml.
+
 Branch: `architecture`. SX files via `sx-tree` MCP only. Never edit generated files.

 ## Current baseline (2026-05-06)
@@ -243,6 +254,34 @@ OCaml: 4545 → 4550. JS: 2591 → 2596. Zero regressions.
 opcodes that skip the primitive table lookup for the most common calls. Compiler emits
 these when operands are known numbers/lists.

+**Outcome:** The opcodes (`OP_ADD`=160, `OP_SUB`=161, `OP_MUL`=162, `OP_DIV`=163,
+`OP_EQ`=164, `OP_LT`=165, `OP_GT`=166, `OP_NOT`=167, `OP_LEN`=168, `OP_FIRST`=169,
+`OP_REST`=170, `OP_CONS`=172) already existed in `sx_vm.ml` but the compiler never
+emitted them — every primitive call went through `OP_CALL_PRIM` (52) with a Hashtbl
+lookup. Two changes:
+
+1. **`lib/compiler.sx` `compile-call`**: when the primitive name + arity matches a
+   specialized opcode, emit the 1-byte opcode (no name index, no argc operand)
+   instead of the 4-byte CALL_PRIM. Bytecode for `fib` shrank from 50→38 bytes.
+2. **`hosts/ocaml/lib/sx_vm.ml` opcode bodies**: extended `OP_ADD/SUB/MUL/DIV` to
+   handle `Integer + Integer` (was `Number + Number` only — defaulted to Hashtbl
+   for the common integer case). Inlined `OP_EQ` to call `Sx_runtime._fast_eq`
+   directly. Inlined `OP_LT/GT` integer + mixed-numeric comparisons.
+
+Median improvements (`bin/bench_vm.exe`, best of 3 runs of 9-min):
+
+| Benchmark        | Baseline (best min) | After (best min) | Change |
+|------------------|---------------------|------------------|-------:|
+| fib(22)          | 107.87ms            |  33.13ms         |  -69%  |
+| loop(200000)     | 429.64ms            | 161.16ms         |  -62%  |
+| sum-to(50000)    |  72.85ms            |  36.74ms         |  -50%  |
+| count-lt(20000)  |  28.44ms            |  17.58ms         |  -38%  |
+| count-eq(20000)  |  37.23ms            |  15.46ms         |  -58%  |
+
+Tests: 4550/4550 passing (unchanged baseline). Zero regressions. Benchmark binary:
+`bin/bench_vm.exe` (loads `lib/compiler.sx` via CEK, JIT-compiles each test fn,
+measures `Sx_vm.call_closure` time on the compiled `vm_closure`).
+
 ---

 ## Progress log
@@ -262,7 +301,7 @@ these when operands are known numbers/lists.
 | 11 — plugin migration + worker | [x] | 6328b810 |
 | 12 — frame records | [x] | a66c0f66 (fib -66%, loop -69%, reduce -86% via prim_call fast path) |
 | 13 — buffer primitive | [x] | 0e022ab6 (inspect rewrite: tree-d10 -80%, tree-d8 -76%, dict-1000 -61%, list-2000 -55%) |
-| 14 — inline primitives JIT | [ ] | — |
+| 14 — inline primitives JIT | [x] | (pending) (fib -69%, loop -62%, sum -50%, count-lt -38%, count-eq -58% via specialized opcode emission) |

 ---