sx: step 14 — inline JIT primitives (-69% fib, -62% loop, -50% sum on bench_vm)

The bytecode compiler emitted OP_CALL_PRIM (52) for every primitive call, even
for arithmetic and comparison hot-paths. The VM had specialized opcodes
(OP_ADD, OP_SUB, OP_EQ, etc.) defined but unused.

- lib/compiler.sx (compile-call): emit specialized 1-byte opcode when the
  primitive name + arity matches one of {+, -, *, /, =, <, >, cons, not, len,
  first, rest}. Falls back to CALL_PRIM otherwise. fib bytecode: 50 → 38 bytes.
- hosts/ocaml/lib/sx_compiler.ml: mirror change in the auto-generated OCaml
  compiler so SXBC export from mcp_tree uses the same emission.
- hosts/ocaml/lib/sx_vm.ml: extend OP_ADD/SUB/MUL/DIV to handle Integer+Integer
  (not just Number+Number). Inline OP_EQ via Sx_runtime._fast_eq. Inline
  OP_LT/GT mixed-numeric comparisons. Avoids Hashtbl lookup on the fallback
  path for the common integer cases that dominate tight loops.
- hosts/ocaml/bin/bench_vm.ml: VM-only benchmark — loads compiler.sx via CEK,
  JIT-compiles each fn, measures Sx_vm.call_closure throughput.

Median improvements (best of 3 runs of 9-min, bench_vm.exe):
  fib(22)         107.87ms →  33.13ms   -69%
  loop(200000)    429.64ms → 161.16ms   -62%
  sum-to(50000)    72.85ms →  36.74ms   -50%
  count-lt(20000)  28.44ms →  17.58ms   -38%
  count-eq(20000)  37.23ms →  15.46ms   -58%

Tests: 4550/4550 OCaml passing (unchanged). Zero regressions.

Last step in the sx-improvements roadmap — all 14 steps complete.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-07 02:38:47 +00:00
parent 4cb5302232
commit 6c171d4906
6 changed files with 262 additions and 10 deletions

View File

@@ -3,6 +3,17 @@
Language-building improvements to the SX evaluator, compiler, and standard library.
Ordered by impact and prerequisite chain. Each step is one loop commit.
## Roadmap complete (2026-05-07)
All 14 steps shipped in 14 commits on the `architecture` branch. Phase 1 (bug fixes:
JIT closures, letrec+resume), Phase 2 (E38 source info — subsumed by tokenizer fix),
Phase 3 (native ADTs: AdtValue, define-type, match, exhaustiveness on both hosts),
Phase 4 (parser/compiler plugin registry + worker), Phase 5 (perf: frame-records via
prim_call fast path, buffer-based serializer, JIT inline opcodes). Cumulative
performance wins on hot benchmarks: CEK fib -66% / loop -69% / reduce -86% (Step 12);
inspect tree-d10 -80% / dict-1000 -61% (Step 13); VM JIT fib -69% / loop -62% / sum
-50% / count-lt -38% / count-eq -58% (Step 14). Test suite: 4550/4550 OCaml.
Branch: `architecture`. SX files via `sx-tree` MCP only. Never edit generated files.
## Current baseline (2026-05-06)
@@ -243,6 +254,34 @@ OCaml: 4545 → 4550. JS: 2591 → 2596. Zero regressions.
opcodes that skip the primitive table lookup for the most common calls. Compiler emits
these when operands are known numbers/lists.
**Outcome:** The opcodes (`OP_ADD`=160, `OP_SUB`=161, `OP_MUL`=162, `OP_DIV`=163,
`OP_EQ`=164, `OP_LT`=165, `OP_GT`=166, `OP_NOT`=167, `OP_LEN`=168, `OP_FIRST`=169,
`OP_REST`=170, `OP_CONS`=172) already existed in `sx_vm.ml` but the compiler never
emitted them — every primitive call went through `OP_CALL_PRIM` (52) with a Hashtbl
lookup. Two changes:
1. **`lib/compiler.sx` `compile-call`**: when the primitive name + arity matches a
specialized opcode, emit the 1-byte opcode (no name index, no argc operand)
instead of the 4-byte CALL_PRIM. Bytecode for `fib` shrank from 50→38 bytes.
2. **`hosts/ocaml/lib/sx_vm.ml` opcode bodies**: extended `OP_ADD/SUB/MUL/DIV` to
handle `Integer + Integer` (was `Number + Number` only — defaulted to Hashtbl
for the common integer case). Inlined `OP_EQ` to call `Sx_runtime._fast_eq`
directly. Inlined `OP_LT/GT` integer + mixed-numeric comparisons.
Median improvements (`bin/bench_vm.exe`, best of 3 runs of 9-min):
| Benchmark | Baseline (best min) | After (best min) | Change |
|------------------|---------------------|------------------|-------:|
| fib(22) | 107.87ms | 33.13ms | -69% |
| loop(200000) | 429.64ms | 161.16ms | -62% |
| sum-to(50000) | 72.85ms | 36.74ms | -50% |
| count-lt(20000) | 28.44ms | 17.58ms | -38% |
| count-eq(20000) | 37.23ms | 15.46ms | -58% |
Tests: 4550/4550 passing (unchanged baseline). Zero regressions. Benchmark binary:
`bin/bench_vm.exe` (loads `lib/compiler.sx` via CEK, JIT-compiles each test fn,
measures `Sx_vm.call_closure` time on the compiled `vm_closure`).
---
## Progress log
@@ -262,7 +301,7 @@ these when operands are known numbers/lists.
| 11 — plugin migration + worker | [x] | 6328b810 |
| 12 — frame records | [x] | a66c0f66 (fib -66%, loop -69%, reduce -86% via prim_call fast path) |
| 13 — buffer primitive | [x] | 0e022ab6 (inspect rewrite: tree-d10 -80%, tree-d8 -76%, dict-1000 -61%, list-2000 -55%) |
| 14 — inline primitives JIT | [ ] | |
| 14 — inline primitives JIT | [x] | (pending) (fib -69%, loop -62%, sum -50%, count-lt -38%, count-eq -58% via specialized opcode emission) |
---