sx: step 14 — inline JIT primitives (-69% fib, -62% loop, -50% sum on bench_vm)
The bytecode compiler emitted OP_CALL_PRIM (52) for every primitive call, even
for arithmetic and comparison hot-paths. The VM had specialized opcodes
(OP_ADD, OP_SUB, OP_EQ, etc.) defined but unused.
- lib/compiler.sx (compile-call): emit specialized 1-byte opcode when the
primitive name + arity matches one of {+, -, *, /, =, <, >, cons, not, len,
first, rest}. Falls back to CALL_PRIM otherwise. fib bytecode: 50 → 38 bytes.
- hosts/ocaml/lib/sx_compiler.ml: mirror change in the auto-generated OCaml
compiler so SXBC export from mcp_tree uses the same emission.
- hosts/ocaml/lib/sx_vm.ml: extend OP_ADD/SUB/MUL/DIV to handle Integer+Integer
(not just Number+Number). Inline OP_EQ via Sx_runtime._fast_eq. Inline
OP_LT/GT mixed-numeric comparisons. Avoids Hashtbl lookup on the fallback
path for the common integer cases that dominate tight loops.
- hosts/ocaml/bin/bench_vm.ml: VM-only benchmark — loads compiler.sx via CEK,
JIT-compiles each fn, measures Sx_vm.call_closure throughput.
Median improvements (best of 3 runs of 9-min, bench_vm.exe):
fib(22) 107.87ms → 33.13ms -69%
loop(200000) 429.64ms → 161.16ms -62%
sum-to(50000) 72.85ms → 36.74ms -50%
count-lt(20000) 28.44ms → 17.58ms -38%
count-eq(20000) 37.23ms → 15.46ms -58%
Tests: 4550/4550 OCaml passing (unchanged). Zero regressions.
Last step in the sx-improvements roadmap — all 14 steps complete.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -3,6 +3,17 @@
|
||||
Language-building improvements to the SX evaluator, compiler, and standard library.
|
||||
Ordered by impact and prerequisite chain. Each step is one loop commit.
|
||||
|
||||
## Roadmap complete (2026-05-07)
|
||||
|
||||
All 14 steps shipped in 14 commits on the `architecture` branch. Phase 1 (bug fixes:
|
||||
JIT closures, letrec+resume), Phase 2 (E38 source info — subsumed by tokenizer fix),
|
||||
Phase 3 (native ADTs: AdtValue, define-type, match, exhaustiveness on both hosts),
|
||||
Phase 4 (parser/compiler plugin registry + worker), Phase 5 (perf: frame-records via
|
||||
prim_call fast path, buffer-based serializer, JIT inline opcodes). Cumulative
|
||||
performance wins on hot benchmarks: CEK fib -66% / loop -69% / reduce -86% (Step 12);
|
||||
inspect tree-d10 -80% / dict-1000 -61% (Step 13); VM JIT fib -69% / loop -62% / sum
|
||||
-50% / count-lt -38% / count-eq -58% (Step 14). Test suite: 4550/4550 OCaml.
|
||||
|
||||
Branch: `architecture`. SX files via `sx-tree` MCP only. Never edit generated files.
|
||||
|
||||
## Current baseline (2026-05-06)
|
||||
@@ -243,6 +254,34 @@ OCaml: 4545 → 4550. JS: 2591 → 2596. Zero regressions.
|
||||
opcodes that skip the primitive table lookup for the most common calls. Compiler emits
|
||||
these when operands are known numbers/lists.
|
||||
|
||||
**Outcome:** The opcodes (`OP_ADD`=160, `OP_SUB`=161, `OP_MUL`=162, `OP_DIV`=163,
|
||||
`OP_EQ`=164, `OP_LT`=165, `OP_GT`=166, `OP_NOT`=167, `OP_LEN`=168, `OP_FIRST`=169,
|
||||
`OP_REST`=170, `OP_CONS`=172) already existed in `sx_vm.ml` but the compiler never
|
||||
emitted them — every primitive call went through `OP_CALL_PRIM` (52) with a Hashtbl
|
||||
lookup. Two changes:
|
||||
|
||||
1. **`lib/compiler.sx` `compile-call`**: when the primitive name + arity matches a
|
||||
specialized opcode, emit the 1-byte opcode (no name index, no argc operand)
|
||||
instead of the 4-byte CALL_PRIM. Bytecode for `fib` shrank from 50→38 bytes.
|
||||
2. **`hosts/ocaml/lib/sx_vm.ml` opcode bodies**: extended `OP_ADD/SUB/MUL/DIV` to
|
||||
handle `Integer + Integer` (was `Number + Number` only — defaulted to Hashtbl
|
||||
for the common integer case). Inlined `OP_EQ` to call `Sx_runtime._fast_eq`
|
||||
directly. Inlined `OP_LT/GT` integer + mixed-numeric comparisons.
|
||||
|
||||
Median improvements (`bin/bench_vm.exe`, best of 3 runs of 9-min):
|
||||
|
||||
| Benchmark | Baseline (best min) | After (best min) | Change |
|
||||
|------------------|---------------------|------------------|-------:|
|
||||
| fib(22) | 107.87ms | 33.13ms | -69% |
|
||||
| loop(200000) | 429.64ms | 161.16ms | -62% |
|
||||
| sum-to(50000) | 72.85ms | 36.74ms | -50% |
|
||||
| count-lt(20000) | 28.44ms | 17.58ms | -38% |
|
||||
| count-eq(20000) | 37.23ms | 15.46ms | -58% |
|
||||
|
||||
Tests: 4550/4550 passing (unchanged baseline). Zero regressions. Benchmark binary:
|
||||
`bin/bench_vm.exe` (loads `lib/compiler.sx` via CEK, JIT-compiles each test fn,
|
||||
measures `Sx_vm.call_closure` time on the compiled `vm_closure`).
|
||||
|
||||
---
|
||||
|
||||
## Progress log
|
||||
@@ -262,7 +301,7 @@ these when operands are known numbers/lists.
|
||||
| 11 — plugin migration + worker | [x] | 6328b810 |
|
||||
| 12 — frame records | [x] | a66c0f66 (fib -66%, loop -69%, reduce -86% via prim_call fast path) |
|
||||
| 13 — buffer primitive | [x] | 0e022ab6 (inspect rewrite: tree-d10 -80%, tree-d8 -76%, dict-1000 -61%, list-2000 -55%) |
|
||||
| 14 — inline primitives JIT | [ ] | — |
|
||||
| 14 — inline primitives JIT | [x] | (pending) (fib -69%, loop -62%, sum -50%, count-lt -38%, count-eq -58% via specialized opcode emission) |
|
||||
|
||||
---
|
||||
|
||||
|
||||
Reference in New Issue
Block a user