sx: step 12 — prim_call fast path (-66% fib, -86% reduce)

CEK frames were already records (cek_frame in sx_types.ml), so the actual
hot-path bottleneck was prim_call "=" [...] in step_continue/step_eval
dispatch: each step did a Hashtbl lookup + 2x list cons + pattern match
just to compare frame-type strings.

Added a short-circuit fast path in prim_call (sx_runtime.ml) for the
hot operators: =, <, >, <=, >=, empty?, first, rest, len. These bypass
the primitives Hashtbl entirely and dispatch directly on value shape.
Inlined _fast_eq for scalar/string equality, which dominates frame-type
dispatch comparisons.

Added bin/bench_cek.exe with five tight-loop benchmarks (fib, loop,
map, reduce, let-heavy). Median of 7 runs:

  fib(18)            2789ms -> 941ms   (-66%)
  loop(5000)         2018ms -> 620ms   (-69%)
  map sq xs(1000)    108ms  -> 48ms    (-56%)
  reduce + ys(2000)  72ms   -> 10ms    (-86%)
  let-heavy(2000)    491ms  -> 271ms   (-45%)

Tests: 4545/4545 passing baseline preserved (1339 pre-existing failures
unchanged).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-07 01:46:23 +00:00
parent 1fbfdfe4ae
commit a66c0f66f0
4 changed files with 159 additions and 6 deletions

View File

@@ -189,6 +189,25 @@ These are incremental and can interleave with other phases.
tagged variant lists. Eliminates allocation pressure from list construction per frame.
Profile before/after on a tight-loop benchmark.
**Outcome:** Frames were already records (`cek_frame` in `sx_types.ml`) — the actual
hot-path bottleneck was `prim_call "=" [...]` in `step_continue`/`step_eval` dispatch:
each step did a Hashtbl lookup + 2x list cons + pattern match per comparison. Added a
fast path in `prim_call` (sx_runtime.ml) for `=`, `<`, `>`, `<=`, `>=`, `empty?`,
`first`, `rest`, `len` that skips the table lookup entirely. Also inlined `_fast_eq`
for the common scalar-equality cases that dominate frame-type dispatch. Median
improvements (bench_cek.exe, 7 runs):
| Benchmark | Before | After | Change |
|-----------|--------|-------|--------|
| fib(18) | 2789ms | 941ms | -66% |
| loop(5000) | 2018ms | 620ms | -69% |
| map sq(1000) | 108ms | 48ms | -56% |
| reduce + (2000) | 72ms | 10ms | -86% |
| let-heavy(2000) | 491ms | 271ms | -45% |
Tests: 4545 passing (unchanged baseline), 1339 failing (unchanged baseline).
Benchmark binary: `bin/bench_cek.exe`.
### Step 13: Buffer primitive for string building
Add `make-buffer`, `buffer-append!`, `buffer->string` primitives. Eliminates the
@@ -218,7 +237,7 @@ these when operands are known numbers/lists.
| 9 — parser feature registry | [x] | 986d6411 |
| 10 — compiler + as converter registry | [x] | d22361e4 |
| 11 — plugin migration + worker | [x] | 6328b810 |
| 12 — frame records | [ ] | — |
| 12 — frame records | [x] | — (fib -66%, loop -69%, reduce -86% via prim_call fast path) |
| 13 — buffer primitive | [ ] | — |
| 14 — inline primitives JIT | [ ] | — |