perf: Phase 6 — substrate perf-regression alarm (perf-smoke)
Replaces the watchdog-bump approach with an automated check. The next 5× (or
worse) substrate regression will trip the alarm at build time instead of
hiding behind a deadline bump and only being noticed weeks later.
Components:
* lib/perf-smoke.sx — four micro-benchmarks chosen for distinct substrate
failure modes: function-call dispatch (fib), env construction (let-chain),
HO-form dispatch + lambda creation (map-sq), TCO + primitive dispatch
(tail-loop). Warm-up pass populates JIT cache before the timed pass so we
measure the steady state.
* scripts/perf-smoke.sh — pipes lib/perf-smoke.sx to sx_server.exe, parses
per-bench wall-time, asserts each is within FACTOR× of the recorded
reference (default 5×). `--update` rewrites the reference in-place.
* scripts/sx-build-all.sh — perf-smoke wired in as a post-step after JS
tests. Hard fail if any benchmark regressed beyond budget.
Reference numbers: minimum across 6 back-to-back runs on this dev machine
under typical concurrent-loop contention (load ~9, 2 vCPU, 7.6 GiB RAM,
OCaml 5.2.0, architecture @ 92f6f187). Documented in
plans/jit-perf-regression.md including how to update them.
The 5× factor is chosen so contention noise (~1–2× variance) doesn't trigger
false alarms but a real ≥5× substrate regression — the kind that motivated
this whole investigation — fails the build immediately.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -87,9 +87,30 @@ The fix depends on the diagnosed cause; this section is filled in once Phase 3 l
|
||||
|
||||
So the next quadratic blow-up doesn't hide behind a watchdog bump:
|
||||
|
||||
- [ ] Add a lightweight perf benchmark (`spec/tests/perf-smoke.sx` or similar): a small workload with a known wall-clock target (e.g. 100ms ± 50ms on a reference machine).
|
||||
- [ ] Wire it into the `sx_test` MCP tool or the `sx_build` post-step. Failing fast means a 5× slowdown trips an alarm before merge, not a 30× one after.
|
||||
- [ ] Document the reference numbers and the machine they were measured on. They will drift; that is fine. The signal is *change*, not absolute number.
|
||||
- [x] Add a lightweight perf benchmark — `lib/perf-smoke.sx`. Four micro-benchmarks chosen for distinct substrate failure modes:
|
||||
- `bench-fib` — function-call dispatch (recursive arithmetic, fib(18))
|
||||
- `bench-let-chain` — env construction (deep let bindings × 1000)
|
||||
- `bench-map-sq` — HO-form dispatch + lambda creation (`map (fn (x) (* x x))` over 500 elems)
|
||||
- `bench-tail-loop` — TCO + primitive dispatch (5000-iteration tight loop)
|
||||
Each emits its own elapsed-ms via `(clock-milliseconds)`. A warm-up pass populates JIT cache before the timed pass.
|
||||
- [x] Wire it into `scripts/sx-build-all.sh` as a post-step after the JS test suite. Failing the perf budget fails the whole build (hard fail, not log-line).
|
||||
- [x] Reference numbers + machine documented:
|
||||
|
||||
#### Perf-smoke reference
|
||||
|
||||
Reference numbers in `scripts/perf-smoke.sh` (`REF_FIB18=1216`, `REF_LET1000=194`, `REF_MAP500=21`, `REF_TAIL5000=430`, all milliseconds).
|
||||
|
||||
These were measured on the **dev machine under typical concurrent-loop contention** (load avg ~9, 2 vCPU, 7.6 GiB RAM, OCaml 5.2.0, architecture HEAD `92f6f187`). They are the **minimum across 6 back-to-back runs**, i.e. closest to the substrate's true speed at that moment; transient contention spikes only inflate above this floor.
|
||||
|
||||
The default budget multiplier is **5×** (`FACTOR=5`). Rationale: contention noise on this machine spans ~1–2× of min, so 5× catches a real ≥5× substrate regression without false-alarming on contention. Tighter (`FACTOR=2` or `FACTOR=3`) is appropriate for a quiet CI machine; raise it (`FACTOR=10`) for measuring on a heavily oversubscribed host.
|
||||
|
||||
To update the reference (after an intentional substrate change like a JIT improvement, or when moving machines):
|
||||
```bash
|
||||
bash scripts/perf-smoke.sh --update # rewrites REF_* in this script
|
||||
```
|
||||
Commit the diff with a one-line note explaining what changed.
|
||||
|
||||
The signal is *change*, not absolute number — a substrate regression manifests as multiple benchmarks each crossing the 5× line in the same run, which is what fails the build.
|
||||
|
||||
## Ground rules
|
||||
|
||||
|
||||
Reference in New Issue
Block a user