perf: Phase 6 — substrate perf-regression alarm (perf-smoke)

Replaces the watchdog-bump approach with an automated check. The next 5× (or
worse) substrate regression will trip the alarm at build time instead of
hiding behind a deadline bump and only being noticed weeks later.

Components:

* lib/perf-smoke.sx — four micro-benchmarks chosen for distinct substrate
  failure modes: function-call dispatch (fib), env construction (let-chain),
  HO-form dispatch + lambda creation (map-sq), TCO + primitive dispatch
  (tail-loop). Warm-up pass populates JIT cache before the timed pass so we
  measure the steady state.

* scripts/perf-smoke.sh — pipes lib/perf-smoke.sx to sx_server.exe, parses
  per-bench wall-time, asserts each is within FACTOR× of the recorded
  reference (default 5×). `--update` rewrites the reference in-place.

* scripts/sx-build-all.sh — perf-smoke wired in as a post-step after JS
  tests. Hard fail if any benchmark regressed beyond budget.

Reference numbers: minimum across 6 back-to-back runs on this dev machine
under typical concurrent-loop contention (load ~9, 2 vCPU, 7.6 GiB RAM,
OCaml 5.2.0, architecture @ 92f6f187). Documented in
plans/jit-perf-regression.md including how to update them.

The 5× factor is chosen so contention noise (~1–2× variance) doesn't trigger
false alarms but a real ≥5× substrate regression — the kind that motivated
this whole investigation — fails the build immediately.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-08 14:23:45 +00:00
parent 92f6f187b7
commit 59bec68dcc
4 changed files with 201 additions and 3 deletions

119
scripts/perf-smoke.sh Executable file
View File

@@ -0,0 +1,119 @@
#!/usr/bin/env bash
# perf-smoke.sh — substrate perf-regression alarm.
#
# Runs lib/perf-smoke.sx via sx_server.exe and asserts each micro-benchmark's
# wall-clock time is within REGRESSION_FACTOR× of the reference number. Exits
# 0 if all are within budget, 1 if any has regressed.
#
# Reference numbers: measured on a quiet dev machine (Linux, 2 vCPU, 7.6 GiB
# RAM, OCaml 5.2.0). Document the machine in jit-perf-regression.md when
# updating.
#
# Usage:
# bash scripts/perf-smoke.sh # check (default factor 5×)
# FACTOR=3 bash scripts/perf-smoke.sh # tighter threshold
# bash scripts/perf-smoke.sh --update # rewrite the reference numbers in
# # this script with current run's
# # numbers (use only on a quiet
# # reference machine; commit the diff)
#
# The signal is *change* relative to the reference, not absolute number.
# Drift is fine; reset the reference when the substrate changes intentionally
# (e.g. after a JIT improvement).
set -uo pipefail
cd "$(git rev-parse --show-toplevel)"
# ── Reference numbers (median of 5 runs on the reference machine) ──────────
# Update these via `bash scripts/perf-smoke.sh --update` on a quiet machine.
REF_FIB18=1216
REF_LET1000=194
REF_MAP500=21
REF_TAIL5000=430
# ── End reference numbers ──────────────────────────────────────────────────
FACTOR="${FACTOR:-5}"
SX_SERVER="${SX_SERVER:-hosts/ocaml/_build/default/bin/sx_server.exe}"
if [ ! -x "$SX_SERVER" ]; then
SX_SERVER="/root/rose-ash/hosts/ocaml/_build/default/bin/sx_server.exe"
fi
if [ ! -x "$SX_SERVER" ]; then
echo "ERROR: sx_server.exe not found. Run: cd hosts/ocaml && dune build" >&2
exit 2
fi
TMPFILE=$(mktemp)
trap "rm -f $TMPFILE" EXIT
cat > "$TMPFILE" <<'EPOCHS'
(epoch 1)
(load "lib/perf-smoke.sx")
(epoch 2)
(eval "(perf-smoke)")
EPOCHS
OUTPUT=$(timeout 60 "$SX_SERVER" < "$TMPFILE" 2>&1)
LINE=$(echo "$OUTPUT" | grep -E '^"perf-smoke ' | head -1 | tr -d '"')
if [ -z "$LINE" ]; then
echo "ERROR: no perf-smoke result line; sx_server output:" >&2
echo "$OUTPUT" | tail -20 >&2
exit 2
fi
# Parse: perf-smoke fib18=N let1000=N map500=N tail5000=N
get() { echo "$LINE" | grep -oE "$1=[0-9]+" | cut -d= -f2; }
FIB18=$(get fib18)
LET1000=$(get let1000)
MAP500=$(get map500)
TAIL5000=$(get tail5000)
if [ "${1:-}" = "--update" ]; then
echo "Measured (this run): fib18=$FIB18 let1000=$LET1000 map500=$MAP500 tail5000=$TAIL5000"
echo "Rewriting reference numbers in $0"
sed -i \
-e "s/^REF_FIB18=.*/REF_FIB18=$FIB18/" \
-e "s/^REF_LET1000=.*/REF_LET1000=$LET1000/" \
-e "s/^REF_MAP500=.*/REF_MAP500=$MAP500/" \
-e "s/^REF_TAIL5000=.*/REF_TAIL5000=$TAIL5000/" \
"$0"
echo "Done. Commit the diff."
exit 0
fi
if [ "$REF_FIB18" -eq 0 ] || [ "$REF_LET1000" -eq 0 ] || \
[ "$REF_MAP500" -eq 0 ] || [ "$REF_TAIL5000" -eq 0 ]; then
echo "WARN: reference numbers not yet set (all zero)." >&2
echo "Run \`bash scripts/perf-smoke.sh --update\` on a quiet reference machine first." >&2
echo "Measured (this run): fib18=$FIB18 let1000=$LET1000 map500=$MAP500 tail5000=$TAIL5000"
exit 0
fi
verdict() {
local name="$1" got="$2" ref="$3"
local budget=$((ref * FACTOR))
if [ "$got" -le "$budget" ]; then
printf ' ok %-12s %5d ms (ref %d, %d×)\n' "$name" "$got" "$ref" "$FACTOR"
return 0
else
printf ' FAIL %-12s %5d ms (ref %d, budget %d×=%d ms)\n' \
"$name" "$got" "$ref" "$FACTOR" "$budget"
return 1
fi
}
FAIL=0
echo "perf-smoke (factor ${FACTOR}× of reference):"
verdict fib18 "$FIB18" "$REF_FIB18" || FAIL=1
verdict let1000 "$LET1000" "$REF_LET1000" || FAIL=1
verdict map500 "$MAP500" "$REF_MAP500" || FAIL=1
verdict tail5000 "$TAIL5000" "$REF_TAIL5000" || FAIL=1
if [ "$FAIL" -eq 0 ]; then
echo "ok perf-smoke within ${FACTOR}× of reference."
exit 0
else
echo "FAIL one or more benchmarks regressed. Investigate before merging."
exit 1
fi

View File

@@ -43,4 +43,6 @@ echo "=== JS test build ==="
python3 hosts/javascript/cli.py --extensions continuations --spec-modules types --output shared/static/scripts/sx-full-test.js || { echo "FAIL: test build"; exit 1; }
echo "=== JS tests ==="
node hosts/javascript/run_tests.js --full 2>&1 | tail -3 || { echo "FAIL: JS tests"; exit 1; }
echo "=== perf-smoke ==="
bash scripts/perf-smoke.sh || { echo "FAIL: perf-smoke (substrate regressed ≥5×, see scripts/perf-smoke.sh)"; exit 1; }
echo "=== All OK ==="