Files
rose-ash/scripts/perf-smoke.sh
giles 59bec68dcc perf: Phase 6 — substrate perf-regression alarm (perf-smoke)
Replaces the watchdog-bump approach with an automated check. The next 5× (or
worse) substrate regression will trip the alarm at build time instead of
hiding behind a deadline bump and only being noticed weeks later.

Components:

* lib/perf-smoke.sx — four micro-benchmarks chosen for distinct substrate
  failure modes: function-call dispatch (fib), env construction (let-chain),
  HO-form dispatch + lambda creation (map-sq), TCO + primitive dispatch
  (tail-loop). Warm-up pass populates JIT cache before the timed pass so we
  measure the steady state.

* scripts/perf-smoke.sh — pipes lib/perf-smoke.sx to sx_server.exe, parses
  per-bench wall-time, asserts each is within FACTOR× of the recorded
  reference (default 5×). `--update` rewrites the reference in-place.

* scripts/sx-build-all.sh — perf-smoke wired in as a post-step after JS
  tests. Hard fail if any benchmark regressed beyond budget.

Reference numbers: minimum across 6 back-to-back runs on this dev machine
under typical concurrent-loop contention (load ~9, 2 vCPU, 7.6 GiB RAM,
OCaml 5.2.0, architecture @ 92f6f187). Documented in
plans/jit-perf-regression.md including how to update them.

The 5× factor is chosen so contention noise (~1–2× variance) doesn't trigger
false alarms but a real ≥5× substrate regression — the kind that motivated
this whole investigation — fails the build immediately.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 14:23:45 +00:00

120 lines
4.0 KiB
Bash
Executable File
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
#!/usr/bin/env bash
# perf-smoke.sh — substrate perf-regression alarm.
#
# Runs lib/perf-smoke.sx via sx_server.exe and asserts each micro-benchmark's
# wall-clock time is within REGRESSION_FACTOR× of the reference number. Exits
# 0 if all are within budget, 1 if any has regressed.
#
# Reference numbers: measured on a quiet dev machine (Linux, 2 vCPU, 7.6 GiB
# RAM, OCaml 5.2.0). Document the machine in jit-perf-regression.md when
# updating.
#
# Usage:
# bash scripts/perf-smoke.sh # check (default factor 5×)
# FACTOR=3 bash scripts/perf-smoke.sh # tighter threshold
# bash scripts/perf-smoke.sh --update # rewrite the reference numbers in
# # this script with current run's
# # numbers (use only on a quiet
# # reference machine; commit the diff)
#
# The signal is *change* relative to the reference, not absolute number.
# Drift is fine; reset the reference when the substrate changes intentionally
# (e.g. after a JIT improvement).
set -uo pipefail
cd "$(git rev-parse --show-toplevel)"
# ── Reference numbers (median of 5 runs on the reference machine) ──────────
# Update these via `bash scripts/perf-smoke.sh --update` on a quiet machine.
REF_FIB18=1216
REF_LET1000=194
REF_MAP500=21
REF_TAIL5000=430
# ── End reference numbers ──────────────────────────────────────────────────
FACTOR="${FACTOR:-5}"
SX_SERVER="${SX_SERVER:-hosts/ocaml/_build/default/bin/sx_server.exe}"
if [ ! -x "$SX_SERVER" ]; then
SX_SERVER="/root/rose-ash/hosts/ocaml/_build/default/bin/sx_server.exe"
fi
if [ ! -x "$SX_SERVER" ]; then
echo "ERROR: sx_server.exe not found. Run: cd hosts/ocaml && dune build" >&2
exit 2
fi
TMPFILE=$(mktemp)
trap "rm -f $TMPFILE" EXIT
cat > "$TMPFILE" <<'EPOCHS'
(epoch 1)
(load "lib/perf-smoke.sx")
(epoch 2)
(eval "(perf-smoke)")
EPOCHS
OUTPUT=$(timeout 60 "$SX_SERVER" < "$TMPFILE" 2>&1)
LINE=$(echo "$OUTPUT" | grep -E '^"perf-smoke ' | head -1 | tr -d '"')
if [ -z "$LINE" ]; then
echo "ERROR: no perf-smoke result line; sx_server output:" >&2
echo "$OUTPUT" | tail -20 >&2
exit 2
fi
# Parse: perf-smoke fib18=N let1000=N map500=N tail5000=N
get() { echo "$LINE" | grep -oE "$1=[0-9]+" | cut -d= -f2; }
FIB18=$(get fib18)
LET1000=$(get let1000)
MAP500=$(get map500)
TAIL5000=$(get tail5000)
if [ "${1:-}" = "--update" ]; then
echo "Measured (this run): fib18=$FIB18 let1000=$LET1000 map500=$MAP500 tail5000=$TAIL5000"
echo "Rewriting reference numbers in $0"
sed -i \
-e "s/^REF_FIB18=.*/REF_FIB18=$FIB18/" \
-e "s/^REF_LET1000=.*/REF_LET1000=$LET1000/" \
-e "s/^REF_MAP500=.*/REF_MAP500=$MAP500/" \
-e "s/^REF_TAIL5000=.*/REF_TAIL5000=$TAIL5000/" \
"$0"
echo "Done. Commit the diff."
exit 0
fi
if [ "$REF_FIB18" -eq 0 ] || [ "$REF_LET1000" -eq 0 ] || \
[ "$REF_MAP500" -eq 0 ] || [ "$REF_TAIL5000" -eq 0 ]; then
echo "WARN: reference numbers not yet set (all zero)." >&2
echo "Run \`bash scripts/perf-smoke.sh --update\` on a quiet reference machine first." >&2
echo "Measured (this run): fib18=$FIB18 let1000=$LET1000 map500=$MAP500 tail5000=$TAIL5000"
exit 0
fi
verdict() {
local name="$1" got="$2" ref="$3"
local budget=$((ref * FACTOR))
if [ "$got" -le "$budget" ]; then
printf ' ok %-12s %5d ms (ref %d, %d×)\n' "$name" "$got" "$ref" "$FACTOR"
return 0
else
printf ' FAIL %-12s %5d ms (ref %d, budget %d×=%d ms)\n' \
"$name" "$got" "$ref" "$FACTOR" "$budget"
return 1
fi
}
FAIL=0
echo "perf-smoke (factor ${FACTOR}× of reference):"
verdict fib18 "$FIB18" "$REF_FIB18" || FAIL=1
verdict let1000 "$LET1000" "$REF_LET1000" || FAIL=1
verdict map500 "$MAP500" "$REF_MAP500" || FAIL=1
verdict tail5000 "$TAIL5000" "$REF_TAIL5000" || FAIL=1
if [ "$FAIL" -eq 0 ]; then
echo "ok perf-smoke within ${FACTOR}× of reference."
exit 0
else
echo "FAIL one or more benchmarks regressed. Investigate before merging."
exit 1
fi