rose-ash

Author SHA1 Message Date

Author	SHA1	Message	Date
giles	59bec68dcc	perf: Phase 6 — substrate perf-regression alarm (perf-smoke) Replaces the watchdog-bump approach with an automated check. The next 5× (or worse) substrate regression will trip the alarm at build time instead of hiding behind a deadline bump and only being noticed weeks later. Components: * lib/perf-smoke.sx — four micro-benchmarks chosen for distinct substrate failure modes: function-call dispatch (fib), env construction (let-chain), HO-form dispatch + lambda creation (map-sq), TCO + primitive dispatch (tail-loop). Warm-up pass populates JIT cache before the timed pass so we measure the steady state. * scripts/perf-smoke.sh — pipes lib/perf-smoke.sx to sx_server.exe, parses per-bench wall-time, asserts each is within FACTOR× of the recorded reference (default 5×). `--update` rewrites the reference in-place. * scripts/sx-build-all.sh — perf-smoke wired in as a post-step after JS tests. Hard fail if any benchmark regressed beyond budget. Reference numbers: minimum across 6 back-to-back runs on this dev machine under typical concurrent-loop contention (load ~9, 2 vCPU, 7.6 GiB RAM, OCaml 5.2.0, architecture @ `92f6f187`). Documented in plans/jit-perf-regression.md including how to update them. The 5× factor is chosen so contention noise (~1–2× variance) doesn't trigger false alarms but a real ≥5× substrate regression — the kind that motivated this whole investigation — fails the build immediately. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 14:23:45 +00:00
giles	c361946974	perf: deadline tweaks (tcl 2400→300s, erlang 120→600s); plan + Phase 1 findings Phase 1 of the jit-perf-regression plan reproduced and quantified the alleged 30× substrate slowdown across 5 guests (tcl, lua, erlang, prolog, haskell). On a quiet machine all five suites pass cleanly: tcl test.sh 57.8s wall, 16.3s user, 376/376 ✓ lua test.sh 27.3s wall, 4.2s user, 185/185 ✓ erlang conformance 3m25s wall, 36.8s user, 530/530 ✓ (needs ≥600s budget) prolog conformance 3m54s wall, 1m08s user, 590/590 ✓ haskell conformance 6m59s wall, 2m37s user, 156/156 ✓ Per-test user-time at architecture HEAD vs pre-substrate-merge baseline (`83dbb595`) is essentially flat (tcl 0.83×, lua 1.4×, prolog 0.82×). The symptoms reported in the plan (test timeouts, OOMs, 30-min hangs) were heavy CPU contention from concurrent loops + one undersized internal `timeout 120` in erlang's conformance script. There is no substrate regression to bisect. Changes: * lib/tcl/test.sh: `timeout 2400` → `timeout 300`. The original 180s deadline is comfortable on a quiet machine (3.1× headroom); 300s gives some safety margin for moderate contention without masking real regressions. * lib/erlang/conformance.sh: `timeout 120` → `timeout 600`. The 120s budget was actually too tight for the full 9-suite chain even before this work. * lib/erlang/scoreboard.{json,md}: 0/0 → 530/530 — populated by a successful conformance run with the new deadline. The previous 0/0 was a stale artefact of the run timing out before parsing any markers. * plans/jit-perf-regression.md: full Phase 1 progress log including per-guest perf table, quiet-machine re-measurement, and conclusion. Phases 2–4 (bisect, diagnose, fix) skipped — there is no substrate regression to find. Phase 6 (perf-regression alarm) still planned to catch the next quadratic blow-up early instead of via watchdog bumps. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 14:05:29 +00:00

giles

59bec68dcc

perf: Phase 6 — substrate perf-regression alarm (perf-smoke)

Replaces the watchdog-bump approach with an automated check. The next 5× (or
worse) substrate regression will trip the alarm at build time instead of
hiding behind a deadline bump and only being noticed weeks later.

Components:

* lib/perf-smoke.sx — four micro-benchmarks chosen for distinct substrate
  failure modes: function-call dispatch (fib), env construction (let-chain),
  HO-form dispatch + lambda creation (map-sq), TCO + primitive dispatch
  (tail-loop). Warm-up pass populates JIT cache before the timed pass so we
  measure the steady state.

* scripts/perf-smoke.sh — pipes lib/perf-smoke.sx to sx_server.exe, parses
  per-bench wall-time, asserts each is within FACTOR× of the recorded
  reference (default 5×). `--update` rewrites the reference in-place.

* scripts/sx-build-all.sh — perf-smoke wired in as a post-step after JS
  tests. Hard fail if any benchmark regressed beyond budget.

Reference numbers: minimum across 6 back-to-back runs on this dev machine
under typical concurrent-loop contention (load ~9, 2 vCPU, 7.6 GiB RAM,
OCaml 5.2.0, architecture @ 92f6f187). Documented in
plans/jit-perf-regression.md including how to update them.

The 5× factor is chosen so contention noise (~1–2× variance) doesn't trigger
false alarms but a real ≥5× substrate regression — the kind that motivated
this whole investigation — fails the build immediately.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-08 14:23:45 +00:00

giles

c361946974

perf: deadline tweaks (tcl 2400→300s, erlang 120→600s); plan + Phase 1 findings

Phase 1 of the jit-perf-regression plan reproduced and quantified the alleged
30× substrate slowdown across 5 guests (tcl, lua, erlang, prolog, haskell). On
a quiet machine all five suites pass cleanly:

  tcl test.sh         57.8s wall, 16.3s user, 376/376 ✓
  lua test.sh         27.3s wall,  4.2s user, 185/185 ✓
  erlang conformance  3m25s wall, 36.8s user, 530/530 ✓ (needs ≥600s budget)
  prolog conformance  3m54s wall, 1m08s user, 590/590 ✓
  haskell conformance 6m59s wall, 2m37s user, 156/156 ✓

Per-test user-time at architecture HEAD vs pre-substrate-merge baseline
(83dbb595) is essentially flat (tcl 0.83×, lua 1.4×, prolog 0.82×). The
symptoms reported in the plan (test timeouts, OOMs, 30-min hangs) were heavy
CPU contention from concurrent loops + one undersized internal `timeout 120`
in erlang's conformance script. There is no substrate regression to bisect.

Changes:

* lib/tcl/test.sh: `timeout 2400` → `timeout 300`. The original 180s deadline
  is comfortable on a quiet machine (3.1× headroom); 300s gives some safety
  margin for moderate contention without masking real regressions.
* lib/erlang/conformance.sh: `timeout 120` → `timeout 600`. The 120s budget
  was actually too tight for the full 9-suite chain even before this work.
* lib/erlang/scoreboard.{json,md}: 0/0 → 530/530 — populated by a successful
  conformance run with the new deadline. The previous 0/0 was a stale
  artefact of the run timing out before parsing any markers.
* plans/jit-perf-regression.md: full Phase 1 progress log including
  per-guest perf table, quiet-machine re-measurement, and conclusion.

Phases 2–4 (bisect, diagnose, fix) skipped — there is no substrate regression
to find. Phase 6 (perf-regression alarm) still planned to catch the next
quadratic blow-up early instead of via watchdog bumps.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-08 14:05:29 +00:00

2 Commits