Replaces the watchdog-bump approach with an automated check. The next 5× (or
worse) substrate regression will trip the alarm at build time instead of
hiding behind a deadline bump and only being noticed weeks later.
Components:
* lib/perf-smoke.sx — four micro-benchmarks chosen for distinct substrate
failure modes: function-call dispatch (fib), env construction (let-chain),
HO-form dispatch + lambda creation (map-sq), TCO + primitive dispatch
(tail-loop). Warm-up pass populates JIT cache before the timed pass so we
measure the steady state.
* scripts/perf-smoke.sh — pipes lib/perf-smoke.sx to sx_server.exe, parses
per-bench wall-time, asserts each is within FACTOR× of the recorded
reference (default 5×). `--update` rewrites the reference in-place.
* scripts/sx-build-all.sh — perf-smoke wired in as a post-step after JS
tests. Hard fail if any benchmark regressed beyond budget.
Reference numbers: minimum across 6 back-to-back runs on this dev machine
under typical concurrent-loop contention (load ~9, 2 vCPU, 7.6 GiB RAM,
OCaml 5.2.0, architecture @ 92f6f187). Documented in
plans/jit-perf-regression.md including how to update them.
The 5× factor is chosen so contention noise (~1–2× variance) doesn't trigger
false alarms but a real ≥5× substrate regression — the kind that motivated
this whole investigation — fails the build immediately.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Conflict in lib/tcl/test.sh: architecture had bumped `timeout 2400 → 7200`,
this branch had restored it to `timeout 300` based on the Phase 1
quiet-machine measurement (376/376 in 57.8s wall, 16.3s user). Resolved by
keeping `timeout 300` — the 7200s bump was preemptive against contention,
not against an actual substrate regression. Phase 1 confirms the original
180s deadline is comfortable; 300s gives 5× headroom for moderate noise.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 1 of the jit-perf-regression plan reproduced and quantified the alleged
30× substrate slowdown across 5 guests (tcl, lua, erlang, prolog, haskell). On
a quiet machine all five suites pass cleanly:
tcl test.sh 57.8s wall, 16.3s user, 376/376 ✓
lua test.sh 27.3s wall, 4.2s user, 185/185 ✓
erlang conformance 3m25s wall, 36.8s user, 530/530 ✓ (needs ≥600s budget)
prolog conformance 3m54s wall, 1m08s user, 590/590 ✓
haskell conformance 6m59s wall, 2m37s user, 156/156 ✓
Per-test user-time at architecture HEAD vs pre-substrate-merge baseline
(83dbb595) is essentially flat (tcl 0.83×, lua 1.4×, prolog 0.82×). The
symptoms reported in the plan (test timeouts, OOMs, 30-min hangs) were heavy
CPU contention from concurrent loops + one undersized internal `timeout 120`
in erlang's conformance script. There is no substrate regression to bisect.
Changes:
* lib/tcl/test.sh: `timeout 2400` → `timeout 300`. The original 180s deadline
is comfortable on a quiet machine (3.1× headroom); 300s gives some safety
margin for moderate contention without masking real regressions.
* lib/erlang/conformance.sh: `timeout 120` → `timeout 600`. The 120s budget
was actually too tight for the full 9-suite chain even before this work.
* lib/erlang/scoreboard.{json,md}: 0/0 → 530/530 — populated by a successful
conformance run with the new deadline. The previous 0/0 was a stale
artefact of the run timing out before parsing any markers.
* plans/jit-perf-regression.md: full Phase 1 progress log including
per-guest perf table, quiet-machine re-measurement, and conclusion.
Phases 2–4 (bisect, diagnose, fix) skipped — there is no substrate regression
to find. Phase 6 (perf-regression alarm) still planned to catch the next
quadratic blow-up early instead of via watchdog bumps.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This session cleared 15 of the 18 documented skips:
- Toggle parser ambiguity (1) — 2-token lookahead in parse-toggle
- Throttled-at modifier (1) — parser + emit-on wrap + runtime hs-throttle!/hs-debounce!
- Tokenizer-stream API (13) — hs-stream wrapper + 15 stream primitives
Plus a perf fix in compiler.sx (hoisted throttle/debounce helpers to
module level so they don't get JIT-recompiled per emit-on call). Wall
time for full batched suite: 28m45s, was 26m17s before sync (so net
+18 tests cost only +2m even though 3x more work).
Remaining skips (3):
- Template-component scope tests (2) — needs <script type="text/
hyperscript-template"> custom-element bootstrap registrar.
- Async event dispatch (1) — repeat until event needs the OCaml
kernel to release the JS event loop between iterations.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
parser.sx parse-toggle-cmd: when seeing 'toggle .foo for', peek the
following two tokens. If they are '<ident> in', it is a for-in loop
and toggle does NOT consume 'for' as a duration clause. Restores the
trailing for-in to the command list.
parser.sx parse-on (handler modifiers): recognize 'throttled at <ms>'
and 'debounced at <ms>' as handler modifiers. Captured as :throttle /
:debounce kwargs in the on-form parts list.
compiler.sx emit-on: pre-extract :throttle / :debounce from parts via
new _strip-throttle-debounce helper before scan-on, then wrap the built
handler with (hs-throttle! handler ms) or (hs-debounce! handler ms).
runtime.sx: hs-throttle! — closure with __hs-last-fire timestamp,
fires immediately and drops events arriving within ms of the last fire.
hs-debounce! — closure with __hs-timer, clears any pending timer and
schedules a new setTimeout(handler, ms) so only the last burst event
fires.
Both formerly-architectural skips now pass:
- "toggle does not consume a following for-in loop"
- "throttled at <time> drops events within the window"
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Brings the architecture branch (559 commits ahead — R7RS step 4-6, JIT
expansion, host_error wrapping, bytecode compiler, etc.) into the
loops/haskell line of work. Conflict in lib/haskell/conformance.sh:
architecture replaced the inline driver with a thin wrapper delegating
to lib/guest/conformance.sh + a config file. Resolved by taking the
wrapper and extending lib/haskell/conformance.conf with all programs
added under loops/haskell (caesar, runlength-str, showadt, showio,
partial, statistics, newton, wordfreq, mapgraph, uniquewords, setops,
shapes, person, config, counter, accumulate, safediv, trycatch) plus
adding map.sx and set.sx to PRELOADS.
plans/haskell-completeness.md gains three new follow-up phases:
- Phase 17 — parser polish (`(x :: Int)` annotations, mid-file imports)
- Phase 18 — one ambitious conformance program (lambda-calc / Dijkstra /
JSON parser candidate list)
- Phase 19 — conformance speed (batch all suites in one sx_server
process to compress the 25-min run to single-digit minutes)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
hk-bind-exceptions! in eval.sx registers throwIO, throw, evaluate, catch,
try, handle, displayException. SomeException constructor pre-registered
in runtime.sx (arity 1, type SomeException).
throwIO and the existing error primitive both raise via SX `raise` with a
uniform "hk-error: msg" string. catch/try/handle parse it back into a
SomeException via hk-exception-of, which strips nested
'Unhandled exception: "..."' host wraps (CEK's host_error formatter) and
the "hk-error: " prefix.
catch and handle evaluate the handler outside the guard scope (build an
"ok"/"exn" outcome tag inside guard, then dispatch outside) so that a
re-throw from the handler propagates past this catch — matching Haskell
semantics rather than infinite-looping in the same guard.
14 unit tests in tests/exceptions.sx (catch success, catch error, try
Right/Left, handle, throwIO + catch/try, evaluate, nested catch, do-bind
through catch, branch on try result, IORef-mutating handler).
Conformance: safediv.hs (8/8) and trycatch.hs (8/8). Scoreboard now
285/285 tests, 36/36 programs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Trivial wrapper: apl-run-file = apl-run ∘ file-read, where
file-read is built-in to OCaml SX.
Tests verify primes.apl, life.apl, quicksort.apl all parse
end-to-end (their last form is a :dfn AST). Source-then-call
test confirms the loaded file's defined fn is callable, even
when the algorithm itself can't fully execute (primes' inline
⍵ rebinding still missing — :glyph-token, not :name-token).
Parser: :name clause now detects 'name ← rhs' patterns inside
expressions. When seen, consumes the remaining tokens as RHS,
parses recursively, and emits a (:assign-expr name parsed-rhs)
value segment.
Eval-ast :dyad and :monad: when the right operand is an
:assign-expr node, capture the binding into env before
evaluating the left operand. This realises the primes idiom:
apl-run "(2 = +⌿ 0 = a ∘.| a) / a ← ⍳ 30"
→ 2 3 5 7 11 13 17 19 23 29
Also: top-level x←5 now evaluates to scalar 5 (apl-eval-ast
:assign just unwraps to its RHS value).
Caveat: ⍵-rebinding (the original primes.apl uses
'⍵←⍳⍵') is a :glyph-token; only :name-tokens are handled.
A regular variable name (like 'a') works.
Haskell strings are [Char]. Calling reverse / head / length on a SX raw
string transparently produces a cons-list of char codes (via hk-str-head /
hk-str-tail in runtime.sx), but (==) then compared the original raw string
against the char-code cons-list and always returned False — so
"racecar" == reverse "racecar" was False.
Added hk-try-charlist-to-string and hk-normalize-for-eq in eval.sx; routed
== and /= through hk-normalize-for-eq so a string compares equal to any
cons-list whose elements are valid Unicode code points spelling the same
characters, and "[]" ↔ "".
palindrome.hs lifts from 9/12 → 12/12; conformance 33/34 → 34/34 programs,
266/269 → 269/269 tests.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Investigation of the long-standing 'why does the runner say 1494/1494 not
1496/1496?' question. The answer is in tests/hs-run-filtered.js:969 — two
tests are skipped via _SKIP_TESTS for documented architectural reasons:
1. 'until event keyword works' — uses 'repeat until event click from #x',
which suspends the OCaml kernel waiting for a click that is never
dispatched from outside K.eval. The sync test runner has no way to
fire the click while the kernel is suspended.
2. 'throttled at <time> drops events within the window' — the HS parser
does not implement the 'throttled at <ms>' modifier. The compiled SX
for the handler is malformed: handler body is the literal symbol
'throttled', the time expression dangles outside the closure as
stray (do 200 ...). Genuinely needs parser+compiler+runtime work,
not just a deadline bump.
Both are documented at the skip site with a comment explaining why they
can't run synchronously. The conformance number is 1494/1494 = 100% on
counted tests, with 2 explicit, justified skips out of 1496 total.
This was the source of the cumulative-vs-isolated test-count discrepancy.
Suite filter runs see them as 'not in this suite,' batched runs see them
as 'continued past'. Either way: not failures.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
apl-permutations was doing (append acc <new-perms>) which is
O(|acc|) and acc grows ~N! big — total cost O(N!²).
Swapped to (append <new-perms> acc) — append is O(|first|)
so cost is O((n+1)·N!_prev) per layer, total O(N!). q(7)
went from 32s to 12s; q(8)=92 now finishes well within the
300s timeout, so the queens(8) test is restored.
497/497. Phase 8 complete.
hk-bind-data-ioref! registers newIORef / readIORef / writeIORef /
modifyIORef / modifyIORef' under the import alias (default IORef).
Representation: dict {"hk-ioref" true "hk-value" v} allocated inside IO.
modifyIORef' uses hk-deep-force on the new value before write.
Side-effect: fixed pre-existing bug in import handler — modname was
reading (nth d 1) (the qualified flag) instead of (nth d 2). All
'import qualified … as Foo' paths were silently no-ops; map.sx unit
suite jumps from 22→26 passing.
Conformance now 33/34 programs, 266/269 tests (only pre-existing
palindrome.hs 9/12 still failing on string-as-list reversal, present
on prior commit).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
tests/hs-run-batched.js — fresh-kernel-per-batch conformance runner.
Solves the WASM kernel JIT-cache-saturation problem (compiled VmClosures
accumulate over a single process and slow tests at the tail of the run)
by spawning a child Node process per batch. Each batch starts with an
empty cache, so tests at index 1400 perform identically to tests at
index 100. Configurable batch size (HS_BATCH_SIZE, default 150) and
parallelism (HS_PARALLEL, default 1).
This is option 2 from the cache-architecture plan — the lowest-risk fix:
zero kernel changes, deterministic results, runs in the same time as the
single-process version when parallelism matches CPU count.
plans/jit-cache-architecture.md — sketches the SX-wide architectural
fix in three phases:
1. Tiered compilation — call counter on lambdas; only JIT after K
invocations. Filters out one-shot lambdas (test harness, dynamic
eval, REPLs) at the source.
2. LRU eviction — central cache with fixed budget. Predictable memory
ceiling regardless of input pattern.
3. Reset API — jit-reset!, jit-clear-cold!, jit-stats, jit-pin!
primitives for app-driven cache management.
Layer split: cache datastructure + LRU in hosts/ocaml/lib/sx_jit_cache.ml
(new), VM integration in sx_vm.ml, primitives registered in
sx_primitives.ml, declarative spec in spec/primitives.sx, and SX-level
ergonomics (with-jit-threshold, with-fresh-jit, jit-report) in lib/jit.sx.
This is host-specific to the OCaml WASM kernel but the SX API surface is
shared across all hosted languages (HS, Common Lisp, Erlang, etc.).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
programs-e2e.sx exercises the classic-algorithm shapes from
lib/apl/tests/programs/*.apl via the full pipeline (apl-run on
embedded source strings). Tests include factorial-via-∇,
triangular numbers, sum-of-squares, prime-mask building blocks
(divisor counts via outer mod), named-fn composition,
dyadic max-of-two, and a single Newton sqrt step.
The original one-liners (e.g. primes' inline ⍵←⍳⍵) need parser
features we haven't built (compress-as-fn, inline assign) — the
e2e tests use multi-statement equivalents. No file-reading
primitive in OCaml SX, so source is embedded.
Side-fix: ⌿ (first-axis reduce) and ⍀ (first-axis scan) were
silently skipped by the tokenizer — added to apl-glyph-set
and apl-parse-op-glyphs.
Parser: apl-collect-fn-bindings pre-scans stmt-groups for
`name ← { ... }` patterns and populates apl-known-fn-names.
is-fn-tok? consults this list; collect-segments-loop emits
(:fn-name nm) for known names so they parse as functions.
Resolver: apl-resolve-{monadic,dyadic} handle :fn-name by
looking up env, asserting the binding is a dfn, returning
a closure that dispatches to apl-call-dfn{-m,}.
Recursion still works: `fact ← {0=⍵:1 ⋄ ⍵×∇⍵-1} ⋄ fact 5` → 120.
Three small unblockers in one iteration:
- tokenizer: read-digits! now consumes optional ".digits" suffix,
so 3.7 and ¯2.5 are single number tokens.
- tokenizer: ⎕ followed by ← emits a single :name "⎕←" token
(instead of splitting on the assign glyph). Parser registers
⎕← in apl-quad-fn-names; apl-monadic-fn maps to apl-quad-print.
- eval-ast: :str AST nodes evaluate to char arrays. Single-char
strings become rank-0 scalars; multi-char become rank-1 vectors
of single-char strings.