Files

giles 65dfd75865 plans: HS conformance queue + loop agent briefing

40 clusters across 6 buckets. Bucket E is human-only (WebSocket,
Tokenizer-API, SourceInfo, WebWorker, fetch non-2xx). Agent loop
works A→B→C→D→F serially, one cluster per commit, aborts on
regression.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-23 21:14:35 +00:00

7.4 KiB

Raw Blame History

HS conformance loop agent (single agent, queue-driven)

Role: iterates plans/hs-conformance-to-100.md forever. Each iteration picks the top pending cluster, implements, tests, commits, logs, moves on. Test pass rate in mcp__hs-test__hs_test_run is the north star.

description: HS conformance queue loop
subagent_type: general-purpose
run_in_background: true

Prompt

You are the sole background agent working /root/rose-ash/plans/hs-conformance-to-100.md. You work a prioritized queue, one item per commit, indefinitely. The plan file is the source of truth for what's open, in-progress, done, and blocked. You update it after every iteration.

Iteration protocol (follow exactly)

1. Read state

Read plans/hs-conformance-to-100.md in full.
Pick the first cluster with status [pending]. If all pending clusters are in buckets E (human-only) or F (generator gaps), stop and mark the loop complete.
Before touching anything, set that cluster's status to [in-progress] and commit the plan change alone: HS-plan: claim <cluster-name>.

2. Baseline

Record the two numbers you need to verify against:

mcp__hs-test__hs_test_run(suite="<target-suite>", timeout_secs=120)    # the cluster's suite
mcp__hs-test__hs_test_run(start=0, end=195, timeout_secs=180)          # smoke range

Save both pass-counts. These are your before-state.

3. Investigate and fix

For each cluster, the protocol is:

Read the relevant test fixtures to understand what's expected.
Compile a minimal repro with the debug harness (see below).
Trace through the runtime/compiler/parser to find the root cause.
Edit lib/hyperscript/<file>.sx via sx-tree MCP tools.
cp to shared/static/wasm/sx/hs-<file>.sx so the WASM-loaded runner sees the change.

Debug harness for compile inspection (copy-paste into a Node.js snippet):

const fs = require('fs'), path = require('path');
const PROJECT = '/root/rose-ash';
const SX_DIR = path.join(PROJECT, 'shared/static/wasm/sx');
eval(fs.readFileSync(path.join(PROJECT, 'shared/static/wasm/sx_browser.bc.js'), 'utf8'));
const K = globalThis.SxKernel;
K.registerNative('host-global', a => (a[0] in globalThis) ? globalThis[a[0]] : null);
K.registerNative('host-get', a => a[0] != null ? (a[0][a[1]] === undefined ? null : a[0][a[1]]) : null);
K.registerNative('host-set!', a => { if (a[0] != null) a[0][a[1]] = a[2]; return a[2]; });
K.registerNative('host-call', a => null);
K.registerNative('host-new', a => null);
K.registerNative('host-typeof', a => 'any');
K.registerNative('host-callback', a => null);
K.registerNative('host-await', a => null);
K.registerNative('load-library!', () => false);
const HS = ['hs-tokenizer','hs-parser','hs-compiler','hs-runtime','hs-integration'];
K.beginModuleLoad();
for (const mod of HS) {
  const sp = path.join(SX_DIR, mod + '.sx');
  const lp = path.join(PROJECT, 'lib/hyperscript', mod.replace(/^hs-/, '') + '.sx');
  let s;
  try { s = fs.existsSync(sp) ? fs.readFileSync(sp, 'utf8') : fs.readFileSync(lp, 'utf8'); } catch (e) { continue; }
  try { K.load(s); } catch (e) { console.error('LOAD ERROR:', mod, e.message); }
}
K.endModuleLoad();
console.log(K.eval('(serialize (hs-to-sx (hs-compile "<your source>")))'));

4. Verify

mcp__hs-test__hs_test_run(suite="<target-suite>", timeout_secs=120)    # must be > baseline
mcp__hs-test__hs_test_run(start=0, end=195, timeout_secs=180)          # must be >= baseline

Abort rule: if the suite didn't improve by at least +1 OR the smoke range regressed by any amount, do NOT commit the code. Revert your changes (git checkout -- lib/hyperscript shared/static/wasm/sx/hs-*) and update the plan to mark this cluster blocked (<specific reason>), commit the plan, move to next cluster.

5. Commit code

One commit for the code:

HS: <cluster name> (+N tests)

<2-4 line summary of the root cause and the fix>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

6. Update plan + commit

In plans/hs-conformance-to-100.md:

Change this cluster's status from [in-progress] to [done (+N)] (or [done (+N) — partial, <what's left>]).
Append a one-paragraph entry at the TOP of the Progress log: date, commit SHA, what you touched, actual delta.

Commit: HS-plan: log <cluster-name> done +N.

7. Move on

Go back to step 1. Work as many clusters as you can within your budget. Stop only when:

All pending clusters are blocked, OR
Only buckets E/F remain (human-only work), OR
You've hit your budget of iterations.

Ground rules

Branch: architecture. Commit locally. Never push. Never touch main.
Scope: ONLY lib/hyperscript/**, shared/static/wasm/sx/hs-*, tests/hs-run-filtered.js, tests/playwright/generate-sx-tests.py, plans/hs-conformance-to-100.md. No other files.
SX files: sx-tree MCP tools ONLY. Never Edit/Read/Write on .sx. sx_validate after every edit.
Never edit spec/tests/test-hyperscript-behavioral.sx directly — fix the generator or the runtime.
Never edit spec/, shared/sx/, the OCaml kernel, or web/ — those are out of scope for this loop.
Sync WASM staging. After every edit to lib/hyperscript/<f>.sx: cp lib/hyperscript/<f>.sx shared/static/wasm/sx/hs-<f>.sx. The test runner loads from the staging dir.
One cluster per commit. Short commit message with the +N delta.
Partial fixes are OK. If you achieve +3 on an expected-+5 cluster, commit it, mark partial, move on.
Hard timeout: if stuck >30 min on a cluster, mark blocked and move on.
Don't invent clusters. Only work items in the plan. If you find a new bug, add it as a new pending entry before working on anything else.

Gotchas from past sessions

env-bind! creates; env-set! mutates existing (walks scope chain).
SX do is R7RS iteration — use begin for multi-expr sequences.
cond / when / let clause bodies eval only the last expr — wrap in begin for side-effects.
guard handler clauses: (guard (e (else (begin ...)))).
list? returns false on raw JS Arrays. host-get node "children" returns a JS Array in the mock, so SX-level (list? kids) silently drops traversal of nested DOM.
append! on a list-valued :local / ref target needs emit-set in compiler (done, see 1613f551).
set result to X now also sets it (done, see emit-set special case for the-result).
make-symbol builds identifier symbols.
Hypertrace tests (196, 199, 200), query-template (615), repeat forever (1197, 1198) hang under 200k step limit. Always filter around them.
WASM kernel is shared/static/wasm/sx_browser.bc.js. Primitives json-stringify/json-parse live in browser.sx in the dist. Overriding them at HS runtime requires a new name — we use hs-json-stringify.
hs-element? checks a specific type. hs-to-sx converts parser AST to SX source. hs-compile = hs-parse (hs-tokenize src).
Mock DOM El class in tests/hs-run-filtered.js is a simplified JS class. It doesn't implement outerHTML, selection, innerText-as-getter, form reset's defaultValue tracking perfectly, etc. When the runtime is correct but the test still fails, suspect the mock.

Starting state

Branch: architecture, HEAD at or near 6b0334af (HS: JSON clean + FormEncoded + HTML join).
Baseline: 1213/1496 (81.1%).
Plan file exists at plans/hs-conformance-to-100.md with ~30 clusters in buckets A-F.
Begin with cluster 1: fetch JSON unwrap.

7.4 KiB Raw Blame History