plans: HS conformance queue + loop agent briefing

40 clusters across 6 buckets. Bucket E is human-only (WebSocket, Tokenizer-API, SourceInfo, WebWorker, fetch non-2xx). Agent loop works A→B→C→D→F serially, one cluster per commit, aborts on regression. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 21:14:35 +00:00
parent 2bd3a6b2ba
commit 65dfd75865
2 changed files with 303 additions and 0 deletions
--- a/plans/agent-briefings/hs-loop.md
+++ b/plans/agent-briefings/hs-loop.md
@@ -0,0 +1,139 @@
+# HS conformance loop agent (single agent, queue-driven)
+
+Role: iterates `plans/hs-conformance-to-100.md` forever. Each iteration picks the top `pending` cluster, implements, tests, commits, logs, moves on. Test pass rate in `mcp__hs-test__hs_test_run` is the north star.
+
+```
+description: HS conformance queue loop
+subagent_type: general-purpose
+run_in_background: true
+```
+
+## Prompt
+
+You are the sole background agent working `/root/rose-ash/plans/hs-conformance-to-100.md`. You work a prioritized queue, one item per commit, indefinitely. The plan file is the source of truth for what's open, in-progress, done, and blocked. You update it after every iteration.
+
+## Iteration protocol (follow exactly)
+
+### 1. Read state
+- Read `plans/hs-conformance-to-100.md` in full.
+- Pick the first cluster with status `[pending]`. If all pending clusters are in buckets E (human-only) or F (generator gaps), stop and mark the loop complete.
+- Before touching anything, set that cluster's status to `[in-progress]` and commit the plan change alone: `HS-plan: claim <cluster-name>`.
+
+### 2. Baseline
+Record the two numbers you need to verify against:
+
+```
+mcp__hs-test__hs_test_run(suite="<target-suite>", timeout_secs=120)    # the cluster's suite
+mcp__hs-test__hs_test_run(start=0, end=195, timeout_secs=180)          # smoke range
+```
+
+Save both pass-counts. These are your before-state.
+
+### 3. Investigate and fix
+
+For each cluster, the protocol is:
+1. Read the relevant test fixtures to understand what's expected.
+2. Compile a minimal repro with the debug harness (see below).
+3. Trace through the runtime/compiler/parser to find the root cause.
+4. Edit `lib/hyperscript/<file>.sx` via sx-tree MCP tools.
+5. `cp` to `shared/static/wasm/sx/hs-<file>.sx` so the WASM-loaded runner sees the change.
+
+**Debug harness for compile inspection** (copy-paste into a Node.js snippet):
+
+```js
+const fs = require('fs'), path = require('path');
+const PROJECT = '/root/rose-ash';
+const SX_DIR = path.join(PROJECT, 'shared/static/wasm/sx');
+eval(fs.readFileSync(path.join(PROJECT, 'shared/static/wasm/sx_browser.bc.js'), 'utf8'));
+const K = globalThis.SxKernel;
+K.registerNative('host-global', a => (a[0] in globalThis) ? globalThis[a[0]] : null);
+K.registerNative('host-get', a => a[0] != null ? (a[0][a[1]] === undefined ? null : a[0][a[1]]) : null);
+K.registerNative('host-set!', a => { if (a[0] != null) a[0][a[1]] = a[2]; return a[2]; });
+K.registerNative('host-call', a => null);
+K.registerNative('host-new', a => null);
+K.registerNative('host-typeof', a => 'any');
+K.registerNative('host-callback', a => null);
+K.registerNative('host-await', a => null);
+K.registerNative('load-library!', () => false);
+const HS = ['hs-tokenizer','hs-parser','hs-compiler','hs-runtime','hs-integration'];
+K.beginModuleLoad();
+for (const mod of HS) {
+  const sp = path.join(SX_DIR, mod + '.sx');
+  const lp = path.join(PROJECT, 'lib/hyperscript', mod.replace(/^hs-/, '') + '.sx');
+  let s;
+  try { s = fs.existsSync(sp) ? fs.readFileSync(sp, 'utf8') : fs.readFileSync(lp, 'utf8'); } catch (e) { continue; }
+  try { K.load(s); } catch (e) { console.error('LOAD ERROR:', mod, e.message); }
+}
+K.endModuleLoad();
+console.log(K.eval('(serialize (hs-to-sx (hs-compile "<your source>")))'));
+```
+
+### 4. Verify
+
+```
+mcp__hs-test__hs_test_run(suite="<target-suite>", timeout_secs=120)    # must be > baseline
+mcp__hs-test__hs_test_run(start=0, end=195, timeout_secs=180)          # must be >= baseline
+```
+
+**Abort rule:** if the suite didn't improve by at least +1 OR the smoke range regressed by any amount, do NOT commit the code. Revert your changes (`git checkout -- lib/hyperscript shared/static/wasm/sx/hs-*`) and update the plan to mark this cluster `blocked (<specific reason>)`, commit the plan, move to next cluster.
+
+### 5. Commit code
+
+One commit for the code:
+
+```
+HS: <cluster name> (+N tests)
+
+<2-4 line summary of the root cause and the fix>
+
+Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
+```
+
+### 6. Update plan + commit
+
+In `plans/hs-conformance-to-100.md`:
+- Change this cluster's status from `[in-progress]` to `[done (+N)]` (or `[done (+N) — partial, <what's left>]`).
+- Append a one-paragraph entry at the TOP of the Progress log: date, commit SHA, what you touched, actual delta.
+
+Commit: `HS-plan: log <cluster-name> done +N`.
+
+### 7. Move on
+Go back to step 1. Work as many clusters as you can within your budget. Stop only when:
+- All pending clusters are blocked, OR
+- Only buckets E/F remain (human-only work), OR
+- You've hit your budget of iterations.
+
+## Ground rules
+
+- **Branch:** `architecture`. Commit locally. **Never push.** **Never touch `main`.**
+- **Scope:** ONLY `lib/hyperscript/**`, `shared/static/wasm/sx/hs-*`, `tests/hs-run-filtered.js`, `tests/playwright/generate-sx-tests.py`, `plans/hs-conformance-to-100.md`. No other files.
+- **SX files:** sx-tree MCP tools ONLY. Never `Edit`/`Read`/`Write` on `.sx`. `sx_validate` after every edit.
+- **Never edit `spec/tests/test-hyperscript-behavioral.sx`** directly — fix the generator or the runtime.
+- **Never edit `spec/`, `shared/sx/`, the OCaml kernel, or `web/`** — those are out of scope for this loop.
+- **Sync WASM staging.** After every edit to `lib/hyperscript/<f>.sx`: `cp lib/hyperscript/<f>.sx shared/static/wasm/sx/hs-<f>.sx`. The test runner loads from the staging dir.
+- **One cluster per commit.** Short commit message with the `+N` delta.
+- **Partial fixes are OK.** If you achieve +3 on an expected-+5 cluster, commit it, mark partial, move on.
+- **Hard timeout:** if stuck >30 min on a cluster, mark `blocked` and move on.
+- **Don't invent clusters.** Only work items in the plan. If you find a new bug, add it as a new pending entry before working on anything else.
+
+## Gotchas from past sessions
+
+- `env-bind!` creates; `env-set!` mutates existing (walks scope chain).
+- SX `do` is R7RS iteration — use `begin` for multi-expr sequences.
+- `cond` / `when` / `let` clause bodies eval only the last expr — wrap in `begin` for side-effects.
+- `guard` handler clauses: `(guard (e (else (begin ...))))`.
+- `list?` returns **false** on raw JS Arrays. `host-get node "children"` returns a JS Array in the mock, so SX-level `(list? kids)` silently drops traversal of nested DOM.
+- `append!` on a list-valued `:local` / `ref` target needs `emit-set` in compiler (done, see 1613f551).
+- `set result to X` now also sets `it` (done, see emit-set special case for `the-result`).
+- `make-symbol` builds identifier symbols.
+- Hypertrace tests (196, 199, 200), query-template (615), `repeat forever` (1197, 1198) hang under 200k step limit. Always filter around them.
+- WASM kernel is `shared/static/wasm/sx_browser.bc.js`. Primitives `json-stringify`/`json-parse` live in `browser.sx` in the dist. Overriding them at HS runtime requires a new name — we use `hs-json-stringify`.
+- `hs-element?` checks a specific type. `hs-to-sx` converts parser AST to SX source. `hs-compile` = `hs-parse (hs-tokenize src)`.
+- Mock DOM `El` class in `tests/hs-run-filtered.js` is a simplified JS class. It doesn't implement outerHTML, selection, innerText-as-getter, form reset's defaultValue tracking perfectly, etc. When the runtime is correct but the test still fails, suspect the mock.
+
+## Starting state
+
+- Branch: `architecture`, HEAD at or near `6b0334af` (HS: JSON clean + FormEncoded + HTML join).
+- Baseline: **1213/1496 (81.1%)**.
+- Plan file exists at `plans/hs-conformance-to-100.md` with ~30 clusters in buckets A-F.
+- Begin with cluster 1: `fetch JSON unwrap`.
--- a/plans/hs-conformance-to-100.md
+++ b/plans/hs-conformance-to-100.md
@@ -0,0 +1,164 @@
+# Hyperscript conformance → 100%
+
+Goal: take the hyperscript upstream conformance suite from **1213/1496 (81%)** to a clean 100%. Queue-driven — single-agent loop on `architecture` branch, one cluster per commit.
+
+## North star
+
+```
+Baseline: 1213/1496 (81.1%)
+Target:   1496/1496
+Gap:      283 tests  (130 real fails + 153 SKIPs)
+```
+
+Track after each iteration via `mcp__hs-test__hs_test_run` on the relevant suite, not the whole thing (full runs take 10+min and include hanging tests — 196/199/200/615/1197/1198 hang under the 200k step limit).
+
+## How to run tests
+
+```
+mcp__hs-test__hs_test_run(suite="hs-upstream-<cluster>")        # fastest, one suite
+mcp__hs-test__hs_test_run(start=0, end=195)                     # early range
+mcp__hs-test__hs_test_run(start=201, end=614)                   # mid range (skip hypertrace hangs)
+mcp__hs-test__hs_test_run(start=616, end=1196)                  # late-1, skip repeat-forever hangs
+mcp__hs-test__hs_test_run(start=1199)                           # late-2 after hangs
+```
+
+## File layout
+
+Runtime/compiler/parser live in `lib/hyperscript/*.sx`. The test runner at `tests/hs-run-filtered.js` loads `shared/static/wasm/sx/hs-*.sx` — **after every `.sx` edit you must `cp lib/hyperscript/<file>.sx shared/static/wasm/sx/hs-<file>.sx`**.
+
+The test fixtures live in `spec/tests/test-hyperscript-behavioral.sx`, generated from `tests/playwright/generate-sx-tests.py`. **Never edit the behavioral.sx fixture directly** — fix the generator or the runtime.
+
+## Cluster queue
+
+Each cluster below is one commit. Order is rough — a loop agent may skip ahead if a predecessor is blocked. **Status:** `pending` / `in-progress` / `done (+N)` / `blocked (<reason>)`.
+
+### Bucket A: runtime fixes, single-file (low risk, high yield)
+
+1. **[pending] fetch JSON unwrap** — `hs-upstream-fetch` 4 tests (`can do a simple fetch w/ json` + 3 variants) got `{:__host_handle N}`. Root: `hs-fetch` in `runtime.sx` returns raw host Response object instead of parsing JSON body. Fix: when format is `"json"`, unwrap via `host-get "_json"` and `json-parse`. Expected: +4.
+
+2. **[pending] element → HTML via outerHTML** — `asExpression / converts an element into HTML` (1 test) + unlocks response fetches. Mock DOM `El` class in `tests/hs-run-filtered.js` has no `outerHTML` getter. Add a getter computed from `tagName` + `attributes` + `children` (recurse). Expected: +1 direct, + knock-on in fetch.
+
+3. **[pending] Values dict insertion order** — `asExpression / Values | FormEncoded` + `| JSONString` (2 tests) — form fields come out `lastName, phone, firstName, areaCode`. Root: `hs-values-absorb` in `runtime.sx` uses `dict-set!` but keys iterate in non-insertion order. Investigate `hs-gather-form-nodes` walk — the recursive `kids` traversal silently fails when `children` is a JS Array (not sx-list), so nested inputs arrive via a different path. Fix: either coerce children to sx-list at the gather boundary OR rewrite gather to explicitly use sx-level iteration helpers. Expected: +2.
+
+4. **[pending] `not` precedence over `or`** — `expressions/not` 3 tests (`not has higher precedence than or`, `not with numeric truthy/falsy`, `not with string truthy/falsy`). Check parser precedence — `not` should bind tighter than `or`. Fix in `parser.sx` expression-level precedence. Expected: +3.
+
+5. **[pending] `some` selector for nonempty match** — `expressions/some / some returns true for nonempty selector` (1 test). `some .class` probably returns the list, not a boolean. Runtime fix. Expected: +1.
+
+6. **[pending] string template `${x}`** — `expressions/strings / string templates work w/ props` + `w/ braces` (2 tests). Template interpolation isn't substituting property accesses. Check `hs-template` runtime. Expected: +2.
+
+7. **[pending] `put` hyperscript reprocessing** — `put / properly processes hyperscript at end/start/content/symbol` (4 tests, all `Expected 42, got 40`). After a put operation, newly inserted HS scripts aren't being activated. Fix: `hs-put-at!` should `hs-boot-subtree!` on the target after DOM insertion. Expected: +4.
+
+8. **[pending] `select returns selected text`** (1 test, `hs-upstream-select`). Likely `select` command needs to return `window.getSelection().toString()` equivalent. Add host-call to selection API in mock. Expected: +1.
+
+9. **[pending] `wait on event` basics** — `wait / can wait on event`, `on another element`, `waiting ... sets it to the event`, `destructure properties in a wait` (4 tests). Event-waiter suspension issue. Expected: +3-4.
+
+10. **[pending] `swap` variable ↔ property** — `swap / can swap a variable with a property` (1 test). Swap command doesn't handle mixed var/prop targets. Expected: +1.
+
+11. **[pending] `hide` strategy** — `hide / can configure hidden as default`, `can hide with custom strategy`, `can set default to custom strategy`, `hide element then show element retains original display` (4 tests). Strategy config plumbing. Expected: +3-4.
+
+12. **[pending] `show` multi-element + display retention** — `show / can show multiple elements with inline-block`, `can filter over a set of elements using the its symbol` (2 tests). Expected: +2.
+
+13. **[pending] `toggle` multi-class + timed + until-event** — `toggle` (3 assertion-fail tests). Expected: +3.
+
+14. **[pending] `unless` modifier** — `unlessModifier / unless can conditionally execute` (1 test). Parser/compiler addition. Expected: +1.
+
+15. **[pending] `transition` query-ref + multi-prop + initial** — `transition` 3 tests. Expected: +2-3.
+
+16. **[pending] `send can reference sender`** — 1 assertion fail. Expected: +1.
+
+17. **[pending] `tell` semantics** — `tell / attributes refer to the thing being told`, `does not overwrite me symbol`, `your symbol represents thing being told` (3 tests). Expected: +3.
+
+18. **[pending] `throw respond async/sync`** — `throw / can respond to async/sync exceptions in event handler` (2 tests). Expected: +2.
+
+### Bucket B: parser/compiler additions (medium risk, shared files)
+
+19. **[pending] `pick` regex + indices** — `pick` 13 tests. Regex match, flags, `of` syntax, start/end, negative indices. Big enough that a single commit might fail — break into pick-regex and pick-indices if needed. Expected: +10-13.
+
+20. **[pending] `repeat` property for-loops + where** — `repeat / basic property for loop`, `can nest loops`, `where clause can use the for loop variable name` (3 tests). Expected: +3.
+
+21. **[pending] `possessiveExpression` property access via its** — `possessive / can access its properties` (1 test, Expected `foo` got ``). Expected: +1.
+
+22. **[pending] window global fn fallback** — `regressions / can invoke functions w/ numbers in name` + unlocks several others. When calling `foo()` where `foo` isn't SX-defined, fall back to `(host-global "foo")`. Design decision: either compile-time emit `(or foo (host-global "foo"))` via a helper, or add runtime lookup in the dispatch path. Expected: +2-4.
+
+23. **[pending] `me symbol works in from expressions`** — `regressions` (1 test, Expected `Foo`). Check `from` expression compilation. Expected: +1.
+
+24. **[pending] `properly interpolates values 2`** — URL interpolation regression (1 test). Likely template string + property access. Expected: +1.
+
+25. **[pending] `can support parenthesized commands and features`** — `parser` (1 test, Expected `clicked`). Parser needs to accept `(cmd...)` grouping in more contexts. Expected: +1.
+
+### Bucket C: feature stubs (DOM observer mocks)
+
+26. **[pending] resize observer mock + `on resize`** — 3 tests. Add a minimal `ResizeObserver` mock to `hs-run-filtered.js`, plus parse/compile `on resize`. Expected: +3.
+
+27. **[pending] intersection observer mock + `on intersection`** — 3 tests. Mock `IntersectionObserver`; compile `on intersection` with margin/threshold modifiers. Expected: +3.
+
+28. **[pending] `ask`/`answer` + prompt/confirm mock** — `askAnswer` 4 tests. **Requires test-name-keyed mock**: first test wants `confirm → true`, second `confirm → false`, third `prompt → "Alice"`, fourth `prompt → null`. Keyed via `_current-test-name` in the runner. Expected: +4.
+
+29. **[pending] `hyperscript:before:init` / `:after:init` / `:parse-error` events** — 6 tests in `bootstrap` + `parser`. Fire DOM events at activation boundaries. Expected: +4-6.
+
+30. **[pending] `logAll` config** — 1 test. Global config that console.log's each command. Expected: +1.
+
+### Bucket D: medium features (bigger commits, plan-first)
+
+31. **[pending] runtime null-safety error reporting** — 18 tests in `runtimeErrors`. When accessing `.foo` on nil, emit a structured error with position info. One coordinated fix in the compiler emit paths for property access, function calls, set/put. Expected: +15-18.
+
+32. **[pending] MutationObserver mock + `on mutation` dispatch** — 15 tests in `on`. Add MO mock to runner. Compile `on mutation [of attribute/childList/attribute-specific]`. Expected: +10-15.
+
+33. **[pending] cookie API** — 5 tests in `expressions/cookies`. `document.cookie` mock in runner + `the cookies` + `set the xxx cookie` keywords. Expected: +5.
+
+34. **[pending] event modifier DSL** — 8 tests in `on`. `elsewhere`, `every`, `first click`, count filters (`once / twice / 3 times`, ranges), `from elsewhere`. Expected: +6-8.
+
+35. **[pending] namespaced `def`** — 3 tests. `def ns.foo() ...` creates `ns.foo`. Expected: +3.
+
+### Bucket E: subsystems (DO NOT LOOP — human-driven)
+
+36. **[blocked: needs design] WebSocket + `socket` + rpc proxy** — 16 tests. Ship only with intentional design review.
+
+37. **[blocked: needs design] Tokenizer-as-API** — 17 tests. Expose tokens as inspectable SX data.
+
+38. **[blocked: needs design] SourceInfo API** — 4 tests. `(get line N)` / `(get source N)` metadata on compiled AST.
+
+39. **[blocked: needs design] WebWorker plugin** — 1 test.
+
+40. **[blocked: needs design] Fetch non-2xx / before-fetch event / real response object** — 7 tests. Sinon-level route mocks or real fetch interception.
+
+### Bucket F: generator translation gaps (after bucket A-D)
+
+Many tests are `SKIP (untranslated)` because `tests/playwright/generate-sx-tests.py` bailed with `return None`. These need patches to the generator to recognize more JS test patterns. Estimated ~25 recoverable tests. Defer to a dedicated generator-repair cluster once the queue above drains.
+
+---
+
+## Ground rules for the loop agent
+
+1. **One cluster per commit.** Don't batch. Short commit message: `HS: <cluster name> (+N tests)`.
+2. **Baseline first, verify at the end.** Before starting: record the current pass count for the target suite AND for one smoke range (0-195). After fixing: rerun both. Abort and mark blocked if:
+   - Target suite didn't improve by at least +1.
+   - Smoke range regressed (any test flipped pass → fail).
+3. **Never edit `.sx` files with `Edit`/`Read`/`Write`.** Use sx-tree MCP (`sx_read_subtree`, `sx_replace_node`, `sx_insert_child`, `sx_insert_near`, `sx_replace_by_pattern`, `sx_rename_symbol`, `sx_validate`, `sx_write_file`).
+4. **Sync WASM staging.** After every edit to `lib/hyperscript/<f>.sx`, run `cp lib/hyperscript/<f>.sx shared/static/wasm/sx/hs-<f>.sx`.
+5. **Never edit `spec/tests/test-hyperscript-behavioral.sx` directly.** Fix the generator or the runtime.
+6. **Scope:** `lib/hyperscript/**`, `shared/static/wasm/sx/hs-*`, `tests/hs-run-filtered.js`, `tests/playwright/generate-sx-tests.py`, `plans/hs-conformance-to-100.md`. Do not touch `spec/evaluator.sx`, the broader SX kernel, or unrelated files.
+7. **Commit even partial fixes.** If you get +N where N is less than expected, commit what you have and mark the cluster `done (+N) — partial, <what's left>`.
+8. **If stuck >30min on a cluster**, mark it `blocked (<reason>)` in the plan and move to the next pending cluster.
+9. **Branch: `architecture`.** Commit locally. Never push. Never touch `main`.
+10. **Log every iteration** in the Progress log below: one paragraph, what you touched, delta, commit SHA.
+
+## Known gotchas
+
+- `env-bind!` creates bindings; `env-set!` mutates existing ones.
+- SX `do` is R7RS iteration — use `begin` for multi-expr sequences.
+- `cond` / `when` / `let` clause bodies evaluate only the last expr — wrap in `begin`.
+- `list?` in SX checks for `{_type:'list'}` — it returns **false** on raw JS Arrays. `host-get node "children"` returns a JS Array in the mock, so recursion via `(list? kids)` silently drops nested elements.
+- `append!` on a list-valued scoped var (`:s`) requires `emit-set` in the compiler — done, see commit 1613f551.
+- When symbol target is `the-result`, also sync `it` (done, see emit-set).
+- Hypertrace tests (196, 199, 200) and query-template test (615) hang under 200k step limit — always filter around them.
+- `repeat forever` tests (1197, 1198) also hang.
+
+## Progress log
+
+(Reverse chronological — newest at top.)
+
+### 2026-04-23 — cluster fixes session baseline
+- **6b0334af** — `HS: remove bare @attr, set X @attr, JSON clean, FormEncoded, HTML join` (+3)
+- **1613f551** — `HS add/append: Set dedup, @attr support, when-clause result tracking` (+6)
+- Pre-loop baseline: 1213/1496 (81.1%).