# Hyperscript conformance → 100% Goal: take the hyperscript upstream conformance suite from **1213/1496 (81%)** to a clean 100%. Queue-driven — single-agent loop on `architecture` branch, one cluster per commit. ## North star ``` Baseline: 1213/1496 (81.1%) Target: 1496/1496 Gap: 283 tests (130 real fails + 153 SKIPs) ``` Track after each iteration via `mcp__hs-test__hs_test_run` on the relevant suite, not the whole thing (full runs take 10+min and include hanging tests — 196/199/200/615/1197/1198 hang under the 200k step limit). ## How to run tests ``` mcp__hs-test__hs_test_run(suite="hs-upstream-") # fastest, one suite mcp__hs-test__hs_test_run(start=0, end=195) # early range mcp__hs-test__hs_test_run(start=201, end=614) # mid range (skip hypertrace hangs) mcp__hs-test__hs_test_run(start=616, end=1196) # late-1, skip repeat-forever hangs mcp__hs-test__hs_test_run(start=1199) # late-2 after hangs ``` ## File layout Runtime/compiler/parser live in `lib/hyperscript/*.sx`. The test runner at `tests/hs-run-filtered.js` loads `shared/static/wasm/sx/hs-*.sx` — **after every `.sx` edit you must `cp lib/hyperscript/.sx shared/static/wasm/sx/hs-.sx`**. The test fixtures live in `spec/tests/test-hyperscript-behavioral.sx`, generated from `tests/playwright/generate-sx-tests.py`. **Never edit the behavioral.sx fixture directly** — fix the generator or the runtime. ## Cluster queue Each cluster below is one commit. Order is rough — a loop agent may skip ahead if a predecessor is blocked. **Status:** `pending` / `in-progress` / `done (+N)` / `blocked ()`. ### Parallel-worktree mode When fanning out multiple clusters at once (Agent with `isolation: "worktree"`), each worktree agent: 1. Works on a fresh copy of the repo — no contention on the mutable tree. 2. Picks **one** cluster, runs the full loop (read, baseline, fix, sync WASM, verify smoke 0-195 + target suite, commit on the worktree's branch). 3. Leaves its branch + commit SHA for the orchestrator. **Does not push**, does not update `plans/hs-conformance-to-100.md` or the scoreboard — those updates happen in the orchestrator's cherry-pick commit so the ledger stays linear. 4. Scope inside the worktree is unchanged (`lib/hyperscript/**`, `shared/static/wasm/sx/hs-*`, `tests/hs-run-filtered.js`, `tests/playwright/generate-sx-tests.py` + its regen output `spec/tests/test-hyperscript-behavioral.sx`). Do **not** edit the plan or the scoreboard inside the worktree — that's the orchestrator's job. Orchestrator cherry-picks worktree commits onto `architecture` one at a time; resolves conflicts as they arrive (most will be trivial since each cluster lives in its own parser/compiler branch or in a different mock). **Good candidates to parallelise:** clusters that touch disjoint surfaces — e.g. 26 (resize observer) and 27 (intersection observer) edit the same mock file but different class stubs; 25 (parenthesised commands) is parser-only; 30 (logAll config) is bootstrap/integration-only. Avoid fanning out clusters that all rewrite the same dispatch spot (`emit-set`, `parse-expr`) in the same commit. **Cherry-pick footgun (observed 2026-04-24):** `sx-tree`'s pretty-printer reformats large regions when an edit lands in the middle of a big `let`/`fn` body. Two worktree commits whose logical diffs touch *different* defines in the same `.sx` file will still conflict textually because the pretty-print shuffles comments and indentation. Because `.sx` files can't be `Edit`-ed (hook blocks `Edit`/`Write`), conflict markers left by git are unrepairable. **Workaround:** when you see a conflict, abort the cherry-pick and re-apply the worktree commit surgically via `sx_replace_node`/`sx_insert_near` on the specific paths that changed. The logical diff is usually small (5–10 nodes); read it with `git show SHA file.sx` and apply it as a series of tree edits on top of current HEAD. ### Bucket A: runtime fixes, single-file (low risk, high yield) 1. **[done (+4)] fetch JSON unwrap** — `hs-upstream-fetch` 4 tests (`can do a simple fetch w/ json` + 3 variants) got `{:__host_handle N}`. Root: `hs-fetch` in `runtime.sx` returns raw host Response object instead of parsing JSON body. Fix: when format is `"json"`, unwrap via `host-get "_json"` and `json-parse`. Expected: +4. 2. **[done (+1)] element → HTML via outerHTML** — `asExpression / converts an element into HTML` (1 test) + unlocks response fetches. Mock DOM `El` class in `tests/hs-run-filtered.js` has no `outerHTML` getter. Add a getter computed from `tagName` + `attributes` + `children` (recurse). Expected: +1 direct, + knock-on in fetch. 3. **[done (+2)] Values dict insertion order** — `asExpression / Values | FormEncoded` + `| JSONString` (2 tests) — form fields come out `lastName, phone, firstName, areaCode`. Root: `hs-values-absorb` in `runtime.sx` uses `dict-set!` but keys iterate in non-insertion order. Investigate `hs-gather-form-nodes` walk — the recursive `kids` traversal silently fails when `children` is a JS Array (not sx-list), so nested inputs arrive via a different path. Fix: either coerce children to sx-list at the gather boundary OR rewrite gather to explicitly use sx-level iteration helpers. Expected: +2. 4. **[done (+3)] `not` precedence over `or`** — `expressions/not` 3 tests (`not has higher precedence than or`, `not with numeric truthy/falsy`, `not with string truthy/falsy`). Check parser precedence — `not` should bind tighter than `or`. Fix in `parser.sx` expression-level precedence. Expected: +3. 5. **[done (+1)] `some` selector for nonempty match** — `expressions/some / some returns true for nonempty selector` (1 test). `some .class` probably returns the list, not a boolean. Runtime fix. Expected: +1. 6. **[done (+2)] string template `${x}`** — `expressions/strings / string templates work w/ props` + `w/ braces` (2 tests). Template interpolation isn't substituting property accesses. Check `hs-template` runtime. Expected: +2. 7. **[done (+1) — partial, 3 tests remain: inserted-button handler doesn't fire for afterbegin/innerHTML paths; might need targeted trace of hs-boot-subtree! or _setInnerHTML timing] `put` hyperscript reprocessing** — `put / properly processes hyperscript at end/start/content/symbol` (4 tests, all `Expected 42, got 40`). After a put operation, newly inserted HS scripts aren't being activated. Fix: `hs-put-at!` should `hs-boot-subtree!` on the target after DOM insertion. Expected: +4. 8. **[done (+1)] `select returns selected text`** (1 test, `hs-upstream-select`). Runtime `hs-get-selection` helper reads `window.__test_selection` stash (or falls back to real `window.getSelection().toString()`). Compiler rewrites `(ref "selection")` to `(hs-get-selection)`. Generator detects the `createRange` / `setStart` / `setEnd` / `addRange` block and emits a single `(host-set! ... __test_selection ...)` op with the resolved text slice of the target element. Expected: +1. 9. **[done (+4)] `wait on event` basics** — `wait / can wait on event`, `on another element`, `waiting ... sets it to the event`, `destructure properties in a wait` (4 tests). Event-waiter suspension issue. Expected: +3-4. 10. **[done (+1)] `swap` variable ↔ property** — `swap / can swap a variable with a property` (1 test). Swap command doesn't handle mixed var/prop targets. Expected: +1. 11. **[done (+4)] `hide` strategy** — `hide / can configure hidden as default`, `can hide with custom strategy`, `can set default to custom strategy`, `hide element then show element retains original display` (4 tests). Strategy config plumbing. Expected: +3-4. 12. **[done (+2)] `show` multi-element + display retention** — `show / can show multiple elements with inline-block`, `can filter over a set of elements using the its symbol` (2 tests). Expected: +2. 13. **[done (+2) — partial, `can toggle for a fixed amount of time` needs an async mock scheduler (sync io-sleep collapses the toggle/un-toggle into one click frame)] `toggle` multi-class + timed + until-event** — `toggle` (3 assertion-fail tests). Expected: +3. 14. **[done (+1)] `unless` modifier** — `unlessModifier / unless can conditionally execute` (1 test). Parser/compiler addition. Expected: +1. 15. **[done (+2) — partial, `can use initial to transition to original value` needs `on click N` count-filtered events (same sync-mock block as clusters 11/13)] `transition` query-ref + multi-prop + initial** — `transition` 3 tests. Expected: +2-3. 16. **[done (+1)] `send can reference sender`** — 1 assertion fail. Expected: +1. 17. **[blocked: tell semantics are subtle — `me` should stay as the original element for explicit `to me` writes but the implicit default for bare `add .bar` inside `tell X` should be X. Attempted just leaving `you`/`yourself` scoped (dropping the `me` shadow) regressed 4 passing tests (`restores proper implicit me`, `works with an array`, etc.) which rely on bare commands using `me` as told-target. Proper fix requires a `beingTold` symbol distinct from `me`, with bare commands compiling to `beingTold-or-me` and explicit `me` always the original — more than a 30-min cluster budget.] `tell` semantics** — `tell / attributes refer to the thing being told`, `does not overwrite me symbol`, `your symbol represents thing being told` (3 tests). Expected: +3. 18. **[done (+2)] `throw respond async/sync`** — `throw / can respond to async/sync exceptions in event handler` (2 tests). Expected: +2. ### Bucket B: parser/compiler additions (medium risk, shared files) 19. **[done (+13)] `pick` regex + indices** — `pick` 13 tests. Regex match, flags, `of` syntax, start/end, negative indices. Big enough that a single commit might fail — break into pick-regex and pick-indices if needed. Expected: +10-13. 20. **[done (+3)] `repeat` property for-loops + where** — `repeat / basic property for loop`, `can nest loops`, `where clause can use the for loop variable name` (3 tests). Expected: +3. 21. **[done (+1)] `possessiveExpression` property access via its** — `possessive / can access its properties` (1 test, Expected `foo` got ``). Expected: +1. 22. **[done (+1)] window global fn fallback** — `regressions / can invoke functions w/ numbers in name` + `can refer to function in init blocks`. Added `host-call-fn` FFI primitive (commit 337c8265), `hs-win-call` runtime helper, simplified compiler emit (direct hs-win-call, no guard), `def` now also registers fn on `window[name]`. Generator: fixed `\"` escaping in hs-compile string literals. Expected: +2-4. 23. **[done (+1)] `me symbol works in from expressions`** — `regressions` (1 test, Expected `Foo`). Check `from` expression compilation. Expected: +1. 24. **[done (+1)] `properly interpolates values 2`** — URL interpolation regression (1 test). Likely template string + property access. Expected: +1. 25. **[done (+1)] `can support parenthesized commands and features`** — `parser` (1 test, Expected `clicked`). Parser needs to accept `(cmd...)` grouping in more contexts. Expected: +1. ### Bucket C: feature stubs (DOM observer mocks) 26. **[done (+3)] resize observer mock + `on resize`** — 3 tests. Add a minimal `ResizeObserver` mock to `hs-run-filtered.js`, plus parse/compile `on resize`. Expected: +3. 27. **[done (+3)] intersection observer mock + `on intersection`** — 3 tests. Mock `IntersectionObserver`; compile `on intersection` with margin/threshold modifiers. Expected: +3. 28. **[done (+4)] `ask`/`answer` + prompt/confirm mock** — `askAnswer` 4 tests. **Requires test-name-keyed mock**: first test wants `confirm → true`, second `confirm → false`, third `prompt → "Alice"`, fourth `prompt → null`. Keyed via `_current-test-name` in the runner. Expected: +4. 29. **[done (+2) — partial, 4 parser-error tests remain (basic parse error messages, parse-error event, EOF newline crash, evaluate-api-first-error). All require stricter parser error-rejection — `add - to` currently parses silently to `(set! nil (hs-add-to! (- 0 nil) nil))`, `on click blargh end on mouseenter also_bad` parses silently to `(do (hs-on me "click" (fn (event) blargh)) (hs-on me "mouseenter" (fn (event) also_bad)))`. Plus emit-error-collection runtime + hyperscript:parse-error event with detail.errors. Larger than a single cluster budget; recommend bucket-D plan-first.] `hyperscript:before:init` / `:after:init` / `:parse-error` events** — 6 tests in `bootstrap` + `parser`. Fire DOM events at activation boundaries. Expected: +4-6. 30. **[done (+1)] `logAll` config** — 1 test. Global config that console.log's each command. Expected: +1. ### Bucket D: medium features (bigger commits, plan-first) 31. **[blocked: Bucket-D plan-first scope, doesn't fit one cluster budget. All 18 tests are SKIP (untranslated) — generator has no `error("HS")` helper. Required pieces: (a) generator-side `eval-hs-error` helper + recognizer for `expect(await error("HS")).toBe("MSG")` blocks; (b) runtime helpers `hs-null-error!` / `hs-named-target` / `hs-named-target-list` raising `'' is null`; (c) compiler patches at every target-position `(query SEL)` emit to wrap in named-target carrying the original selector source — that's ~17 command emit paths (add, remove, hide, show, measure, settle, trigger, send, set, default, increment, decrement, put, toggle, transition, append, take); (d) function-call null-check at bare `(name)`, `hs-method-call`, and `host-get` chains, deriving the leftmost-uncalled-name `'x'` / `'x.y'` from the parse tree; (e) possessive-base null-check (`set x's y to true` → `'x' is null`). Each piece is straightforward in isolation but the cross-cutting compiler change touches every emit path and needs a coordinated design pass. Recommend a dedicated design doc + multi-commit worktree like buckets E36-E40.] runtime null-safety error reporting** — 18 tests in `runtimeErrors`. When accessing `.foo` on nil, emit a structured error with position info. One coordinated fix in the compiler emit paths for property access, function calls, set/put. Expected: +15-18. 32. **[done (+7)] MutationObserver mock + `on mutation` dispatch** — 7 tests in `on`. Add MO mock to runner. Compile `on mutation [of attribute/childList/attribute-specific]`. Expected: +10-15. 33. **[done (+4) — partial, 1 test remains: `iterate cookies values work` needs `hs-for-each` to recognise host-array/proxy collections (currently `(list? collection)` returns false for the JS Proxy so the loop body never runs). Out of scope.] cookie API** — 5 tests in `expressions/cookies`. `document.cookie` mock in runner + `the cookies` + `set the xxx cookie` keywords. Expected: +5. 34. **[done (+7) — partial, 1 test remains: `every` keyword multi-handler-execute test needs handler-queue semantics where `wait for X` doesn't block subsequent invocations of the same handler — current `hs-on-every` shares the same dom-listen plumbing as `hs-on` and queues events implicitly via JS event loop, so the third synthetic click waits for the prior handler's `wait for customEvent` to settle. Out of single-cluster scope.] event modifier DSL** — 8 tests in `on`. `elsewhere`, `every`, `first click`, count filters (`once / twice / 3 times`, ranges), `from elsewhere`. Expected: +6-8. 35. **[done (+3)] namespaced `def`** — 3 tests. `def ns.foo() ...` creates `ns.foo`. Expected: +3. ### Bucket E: subsystems (DO NOT LOOP — human-driven) All five have design docs on their own worktree branches pending review + merge. After merge, status flips to `design-ready` and they become eligible for the loop. 36. **[design-done, pending review — `plans/designs/e36-websocket.md` on `worktree-agent-a9daf73703f520257`] WebSocket + `socket`** — 16 tests. Upstream shape is `socket NAME URL [with timeout N] [on message [as JSON] …] end` with an **implicit `.rpc` Proxy** (ES6 Proxy lives in JS, not SX), not `with proxy { send, receive }` as this row previously claimed. Design doc has 8-commit checklist, +12–16 delta estimate. Ship only with intentional design review. 37. **[design-done, pending review — `plans/designs/e37-tokenizer-api.md` on `worktree-agent-a6bb61d59cc0be8b4`] Tokenizer-as-API** — 17 tests. Expose tokens as inspectable SX data via `hs-tokens-of` / `hs-stream-token` / `hs-token-type` etc; type-map current `hs-tokenize` output to upstream SCREAMING_SNAKE_CASE. 8-step checklist, +16–17 delta. 38. **[design-done, pending review — `plans/designs/e38-sourceinfo.md` on `agent-e38-sourceinfo`] SourceInfo API** — 4 tests. Inline span-wrapper strategy (not side-channel dict) with compiler-entry unwrap. 4-commit plan. 39. **[design-done, pending review — `plans/designs/e39-webworker.md` on `hs-design-e39-webworker`] WebWorker plugin** — 1 test. Parser-only stub that errors with a link to upstream docs; no runtime, no mock Worker class. Hand-write the test (don't patch the generator). Single commit. 40. **[design-done, pending review — `plans/designs/e40-real-fetch.md` on `worktree-agent-a94612a4283eaa5e0`] Fetch non-2xx / before-fetch event / real response object** — 7 tests. SX-dict Response wrapper `{:_hs-response :ok :status :url :_body :_json :_html}`; restructured `hs-fetch` that always fetches wrapper then converts by format; test-name-keyed `_fetchScripts`. 11-step checklist. Watch for regression on cluster-1 JSON unwrap. ### Bucket F: generator translation gaps (after bucket A-D) Many tests are `SKIP (untranslated)` because `tests/playwright/generate-sx-tests.py` bailed with `return None`. These need patches to the generator to recognize more JS test patterns. Estimated ~25 recoverable tests. Defer to a dedicated generator-repair cluster once the queue above drains. --- ## Ground rules for the loop agent 1. **One cluster per commit.** Don't batch. Short commit message: `HS: (+N tests)`. 2. **Baseline first, verify at the end.** Before starting: record the current pass count for the target suite AND for one smoke range (0-195). After fixing: rerun both. Abort and mark blocked if: - Target suite didn't improve by at least +1. - Smoke range regressed (any test flipped pass → fail). 3. **Never edit `.sx` files with `Edit`/`Read`/`Write`.** Use sx-tree MCP (`sx_read_subtree`, `sx_replace_node`, `sx_insert_child`, `sx_insert_near`, `sx_replace_by_pattern`, `sx_rename_symbol`, `sx_validate`, `sx_write_file`). 4. **Sync WASM staging.** After every edit to `lib/hyperscript/.sx`, run `cp lib/hyperscript/.sx shared/static/wasm/sx/hs-.sx`. 5. **Never edit `spec/tests/test-hyperscript-behavioral.sx` directly.** Fix the generator or the runtime. 6. **Scope:** `lib/hyperscript/**`, `shared/static/wasm/sx/hs-*`, `tests/hs-run-filtered.js`, `tests/playwright/generate-sx-tests.py`, `plans/hs-conformance-to-100.md`. Do not touch `spec/evaluator.sx`, the broader SX kernel, or unrelated files. 7. **Commit even partial fixes.** If you get +N where N is less than expected, commit what you have and mark the cluster `done (+N) — partial, `. 8. **If stuck >30min on a cluster**, mark it `blocked ()` in the plan and move to the next pending cluster. 9. **Branch: `architecture`.** Commit locally. Never push. Never touch `main`. 10. **Log every iteration** in the Progress log below: one paragraph, what you touched, delta, commit SHA. 11. **Update the scoreboard** at `plans/hs-conformance-scoreboard.md` in the SAME plan-update commit: bump the `Merged:` line, update the row's `Status` / `Δ` / `Commit`, and adjust the buckets roll-up counts. 12. **Also expand scope to include** `plans/hs-conformance-scoreboard.md` (for rule 6 purposes). ## Known gotchas - `env-bind!` creates bindings; `env-set!` mutates existing ones. - SX `do` is R7RS iteration — use `begin` for multi-expr sequences. - `cond` / `when` / `let` clause bodies evaluate only the last expr — wrap in `begin`. - `list?` in SX checks for `{_type:'list'}` — it returns **false** on raw JS Arrays. `host-get node "children"` returns a JS Array in the mock, so recursion via `(list? kids)` silently drops nested elements. - `append!` on a list-valued scoped var (`:s`) requires `emit-set` in the compiler — done, see commit 1613f551. - When symbol target is `the-result`, also sync `it` (done, see emit-set). - Hypertrace tests (196, 199, 200) and query-template test (615) hang under 200k step limit — always filter around them. - `repeat forever` tests (1197, 1198) also hang. ## Progress log (Reverse chronological — newest at top.) ### 2026-04-25 — Bucket F: in-expression filter semantics (+1) - **67a5f137** — `HS: in-expression filter semantics (+1 test)`. `1 in [1, 2, 3]` was returning boolean `true` instead of the filtered list `(list 1)`. Root cause: `in?` compiled to `hs-contains?` which returns boolean for scalar items. Fix: (a) `runtime.sx` adds `hs-in?` returning filtered list for all cases, plus `hs-in-bool?` which wraps with `(not (hs-falsy? ...))` for boolean contexts; (b) `compiler.sx` changes `in?` clause to emit `(hs-in? collection item)` and adds new `in-bool?` clause emitting `(hs-in-bool? collection item)`; (c) `parser.sx` changes `is in` and `am in` comparison forms to produce `in-bool?` so those stay boolean. Suite hs-upstream-expressions/in: 8/9 → 9/9. Smoke 0-195: 173/195 unchanged. ### 2026-04-25 — cluster 22 window global fn fallback (+1) - **d31565d5** — `HS cluster 22: simplify win-call emit + def→window + init-blocks test (+1)`. Two-part change building on 337c8265 (host-call-fn FFI + hs-win-call runtime). (a) `compiler.sx` removes the guard wrapper from bare-call and method-call `hs-win-call` emit paths — direct `(hs-win-call name (list args))` is sufficient since hs-win-call returns nil for unknown names; `def` compilation now also emits `(host-set! (host-global "window") name fn)` so every HS-defined function is reachable via window lookup. (b) `generate-sx-tests.py` fixes a quoting bug: `\"here\"` was being embedded as three SX nodes (`""` + symbol + `""`) instead of a single escaped-quote string; fixed with `\\\"` escaping. Hand-rolled deftest for `can refer to function in init blocks` now passes. Suite hs-upstream-core/regressions: 13/16 → 14/16. Smoke 0-195: 172/195 → 173/195. ### 2026-04-25 — cluster 11/33 followups: hide strategy + cookie clear (+2) - **5ff2b706** — `HS: cluster 11/33 followups (+2 tests)`. Three orthogonal fixes that pick up tests now unblocked by earlier work. (a) `parser.sx` `parse-hide-cmd`/`parse-show-cmd`: added `on` to the keyword set that flips the implicit-`me` target. Previously `on click 1 hide on click 2 show` silently parsed as `(hs-hide! nil ...)` because `parse-expr` started consuming `on` and returned nil; now hide/show recognise a sibling feature and default to `me`. (b) `runtime.sx` `hs-method-call` fallback for non-built-in methods: SX-callables (lambdas) call via `apply`, JS-native functions (e.g. `cookies.clear`) dispatch via `(apply host-call (cons obj (cons method args)))` so the native receives the args list. (c) Generator `hs-cleanup!` body wrapped in `begin` (fn body evaluates only the last expr) and now resets `hs-set-default-hide-strategy! nil` + `hs-set-log-all! false` between tests — the prior `can set default to custom strategy` cluster-11 test had been leaking `_hs-default-hide-strategy` into the rest of the suite, breaking `hide element then show element retains original display`. New cluster-33 hand-roll for `basic clear cookie values work` exercises the method-call fallback. Suite hs-upstream-hide: 15/16 → 16/16. Suite hs-upstream-expressions/cookies: 3/5 → 4/5. Smoke 0-195 unchanged at 172/195. ### 2026-04-25 — cluster 35 namespaced def + script-tag globals (+3) - **122053ed** — `HS: namespaced def + script-tag global functions (+3 tests)`. Two-part change: (a) `runtime.sx` `hs-method-call` gains a fallback for unknown methods — `(let ((fn-val (host-get obj method))) (if (callable? fn-val) (apply fn-val args) nil))`. This lets `utils.foo()` dispatch through `(host-get utils "foo")` when `utils` is an SX dict whose `foo` is an SX lambda. (b) Generator hand-rolls 3 deftests since the SX runtime has no `