Files
rose-ash/plans/hs-conformance-to-100.md

14 KiB

Hyperscript conformance → 100%

Goal: take the hyperscript upstream conformance suite from 1213/1496 (81%) to a clean 100%. Queue-driven — single-agent loop on architecture branch, one cluster per commit.

North star

Baseline: 1213/1496 (81.1%)
Target:   1496/1496
Gap:      283 tests  (130 real fails + 153 SKIPs)

Track after each iteration via mcp__hs-test__hs_test_run on the relevant suite, not the whole thing (full runs take 10+min and include hanging tests — 196/199/200/615/1197/1198 hang under the 200k step limit).

How to run tests

mcp__hs-test__hs_test_run(suite="hs-upstream-<cluster>")        # fastest, one suite
mcp__hs-test__hs_test_run(start=0, end=195)                     # early range
mcp__hs-test__hs_test_run(start=201, end=614)                   # mid range (skip hypertrace hangs)
mcp__hs-test__hs_test_run(start=616, end=1196)                  # late-1, skip repeat-forever hangs
mcp__hs-test__hs_test_run(start=1199)                           # late-2 after hangs

File layout

Runtime/compiler/parser live in lib/hyperscript/*.sx. The test runner at tests/hs-run-filtered.js loads shared/static/wasm/sx/hs-*.sxafter every .sx edit you must cp lib/hyperscript/<file>.sx shared/static/wasm/sx/hs-<file>.sx.

The test fixtures live in spec/tests/test-hyperscript-behavioral.sx, generated from tests/playwright/generate-sx-tests.py. Never edit the behavioral.sx fixture directly — fix the generator or the runtime.

Cluster queue

Each cluster below is one commit. Order is rough — a loop agent may skip ahead if a predecessor is blocked. Status: pending / in-progress / done (+N) / blocked (<reason>).

Bucket A: runtime fixes, single-file (low risk, high yield)

  1. [done (+4)] fetch JSON unwraphs-upstream-fetch 4 tests (can do a simple fetch w/ json + 3 variants) got {:__host_handle N}. Root: hs-fetch in runtime.sx returns raw host Response object instead of parsing JSON body. Fix: when format is "json", unwrap via host-get "_json" and json-parse. Expected: +4.

  2. [done (+1)] element → HTML via outerHTMLasExpression / converts an element into HTML (1 test) + unlocks response fetches. Mock DOM El class in tests/hs-run-filtered.js has no outerHTML getter. Add a getter computed from tagName + attributes + children (recurse). Expected: +1 direct, + knock-on in fetch.

  3. [done (+2)] Values dict insertion orderasExpression / Values | FormEncoded + | JSONString (2 tests) — form fields come out lastName, phone, firstName, areaCode. Root: hs-values-absorb in runtime.sx uses dict-set! but keys iterate in non-insertion order. Investigate hs-gather-form-nodes walk — the recursive kids traversal silently fails when children is a JS Array (not sx-list), so nested inputs arrive via a different path. Fix: either coerce children to sx-list at the gather boundary OR rewrite gather to explicitly use sx-level iteration helpers. Expected: +2.

  4. [pending] not precedence over orexpressions/not 3 tests (not has higher precedence than or, not with numeric truthy/falsy, not with string truthy/falsy). Check parser precedence — not should bind tighter than or. Fix in parser.sx expression-level precedence. Expected: +3.

  5. [pending] some selector for nonempty matchexpressions/some / some returns true for nonempty selector (1 test). some .class probably returns the list, not a boolean. Runtime fix. Expected: +1.

  6. [pending] string template ${x}expressions/strings / string templates work w/ props + w/ braces (2 tests). Template interpolation isn't substituting property accesses. Check hs-template runtime. Expected: +2.

  7. [pending] put hyperscript reprocessingput / properly processes hyperscript at end/start/content/symbol (4 tests, all Expected 42, got 40). After a put operation, newly inserted HS scripts aren't being activated. Fix: hs-put-at! should hs-boot-subtree! on the target after DOM insertion. Expected: +4.

  8. [pending] select returns selected text (1 test, hs-upstream-select). Likely select command needs to return window.getSelection().toString() equivalent. Add host-call to selection API in mock. Expected: +1.

  9. [pending] wait on event basicswait / can wait on event, on another element, waiting ... sets it to the event, destructure properties in a wait (4 tests). Event-waiter suspension issue. Expected: +3-4.

  10. [pending] swap variable ↔ propertyswap / can swap a variable with a property (1 test). Swap command doesn't handle mixed var/prop targets. Expected: +1.

  11. [pending] hide strategyhide / can configure hidden as default, can hide with custom strategy, can set default to custom strategy, hide element then show element retains original display (4 tests). Strategy config plumbing. Expected: +3-4.

  12. [pending] show multi-element + display retentionshow / can show multiple elements with inline-block, can filter over a set of elements using the its symbol (2 tests). Expected: +2.

  13. [pending] toggle multi-class + timed + until-eventtoggle (3 assertion-fail tests). Expected: +3.

  14. [pending] unless modifierunlessModifier / unless can conditionally execute (1 test). Parser/compiler addition. Expected: +1.

  15. [pending] transition query-ref + multi-prop + initialtransition 3 tests. Expected: +2-3.

  16. [pending] send can reference sender — 1 assertion fail. Expected: +1.

  17. [pending] tell semanticstell / attributes refer to the thing being told, does not overwrite me symbol, your symbol represents thing being told (3 tests). Expected: +3.

  18. [pending] throw respond async/syncthrow / can respond to async/sync exceptions in event handler (2 tests). Expected: +2.

Bucket B: parser/compiler additions (medium risk, shared files)

  1. [pending] pick regex + indicespick 13 tests. Regex match, flags, of syntax, start/end, negative indices. Big enough that a single commit might fail — break into pick-regex and pick-indices if needed. Expected: +10-13.

  2. [pending] repeat property for-loops + whererepeat / basic property for loop, can nest loops, where clause can use the for loop variable name (3 tests). Expected: +3.

  3. [pending] possessiveExpression property access via itspossessive / can access its properties (1 test, Expected foo got ``). Expected: +1.

  4. [pending] window global fn fallbackregressions / can invoke functions w/ numbers in name + unlocks several others. When calling foo() where foo isn't SX-defined, fall back to (host-global "foo"). Design decision: either compile-time emit (or foo (host-global "foo")) via a helper, or add runtime lookup in the dispatch path. Expected: +2-4.

  5. [pending] me symbol works in from expressionsregressions (1 test, Expected Foo). Check from expression compilation. Expected: +1.

  6. [pending] properly interpolates values 2 — URL interpolation regression (1 test). Likely template string + property access. Expected: +1.

  7. [pending] can support parenthesized commands and featuresparser (1 test, Expected clicked). Parser needs to accept (cmd...) grouping in more contexts. Expected: +1.

Bucket C: feature stubs (DOM observer mocks)

  1. [pending] resize observer mock + on resize — 3 tests. Add a minimal ResizeObserver mock to hs-run-filtered.js, plus parse/compile on resize. Expected: +3.

  2. [pending] intersection observer mock + on intersection — 3 tests. Mock IntersectionObserver; compile on intersection with margin/threshold modifiers. Expected: +3.

  3. [pending] ask/answer + prompt/confirm mockaskAnswer 4 tests. Requires test-name-keyed mock: first test wants confirm → true, second confirm → false, third prompt → "Alice", fourth prompt → null. Keyed via _current-test-name in the runner. Expected: +4.

  4. [pending] hyperscript:before:init / :after:init / :parse-error events — 6 tests in bootstrap + parser. Fire DOM events at activation boundaries. Expected: +4-6.

  5. [pending] logAll config — 1 test. Global config that console.log's each command. Expected: +1.

Bucket D: medium features (bigger commits, plan-first)

  1. [pending] runtime null-safety error reporting — 18 tests in runtimeErrors. When accessing .foo on nil, emit a structured error with position info. One coordinated fix in the compiler emit paths for property access, function calls, set/put. Expected: +15-18.

  2. [pending] MutationObserver mock + on mutation dispatch — 15 tests in on. Add MO mock to runner. Compile on mutation [of attribute/childList/attribute-specific]. Expected: +10-15.

  3. [pending] cookie API — 5 tests in expressions/cookies. document.cookie mock in runner + the cookies + set the xxx cookie keywords. Expected: +5.

  4. [pending] event modifier DSL — 8 tests in on. elsewhere, every, first click, count filters (once / twice / 3 times, ranges), from elsewhere. Expected: +6-8.

  5. [pending] namespaced def — 3 tests. def ns.foo() ... creates ns.foo. Expected: +3.

Bucket E: subsystems (DO NOT LOOP — human-driven)

  1. [blocked: needs design] WebSocket + socket + rpc proxy — 16 tests. Ship only with intentional design review.

  2. [blocked: needs design] Tokenizer-as-API — 17 tests. Expose tokens as inspectable SX data.

  3. [blocked: needs design] SourceInfo API — 4 tests. (get line N) / (get source N) metadata on compiled AST.

  4. [blocked: needs design] WebWorker plugin — 1 test.

  5. [blocked: needs design] Fetch non-2xx / before-fetch event / real response object — 7 tests. Sinon-level route mocks or real fetch interception.

Bucket F: generator translation gaps (after bucket A-D)

Many tests are SKIP (untranslated) because tests/playwright/generate-sx-tests.py bailed with return None. These need patches to the generator to recognize more JS test patterns. Estimated ~25 recoverable tests. Defer to a dedicated generator-repair cluster once the queue above drains.


Ground rules for the loop agent

  1. One cluster per commit. Don't batch. Short commit message: HS: <cluster name> (+N tests).
  2. Baseline first, verify at the end. Before starting: record the current pass count for the target suite AND for one smoke range (0-195). After fixing: rerun both. Abort and mark blocked if:
    • Target suite didn't improve by at least +1.
    • Smoke range regressed (any test flipped pass → fail).
  3. Never edit .sx files with Edit/Read/Write. Use sx-tree MCP (sx_read_subtree, sx_replace_node, sx_insert_child, sx_insert_near, sx_replace_by_pattern, sx_rename_symbol, sx_validate, sx_write_file).
  4. Sync WASM staging. After every edit to lib/hyperscript/<f>.sx, run cp lib/hyperscript/<f>.sx shared/static/wasm/sx/hs-<f>.sx.
  5. Never edit spec/tests/test-hyperscript-behavioral.sx directly. Fix the generator or the runtime.
  6. Scope: lib/hyperscript/**, shared/static/wasm/sx/hs-*, tests/hs-run-filtered.js, tests/playwright/generate-sx-tests.py, plans/hs-conformance-to-100.md. Do not touch spec/evaluator.sx, the broader SX kernel, or unrelated files.
  7. Commit even partial fixes. If you get +N where N is less than expected, commit what you have and mark the cluster done (+N) — partial, <what's left>.
  8. If stuck >30min on a cluster, mark it blocked (<reason>) in the plan and move to the next pending cluster.
  9. Branch: architecture. Commit locally. Never push. Never touch main.
  10. Log every iteration in the Progress log below: one paragraph, what you touched, delta, commit SHA.

Known gotchas

  • env-bind! creates bindings; env-set! mutates existing ones.
  • SX do is R7RS iteration — use begin for multi-expr sequences.
  • cond / when / let clause bodies evaluate only the last expr — wrap in begin.
  • list? in SX checks for {_type:'list'} — it returns false on raw JS Arrays. host-get node "children" returns a JS Array in the mock, so recursion via (list? kids) silently drops nested elements.
  • append! on a list-valued scoped var (:s) requires emit-set in the compiler — done, see commit 1613f551.
  • When symbol target is the-result, also sync it (done, see emit-set).
  • Hypertrace tests (196, 199, 200) and query-template test (615) hang under 200k step limit — always filter around them.
  • repeat forever tests (1197, 1198) also hang.

Progress log

(Reverse chronological — newest at top.)

2026-04-23 — cluster 3 Values dict insertion order

  • e59c0b8eHS: Values dict insertion order (+2 tests). Root cause was the OCaml kernel's dict implementation iterating keys in scrambled (non-insertion) order. Added _order hidden list tracked by hs-values-absorb, and taught hs-coerce FormEncoded/JSONString branches to iterate via _order when present (filtering the _order marker out). Suite hs-upstream-expressions/asExpression: 28/42 → 30/42. Smoke 0-195: 162/195 unchanged.

2026-04-23 — cluster 2 element→HTML via outerHTML

  • e195b5bdHS: element → HTML via outerHTML (+1 test). Added an outerHTML getter on the mock El class in tests/hs-run-filtered.js. Merges .id/.className (host-set! targets) with .attributes, falls back to innerText/textContent. Suite hs-upstream-expressions/asExpression: 27/42 → 28/42. Smoke 0-195: 162/195 unchanged.

2026-04-23 — cluster 1 fetch JSON unwrap

  • 39a597e9HS: fetch JSON unwrap (+4 tests). Added hs-host-to-sx helper in runtime.sx that converts raw host-handle JS objects/arrays to proper SX dicts/lists via Object.keys/Array walks. hs-fetch now calls it on the result when format is "json". Detects host-handle dicts by checking (host-get v "_type") == "dict" — genuine SX dicts have the marker, host handles don't. Suite hs-upstream-fetch: 11/23 → 15/23. Smoke 0-195: 162/195 unchanged.

2026-04-23 — cluster fixes session baseline

  • 6b0334afHS: remove bare @attr, set X @attr, JSON clean, FormEncoded, HTML join (+3)
  • 1613f551HS add/append: Set dedup, @attr support, when-clause result tracking (+6)
  • Pre-loop baseline: 1213/1496 (81.1%).