Files

giles 67d4b9dae5 HS-design: E38 SourceInfo API

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-24 07:08:02 +00:00

12 KiB

Raw Blame History

E38 — SourceInfo API (design)

Cluster 38 of plans/hs-conformance-to-100.md. Goal: 4 tests in hs-upstream-core/sourceInfo that exercise _hyperscript.parse(src).sourceFor() and .lineFor().

Upstream reference: /tmp/hs-upstream/test/core/sourceInfo.js, /tmp/hs-upstream/src/parsetree/base.js (29 lines of impl — sourceFor() slices programSource[startToken.start..endToken.end], lineFor() returns programSource.split("\n")[startToken.line-1]).

1. Failing tests

All four currently SKIP (untranslated) (lines 2434–2442 of spec/tests/test-hyperscript-behavioral.sx).

#	Test name	What it asserts
1	`debug`	`parse("<button.foo/>").sourceFor() == "<button.foo/>"` — single-token round-trip.
2	`get source works for expressions`	7 separate `parse(…).sourceFor()` checks over `1`, `a.b`, `a.b()`, `<button.foo/>`, `x + y`, `'foo'`, `.foo`, `#bar`. Also navigates: `elt.root.sourceFor()` ⇒ `"a"` for `"a.b"`; `elt.root.root` for `"a.b()"`; `elt.lhs`/`elt.rhs` for `"x + y"`.
3	`get source works for statements`	`if true log 'it was true'` and `for x in [1, 2, 3] log x then log x end` each round-trip through `sourceFor()`.
4	`get line works for statements`	`parse("if true\n log 'it was true'\n log 'it was true'")` — `elt.lineFor()` ⇒ `"if true"`, `elt.trueBranch.lineFor()` ⇒ `" log 'it was true'"`, `elt.trueBranch.next.lineFor()` ⇒ `" log 'it was true'"`.

Key demand: the AST must (a) retain a {start, end, line} span per node; (b) expose navigable sub-nodes (root, lhs, rhs, trueBranch, next); (c) provide sourceFor/lineFor keyed off the original program source.

2. Proposed API

User-visible surface, kept minimal:

(hs-parse-ast "SRC")                  ; → parsed node (an AST handle, see §3)
(hs-source-for NODE)                  ; → substring of original source
(hs-line-for   NODE)                  ; → full source line containing NODE's start
(hs-node-get   NODE KEY)              ; → child AST node at field (root / lhs / rhs / true-branch / next …)

NODE is a parsed-but-uncompiled AST. It is not a compiled handler, not a runtime event. The upstream API mirrors this: _hyperscript.parse(src) returns a parse tree, never a closure. Keeping the feature scoped to parser output avoids retro-fitting spans onto bytecode or closures.

For the generator's benefit we expose two thin helpers at the test layer only:

(hs-src src)                          ; = (hs-source-for (hs-parse-ast src))
(hs-src-at src field-path)            ; = walk (hs-node-get … key) then source-for

We do not add (get line thing) as a DSL keyword. That phrase in the plan row was shorthand — the tests actually call host methods .sourceFor() / .lineFor(), not hyperscript statements. Keeping this out of the HS grammar keeps the surface area near zero.

3. Attach strategy

The tokenizer and parser already have the raw material; the information is dropped at two points.

Walk-through

Stage	File	State today	Change
Tokenize	`lib/hyperscript/tokenizer.sx`	Tokens are `{:type T :value V :pos P}`. Only `start` offset tracked; no `end`, no `line`.	Extend `hs-make-token` → `{:type :value :pos :end :line}`. Track a `current-line` counter in `hs-tokenize` that increments on `\n`. `:end` = index after last consumed char.
Parse	`lib/hyperscript/parser.sx`	`hs-parse` takes `(tokens src)`, returns bare SX lists/symbols. Source offsets are consumed internally (see `collect-sx-source` at path `(0 2 2 69)`) but never stored on the output AST.	For every production that returns a node, attach a span dict: wrap the output in `{:hs-ast true :kind … :start START :end END :line LINE :src SRC :children CHILDREN :fields FIELDS}`. Children preserve the SX list an `hs-compile` downstream currently consumes; `fields` is a small dict mapping `:root :lhs :rhs :true-branch :next …` to sub-nodes.
Compile	`lib/hyperscript/compiler.sx`	`hs-to-sx` consumes the bare list AST and emits runtime calls.	Add a thin unwrap step at the entry: if the AST is a span-wrapped dict, pull `:children` (or equivalent raw list) and continue. No per-production rewiring — the wrapped form passes through unchanged for every existing callsite.
Runtime	`lib/hyperscript/runtime.sx`	Compiled code never sees AST nodes.	No change. SourceInfo lives on the parse tree, not on compiled handlers.

Side-channel vs inline

Inline wrapper dict is the cheaper option, because:

Parser output is already heterogeneous (lists, symbols, strings, numbers). A dict wrapper is distinguishable by (dict? x) + (dict-get x :hs-ast) — no risk of collision.
A side-channel (map node → span) would need identity semantics, and SX lists don't have stable identity after any structural transform. We would end up cloning everything.
The compiler's existing hs-to-sx dispatch is on (first ast). The unwrap step is a single cond branch at its top.

Field dictionary

The parser emits nodes in many shapes. :fields names a handful of them so hs-node-get can navigate without the caller learning SX shape. Mapping (from the upstream tests):

Upstream accessor	Our field key	Produced by
`.root`	`:root`	symbol-with-member / call expressions (`a.b`, `a.b()`). For `a.b` the root is `a`; for `a.b()` the root is `a.b`.
`.lhs` / `.rhs`	`:lhs` / `:rhs`	binary operators (`x + y`).
`.trueBranch`	`:true-branch`	`if` command; the first command in the consequent.
`.next`	`:next`	any command; the following command in a `CommandList`.

Only these four fields are needed for the 4 tests. Others are deferred.

Span capture

The parser already tracks start offsets via its token cursor; collect-sx-source shows the end-substring pattern. Pattern for every production:

(let ((start (current-pos))
      (start-line (current-line)))
  (let ((raw (… existing production …)))
    (let ((end (previous-pos)))
      (hs-ast-wrap raw :kind "…" :start start :end end :line start-line :src src))))

Two tiny helpers (current-pos, current-line) added to the parser's inner let scope. hs-ast-wrap lives alongside collect-sx-source.

4. Test mock / generator strategy

Add one pattern to tests/playwright/generate-sx-tests.py (cluster: sourceInfo). Recognise:

_hyperscript.parse("SRC").sourceFor()                 → (hs-src "SRC")
_hyperscript.parse("SRC").root.sourceFor()            → (hs-src-at "SRC" (list :root))
_hyperscript.parse("SRC").root.root.sourceFor()       → (hs-src-at "SRC" (list :root :root))
_hyperscript.parse("SRC").lhs.sourceFor()             → (hs-src-at "SRC" (list :lhs))
_hyperscript.parse("SRC").rhs.sourceFor()             → (hs-src-at "SRC" (list :rhs))
_hyperscript.parse("SRC").lineFor()                   → (hs-line-at "SRC" (list))
_hyperscript.parse("SRC").trueBranch.lineFor()        → (hs-line-at "SRC" (list :true-branch))
_hyperscript.parse("SRC").trueBranch.next.lineFor()   → (hs-line-at "SRC" (list :true-branch :next))

Object-returning patterns (return { src: …, rootSrc: … }) become one assert= per member. The generator already has the newline escaping infrastructure for string bodies (cluster 17 etc. exercised it).

No mock-DOM changes required — SourceInfo does not touch the DOM. hs-cleanup! is unused here.

5. Test-delta estimate

Test	Sub-assertions	Blockers today	Delta
`debug`	1	Parser must accept `<button.foo/>` as a full expression (already does — it's a CSS-literal). Needs `sourceFor`.	+1
`get source works for expressions`	~9	Adds binary operator span (`x + y`) and nested-member navigation (`.root.root`).	+1 (one test, all assertions must pass)
`get source works for statements`	2	Needs statement-level span; `if … log …` and `for … end` already parse.	+1
`get line works for statements`	3	Needs `:line`, `:true-branch`, `:next` field navigation, and the `lineFor` semantics (newline-indexed string split, not just the node's own source slice).	+1

Total: +4 (matches the plan's cluster row).

6. Risks

AST equality. Wrapping every parser node in a dict changes equal? semantics for any caller that does structural comparison on AST output. Mitigation: the compiler's entry unwrap means all downstream code sees the bare form. Only new hs-parse-ast callers see the wrapped form. Direct hs-parse/hs-compile/hs-to-sx-from-source keep their existing return shape.
Serialisation. If AST nodes are ever sent over the wire (they are not today, but the spec/tests runner serialises results for error printing), the wrapper dict grows the payload. Mitigation: keep :src as a reference to the shared program source string (one copy) rather than slicing per node; SX dicts share values.
Memory. One extra dict per node. The parser currently allocates a list per node; we double that. For the largest test program (for x in [1, 2, 3] log x then log x end) this is ~15 nodes. Negligible.
lineFor off-by-one. Upstream uses programSource.split("\n")[startToken.line - 1] and counts lines from 1. Our current-line must mirror exactly — increment after \n, first line is 1. Unit-test the tokenizer on the "if true\n log …\n log …" fixture before wiring the parser.
Operator associativity and .root. Upstream's a.b() gives .root = (a.b) and .root.root = a. Our parser must record the callee sub-expression as :root of a call node, and the receiver as :root of a member node. A one-liner slip here would fail test 2 silently.

7. Implementation checklist

Four commits. Each commit passes the baseline smoke range (0–195) before moving on.

Tokenizer: add :end and :line to tokens. Extend hs-make-token; track current-line in hs-tokenize; update every emission site (there are ~20). No parser changes yet. Unit-test via a small ad-hoc deftest in the tokenizer's own test fixture (or inline in behavioral.sx under a throwaway suite — remove before commit). Commit: HS: tokenizer tracks :end and :line.
Parser: wrap output nodes with span dict + fields. Introduce hs-ast-wrap, current-pos, current-line. Wrap expression and statement productions. Populate :root :lhs :rhs :true-branch :next for the handful of node shapes the tests exercise. Add entry-unwrap to hs-to-sx so downstream consumers are unaffected. Commit: HS: parser attaches source spans to AST nodes.
API: hs-parse-ast, hs-source-for, hs-line-for, hs-node-get + test helpers hs-src, hs-src-at, hs-line-at. Thin functions. Place hs-parse-ast in parser.sx, accessors in runtime.sx (so they're auto-loaded by the behavioral runner), helpers inline in test-hyperscript-behavioral.sx via the generator. Commit: HS: sourceInfo API (sourceFor / lineFor / node-get).
Generator: sourceInfo pattern + regenerate 4 tests. Add the pattern matchers from §4 to generate-sx-tests.py. Regenerate spec/tests/test-hyperscript-behavioral.sx. Verify hs-upstream-core/sourceInfo goes from 0/4 to 4/4 and no regression in the 0–195 smoke range. Remember: cp lib/hyperscript/<f>.sx shared/static/wasm/sx/hs-<f>.sx after each .sx touch. Commit: HS: sourceInfo (+4 tests).

Notes

No runtime changes. SourceInfo is purely a parser-side facility.
No changes to the HS DSL grammar. get line / get source are not added as hyperscript keywords — the upstream test file exclusively calls host-side methods on parse-tree objects.
Upstream's impl is 7 lines of host JS. Ours lands in about 30 lines of SX plus a generator pattern.

12 KiB Raw Blame History Unescape Escape