HS-design: E38 SourceInfo API

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 06:55:30 +00:00
parent df8913e9a1
commit 67d4b9dae5
1 changed files with 144 additions and 0 deletions
--- a/plans/designs/e38-sourceinfo.md
+++ b/plans/designs/e38-sourceinfo.md
@@ -0,0 +1,144 @@
+# E38 — SourceInfo API (design)
+
+Cluster 38 of `plans/hs-conformance-to-100.md`. Goal: 4 tests in `hs-upstream-core/sourceInfo` that exercise `_hyperscript.parse(src).sourceFor()` and `.lineFor()`.
+
+Upstream reference: `/tmp/hs-upstream/test/core/sourceInfo.js`, `/tmp/hs-upstream/src/parsetree/base.js` (29 lines of impl — `sourceFor()` slices `programSource[startToken.start..endToken.end]`, `lineFor()` returns `programSource.split("\n")[startToken.line-1]`).
+
+## 1. Failing tests
+
+All four currently `SKIP (untranslated)` (lines 2434–2442 of `spec/tests/test-hyperscript-behavioral.sx`).
+
+| # | Test name | What it asserts |
+|---|-----------|-----------------|
+| 1 | `debug` | `parse("<button.foo/>").sourceFor() == "<button.foo/>"` — single-token round-trip. |
+| 2 | `get source works for expressions` | 7 separate `parse(…).sourceFor()` checks over `1`, `a.b`, `a.b()`, `<button.foo/>`, `x + y`, `'foo'`, `.foo`, `#bar`. Also navigates: `elt.root.sourceFor()` ⇒ `"a"` for `"a.b"`; `elt.root.root` for `"a.b()"`; `elt.lhs`/`elt.rhs` for `"x + y"`. |
+| 3 | `get source works for statements` | `if true log 'it was true'` and `for x in [1, 2, 3] log x then log x end` each round-trip through `sourceFor()`. |
+| 4 | `get line works for statements` | `parse("if true\n  log 'it was true'\n    log 'it was true'")` — `elt.lineFor()` ⇒ `"if true"`, `elt.trueBranch.lineFor()` ⇒ `"  log 'it was true'"`, `elt.trueBranch.next.lineFor()` ⇒ `"    log 'it was true'"`. |
+
+Key demand: the AST must (a) retain a `{start, end, line}` span per node; (b) expose navigable sub-nodes (`root`, `lhs`, `rhs`, `trueBranch`, `next`); (c) provide `sourceFor`/`lineFor` keyed off the original program source.
+
+## 2. Proposed API
+
+User-visible surface, kept minimal:
+
+```
+(hs-parse-ast "SRC")                  ; → parsed node (an AST handle, see §3)
+(hs-source-for NODE)                  ; → substring of original source
+(hs-line-for   NODE)                  ; → full source line containing NODE's start
+(hs-node-get   NODE KEY)              ; → child AST node at field (root / lhs / rhs / true-branch / next …)
+```
+
+`NODE` is a **parsed-but-uncompiled** AST. It is not a compiled handler, not a runtime event. The upstream API mirrors this: `_hyperscript.parse(src)` returns a parse tree, never a closure. Keeping the feature scoped to parser output avoids retro-fitting spans onto bytecode or closures.
+
+For the generator's benefit we expose two thin helpers at the test layer only:
+
+```
+(hs-src src)                          ; = (hs-source-for (hs-parse-ast src))
+(hs-src-at src field-path)            ; = walk (hs-node-get … key) then source-for
+```
+
+We do **not** add `(get line thing)` as a DSL keyword. That phrase in the plan row was shorthand — the tests actually call host methods `.sourceFor()` / `.lineFor()`, not hyperscript statements. Keeping this out of the HS grammar keeps the surface area near zero.
+
+## 3. Attach strategy
+
+The tokenizer and parser already have the raw material; the information is dropped at two points.
+
+### Walk-through
+
+| Stage | File | State today | Change |
+|-------|------|-------------|--------|
+| Tokenize | `lib/hyperscript/tokenizer.sx` | Tokens are `{:type T :value V :pos P}`. Only `start` offset tracked; no `end`, no `line`. | Extend `hs-make-token` → `{:type :value :pos :end :line}`. Track a `current-line` counter in `hs-tokenize` that increments on `\n`. `:end` = index after last consumed char. |
+| Parse | `lib/hyperscript/parser.sx` | `hs-parse` takes `(tokens src)`, returns bare SX lists/symbols. Source offsets are consumed internally (see `collect-sx-source` at path `(0 2 2 69)`) but never stored on the output AST. | For every production that returns a node, attach a span dict: wrap the output in `{:hs-ast true :kind … :start START :end END :line LINE :src SRC :children CHILDREN :fields FIELDS}`. Children preserve the SX list an `hs-compile` downstream currently consumes; `fields` is a small dict mapping `:root :lhs :rhs :true-branch :next …` to sub-nodes. |
+| Compile | `lib/hyperscript/compiler.sx` | `hs-to-sx` consumes the bare list AST and emits runtime calls. | Add a thin unwrap step at the entry: if the AST is a span-wrapped dict, pull `:children` (or equivalent raw list) and continue. No per-production rewiring — the wrapped form passes through unchanged for every existing callsite. |
+| Runtime | `lib/hyperscript/runtime.sx` | Compiled code never sees AST nodes. | No change. SourceInfo lives on the parse tree, not on compiled handlers. |
+
+### Side-channel vs inline
+
+**Inline wrapper dict is the cheaper option**, because:
+
+- Parser output is already heterogeneous (lists, symbols, strings, numbers). A dict wrapper is distinguishable by `(dict? x)` + `(dict-get x :hs-ast)` — no risk of collision.
+- A side-channel `(map node → span)` would need identity semantics, and SX lists don't have stable identity after any structural transform. We would end up cloning everything.
+- The compiler's existing `hs-to-sx` dispatch is on `(first ast)`. The unwrap step is a single `cond` branch at its top.
+
+### Field dictionary
+
+The parser emits nodes in many shapes. `:fields` names a handful of them so `hs-node-get` can navigate without the caller learning SX shape. Mapping (from the upstream tests):
+
+| Upstream accessor | Our field key | Produced by |
+|-------------------|---------------|-------------|
+| `.root` | `:root` | symbol-with-member / call expressions (`a.b`, `a.b()`). For `a.b` the root is `a`; for `a.b()` the root is `a.b`. |
+| `.lhs` / `.rhs` | `:lhs` / `:rhs` | binary operators (`x + y`). |
+| `.trueBranch` | `:true-branch` | `if` command; the first command in the consequent. |
+| `.next` | `:next` | any command; the following command in a `CommandList`. |
+
+Only these four fields are needed for the 4 tests. Others are deferred.
+
+### Span capture
+
+The parser already tracks start offsets via its token cursor; `collect-sx-source` shows the end-substring pattern. Pattern for every production:
+
+```
+(let ((start (current-pos))
+      (start-line (current-line)))
+  (let ((raw (… existing production …)))
+    (let ((end (previous-pos)))
+      (hs-ast-wrap raw :kind "…" :start start :end end :line start-line :src src))))
+```
+
+Two tiny helpers (`current-pos`, `current-line`) added to the parser's inner `let` scope. `hs-ast-wrap` lives alongside `collect-sx-source`.
+
+## 4. Test mock / generator strategy
+
+Add one pattern to `tests/playwright/generate-sx-tests.py` (cluster: sourceInfo). Recognise:
+
+```js
+_hyperscript.parse("SRC").sourceFor()                 → (hs-src "SRC")
+_hyperscript.parse("SRC").root.sourceFor()            → (hs-src-at "SRC" (list :root))
+_hyperscript.parse("SRC").root.root.sourceFor()       → (hs-src-at "SRC" (list :root :root))
+_hyperscript.parse("SRC").lhs.sourceFor()             → (hs-src-at "SRC" (list :lhs))
+_hyperscript.parse("SRC").rhs.sourceFor()             → (hs-src-at "SRC" (list :rhs))
+_hyperscript.parse("SRC").lineFor()                   → (hs-line-at "SRC" (list))
+_hyperscript.parse("SRC").trueBranch.lineFor()        → (hs-line-at "SRC" (list :true-branch))
+_hyperscript.parse("SRC").trueBranch.next.lineFor()   → (hs-line-at "SRC" (list :true-branch :next))
+```
+
+Object-returning patterns (`return { src: …, rootSrc: … }`) become one `assert=` per member. The generator already has the newline escaping infrastructure for string bodies (cluster 17 etc. exercised it).
+
+No mock-DOM changes required — SourceInfo does not touch the DOM. `hs-cleanup!` is unused here.
+
+## 5. Test-delta estimate
+
+| Test | Sub-assertions | Blockers today | Delta |
+|------|----------------|----------------|-------|
+| `debug` | 1 | Parser must accept `<button.foo/>` as a full expression (already does — it's a CSS-literal). Needs `sourceFor`. | +1 |
+| `get source works for expressions` | ~9 | Adds binary operator span (`x + y`) and nested-member navigation (`.root.root`). | +1 (one test, all assertions must pass) |
+| `get source works for statements` | 2 | Needs statement-level span; `if … log …` and `for … end` already parse. | +1 |
+| `get line works for statements` | 3 | Needs `:line`, `:true-branch`, `:next` field navigation, and the `lineFor` semantics (newline-indexed string split, not just the node's own source slice). | +1 |
+
+Total: **+4** (matches the plan's cluster row).
+
+## 6. Risks
+
+- **AST equality.** Wrapping every parser node in a dict changes `equal?` semantics for any caller that does structural comparison on AST output. Mitigation: the compiler's entry unwrap means all downstream code sees the bare form. Only new `hs-parse-ast` callers see the wrapped form. Direct `hs-parse`/`hs-compile`/`hs-to-sx-from-source` keep their existing return shape.
+- **Serialisation.** If AST nodes are ever sent over the wire (they are not today, but the `spec/tests` runner serialises results for error printing), the wrapper dict grows the payload. Mitigation: keep `:src` as a reference to the shared program source string (one copy) rather than slicing per node; SX dicts share values.
+- **Memory.** One extra dict per node. The parser currently allocates a list per node; we double that. For the largest test program (`for x in [1, 2, 3] log x then log x end`) this is ~15 nodes. Negligible.
+- **`lineFor` off-by-one.** Upstream uses `programSource.split("\n")[startToken.line - 1]` and counts lines from 1. Our `current-line` must mirror exactly — increment *after* `\n`, first line is `1`. Unit-test the tokenizer on the `"if true\n  log …\n    log …"` fixture before wiring the parser.
+- **Operator associativity and `.root`.** Upstream's `a.b()` gives `.root = (a.b)` and `.root.root = a`. Our parser must record the callee sub-expression as `:root` of a call node, and the receiver as `:root` of a member node. A one-liner slip here would fail test 2 silently.
+
+## 7. Implementation checklist
+
+Four commits. Each commit passes the baseline smoke range (0–195) before moving on.
+
+1. **Tokenizer: add `:end` and `:line` to tokens.** Extend `hs-make-token`; track `current-line` in `hs-tokenize`; update every emission site (there are ~20). No parser changes yet. Unit-test via a small ad-hoc `deftest` in the tokenizer's own test fixture (or inline in `behavioral.sx` under a throwaway suite — remove before commit). Commit: `HS: tokenizer tracks :end and :line`.
+
+2. **Parser: wrap output nodes with span dict + fields.** Introduce `hs-ast-wrap`, `current-pos`, `current-line`. Wrap expression and statement productions. Populate `:root :lhs :rhs :true-branch :next` for the handful of node shapes the tests exercise. Add entry-unwrap to `hs-to-sx` so downstream consumers are unaffected. Commit: `HS: parser attaches source spans to AST nodes`.
+
+3. **API: `hs-parse-ast`, `hs-source-for`, `hs-line-for`, `hs-node-get` + test helpers `hs-src`, `hs-src-at`, `hs-line-at`.** Thin functions. Place `hs-parse-ast` in `parser.sx`, accessors in `runtime.sx` (so they're auto-loaded by the behavioral runner), helpers inline in `test-hyperscript-behavioral.sx` via the generator. Commit: `HS: sourceInfo API (sourceFor / lineFor / node-get)`.
+
+4. **Generator: sourceInfo pattern + regenerate 4 tests.** Add the pattern matchers from §4 to `generate-sx-tests.py`. Regenerate `spec/tests/test-hyperscript-behavioral.sx`. Verify `hs-upstream-core/sourceInfo` goes from 0/4 to 4/4 and no regression in the 0–195 smoke range. Remember: `cp lib/hyperscript/<f>.sx shared/static/wasm/sx/hs-<f>.sx` after each `.sx` touch. Commit: `HS: sourceInfo (+4 tests)`.
+
+## Notes
+
+- No runtime changes. SourceInfo is purely a parser-side facility.
+- No changes to the HS DSL grammar. `get line` / `get source` are *not* added as hyperscript keywords — the upstream test file exclusively calls host-side methods on parse-tree objects.
+- Upstream's impl is 7 lines of host JS. Ours lands in about 30 lines of SX plus a generator pattern.