Files
rose-ash/plans/haskell-on-sx.md
giles cab7ca883f
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Has been cancelled
haskell: operator sections + list comprehensions, Phase 1 parser complete (+22 tests, 211/211)
2026-04-24 20:47:51 +00:00

328 lines
21 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Haskell-on-SX: mini-Haskell with real laziness
Mini-Haskell is the research-paper-worthy demo. Laziness is native to the SX runtime (thunks are already a first-class type); algebraic data types map onto tagged lists; typeclasses map onto dictionary passing; IO maps onto `perform`/`resume`. Hindley-Milner inference is the one real piece of new work.
End-state goal: a **Haskell 98 subset** that runs the small classic programs (sieve of Eratosthenes lazy stream, fibonacci as infinite list, naive quicksort, n-queens, expression evaluator) plus a ~150-test corpus.
## Scope decisions (defaults — override)
- **Standard:** Haskell 98 subset. No GHC extensions (no `DataKinds`, no `GADTs`, no `TypeFamilies`, no `TemplateHaskell`).
- **Phase 1-3 are untyped** — we get the evaluator right first with laziness + ADTs, then add HM inference in phase 4. This is deliberate: typing is the hard bit and will take a full phase on its own.
- **Typeclasses:** dictionary passing, no overlap, no orphan instances. Added in phase 5.
- **Layout rule:** yes — phase 1 implements Haskell's indentation-sensitive parsing (painful but required).
- **Test corpus:** custom. No GHC test suite. Bundle classic programs + ~100 hand-written expression-level tests + mini Prelude tests.
## Ground rules
- **Scope:** only `lib/haskell/**` and `plans/haskell-on-sx.md`. No edits to `spec/`, `hosts/`, `shared/`, or other language dirs.
- **SX files:** `sx-tree` MCP tools only.
- **Architecture:** Haskell source → AST → desugared-core → SX AST → CEK. Thunks on the SX side provide laziness natively.
- **Commits:** one feature per commit. Keep `## Progress log` updated.
## Architecture sketch
```
Haskell source
lib/haskell/tokenizer.sx — idents, operators, layout-sensitive indentation
lib/haskell/parser.sx — AST: modules, data decls, type sigs, fn clauses, expressions
lib/haskell/desugar.sx — surface → core: case-of-case, do-notation, list comp, guards
lib/haskell/transpile.sx — core → SX AST, wrapping everything in thunks for laziness
lib/haskell/runtime.sx — force, ADT constructors, Prelude, typeclass dicts (phase 5+)
existing CEK / VM
```
Key mappings:
- **Laziness** = every function argument is an SX thunk; `force` is WHNF reduction. SX already has `make-thunk` from the trampolining evaluator — we reuse it.
- **Pattern match** = forces the scrutinee to WHNF, then structural match on the tag
- **ADT** = `data Maybe a = Nothing | Just a` compiles to tagged lists: `(:Nothing)` and `(:Just <thunk>)`
- **Typeclass** = each class becomes a record type; each instance becomes a record value; each method becomes a projection; the elaborator inserts the dict at each call site (phase 5)
- **IO** = `IO a` is a function `World -> (a, World)` internally; in practice uses `perform`/`resume` for actual side effects
- **Layout** = offside rule; inserted virtual braces + semis during a lexer-parser feedback pass
## Roadmap
### Phase 1 — tokenizer + parser + layout rule
- [x] Tokenizer: reserved words, qualified names, operators, numbers (int, float, Rational later), chars/strings, comments (`--` and `{-` nested)
- [x] Layout algorithm: turn indentation into virtual `{`, `;`, `}` tokens per Haskell 98 §10.3
- Parser (split into sub-items — implement one per iteration):
- [x] Expressions: atoms, parens, tuples, lists, ranges, application, infix with full Haskell-98 precedence table, unary `-`, backtick operators, lambdas, `if`, `let`
- [x] `case … of` and `do`-notation expressions (plus minimal patterns needed for arms/binds: var, wildcard, literal, 0-arity and applied constructor, tuple, list)
- [x] Patterns — full: `as` patterns, nested, negative literal, `~` lazy, infix constructor (`:` / consym), extend lambdas/let with non-var patterns
- [x] Top-level decls: function clauses (simple — no guards/where yet), pattern bindings, multi-name type signatures, `data` with type vars and recursive constructors, `type` synonyms, `newtype`, fixity (`infix`/`infixl`/`infixr` with optional precedence, comma-separated ops, backtick names). Types: vars / constructors / application / `->` (right-assoc) / tuples / lists. `hk-parse-top` entry.
- [x] `where` clauses + guards (on fun-clauses, case alts, and let/do-let bindings — with the let funclause shorthand `let f x = …` now supported)
- [x] Module header + imports — `module NAME [exports] where …`, qualified/as/hiding/explicit imports, operator exports, `module Foo` exports, dotted names, headerless-with-imports
- [x] List comprehensions + operator sections — `(op)` / `(op e)` / `(e op)` (excluding `-` from right sections), `[e | q1, q2, …]` with `q-gen` / `q-guard` / `q-let` qualifiers
- [x] AST design modelled on GHC's HsSyn at a surface level — keyword-tagged lists cover modules/imports/decls/types/patterns/expressions; see parser.sx docstrings for the full node catalogue
- [x] Unit tests in `lib/haskell/tests/parse.sx` (43 tokenizer tests, all green)
### Phase 2 — desugar + eager-ish eval + ADTs (untyped)
- [ ] Desugar: guards → nested `if`s; `where``let`; list comp → `concatMap`-based; do-notation stays for now (desugared in phase 3)
- [ ] `data` declarations register constructors in runtime
- [ ] Pattern match (tag-based, value-level): atoms, vars, wildcards, constructor patterns, `as` patterns, nested
- [ ] Evaluator (still strict internally — laziness in phase 3): `let`, `lambda`, application, `case`, literals, constructors
- [ ] 30+ eval tests in `lib/haskell/tests/eval.sx`
### Phase 3 — laziness + classic programs
- [ ] Transpile to thunk-wrapped SX: every application arg becomes `(make-thunk (lambda () <arg>))`
- [ ] `force` = SX eval-thunk-to-WHNF primitive
- [ ] Pattern match forces scrutinee before matching
- [ ] Infinite structures: `repeat x`, `iterate f x`, `[1..]`, Fibonacci stream, sieve of Eratosthenes
- [ ] `seq`, `deepseq` from Prelude
- [ ] Do-notation for a stub `IO` monad (just threading, no real side effects yet)
- [ ] Classic programs in `lib/haskell/tests/programs/`:
- [ ] `fib.hs` — infinite Fibonacci stream
- [ ] `sieve.hs` — lazy sieve of Eratosthenes
- [ ] `quicksort.hs` — naive QS
- [ ] `nqueens.hs`
- [ ] `calculator.hs` — parser combinator style expression evaluator
- [ ] `lib/haskell/conformance.sh` + runner; `scoreboard.json` + `scoreboard.md`
- [ ] Target: 5/5 classic programs passing
### Phase 4 — Hindley-Milner inference
- [ ] Algorithm W: unification + type schemes + generalisation + instantiation
- [ ] Report type errors with meaningful positions
- [ ] Reject untypeable programs that phase 3 was accepting
- [ ] Type-sig checking: user writes `f :: Int -> Int`; verify
- [ ] Let-polymorphism
- [ ] Unit tests: inference for 50+ expressions
### Phase 5 — typeclasses (dictionary passing)
- [ ] `class` / `instance` declarations
- [ ] Dictionary-passing elaborator: inserts dict args at call sites
- [ ] Standard classes: `Eq`, `Ord`, `Show`, `Num`, `Functor`, `Monad`, `Applicative`
- [ ] `deriving (Eq, Show)` for ADTs
### Phase 6 — real IO + Prelude completion
- [ ] Real `IO` monad backed by `perform`/`resume`
- [ ] `putStrLn`, `getLine`, `readFile`, `writeFile`, `print`
- [ ] Full-ish Prelude: `Maybe`, `Either`, `List` functions, `Map`-lite
- [ ] Drive scoreboard toward 150+ passing
## Progress log
_Newest first._
- **2026-04-24** — Phase 1 parser is now complete. This iteration adds
operator sections and list comprehensions, the two remaining
aexp-level forms, plus ticks the “AST design” item (the keyword-
tagged list shape has accumulated a full HsSyn-level surface).
Changes:
- `hk-parse-infix` now bails on `op )` without consuming the op, so
the paren parser can claim it as a left section.
- `hk-parse-parens` rewritten to recognise five new forms:
`()` (unit), `(op)``(:var OP)`, `(op e)``(:sect-right OP E)`
(excluded for `-` so that `(- 5)` stays `(:neg 5)`), `(e op)`
`(:sect-left OP E)`, plus regular parens and tuples. Works for
varsym, consym, reservedop `:`, and backtick-quoted varids.
- `hk-section-op-info` inspects the current token and returns a
`{:name :len}` dict, so the same logic handles 1-token ops and
3-token backtick ops uniformly.
- `hk-parse-list-lit` now recognises a `|` after the first element
and dispatches to `hk-parse-qual` per qualifier (comma-separated),
producing `(:list-comp EXPR QUALS)`. Qualifiers are:
`(:q-gen PAT EXPR)` when a paren-balanced lookahead
(`hk-comp-qual-is-gen?`) finds `<-` before the next `,`/`]`,
`(:q-let BINDS)` for `let …`, and `(:q-guard EXPR)` otherwise.
- `hk-parse-comp-let` accepts `]` or `,` as an implicit block close
(single-line comprehensions never see layout's vrbrace before the
qualifier terminator arrives); explicit `{ }` still closes
strictly.
22 new tests in `lib/haskell/tests/parser-sect-comp.sx` cover
op-references (inc. `(-)`, `(:)`, backtick), right sections (inc.
backtick), left sections, the `(- 5)``:neg` corner, plain parens
and tuples, six comprehension shapes (simple, filter, let,
nested-generators, constructor pattern bind, tuple pattern bind,
and a three-qualifier mix). 211/211 green.
- **2026-04-24** — Phase 1: module header + imports. Added
`hk-parse-module-header`, `hk-parse-import`, plus shared helpers for
import/export entity lists (`hk-parse-ent`, `hk-parse-ent-member`,
`hk-parse-ent-list`). New AST:
- `(:module NAME EXPORTS IMPORTS DECLS)` — NAME `nil` means no header,
EXPORTS `nil` means no export list (distinct from empty `()`)
- `(:import QUALIFIED NAME AS SPEC)` — QUALIFIED bool, AS alias or nil,
SPEC nil / `(:spec-items ENTS)` / `(:spec-hiding ENTS)`
- Entity refs: `:ent-var`, `:ent-all` (`Tycon(..)`), `:ent-with`
(`Tycon(m1, m2, …)`), `:ent-module` (exports only).
`hk-parse-program` now dispatches on the leading token: `module`
keyword → full header-plus-body parse (consuming the `where` layout
brace around the module body); otherwise collect any leading
`import` decls and then remaining decls with the existing logic.
The outer shell is `(:module …)` as soon as any header or import is
present, and stays as `(:program DECLS)` otherwise — preserving every
previous test expectation untouched. Handles operator exports `((+:))`,
dotted module names (`Data.Map`), and the Haskell-98 context-sensitive
keywords `qualified`/`as`/`hiding` (all lexed as ordinary varids and
matched only in import position). 16 new tests in
`lib/haskell/tests/parser-module.sx` covering simple/exports/empty
headers, dotted names, operator exports, `module Foo` exports,
qualified/aliased/items/hiding imports, and a headerless-with-imports
file. 189/189 green.
- **2026-04-24** — Phase 1: guards + where clauses. Factored a single
`hk-parse-rhs sep` that all body-producing sites now share: it reads
a plain `sep expr` body or a chain of `| cond sep expr` guards, then
— regardless of which form — looks for an optional `where` block and
wraps accordingly. AST additions:
- `:guarded GUARDS` where each GUARD is `:guard COND EXPR`
- `:where BODY DECLS` where BODY is a plain expr or a `:guarded`
Both can nest (guards inside where). `hk-parse-alt` now routes through
`hk-parse-rhs "->"`, `hk-parse-fun-clause` and `hk-parse-bind` through
`hk-parse-rhs "="`. `hk-parse-where-decls` reuses `hk-parse-decl` so
where-blocks accept any decl form (signatures, fixity, nested funs).
As a side effect, `hk-parse-bind` now also picks up the Haskell-native
`let f x = …` funclause shorthand: a varid followed by one or more
apats produces `(:fun-clause NAME APATS BODY)` instead of a
`(:bind (:p-var …) …)` — keeping the simple `let x = e` shape
unchanged for existing tests. 11 new tests in
`lib/haskell/tests/parser-guards-where.sx` cover two- and three-way
guards, mixed guarded + equality clauses, single- and multi-binding
where blocks, guards plus where, case-alt guards, case-alt where,
let with funclause shorthand, let with guards, and a where containing
a type signature alongside a fun-clause. 173/173 green.
- **2026-04-24** — Phase 1: top-level decls. Refactored `hk-parse-expr` into a
`hk-parser tokens mode` with `:expr` / `:module` dispatch so the big lexical
state is shared (peek/advance/pat/expr helpers all reachable); added public
wrappers `hk-parse-expr`, `hk-parse-module`, and source-level entry
`hk-parse-top`. New type parser (`hk-parse-type` / `hk-parse-btype` /
`hk-parse-atype`): type variables (`:t-var`), type constructors (`:t-con`),
type application (`:t-app`, left-assoc), right-associative function arrow
(`:t-fun`), unit/tuples (`:t-tuple`), and lists (`:t-list`). New decl parser
(`hk-parse-decl` / `hk-parse-program`) producing a `(:program DECLS)` shell:
- `:type-sig NAMES TYPE` — comma-separated multi-name support
- `:fun-clause NAME APATS BODY` — patterns for args, body via existing expr
- `:pat-bind PAT BODY` — top-level pattern bindings like `(a, b) = pair`
- `:data NAME TVARS CONS` with `:con-def CNAME FIELDS` for nullary and
multi-arg constructors, including recursive references
- `:type-syn NAME TVARS TYPE`, `:newtype NAME TVARS CNAME FIELD`
- `:fixity ASSOC PREC OPS` — assoc one of `"l"`/`"r"`/`"n"`, default prec 9,
comma-separated operator names, including backtick-quoted varids.
Sig vs fun-clause disambiguated by a paren-balanced top-level scan for
`::` before the next `;`/`}` (`hk-has-top-dcolon?`). 24 new tests in
`lib/haskell/tests/parser-decls.sx` cover all decl forms, signatures with
application / tuples / lists / right-assoc arrows, nullary and recursive
data types, multi-clause functions, and a mixed program with data + type-
synonym + signature + two function clauses. Not yet: guards, where
clauses, module header, imports, deriving, contexts, GADTs. 162/162 green.
- **2026-04-24** — Phase 1: full patterns. Added `as` patterns
(`name@apat``(:p-as NAME PAT)`), lazy patterns (`~apat`
`(:p-lazy PAT)`), negative literal patterns (`-N` / `-F` resolving
eagerly in the parser so downstream passes see a plain `(:p-int -1)`),
and infix constructor patterns via a right-associative single-band
layer on top of `hk-parse-pat-lhs` for any `consym` or reservedop `:`
(so `x : xs` parses as `(:p-con ":" [x, xs])`, `a :+: b` likewise).
Extended `hk-apat-start?` with `-` and `~` so the pattern-argument
loops in lambdas and constructor applications pick these up.
Lambdas now parse apat parameters instead of bare varids — so the
`:lambda` AST is `(:lambda APATS BODY)` with apats as pattern nodes.
`hk-parse-bind` became a plain `pat = expr` form, so `:bind` now has
a pattern LHS throughout (simple `x = 1``(:bind (:p-var "x") …)`);
this picks up `let (x, y) = pair in …` and `let Just x = m in x`
automatically, and flows through `do`-notation lets. Eight existing
tests updated to the pattern-flavoured AST. Also fixed a pragmatic
layout issue that surfaced in multi-line `let`s: when a layout-indent
would emit a spurious `;` just before an `in` token (because the
let block had already been closed by dedent), `hk-peek-next-reserved`
now lets the layout pass skip that indent and leave closing to the
existing `in` handler. 18 new tests in
`lib/haskell/tests/parser-patterns.sx` cover every pattern variant,
lambda with mixed apats, let pattern-bindings (tuple / constructor /
cons), and do-bind with a tuple pattern. 138/138 green.
- **2026-04-24** — Phase 1: `case … of` and `do`-notation parsers. Added `hk-parse-case`
/ `hk-parse-alt`, `hk-parse-do` / `hk-parse-do-stmt` / `hk-parse-do-let`, plus the
minimal pattern language needed to make arms and binds meaningful:
`hk-parse-apat` (var, wildcard `_`, int/float/string/char literal, 0-arity
conid/qconid, paren+tuple, list) and `hk-parse-pat` (conid applied to
apats greedily). AST nodes: `:case SCRUT ALTS`, `:alt PAT BODY`, `:do STMTS`
with stmts `:do-expr E` / `:do-bind PAT E` / `:do-let BINDS`, and pattern
tags `:p-wild` / `:p-int` / `:p-float` / `:p-string` / `:p-char` / `:p-var`
/ `:p-con NAME ARGS` / `:p-tuple` / `:p-list`. `do`-stmts disambiguate
`pat <- e` vs bare expression with a forward paren/bracket/brace-balanced
scan for `<-` before the next `;`/`}` — no backtracking, no AST rewrite.
`case` and `do` accept both implicit (`vlbrace`/`vsemi`/`vrbrace`) and
explicit braces. Added to `hk-parse-lexp` so they participate fully in
operator-precedence expressions. 19 new tests in
`lib/haskell/tests/parser-case-do.sx` cover every pattern variant,
explicit-brace `case`, expression scrutinees, do with bind/let/expr,
multi-binding `let` in `do`, constructor patterns in binds, and
`case`/`do` nested inside `let` and lambda. The full pattern item (as
patterns, negative literals, `~` lazy, lambda/let pattern extension)
remains a separate sub-item. 119/119 green.
- **2026-04-24** — Phase 1: expression parser (`lib/haskell/parser.sx`, ~380 lines).
Pratt-style precedence climbing against a Haskell-98-default op table (24
operators across precedence 09, left/right/non assoc, default infixl 9 for
anything unlisted). Supports literals (int/float/string/char), varid/conid
(qualified variants folded into `:var` / `:con`), parens / unit / tuples,
list literals, ranges `[a..b]` and `[a,b..c]`, left-associative application,
unary `-`, backtick operators (`x \`mod\` 3`), lambdas, `if-then-else`, and
`let … in` consuming both virtual and explicit braces. AST uses keyword
tags (`:var`, `:op`, `:lambda`, `:let`, `:bind`, `:tuple`, `:range`,
`:range-step`, `:app`, `:neg`, `:if`, `:list`, `:int`, `:float`, `:string`,
`:char`, `:con`). The parser skips a leading `vlbrace` / `lbrace` so it can
be called on full post-layout output, and uses a `raise`-based error channel
with location-lite messages. 42 new tests in `lib/haskell/tests/parser-expr.sx`
cover literals, identifiers, parens/tuple/unit, list + range, app associativity,
operator precedence (mul over add, cons right-assoc, function-composition
right-assoc, `$` lowest), backtick ops, unary `-`, lambda multi-param,
`if` with infix condition, single- and multi-binding `let` (both implicit
and explicit braces), plus a few mixed nestings. 100/100 green.
- **2026-04-24** — Phase 1: layout algorithm (`lib/haskell/layout.sx`, ~260 lines)
implementing Haskell 98 §10.3. Two-pass design: a pre-pass augments the raw
token stream with explicit `layout-open` / `layout-indent` markers (suppressing
`<n>` when `{n}` already applies, per note 3), then an L pass consumes the
augmented stream against a stack of implicit/explicit layout contexts and
emits `vlbrace` / `vsemi` / `vrbrace` tokens; newlines are dropped. Supports
the initial module-level implicit open (skipped when the first token is
`module` or `{`), the four layout keywords (`let`/`where`/`do`/`of`), explicit
braces disabling layout, dedent closing nested implicit blocks while also
emitting `vsemi` at the enclosing level, and the pragmatic single-line
`let … in` rule (emit `}` when `in` meets an implicit let). 15 new tests
in `lib/haskell/tests/layout.sx` cover module-start, do/let/where/case/of,
explicit braces, multi-level dedent, line continuation, and EOF close-down.
Shared test helpers moved to `lib/haskell/testlib.sx` so both test files
can share one `hk-test`. `test.sh` preloads tokenizer + layout + testlib.
58/58 green.
- **2026-04-24** — Phase 1: Haskell 98 tokenizer (`lib/haskell/tokenizer.sx`, 490 lines)
covering idents (lower/upper/qvarid/qconid), 23 reserved words, 11 reserved ops,
varsym/consym operator chains, integer/hex/octal/float literals incl. exponent
notation, char + string literals with escape sequences, nested `{- ... -}` block
comments with depth counter, `-- ... EOL` line comments (respecting the
"followed by symbol = not a comment" Haskell 98 rule), backticks, punctuation,
and explicit `newline` tokens for the upcoming layout pass. 43 structural tests
in `lib/haskell/tests/parse.sx`, a lightweight `hk-deep=?` equality helper
and a custom `lib/haskell/test.sh` runner (pipes through the OCaml epoch
protocol, falls back to the main-repo build when run from a worktree). 43/43
green.
Also peeked at `/root/rose-ash/sx-haskell/` per briefing: that directory is a
Haskell program implementing an **SX interpreter** (Types.hs, Eval.hs,
Primitives.hs, etc. — ~2800 lines of .hs) — the *opposite* direction from this
project. Nothing to fold in.
Gotchas hit: `emit!` and `peek` are SX evaluator special forms, so every local
helper uses the `hk-` prefix. `cond`/`when`/`let` clauses evaluate ONLY the
last expression; multi-expression bodies MUST be wrapped in `(do ...)`. These
two together account for all the tokenizer's early crashes.
## Blockers
- _(none yet)_