Files
rose-ash/plans/haskell-on-sx.md
giles 58dbbc5d8b
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Has been cancelled
haskell: full patterns — as/lazy/negative/infix + lambda & let pat LHS (+18 tests, 138/138)
2026-04-24 18:34:47 +00:00

224 lines
13 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Haskell-on-SX: mini-Haskell with real laziness
Mini-Haskell is the research-paper-worthy demo. Laziness is native to the SX runtime (thunks are already a first-class type); algebraic data types map onto tagged lists; typeclasses map onto dictionary passing; IO maps onto `perform`/`resume`. Hindley-Milner inference is the one real piece of new work.
End-state goal: a **Haskell 98 subset** that runs the small classic programs (sieve of Eratosthenes lazy stream, fibonacci as infinite list, naive quicksort, n-queens, expression evaluator) plus a ~150-test corpus.
## Scope decisions (defaults — override)
- **Standard:** Haskell 98 subset. No GHC extensions (no `DataKinds`, no `GADTs`, no `TypeFamilies`, no `TemplateHaskell`).
- **Phase 1-3 are untyped** — we get the evaluator right first with laziness + ADTs, then add HM inference in phase 4. This is deliberate: typing is the hard bit and will take a full phase on its own.
- **Typeclasses:** dictionary passing, no overlap, no orphan instances. Added in phase 5.
- **Layout rule:** yes — phase 1 implements Haskell's indentation-sensitive parsing (painful but required).
- **Test corpus:** custom. No GHC test suite. Bundle classic programs + ~100 hand-written expression-level tests + mini Prelude tests.
## Ground rules
- **Scope:** only `lib/haskell/**` and `plans/haskell-on-sx.md`. No edits to `spec/`, `hosts/`, `shared/`, or other language dirs.
- **SX files:** `sx-tree` MCP tools only.
- **Architecture:** Haskell source → AST → desugared-core → SX AST → CEK. Thunks on the SX side provide laziness natively.
- **Commits:** one feature per commit. Keep `## Progress log` updated.
## Architecture sketch
```
Haskell source
lib/haskell/tokenizer.sx — idents, operators, layout-sensitive indentation
lib/haskell/parser.sx — AST: modules, data decls, type sigs, fn clauses, expressions
lib/haskell/desugar.sx — surface → core: case-of-case, do-notation, list comp, guards
lib/haskell/transpile.sx — core → SX AST, wrapping everything in thunks for laziness
lib/haskell/runtime.sx — force, ADT constructors, Prelude, typeclass dicts (phase 5+)
existing CEK / VM
```
Key mappings:
- **Laziness** = every function argument is an SX thunk; `force` is WHNF reduction. SX already has `make-thunk` from the trampolining evaluator — we reuse it.
- **Pattern match** = forces the scrutinee to WHNF, then structural match on the tag
- **ADT** = `data Maybe a = Nothing | Just a` compiles to tagged lists: `(:Nothing)` and `(:Just <thunk>)`
- **Typeclass** = each class becomes a record type; each instance becomes a record value; each method becomes a projection; the elaborator inserts the dict at each call site (phase 5)
- **IO** = `IO a` is a function `World -> (a, World)` internally; in practice uses `perform`/`resume` for actual side effects
- **Layout** = offside rule; inserted virtual braces + semis during a lexer-parser feedback pass
## Roadmap
### Phase 1 — tokenizer + parser + layout rule
- [x] Tokenizer: reserved words, qualified names, operators, numbers (int, float, Rational later), chars/strings, comments (`--` and `{-` nested)
- [x] Layout algorithm: turn indentation into virtual `{`, `;`, `}` tokens per Haskell 98 §10.3
- Parser (split into sub-items — implement one per iteration):
- [x] Expressions: atoms, parens, tuples, lists, ranges, application, infix with full Haskell-98 precedence table, unary `-`, backtick operators, lambdas, `if`, `let`
- [x] `case … of` and `do`-notation expressions (plus minimal patterns needed for arms/binds: var, wildcard, literal, 0-arity and applied constructor, tuple, list)
- [x] Patterns — full: `as` patterns, nested, negative literal, `~` lazy, infix constructor (`:` / consym), extend lambdas/let with non-var patterns
- [ ] Top-level decls: function clauses, type signatures, `data`, `type`, `newtype`, fixity decls
- [ ] `where` clauses + guards
- [ ] Module header + imports (stub)
- [ ] List comprehensions + operator sections
- [ ] AST design modelled on GHC's HsSyn at a surface level
- [x] Unit tests in `lib/haskell/tests/parse.sx` (43 tokenizer tests, all green)
### Phase 2 — desugar + eager-ish eval + ADTs (untyped)
- [ ] Desugar: guards → nested `if`s; `where``let`; list comp → `concatMap`-based; do-notation stays for now (desugared in phase 3)
- [ ] `data` declarations register constructors in runtime
- [ ] Pattern match (tag-based, value-level): atoms, vars, wildcards, constructor patterns, `as` patterns, nested
- [ ] Evaluator (still strict internally — laziness in phase 3): `let`, `lambda`, application, `case`, literals, constructors
- [ ] 30+ eval tests in `lib/haskell/tests/eval.sx`
### Phase 3 — laziness + classic programs
- [ ] Transpile to thunk-wrapped SX: every application arg becomes `(make-thunk (lambda () <arg>))`
- [ ] `force` = SX eval-thunk-to-WHNF primitive
- [ ] Pattern match forces scrutinee before matching
- [ ] Infinite structures: `repeat x`, `iterate f x`, `[1..]`, Fibonacci stream, sieve of Eratosthenes
- [ ] `seq`, `deepseq` from Prelude
- [ ] Do-notation for a stub `IO` monad (just threading, no real side effects yet)
- [ ] Classic programs in `lib/haskell/tests/programs/`:
- [ ] `fib.hs` — infinite Fibonacci stream
- [ ] `sieve.hs` — lazy sieve of Eratosthenes
- [ ] `quicksort.hs` — naive QS
- [ ] `nqueens.hs`
- [ ] `calculator.hs` — parser combinator style expression evaluator
- [ ] `lib/haskell/conformance.sh` + runner; `scoreboard.json` + `scoreboard.md`
- [ ] Target: 5/5 classic programs passing
### Phase 4 — Hindley-Milner inference
- [ ] Algorithm W: unification + type schemes + generalisation + instantiation
- [ ] Report type errors with meaningful positions
- [ ] Reject untypeable programs that phase 3 was accepting
- [ ] Type-sig checking: user writes `f :: Int -> Int`; verify
- [ ] Let-polymorphism
- [ ] Unit tests: inference for 50+ expressions
### Phase 5 — typeclasses (dictionary passing)
- [ ] `class` / `instance` declarations
- [ ] Dictionary-passing elaborator: inserts dict args at call sites
- [ ] Standard classes: `Eq`, `Ord`, `Show`, `Num`, `Functor`, `Monad`, `Applicative`
- [ ] `deriving (Eq, Show)` for ADTs
### Phase 6 — real IO + Prelude completion
- [ ] Real `IO` monad backed by `perform`/`resume`
- [ ] `putStrLn`, `getLine`, `readFile`, `writeFile`, `print`
- [ ] Full-ish Prelude: `Maybe`, `Either`, `List` functions, `Map`-lite
- [ ] Drive scoreboard toward 150+ passing
## Progress log
_Newest first._
- **2026-04-24** — Phase 1: full patterns. Added `as` patterns
(`name@apat``(:p-as NAME PAT)`), lazy patterns (`~apat`
`(:p-lazy PAT)`), negative literal patterns (`-N` / `-F` resolving
eagerly in the parser so downstream passes see a plain `(:p-int -1)`),
and infix constructor patterns via a right-associative single-band
layer on top of `hk-parse-pat-lhs` for any `consym` or reservedop `:`
(so `x : xs` parses as `(:p-con ":" [x, xs])`, `a :+: b` likewise).
Extended `hk-apat-start?` with `-` and `~` so the pattern-argument
loops in lambdas and constructor applications pick these up.
Lambdas now parse apat parameters instead of bare varids — so the
`:lambda` AST is `(:lambda APATS BODY)` with apats as pattern nodes.
`hk-parse-bind` became a plain `pat = expr` form, so `:bind` now has
a pattern LHS throughout (simple `x = 1``(:bind (:p-var "x") …)`);
this picks up `let (x, y) = pair in …` and `let Just x = m in x`
automatically, and flows through `do`-notation lets. Eight existing
tests updated to the pattern-flavoured AST. Also fixed a pragmatic
layout issue that surfaced in multi-line `let`s: when a layout-indent
would emit a spurious `;` just before an `in` token (because the
let block had already been closed by dedent), `hk-peek-next-reserved`
now lets the layout pass skip that indent and leave closing to the
existing `in` handler. 18 new tests in
`lib/haskell/tests/parser-patterns.sx` cover every pattern variant,
lambda with mixed apats, let pattern-bindings (tuple / constructor /
cons), and do-bind with a tuple pattern. 138/138 green.
- **2026-04-24** — Phase 1: `case … of` and `do`-notation parsers. Added `hk-parse-case`
/ `hk-parse-alt`, `hk-parse-do` / `hk-parse-do-stmt` / `hk-parse-do-let`, plus the
minimal pattern language needed to make arms and binds meaningful:
`hk-parse-apat` (var, wildcard `_`, int/float/string/char literal, 0-arity
conid/qconid, paren+tuple, list) and `hk-parse-pat` (conid applied to
apats greedily). AST nodes: `:case SCRUT ALTS`, `:alt PAT BODY`, `:do STMTS`
with stmts `:do-expr E` / `:do-bind PAT E` / `:do-let BINDS`, and pattern
tags `:p-wild` / `:p-int` / `:p-float` / `:p-string` / `:p-char` / `:p-var`
/ `:p-con NAME ARGS` / `:p-tuple` / `:p-list`. `do`-stmts disambiguate
`pat <- e` vs bare expression with a forward paren/bracket/brace-balanced
scan for `<-` before the next `;`/`}` — no backtracking, no AST rewrite.
`case` and `do` accept both implicit (`vlbrace`/`vsemi`/`vrbrace`) and
explicit braces. Added to `hk-parse-lexp` so they participate fully in
operator-precedence expressions. 19 new tests in
`lib/haskell/tests/parser-case-do.sx` cover every pattern variant,
explicit-brace `case`, expression scrutinees, do with bind/let/expr,
multi-binding `let` in `do`, constructor patterns in binds, and
`case`/`do` nested inside `let` and lambda. The full pattern item (as
patterns, negative literals, `~` lazy, lambda/let pattern extension)
remains a separate sub-item. 119/119 green.
- **2026-04-24** — Phase 1: expression parser (`lib/haskell/parser.sx`, ~380 lines).
Pratt-style precedence climbing against a Haskell-98-default op table (24
operators across precedence 09, left/right/non assoc, default infixl 9 for
anything unlisted). Supports literals (int/float/string/char), varid/conid
(qualified variants folded into `:var` / `:con`), parens / unit / tuples,
list literals, ranges `[a..b]` and `[a,b..c]`, left-associative application,
unary `-`, backtick operators (`x \`mod\` 3`), lambdas, `if-then-else`, and
`let … in` consuming both virtual and explicit braces. AST uses keyword
tags (`:var`, `:op`, `:lambda`, `:let`, `:bind`, `:tuple`, `:range`,
`:range-step`, `:app`, `:neg`, `:if`, `:list`, `:int`, `:float`, `:string`,
`:char`, `:con`). The parser skips a leading `vlbrace` / `lbrace` so it can
be called on full post-layout output, and uses a `raise`-based error channel
with location-lite messages. 42 new tests in `lib/haskell/tests/parser-expr.sx`
cover literals, identifiers, parens/tuple/unit, list + range, app associativity,
operator precedence (mul over add, cons right-assoc, function-composition
right-assoc, `$` lowest), backtick ops, unary `-`, lambda multi-param,
`if` with infix condition, single- and multi-binding `let` (both implicit
and explicit braces), plus a few mixed nestings. 100/100 green.
- **2026-04-24** — Phase 1: layout algorithm (`lib/haskell/layout.sx`, ~260 lines)
implementing Haskell 98 §10.3. Two-pass design: a pre-pass augments the raw
token stream with explicit `layout-open` / `layout-indent` markers (suppressing
`<n>` when `{n}` already applies, per note 3), then an L pass consumes the
augmented stream against a stack of implicit/explicit layout contexts and
emits `vlbrace` / `vsemi` / `vrbrace` tokens; newlines are dropped. Supports
the initial module-level implicit open (skipped when the first token is
`module` or `{`), the four layout keywords (`let`/`where`/`do`/`of`), explicit
braces disabling layout, dedent closing nested implicit blocks while also
emitting `vsemi` at the enclosing level, and the pragmatic single-line
`let … in` rule (emit `}` when `in` meets an implicit let). 15 new tests
in `lib/haskell/tests/layout.sx` cover module-start, do/let/where/case/of,
explicit braces, multi-level dedent, line continuation, and EOF close-down.
Shared test helpers moved to `lib/haskell/testlib.sx` so both test files
can share one `hk-test`. `test.sh` preloads tokenizer + layout + testlib.
58/58 green.
- **2026-04-24** — Phase 1: Haskell 98 tokenizer (`lib/haskell/tokenizer.sx`, 490 lines)
covering idents (lower/upper/qvarid/qconid), 23 reserved words, 11 reserved ops,
varsym/consym operator chains, integer/hex/octal/float literals incl. exponent
notation, char + string literals with escape sequences, nested `{- ... -}` block
comments with depth counter, `-- ... EOL` line comments (respecting the
"followed by symbol = not a comment" Haskell 98 rule), backticks, punctuation,
and explicit `newline` tokens for the upcoming layout pass. 43 structural tests
in `lib/haskell/tests/parse.sx`, a lightweight `hk-deep=?` equality helper
and a custom `lib/haskell/test.sh` runner (pipes through the OCaml epoch
protocol, falls back to the main-repo build when run from a worktree). 43/43
green.
Also peeked at `/root/rose-ash/sx-haskell/` per briefing: that directory is a
Haskell program implementing an **SX interpreter** (Types.hs, Eval.hs,
Primitives.hs, etc. — ~2800 lines of .hs) — the *opposite* direction from this
project. Nothing to fold in.
Gotchas hit: `emit!` and `peek` are SX evaluator special forms, so every local
helper uses the `hk-` prefix. `cond`/`when`/`let` clauses evaluate ONLY the
last expression; multi-expression bodies MUST be wrapped in `(do ...)`. These
two together account for all the tokenizer's early crashes.
## Blockers
- _(none yet)_