# Haskell-on-SX: mini-Haskell with real laziness Mini-Haskell is the research-paper-worthy demo. Laziness is native to the SX runtime (thunks are already a first-class type); algebraic data types map onto tagged lists; typeclasses map onto dictionary passing; IO maps onto `perform`/`resume`. Hindley-Milner inference is the one real piece of new work. End-state goal: a **Haskell 98 subset** that runs the small classic programs (sieve of Eratosthenes lazy stream, fibonacci as infinite list, naive quicksort, n-queens, expression evaluator) plus a ~150-test corpus. ## Scope decisions (defaults — override) - **Standard:** Haskell 98 subset. No GHC extensions (no `DataKinds`, no `GADTs`, no `TypeFamilies`, no `TemplateHaskell`). - **Phase 1-3 are untyped** — we get the evaluator right first with laziness + ADTs, then add HM inference in phase 4. This is deliberate: typing is the hard bit and will take a full phase on its own. - **Typeclasses:** dictionary passing, no overlap, no orphan instances. Added in phase 5. - **Layout rule:** yes — phase 1 implements Haskell's indentation-sensitive parsing (painful but required). - **Test corpus:** custom. No GHC test suite. Bundle classic programs + ~100 hand-written expression-level tests + mini Prelude tests. ## Ground rules - **Scope:** only `lib/haskell/**` and `plans/haskell-on-sx.md`. No edits to `spec/`, `hosts/`, `shared/`, or other language dirs. - **SX files:** `sx-tree` MCP tools only. - **Architecture:** Haskell source → AST → desugared-core → SX AST → CEK. Thunks on the SX side provide laziness natively. - **Commits:** one feature per commit. Keep `## Progress log` updated. ## Architecture sketch ``` Haskell source │ ▼ lib/haskell/tokenizer.sx — idents, operators, layout-sensitive indentation │ ▼ lib/haskell/parser.sx — AST: modules, data decls, type sigs, fn clauses, expressions │ ▼ lib/haskell/desugar.sx — surface → core: case-of-case, do-notation, list comp, guards │ ▼ lib/haskell/transpile.sx — core → SX AST, wrapping everything in thunks for laziness │ ▼ lib/haskell/runtime.sx — force, ADT constructors, Prelude, typeclass dicts (phase 5+) │ ▼ existing CEK / VM ``` Key mappings: - **Laziness** = every function argument is an SX thunk; `force` is WHNF reduction. SX already has `make-thunk` from the trampolining evaluator — we reuse it. - **Pattern match** = forces the scrutinee to WHNF, then structural match on the tag - **ADT** = `data Maybe a = Nothing | Just a` compiles to tagged lists: `(:Nothing)` and `(:Just )` - **Typeclass** = each class becomes a record type; each instance becomes a record value; each method becomes a projection; the elaborator inserts the dict at each call site (phase 5) - **IO** = `IO a` is a function `World -> (a, World)` internally; in practice uses `perform`/`resume` for actual side effects - **Layout** = offside rule; inserted virtual braces + semis during a lexer-parser feedback pass ## Roadmap ### Phase 1 — tokenizer + parser + layout rule - [x] Tokenizer: reserved words, qualified names, operators, numbers (int, float, Rational later), chars/strings, comments (`--` and `{-` nested) - [x] Layout algorithm: turn indentation into virtual `{`, `;`, `}` tokens per Haskell 98 §10.3 - Parser (split into sub-items — implement one per iteration): - [x] Expressions: atoms, parens, tuples, lists, ranges, application, infix with full Haskell-98 precedence table, unary `-`, backtick operators, lambdas, `if`, `let` - [x] `case … of` and `do`-notation expressions (plus minimal patterns needed for arms/binds: var, wildcard, literal, 0-arity and applied constructor, tuple, list) - [x] Patterns — full: `as` patterns, nested, negative literal, `~` lazy, infix constructor (`:` / consym), extend lambdas/let with non-var patterns - [x] Top-level decls: function clauses (simple — no guards/where yet), pattern bindings, multi-name type signatures, `data` with type vars and recursive constructors, `type` synonyms, `newtype`, fixity (`infix`/`infixl`/`infixr` with optional precedence, comma-separated ops, backtick names). Types: vars / constructors / application / `->` (right-assoc) / tuples / lists. `hk-parse-top` entry. - [x] `where` clauses + guards (on fun-clauses, case alts, and let/do-let bindings — with the let funclause shorthand `let f x = …` now supported) - [x] Module header + imports — `module NAME [exports] where …`, qualified/as/hiding/explicit imports, operator exports, `module Foo` exports, dotted names, headerless-with-imports - [ ] List comprehensions + operator sections - [ ] AST design modelled on GHC's HsSyn at a surface level - [x] Unit tests in `lib/haskell/tests/parse.sx` (43 tokenizer tests, all green) ### Phase 2 — desugar + eager-ish eval + ADTs (untyped) - [ ] Desugar: guards → nested `if`s; `where` → `let`; list comp → `concatMap`-based; do-notation stays for now (desugared in phase 3) - [ ] `data` declarations register constructors in runtime - [ ] Pattern match (tag-based, value-level): atoms, vars, wildcards, constructor patterns, `as` patterns, nested - [ ] Evaluator (still strict internally — laziness in phase 3): `let`, `lambda`, application, `case`, literals, constructors - [ ] 30+ eval tests in `lib/haskell/tests/eval.sx` ### Phase 3 — laziness + classic programs - [ ] Transpile to thunk-wrapped SX: every application arg becomes `(make-thunk (lambda () ))` - [ ] `force` = SX eval-thunk-to-WHNF primitive - [ ] Pattern match forces scrutinee before matching - [ ] Infinite structures: `repeat x`, `iterate f x`, `[1..]`, Fibonacci stream, sieve of Eratosthenes - [ ] `seq`, `deepseq` from Prelude - [ ] Do-notation for a stub `IO` monad (just threading, no real side effects yet) - [ ] Classic programs in `lib/haskell/tests/programs/`: - [ ] `fib.hs` — infinite Fibonacci stream - [ ] `sieve.hs` — lazy sieve of Eratosthenes - [ ] `quicksort.hs` — naive QS - [ ] `nqueens.hs` - [ ] `calculator.hs` — parser combinator style expression evaluator - [ ] `lib/haskell/conformance.sh` + runner; `scoreboard.json` + `scoreboard.md` - [ ] Target: 5/5 classic programs passing ### Phase 4 — Hindley-Milner inference - [ ] Algorithm W: unification + type schemes + generalisation + instantiation - [ ] Report type errors with meaningful positions - [ ] Reject untypeable programs that phase 3 was accepting - [ ] Type-sig checking: user writes `f :: Int -> Int`; verify - [ ] Let-polymorphism - [ ] Unit tests: inference for 50+ expressions ### Phase 5 — typeclasses (dictionary passing) - [ ] `class` / `instance` declarations - [ ] Dictionary-passing elaborator: inserts dict args at call sites - [ ] Standard classes: `Eq`, `Ord`, `Show`, `Num`, `Functor`, `Monad`, `Applicative` - [ ] `deriving (Eq, Show)` for ADTs ### Phase 6 — real IO + Prelude completion - [ ] Real `IO` monad backed by `perform`/`resume` - [ ] `putStrLn`, `getLine`, `readFile`, `writeFile`, `print` - [ ] Full-ish Prelude: `Maybe`, `Either`, `List` functions, `Map`-lite - [ ] Drive scoreboard toward 150+ passing ## Progress log _Newest first._ - **2026-04-24** — Phase 1: module header + imports. Added `hk-parse-module-header`, `hk-parse-import`, plus shared helpers for import/export entity lists (`hk-parse-ent`, `hk-parse-ent-member`, `hk-parse-ent-list`). New AST: - `(:module NAME EXPORTS IMPORTS DECLS)` — NAME `nil` means no header, EXPORTS `nil` means no export list (distinct from empty `()`) - `(:import QUALIFIED NAME AS SPEC)` — QUALIFIED bool, AS alias or nil, SPEC nil / `(:spec-items ENTS)` / `(:spec-hiding ENTS)` - Entity refs: `:ent-var`, `:ent-all` (`Tycon(..)`), `:ent-with` (`Tycon(m1, m2, …)`), `:ent-module` (exports only). `hk-parse-program` now dispatches on the leading token: `module` keyword → full header-plus-body parse (consuming the `where` layout brace around the module body); otherwise collect any leading `import` decls and then remaining decls with the existing logic. The outer shell is `(:module …)` as soon as any header or import is present, and stays as `(:program DECLS)` otherwise — preserving every previous test expectation untouched. Handles operator exports `((+:))`, dotted module names (`Data.Map`), and the Haskell-98 context-sensitive keywords `qualified`/`as`/`hiding` (all lexed as ordinary varids and matched only in import position). 16 new tests in `lib/haskell/tests/parser-module.sx` covering simple/exports/empty headers, dotted names, operator exports, `module Foo` exports, qualified/aliased/items/hiding imports, and a headerless-with-imports file. 189/189 green. - **2026-04-24** — Phase 1: guards + where clauses. Factored a single `hk-parse-rhs sep` that all body-producing sites now share: it reads a plain `sep expr` body or a chain of `| cond sep expr` guards, then — regardless of which form — looks for an optional `where` block and wraps accordingly. AST additions: - `:guarded GUARDS` where each GUARD is `:guard COND EXPR` - `:where BODY DECLS` where BODY is a plain expr or a `:guarded` Both can nest (guards inside where). `hk-parse-alt` now routes through `hk-parse-rhs "->"`, `hk-parse-fun-clause` and `hk-parse-bind` through `hk-parse-rhs "="`. `hk-parse-where-decls` reuses `hk-parse-decl` so where-blocks accept any decl form (signatures, fixity, nested funs). As a side effect, `hk-parse-bind` now also picks up the Haskell-native `let f x = …` funclause shorthand: a varid followed by one or more apats produces `(:fun-clause NAME APATS BODY)` instead of a `(:bind (:p-var …) …)` — keeping the simple `let x = e` shape unchanged for existing tests. 11 new tests in `lib/haskell/tests/parser-guards-where.sx` cover two- and three-way guards, mixed guarded + equality clauses, single- and multi-binding where blocks, guards plus where, case-alt guards, case-alt where, let with funclause shorthand, let with guards, and a where containing a type signature alongside a fun-clause. 173/173 green. - **2026-04-24** — Phase 1: top-level decls. Refactored `hk-parse-expr` into a `hk-parser tokens mode` with `:expr` / `:module` dispatch so the big lexical state is shared (peek/advance/pat/expr helpers all reachable); added public wrappers `hk-parse-expr`, `hk-parse-module`, and source-level entry `hk-parse-top`. New type parser (`hk-parse-type` / `hk-parse-btype` / `hk-parse-atype`): type variables (`:t-var`), type constructors (`:t-con`), type application (`:t-app`, left-assoc), right-associative function arrow (`:t-fun`), unit/tuples (`:t-tuple`), and lists (`:t-list`). New decl parser (`hk-parse-decl` / `hk-parse-program`) producing a `(:program DECLS)` shell: - `:type-sig NAMES TYPE` — comma-separated multi-name support - `:fun-clause NAME APATS BODY` — patterns for args, body via existing expr - `:pat-bind PAT BODY` — top-level pattern bindings like `(a, b) = pair` - `:data NAME TVARS CONS` with `:con-def CNAME FIELDS` for nullary and multi-arg constructors, including recursive references - `:type-syn NAME TVARS TYPE`, `:newtype NAME TVARS CNAME FIELD` - `:fixity ASSOC PREC OPS` — assoc one of `"l"`/`"r"`/`"n"`, default prec 9, comma-separated operator names, including backtick-quoted varids. Sig vs fun-clause disambiguated by a paren-balanced top-level scan for `::` before the next `;`/`}` (`hk-has-top-dcolon?`). 24 new tests in `lib/haskell/tests/parser-decls.sx` cover all decl forms, signatures with application / tuples / lists / right-assoc arrows, nullary and recursive data types, multi-clause functions, and a mixed program with data + type- synonym + signature + two function clauses. Not yet: guards, where clauses, module header, imports, deriving, contexts, GADTs. 162/162 green. - **2026-04-24** — Phase 1: full patterns. Added `as` patterns (`name@apat` → `(:p-as NAME PAT)`), lazy patterns (`~apat` → `(:p-lazy PAT)`), negative literal patterns (`-N` / `-F` resolving eagerly in the parser so downstream passes see a plain `(:p-int -1)`), and infix constructor patterns via a right-associative single-band layer on top of `hk-parse-pat-lhs` for any `consym` or reservedop `:` (so `x : xs` parses as `(:p-con ":" [x, xs])`, `a :+: b` likewise). Extended `hk-apat-start?` with `-` and `~` so the pattern-argument loops in lambdas and constructor applications pick these up. Lambdas now parse apat parameters instead of bare varids — so the `:lambda` AST is `(:lambda APATS BODY)` with apats as pattern nodes. `hk-parse-bind` became a plain `pat = expr` form, so `:bind` now has a pattern LHS throughout (simple `x = 1` → `(:bind (:p-var "x") …)`); this picks up `let (x, y) = pair in …` and `let Just x = m in x` automatically, and flows through `do`-notation lets. Eight existing tests updated to the pattern-flavoured AST. Also fixed a pragmatic layout issue that surfaced in multi-line `let`s: when a layout-indent would emit a spurious `;` just before an `in` token (because the let block had already been closed by dedent), `hk-peek-next-reserved` now lets the layout pass skip that indent and leave closing to the existing `in` handler. 18 new tests in `lib/haskell/tests/parser-patterns.sx` cover every pattern variant, lambda with mixed apats, let pattern-bindings (tuple / constructor / cons), and do-bind with a tuple pattern. 138/138 green. - **2026-04-24** — Phase 1: `case … of` and `do`-notation parsers. Added `hk-parse-case` / `hk-parse-alt`, `hk-parse-do` / `hk-parse-do-stmt` / `hk-parse-do-let`, plus the minimal pattern language needed to make arms and binds meaningful: `hk-parse-apat` (var, wildcard `_`, int/float/string/char literal, 0-arity conid/qconid, paren+tuple, list) and `hk-parse-pat` (conid applied to apats greedily). AST nodes: `:case SCRUT ALTS`, `:alt PAT BODY`, `:do STMTS` with stmts `:do-expr E` / `:do-bind PAT E` / `:do-let BINDS`, and pattern tags `:p-wild` / `:p-int` / `:p-float` / `:p-string` / `:p-char` / `:p-var` / `:p-con NAME ARGS` / `:p-tuple` / `:p-list`. `do`-stmts disambiguate `pat <- e` vs bare expression with a forward paren/bracket/brace-balanced scan for `<-` before the next `;`/`}` — no backtracking, no AST rewrite. `case` and `do` accept both implicit (`vlbrace`/`vsemi`/`vrbrace`) and explicit braces. Added to `hk-parse-lexp` so they participate fully in operator-precedence expressions. 19 new tests in `lib/haskell/tests/parser-case-do.sx` cover every pattern variant, explicit-brace `case`, expression scrutinees, do with bind/let/expr, multi-binding `let` in `do`, constructor patterns in binds, and `case`/`do` nested inside `let` and lambda. The full pattern item (as patterns, negative literals, `~` lazy, lambda/let pattern extension) remains a separate sub-item. 119/119 green. - **2026-04-24** — Phase 1: expression parser (`lib/haskell/parser.sx`, ~380 lines). Pratt-style precedence climbing against a Haskell-98-default op table (24 operators across precedence 0–9, left/right/non assoc, default infixl 9 for anything unlisted). Supports literals (int/float/string/char), varid/conid (qualified variants folded into `:var` / `:con`), parens / unit / tuples, list literals, ranges `[a..b]` and `[a,b..c]`, left-associative application, unary `-`, backtick operators (`x \`mod\` 3`), lambdas, `if-then-else`, and `let … in` consuming both virtual and explicit braces. AST uses keyword tags (`:var`, `:op`, `:lambda`, `:let`, `:bind`, `:tuple`, `:range`, `:range-step`, `:app`, `:neg`, `:if`, `:list`, `:int`, `:float`, `:string`, `:char`, `:con`). The parser skips a leading `vlbrace` / `lbrace` so it can be called on full post-layout output, and uses a `raise`-based error channel with location-lite messages. 42 new tests in `lib/haskell/tests/parser-expr.sx` cover literals, identifiers, parens/tuple/unit, list + range, app associativity, operator precedence (mul over add, cons right-assoc, function-composition right-assoc, `$` lowest), backtick ops, unary `-`, lambda multi-param, `if` with infix condition, single- and multi-binding `let` (both implicit and explicit braces), plus a few mixed nestings. 100/100 green. - **2026-04-24** — Phase 1: layout algorithm (`lib/haskell/layout.sx`, ~260 lines) implementing Haskell 98 §10.3. Two-pass design: a pre-pass augments the raw token stream with explicit `layout-open` / `layout-indent` markers (suppressing `` when `{n}` already applies, per note 3), then an L pass consumes the augmented stream against a stack of implicit/explicit layout contexts and emits `vlbrace` / `vsemi` / `vrbrace` tokens; newlines are dropped. Supports the initial module-level implicit open (skipped when the first token is `module` or `{`), the four layout keywords (`let`/`where`/`do`/`of`), explicit braces disabling layout, dedent closing nested implicit blocks while also emitting `vsemi` at the enclosing level, and the pragmatic single-line `let … in` rule (emit `}` when `in` meets an implicit let). 15 new tests in `lib/haskell/tests/layout.sx` cover module-start, do/let/where/case/of, explicit braces, multi-level dedent, line continuation, and EOF close-down. Shared test helpers moved to `lib/haskell/testlib.sx` so both test files can share one `hk-test`. `test.sh` preloads tokenizer + layout + testlib. 58/58 green. - **2026-04-24** — Phase 1: Haskell 98 tokenizer (`lib/haskell/tokenizer.sx`, 490 lines) covering idents (lower/upper/qvarid/qconid), 23 reserved words, 11 reserved ops, varsym/consym operator chains, integer/hex/octal/float literals incl. exponent notation, char + string literals with escape sequences, nested `{- ... -}` block comments with depth counter, `-- ... EOL` line comments (respecting the "followed by symbol = not a comment" Haskell 98 rule), backticks, punctuation, and explicit `newline` tokens for the upcoming layout pass. 43 structural tests in `lib/haskell/tests/parse.sx`, a lightweight `hk-deep=?` equality helper and a custom `lib/haskell/test.sh` runner (pipes through the OCaml epoch protocol, falls back to the main-repo build when run from a worktree). 43/43 green. Also peeked at `/root/rose-ash/sx-haskell/` per briefing: that directory is a Haskell program implementing an **SX interpreter** (Types.hs, Eval.hs, Primitives.hs, etc. — ~2800 lines of .hs) — the *opposite* direction from this project. Nothing to fold in. Gotchas hit: `emit!` and `peek` are SX evaluator special forms, so every local helper uses the `hk-` prefix. `cond`/`when`/`let` clauses evaluate ONLY the last expression; multi-expression bodies MUST be wrapped in `(do ...)`. These two together account for all the tokenizer's early crashes. ## Blockers - _(none yet)_