Files
rose-ash/plans/haskell-on-sx.md
giles 36234f0132
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Has been cancelled
haskell: case/do + minimal patterns (+19 tests, 119/119)
2026-04-24 18:00:58 +00:00

12 KiB
Raw Blame History

Haskell-on-SX: mini-Haskell with real laziness

Mini-Haskell is the research-paper-worthy demo. Laziness is native to the SX runtime (thunks are already a first-class type); algebraic data types map onto tagged lists; typeclasses map onto dictionary passing; IO maps onto perform/resume. Hindley-Milner inference is the one real piece of new work.

End-state goal: a Haskell 98 subset that runs the small classic programs (sieve of Eratosthenes lazy stream, fibonacci as infinite list, naive quicksort, n-queens, expression evaluator) plus a ~150-test corpus.

Scope decisions (defaults — override)

  • Standard: Haskell 98 subset. No GHC extensions (no DataKinds, no GADTs, no TypeFamilies, no TemplateHaskell).
  • Phase 1-3 are untyped — we get the evaluator right first with laziness + ADTs, then add HM inference in phase 4. This is deliberate: typing is the hard bit and will take a full phase on its own.
  • Typeclasses: dictionary passing, no overlap, no orphan instances. Added in phase 5.
  • Layout rule: yes — phase 1 implements Haskell's indentation-sensitive parsing (painful but required).
  • Test corpus: custom. No GHC test suite. Bundle classic programs + ~100 hand-written expression-level tests + mini Prelude tests.

Ground rules

  • Scope: only lib/haskell/** and plans/haskell-on-sx.md. No edits to spec/, hosts/, shared/, or other language dirs.
  • SX files: sx-tree MCP tools only.
  • Architecture: Haskell source → AST → desugared-core → SX AST → CEK. Thunks on the SX side provide laziness natively.
  • Commits: one feature per commit. Keep ## Progress log updated.

Architecture sketch

Haskell source
    │
    ▼
lib/haskell/tokenizer.sx  — idents, operators, layout-sensitive indentation
    │
    ▼
lib/haskell/parser.sx     — AST: modules, data decls, type sigs, fn clauses, expressions
    │
    ▼
lib/haskell/desugar.sx    — surface → core: case-of-case, do-notation, list comp, guards
    │
    ▼
lib/haskell/transpile.sx  — core → SX AST, wrapping everything in thunks for laziness
    │
    ▼
lib/haskell/runtime.sx    — force, ADT constructors, Prelude, typeclass dicts (phase 5+)
    │
    ▼
existing CEK / VM

Key mappings:

  • Laziness = every function argument is an SX thunk; force is WHNF reduction. SX already has make-thunk from the trampolining evaluator — we reuse it.
  • Pattern match = forces the scrutinee to WHNF, then structural match on the tag
  • ADT = data Maybe a = Nothing | Just a compiles to tagged lists: (:Nothing) and (:Just <thunk>)
  • Typeclass = each class becomes a record type; each instance becomes a record value; each method becomes a projection; the elaborator inserts the dict at each call site (phase 5)
  • IO = IO a is a function World -> (a, World) internally; in practice uses perform/resume for actual side effects
  • Layout = offside rule; inserted virtual braces + semis during a lexer-parser feedback pass

Roadmap

Phase 1 — tokenizer + parser + layout rule

  • Tokenizer: reserved words, qualified names, operators, numbers (int, float, Rational later), chars/strings, comments (-- and {- nested)
  • Layout algorithm: turn indentation into virtual {, ;, } tokens per Haskell 98 §10.3
  • Parser (split into sub-items — implement one per iteration):
    • Expressions: atoms, parens, tuples, lists, ranges, application, infix with full Haskell-98 precedence table, unary -, backtick operators, lambdas, if, let
    • case … of and do-notation expressions (plus minimal patterns needed for arms/binds: var, wildcard, literal, 0-arity and applied constructor, tuple, list)
    • Patterns — full: as patterns, nested, negative literal, ~ lazy, extend lambdas/let with non-var patterns
    • Top-level decls: function clauses, type signatures, data, type, newtype, fixity decls
    • where clauses + guards
    • Module header + imports (stub)
    • List comprehensions + operator sections
  • AST design modelled on GHC's HsSyn at a surface level
  • Unit tests in lib/haskell/tests/parse.sx (43 tokenizer tests, all green)

Phase 2 — desugar + eager-ish eval + ADTs (untyped)

  • Desugar: guards → nested ifs; wherelet; list comp → concatMap-based; do-notation stays for now (desugared in phase 3)
  • data declarations register constructors in runtime
  • Pattern match (tag-based, value-level): atoms, vars, wildcards, constructor patterns, as patterns, nested
  • Evaluator (still strict internally — laziness in phase 3): let, lambda, application, case, literals, constructors
  • 30+ eval tests in lib/haskell/tests/eval.sx

Phase 3 — laziness + classic programs

  • Transpile to thunk-wrapped SX: every application arg becomes (make-thunk (lambda () <arg>))
  • force = SX eval-thunk-to-WHNF primitive
  • Pattern match forces scrutinee before matching
  • Infinite structures: repeat x, iterate f x, [1..], Fibonacci stream, sieve of Eratosthenes
  • seq, deepseq from Prelude
  • Do-notation for a stub IO monad (just threading, no real side effects yet)
  • Classic programs in lib/haskell/tests/programs/:
    • fib.hs — infinite Fibonacci stream
    • sieve.hs — lazy sieve of Eratosthenes
    • quicksort.hs — naive QS
    • nqueens.hs
    • calculator.hs — parser combinator style expression evaluator
  • lib/haskell/conformance.sh + runner; scoreboard.json + scoreboard.md
  • Target: 5/5 classic programs passing

Phase 4 — Hindley-Milner inference

  • Algorithm W: unification + type schemes + generalisation + instantiation
  • Report type errors with meaningful positions
  • Reject untypeable programs that phase 3 was accepting
  • Type-sig checking: user writes f :: Int -> Int; verify
  • Let-polymorphism
  • Unit tests: inference for 50+ expressions

Phase 5 — typeclasses (dictionary passing)

  • class / instance declarations
  • Dictionary-passing elaborator: inserts dict args at call sites
  • Standard classes: Eq, Ord, Show, Num, Functor, Monad, Applicative
  • deriving (Eq, Show) for ADTs

Phase 6 — real IO + Prelude completion

  • Real IO monad backed by perform/resume
  • putStrLn, getLine, readFile, writeFile, print
  • Full-ish Prelude: Maybe, Either, List functions, Map-lite
  • Drive scoreboard toward 150+ passing

Progress log

Newest first.

  • 2026-04-24 — Phase 1: case … of and do-notation parsers. Added hk-parse-case / hk-parse-alt, hk-parse-do / hk-parse-do-stmt / hk-parse-do-let, plus the minimal pattern language needed to make arms and binds meaningful: hk-parse-apat (var, wildcard _, int/float/string/char literal, 0-arity conid/qconid, paren+tuple, list) and hk-parse-pat (conid applied to apats greedily). AST nodes: :case SCRUT ALTS, :alt PAT BODY, :do STMTS with stmts :do-expr E / :do-bind PAT E / :do-let BINDS, and pattern tags :p-wild / :p-int / :p-float / :p-string / :p-char / :p-var / :p-con NAME ARGS / :p-tuple / :p-list. do-stmts disambiguate pat <- e vs bare expression with a forward paren/bracket/brace-balanced scan for <- before the next ;/} — no backtracking, no AST rewrite. case and do accept both implicit (vlbrace/vsemi/vrbrace) and explicit braces. Added to hk-parse-lexp so they participate fully in operator-precedence expressions. 19 new tests in lib/haskell/tests/parser-case-do.sx cover every pattern variant, explicit-brace case, expression scrutinees, do with bind/let/expr, multi-binding let in do, constructor patterns in binds, and case/do nested inside let and lambda. The full pattern item (as patterns, negative literals, ~ lazy, lambda/let pattern extension) remains a separate sub-item. 119/119 green.

  • 2026-04-24 — Phase 1: expression parser (lib/haskell/parser.sx, ~380 lines). Pratt-style precedence climbing against a Haskell-98-default op table (24 operators across precedence 09, left/right/non assoc, default infixl 9 for anything unlisted). Supports literals (int/float/string/char), varid/conid (qualified variants folded into :var / :con), parens / unit / tuples, list literals, ranges [a..b] and [a,b..c], left-associative application, unary -, backtick operators (x \mod` 3), lambdas, if-then-else, and let … in consuming both virtual and explicit braces. AST uses keyword tags (:var, :op, :lambda, :let, :bind, :tuple, :range, :range-step, :app, :neg, :if, :list, :int, :float, :string, :char, :con). The parser skips a leading vlbrace/lbraceso it can be called on full post-layout output, and uses araise-based error channel with location-lite messages. 42 new tests in lib/haskell/tests/parser-expr.sxcover literals, identifiers, parens/tuple/unit, list + range, app associativity, operator precedence (mul over add, cons right-assoc, function-composition right-assoc,$lowest), backtick ops, unary-, lambda multi-param, ifwith infix condition, single- and multi-bindinglet` (both implicit and explicit braces), plus a few mixed nestings. 100/100 green.

  • 2026-04-24 — Phase 1: layout algorithm (lib/haskell/layout.sx, ~260 lines) implementing Haskell 98 §10.3. Two-pass design: a pre-pass augments the raw token stream with explicit layout-open / layout-indent markers (suppressing <n> when {n} already applies, per note 3), then an L pass consumes the augmented stream against a stack of implicit/explicit layout contexts and emits vlbrace / vsemi / vrbrace tokens; newlines are dropped. Supports the initial module-level implicit open (skipped when the first token is module or {), the four layout keywords (let/where/do/of), explicit braces disabling layout, dedent closing nested implicit blocks while also emitting vsemi at the enclosing level, and the pragmatic single-line let … in rule (emit } when in meets an implicit let). 15 new tests in lib/haskell/tests/layout.sx cover module-start, do/let/where/case/of, explicit braces, multi-level dedent, line continuation, and EOF close-down. Shared test helpers moved to lib/haskell/testlib.sx so both test files can share one hk-test. test.sh preloads tokenizer + layout + testlib. 58/58 green.

  • 2026-04-24 — Phase 1: Haskell 98 tokenizer (lib/haskell/tokenizer.sx, 490 lines) covering idents (lower/upper/qvarid/qconid), 23 reserved words, 11 reserved ops, varsym/consym operator chains, integer/hex/octal/float literals incl. exponent notation, char + string literals with escape sequences, nested {- ... -} block comments with depth counter, -- ... EOL line comments (respecting the "followed by symbol = not a comment" Haskell 98 rule), backticks, punctuation, and explicit newline tokens for the upcoming layout pass. 43 structural tests in lib/haskell/tests/parse.sx, a lightweight hk-deep=? equality helper and a custom lib/haskell/test.sh runner (pipes through the OCaml epoch protocol, falls back to the main-repo build when run from a worktree). 43/43 green.

    Also peeked at /root/rose-ash/sx-haskell/ per briefing: that directory is a Haskell program implementing an SX interpreter (Types.hs, Eval.hs, Primitives.hs, etc. — ~2800 lines of .hs) — the opposite direction from this project. Nothing to fold in.

    Gotchas hit: emit! and peek are SX evaluator special forms, so every local helper uses the hk- prefix. cond/when/let clauses evaluate ONLY the last expression; multi-expression bodies MUST be wrapped in (do ...). These two together account for all the tokenizer's early crashes.

Blockers

  • (none yet)