Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Has been cancelled
152 lines
8.5 KiB
Markdown
152 lines
8.5 KiB
Markdown
# Haskell-on-SX: mini-Haskell with real laziness
|
|
|
|
Mini-Haskell is the research-paper-worthy demo. Laziness is native to the SX runtime (thunks are already a first-class type); algebraic data types map onto tagged lists; typeclasses map onto dictionary passing; IO maps onto `perform`/`resume`. Hindley-Milner inference is the one real piece of new work.
|
|
|
|
End-state goal: a **Haskell 98 subset** that runs the small classic programs (sieve of Eratosthenes lazy stream, fibonacci as infinite list, naive quicksort, n-queens, expression evaluator) plus a ~150-test corpus.
|
|
|
|
## Scope decisions (defaults — override)
|
|
|
|
- **Standard:** Haskell 98 subset. No GHC extensions (no `DataKinds`, no `GADTs`, no `TypeFamilies`, no `TemplateHaskell`).
|
|
- **Phase 1-3 are untyped** — we get the evaluator right first with laziness + ADTs, then add HM inference in phase 4. This is deliberate: typing is the hard bit and will take a full phase on its own.
|
|
- **Typeclasses:** dictionary passing, no overlap, no orphan instances. Added in phase 5.
|
|
- **Layout rule:** yes — phase 1 implements Haskell's indentation-sensitive parsing (painful but required).
|
|
- **Test corpus:** custom. No GHC test suite. Bundle classic programs + ~100 hand-written expression-level tests + mini Prelude tests.
|
|
|
|
## Ground rules
|
|
|
|
- **Scope:** only `lib/haskell/**` and `plans/haskell-on-sx.md`. No edits to `spec/`, `hosts/`, `shared/`, or other language dirs.
|
|
- **SX files:** `sx-tree` MCP tools only.
|
|
- **Architecture:** Haskell source → AST → desugared-core → SX AST → CEK. Thunks on the SX side provide laziness natively.
|
|
- **Commits:** one feature per commit. Keep `## Progress log` updated.
|
|
|
|
## Architecture sketch
|
|
|
|
```
|
|
Haskell source
|
|
│
|
|
▼
|
|
lib/haskell/tokenizer.sx — idents, operators, layout-sensitive indentation
|
|
│
|
|
▼
|
|
lib/haskell/parser.sx — AST: modules, data decls, type sigs, fn clauses, expressions
|
|
│
|
|
▼
|
|
lib/haskell/desugar.sx — surface → core: case-of-case, do-notation, list comp, guards
|
|
│
|
|
▼
|
|
lib/haskell/transpile.sx — core → SX AST, wrapping everything in thunks for laziness
|
|
│
|
|
▼
|
|
lib/haskell/runtime.sx — force, ADT constructors, Prelude, typeclass dicts (phase 5+)
|
|
│
|
|
▼
|
|
existing CEK / VM
|
|
```
|
|
|
|
Key mappings:
|
|
- **Laziness** = every function argument is an SX thunk; `force` is WHNF reduction. SX already has `make-thunk` from the trampolining evaluator — we reuse it.
|
|
- **Pattern match** = forces the scrutinee to WHNF, then structural match on the tag
|
|
- **ADT** = `data Maybe a = Nothing | Just a` compiles to tagged lists: `(:Nothing)` and `(:Just <thunk>)`
|
|
- **Typeclass** = each class becomes a record type; each instance becomes a record value; each method becomes a projection; the elaborator inserts the dict at each call site (phase 5)
|
|
- **IO** = `IO a` is a function `World -> (a, World)` internally; in practice uses `perform`/`resume` for actual side effects
|
|
- **Layout** = offside rule; inserted virtual braces + semis during a lexer-parser feedback pass
|
|
|
|
## Roadmap
|
|
|
|
### Phase 1 — tokenizer + parser + layout rule
|
|
- [x] Tokenizer: reserved words, qualified names, operators, numbers (int, float, Rational later), chars/strings, comments (`--` and `{-` nested)
|
|
- [x] Layout algorithm: turn indentation into virtual `{`, `;`, `}` tokens per Haskell 98 §10.3
|
|
- [ ] Parser: modules, imports (stub), top-level decls, type sigs, function clauses with patterns + guards + where-clauses, expressions with operator precedence, lambdas, `let`, `if`, `case`, `do`, list comp, sections
|
|
- [ ] AST design modelled on GHC's HsSyn at a surface level
|
|
- [x] Unit tests in `lib/haskell/tests/parse.sx` (43 tokenizer tests, all green)
|
|
|
|
### Phase 2 — desugar + eager-ish eval + ADTs (untyped)
|
|
- [ ] Desugar: guards → nested `if`s; `where` → `let`; list comp → `concatMap`-based; do-notation stays for now (desugared in phase 3)
|
|
- [ ] `data` declarations register constructors in runtime
|
|
- [ ] Pattern match (tag-based, value-level): atoms, vars, wildcards, constructor patterns, `as` patterns, nested
|
|
- [ ] Evaluator (still strict internally — laziness in phase 3): `let`, `lambda`, application, `case`, literals, constructors
|
|
- [ ] 30+ eval tests in `lib/haskell/tests/eval.sx`
|
|
|
|
### Phase 3 — laziness + classic programs
|
|
- [ ] Transpile to thunk-wrapped SX: every application arg becomes `(make-thunk (lambda () <arg>))`
|
|
- [ ] `force` = SX eval-thunk-to-WHNF primitive
|
|
- [ ] Pattern match forces scrutinee before matching
|
|
- [ ] Infinite structures: `repeat x`, `iterate f x`, `[1..]`, Fibonacci stream, sieve of Eratosthenes
|
|
- [ ] `seq`, `deepseq` from Prelude
|
|
- [ ] Do-notation for a stub `IO` monad (just threading, no real side effects yet)
|
|
- [ ] Classic programs in `lib/haskell/tests/programs/`:
|
|
- [ ] `fib.hs` — infinite Fibonacci stream
|
|
- [ ] `sieve.hs` — lazy sieve of Eratosthenes
|
|
- [ ] `quicksort.hs` — naive QS
|
|
- [ ] `nqueens.hs`
|
|
- [ ] `calculator.hs` — parser combinator style expression evaluator
|
|
- [ ] `lib/haskell/conformance.sh` + runner; `scoreboard.json` + `scoreboard.md`
|
|
- [ ] Target: 5/5 classic programs passing
|
|
|
|
### Phase 4 — Hindley-Milner inference
|
|
- [ ] Algorithm W: unification + type schemes + generalisation + instantiation
|
|
- [ ] Report type errors with meaningful positions
|
|
- [ ] Reject untypeable programs that phase 3 was accepting
|
|
- [ ] Type-sig checking: user writes `f :: Int -> Int`; verify
|
|
- [ ] Let-polymorphism
|
|
- [ ] Unit tests: inference for 50+ expressions
|
|
|
|
### Phase 5 — typeclasses (dictionary passing)
|
|
- [ ] `class` / `instance` declarations
|
|
- [ ] Dictionary-passing elaborator: inserts dict args at call sites
|
|
- [ ] Standard classes: `Eq`, `Ord`, `Show`, `Num`, `Functor`, `Monad`, `Applicative`
|
|
- [ ] `deriving (Eq, Show)` for ADTs
|
|
|
|
### Phase 6 — real IO + Prelude completion
|
|
- [ ] Real `IO` monad backed by `perform`/`resume`
|
|
- [ ] `putStrLn`, `getLine`, `readFile`, `writeFile`, `print`
|
|
- [ ] Full-ish Prelude: `Maybe`, `Either`, `List` functions, `Map`-lite
|
|
- [ ] Drive scoreboard toward 150+ passing
|
|
|
|
## Progress log
|
|
|
|
_Newest first._
|
|
|
|
- **2026-04-24** — Phase 1: layout algorithm (`lib/haskell/layout.sx`, ~260 lines)
|
|
implementing Haskell 98 §10.3. Two-pass design: a pre-pass augments the raw
|
|
token stream with explicit `layout-open` / `layout-indent` markers (suppressing
|
|
`<n>` when `{n}` already applies, per note 3), then an L pass consumes the
|
|
augmented stream against a stack of implicit/explicit layout contexts and
|
|
emits `vlbrace` / `vsemi` / `vrbrace` tokens; newlines are dropped. Supports
|
|
the initial module-level implicit open (skipped when the first token is
|
|
`module` or `{`), the four layout keywords (`let`/`where`/`do`/`of`), explicit
|
|
braces disabling layout, dedent closing nested implicit blocks while also
|
|
emitting `vsemi` at the enclosing level, and the pragmatic single-line
|
|
`let … in` rule (emit `}` when `in` meets an implicit let). 15 new tests
|
|
in `lib/haskell/tests/layout.sx` cover module-start, do/let/where/case/of,
|
|
explicit braces, multi-level dedent, line continuation, and EOF close-down.
|
|
Shared test helpers moved to `lib/haskell/testlib.sx` so both test files
|
|
can share one `hk-test`. `test.sh` preloads tokenizer + layout + testlib.
|
|
58/58 green.
|
|
|
|
- **2026-04-24** — Phase 1: Haskell 98 tokenizer (`lib/haskell/tokenizer.sx`, 490 lines)
|
|
covering idents (lower/upper/qvarid/qconid), 23 reserved words, 11 reserved ops,
|
|
varsym/consym operator chains, integer/hex/octal/float literals incl. exponent
|
|
notation, char + string literals with escape sequences, nested `{- ... -}` block
|
|
comments with depth counter, `-- ... EOL` line comments (respecting the
|
|
"followed by symbol = not a comment" Haskell 98 rule), backticks, punctuation,
|
|
and explicit `newline` tokens for the upcoming layout pass. 43 structural tests
|
|
in `lib/haskell/tests/parse.sx`, a lightweight `hk-deep=?` equality helper
|
|
and a custom `lib/haskell/test.sh` runner (pipes through the OCaml epoch
|
|
protocol, falls back to the main-repo build when run from a worktree). 43/43
|
|
green.
|
|
|
|
Also peeked at `/root/rose-ash/sx-haskell/` per briefing: that directory is a
|
|
Haskell program implementing an **SX interpreter** (Types.hs, Eval.hs,
|
|
Primitives.hs, etc. — ~2800 lines of .hs) — the *opposite* direction from this
|
|
project. Nothing to fold in.
|
|
|
|
Gotchas hit: `emit!` and `peek` are SX evaluator special forms, so every local
|
|
helper uses the `hk-` prefix. `cond`/`when`/`let` clauses evaluate ONLY the
|
|
last expression; multi-expression bodies MUST be wrapped in `(do ...)`. These
|
|
two together account for all the tokenizer's early crashes.
|
|
|
|
## Blockers
|
|
|
|
- _(none yet)_
|