Files
rose-ash/plans/haskell-on-sx.md
giles 9facbb4836
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Has been cancelled
plans: tick quicksort.hs, progress log 2026-04-25
2026-04-25 18:06:58 +00:00

547 lines
34 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Haskell-on-SX: mini-Haskell with real laziness
Mini-Haskell is the research-paper-worthy demo. Laziness is native to the SX runtime (thunks are already a first-class type); algebraic data types map onto tagged lists; typeclasses map onto dictionary passing; IO maps onto `perform`/`resume`. Hindley-Milner inference is the one real piece of new work.
End-state goal: a **Haskell 98 subset** that runs the small classic programs (sieve of Eratosthenes lazy stream, fibonacci as infinite list, naive quicksort, n-queens, expression evaluator) plus a ~150-test corpus.
## Scope decisions (defaults — override)
- **Standard:** Haskell 98 subset. No GHC extensions (no `DataKinds`, no `GADTs`, no `TypeFamilies`, no `TemplateHaskell`).
- **Phase 1-3 are untyped** — we get the evaluator right first with laziness + ADTs, then add HM inference in phase 4. This is deliberate: typing is the hard bit and will take a full phase on its own.
- **Typeclasses:** dictionary passing, no overlap, no orphan instances. Added in phase 5.
- **Layout rule:** yes — phase 1 implements Haskell's indentation-sensitive parsing (painful but required).
- **Test corpus:** custom. No GHC test suite. Bundle classic programs + ~100 hand-written expression-level tests + mini Prelude tests.
## Ground rules
- **Scope:** only `lib/haskell/**` and `plans/haskell-on-sx.md`. No edits to `spec/`, `hosts/`, `shared/`, or other language dirs.
- **SX files:** `sx-tree` MCP tools only.
- **Architecture:** Haskell source → AST → desugared-core → SX AST → CEK. Thunks on the SX side provide laziness natively.
- **Commits:** one feature per commit. Keep `## Progress log` updated.
## Architecture sketch
```
Haskell source
lib/haskell/tokenizer.sx — idents, operators, layout-sensitive indentation
lib/haskell/parser.sx — AST: modules, data decls, type sigs, fn clauses, expressions
lib/haskell/desugar.sx — surface → core: case-of-case, do-notation, list comp, guards
lib/haskell/transpile.sx — core → SX AST, wrapping everything in thunks for laziness
lib/haskell/runtime.sx — force, ADT constructors, Prelude, typeclass dicts (phase 5+)
existing CEK / VM
```
Key mappings:
- **Laziness** = every function argument is an SX thunk; `force` is WHNF reduction. SX already has `make-thunk` from the trampolining evaluator — we reuse it.
- **Pattern match** = forces the scrutinee to WHNF, then structural match on the tag
- **ADT** = `data Maybe a = Nothing | Just a` compiles to tagged lists: `(:Nothing)` and `(:Just <thunk>)`
- **Typeclass** = each class becomes a record type; each instance becomes a record value; each method becomes a projection; the elaborator inserts the dict at each call site (phase 5)
- **IO** = `IO a` is a function `World -> (a, World)` internally; in practice uses `perform`/`resume` for actual side effects
- **Layout** = offside rule; inserted virtual braces + semis during a lexer-parser feedback pass
## Roadmap
### Phase 1 — tokenizer + parser + layout rule
- [x] Tokenizer: reserved words, qualified names, operators, numbers (int, float, Rational later), chars/strings, comments (`--` and `{-` nested)
- [x] Layout algorithm: turn indentation into virtual `{`, `;`, `}` tokens per Haskell 98 §10.3
- Parser (split into sub-items — implement one per iteration):
- [x] Expressions: atoms, parens, tuples, lists, ranges, application, infix with full Haskell-98 precedence table, unary `-`, backtick operators, lambdas, `if`, `let`
- [x] `case … of` and `do`-notation expressions (plus minimal patterns needed for arms/binds: var, wildcard, literal, 0-arity and applied constructor, tuple, list)
- [x] Patterns — full: `as` patterns, nested, negative literal, `~` lazy, infix constructor (`:` / consym), extend lambdas/let with non-var patterns
- [x] Top-level decls: function clauses (simple — no guards/where yet), pattern bindings, multi-name type signatures, `data` with type vars and recursive constructors, `type` synonyms, `newtype`, fixity (`infix`/`infixl`/`infixr` with optional precedence, comma-separated ops, backtick names). Types: vars / constructors / application / `->` (right-assoc) / tuples / lists. `hk-parse-top` entry.
- [x] `where` clauses + guards (on fun-clauses, case alts, and let/do-let bindings — with the let funclause shorthand `let f x = …` now supported)
- [x] Module header + imports — `module NAME [exports] where …`, qualified/as/hiding/explicit imports, operator exports, `module Foo` exports, dotted names, headerless-with-imports
- [x] List comprehensions + operator sections — `(op)` / `(op e)` / `(e op)` (excluding `-` from right sections), `[e | q1, q2, …]` with `q-gen` / `q-guard` / `q-let` qualifiers
- [x] AST design modelled on GHC's HsSyn at a surface level — keyword-tagged lists cover modules/imports/decls/types/patterns/expressions; see parser.sx docstrings for the full node catalogue
- [x] Unit tests in `lib/haskell/tests/parse.sx` (43 tokenizer tests, all green)
### Phase 2 — desugar + eager-ish eval + ADTs (untyped)
- [x] Desugar: guards → nested `if`s; `where``let`; list comp → `concatMap`-based; do-notation stays for now (desugared in phase 3)
- [x] `data` declarations register constructors in runtime
- [x] Pattern match (tag-based, value-level): atoms, vars, wildcards, constructor patterns, `as` patterns, nested
- [x] Evaluator (still strict internally — laziness in phase 3): `let`, `lambda`, application, `case`, literals, constructors
- [x] 30+ eval tests in `lib/haskell/tests/eval.sx`
### Phase 3 — laziness + classic programs
- [x] Transpile to thunk-wrapped SX: every application arg becomes `(make-thunk (lambda () <arg>))`
- [x] `force` = SX eval-thunk-to-WHNF primitive
- [x] Pattern match forces scrutinee before matching
- [x] Infinite structures: `repeat x`, `iterate f x`, `[1..]`, Fibonacci stream (sieve deferred — needs lazy `++` and is exercised under `Classic programs`)
- [x] `seq`, `deepseq` from Prelude
- [x] Do-notation for a stub `IO` monad (just threading, no real side effects yet)
- [ ] Classic programs in `lib/haskell/tests/programs/`:
- [x] `fib.hs` — infinite Fibonacci stream
- [x] `sieve.hs` — lazy sieve of Eratosthenes
- [x] `quicksort.hs` — naive QS
- [ ] `nqueens.hs`
- [ ] `calculator.hs` — parser combinator style expression evaluator
- [ ] `lib/haskell/conformance.sh` + runner; `scoreboard.json` + `scoreboard.md`
- [ ] Target: 5/5 classic programs passing
### Phase 4 — Hindley-Milner inference
- [ ] Algorithm W: unification + type schemes + generalisation + instantiation
- [ ] Report type errors with meaningful positions
- [ ] Reject untypeable programs that phase 3 was accepting
- [ ] Type-sig checking: user writes `f :: Int -> Int`; verify
- [ ] Let-polymorphism
- [ ] Unit tests: inference for 50+ expressions
### Phase 5 — typeclasses (dictionary passing)
- [ ] `class` / `instance` declarations
- [ ] Dictionary-passing elaborator: inserts dict args at call sites
- [ ] Standard classes: `Eq`, `Ord`, `Show`, `Num`, `Functor`, `Monad`, `Applicative`
- [ ] `deriving (Eq, Show)` for ADTs
### Phase 6 — real IO + Prelude completion
- [ ] Real `IO` monad backed by `perform`/`resume`
- [ ] `putStrLn`, `getLine`, `readFile`, `writeFile`, `print`
- [ ] Full-ish Prelude: `Maybe`, `Either`, `List` functions, `Map`-lite
- [ ] Drive scoreboard toward 150+ passing
## Progress log
_Newest first._
- **2026-04-25** — Classic program `quicksort.hs`: naive functional quicksort.
`qsort (x:xs) = qsort smaller ++ [x] ++ qsort larger where smaller = filter (< x) xs; larger = filter (>= x) xs`.
No new runtime additions needed — right sections, `filter`, `++` all worked out of the box.
5 tests (general sort, empty, singleton, already-sorted, reverse-sorted). 395/395 green.
- **2026-04-25** — Classic program `sieve.hs`: lazy sieve of Eratosthenes.
Added `mod`, `div`, `rem`, `quot` to `hk-binop` (and as first-class
values in `hk-init-env`), enabling backtick operator use. The filter-based
sieve `sieve (p:xs) = p : sieve (filter (\x -> x \`mod\` p /= 0) xs)` works
with the existing lazy cons + Prelude `filter`. 2 new tests in
`lib/haskell/tests/program-sieve.sx` (first 10 primes, 20th prime = 71).
390/390 green.
- **2026-04-25** — First classic program: `fib.hs`. Canonical Haskell
source lives at `lib/haskell/tests/programs/fib.hs` (the
two-cons-cell self-referential fibs definition plus a hand-rolled
`zipPlus`). The runner at `lib/haskell/tests/program-fib.sx`
mirrors the source as an SX string (the OCaml server's
`read-file` lives in the page-helpers env, not the default load
env, so direct file reads from inside `eval` aren't available).
Tests: `take 15 myFibs == [0,1,1,2,3,5,8,13,21,34,55,89,144,233,377]`,
plus a spot-check that the user-defined `zipPlus` is also
reachable. Found and fixed an ordering bug in `hk-bind-decls!`:
pass 3 (0-arity body evaluation) iterated `(keys groups)` whose
order is implementation-defined, so a top-down program where
`result = take 15 myFibs` came after `myFibs = …` could see
`myFibs` still bound to its `nil` placeholder. Now group names
are tracked in source order via a parallel list and pass 3 walks
that. 388/388 green.
- **2026-04-25** — Phase 3 do-notation + stub IO monad. Added a
`hk-desugar-do` pass that follows Haskell 98 §3.14 verbatim:
`do { e } = e`, `do { e ; ss } = e >> do { ss }`,
`do { p <- e ; ss } = e >>= \p -> do { ss }`, and
`do { let ds ; ss } = let ds in do { ss }`. The desugarer's
`:do` branch now invokes this pass directly so the surface
AST forms (`:do-expr`, `:do-bind`, `:do-let`) never reach the
evaluator. IO is represented as a tagged value
`("IO" payload)` — `return` (lazy builtin) wraps; `>>=` (lazy
builtin) forces the action, unwraps, and calls the bound
function on the payload; `>>` (lazy builtin) forces the
action and returns the second one. All three are non-strict
in their action arguments so deeply nested do-blocks don't
walk the whole chain at construction time. 14 new tests in
`lib/haskell/tests/do-io.sx` cover single-stmt do, single
and multi-bind, `>>` sequencing (last action wins), do-let
(single, multi, interleaved with bind), bind-to-`Just`,
bind-to-tuple, do inside a top-level fun, nested do, and
using `(>>=)`/`(>>)` directly as functions. 382/382 green.
- **2026-04-25** — Phase 3 `seq` + `deepseq`. Built-ins were strict
in all args by default (every collected thunk forced before
invoking the underlying SX fn) — that defeats `seq`'s purpose,
which is strict in its first argument and lazy in its second.
Added a tiny `lazy` flag on the builtin record (set by a new
`hk-mk-lazy-builtin` constructor) and routed `hk-apply-builtin`
to skip the auto-force when the flag is true. `seq a b` calls
`hk-force a` then returns `b` unchanged so its laziness is
preserved; `deepseq` does the same with `hk-deep-force`. 9 new
tests in `lib/haskell/tests/seq.sx` cover primitive, computed,
and let-bound first args, deepseq on a list / `Just` /
tuple, seq inside arithmetic, seq via a fun-clause, and
`[seq 1 10, seq 2 20]` to confirm seq composes inside list
literals. The lazy-when-unused negative case is also tested:
`let x = error "never" in 42 == 42`. 368/368 green.
- **2026-04-24** — Phase 3 infinite structures + Prelude. Two
evaluator changes turn the lazy primitives into a working
language:
1. Op-form `:` is now non-strict in both args — `hk-eval-op`
special-cases it before the eager force-and-binop path, so a
cons-cell holds two thunks. This is what makes `repeat x =
x : repeat x`, `iterate f x = x : iterate f (f x)`, and the
classic `fibs = 0 : 1 : zipWith plus fibs (tail fibs)`
terminate when only a finite prefix is consumed.
2. Operators are now first-class values via a small
`hk-make-binop-builtin` helper, so `(+)`, `(*)`, `(==)` etc.
can be passed to `zipWith` and `map`.
Added range support across parser + evaluator: `[from..to]` and
`[from,next..to]` evaluate eagerly via `hk-build-range` (handles
step direction); `[from..]` parses to a new `:range-from` node
that the evaluator desugars to `iterate (+ 1) from`. New
`hk-load-into!` runs the regular pipeline (parse → desugar →
register data → bind decls) on a source string, and `hk-init-env`
preloads `hk-prelude-src` with the Phase-3 Prelude:
`head`, `tail`, `fst`, `snd`, `take`, `drop`, `repeat`, `iterate`,
`length`, `map`, `filter`, `zipWith`, plus `fibs` and `plus`.
25 new tests in `lib/haskell/tests/infinite.sx`, including
`take 10 fibs == [0,1,1,2,3,5,8,13,21,34]`,
`head (drop 99 [1..])`, `iterate (\x -> x * 2) 1` powers of two,
user-defined `ones = 1 : ones`, `naturalsFrom`, range edge cases,
composed `map`/`filter`, and a custom `mySum`. 359/359 green.
Sieve of Eratosthenes is deferred — it needs lazy `++` plus a
`mod` primitive — and lives under `Classic programs` anyway.
- **2026-04-24** — Phase 3 laziness foundation. Added a thunk type to
`lib/haskell/eval.sx` (`hk-mk-thunk` / `hk-is-thunk?`) backed by a
one-shot memoizing `hk-force` that evaluates the deferred AST, then
flips a `forced` flag and caches the value on the thunk dict; the
shared `hk-deep-force` walks the result tree at the test/output
boundary. Three single-line wiring changes in the evaluator make
every application argument lazy: `:app` now wraps its argument in
`hk-mk-thunk` rather than evaluating it. To preserve correctness
where values must be inspected, `hk-apply`, `hk-eval-op`,
`hk-eval-if`, `hk-eval-case`, and `hk-eval` for `:neg` now force
their operand. `hk-apply-builtin` forces every collected arg
before invoking the underlying SX fn so built-ins (`error`, `not`,
`id`) stay strict. The pattern matcher in `match.sx` now forces
the scrutinee just-in-time only for patterns that need to inspect
shape — `p-wild`, `p-var`, `p-as`, and `p-lazy` are no-force
paths, so the value flows through as a thunk and binding
preserves laziness. `hk-match-list-pat` forces at every cons-spine
step. 6 new lazy-specific tests in `lib/haskell/tests/eval.sx`
verify that `(\x y -> x) 1 (error …)` and `(\x y -> y) (error …) 99`
return without diverging, that `case Just (error …) of Just _ -> 7`
short-circuits, that `const` drops its second arg, that
`myHead (1 : error … : [])` returns 1 without touching the tail,
and that `Just (error …)` survives a wildcard-arm `case`. 333/333
green, all prior eval tests preserved by deep-forcing the result
in `hk-eval-expr-source` and `hk-prog-val`.
- **2026-04-24** — Phase 2 evaluator (`lib/haskell/eval.sx`) — ties
the whole pipeline together. Strict semantics throughout (laziness
is Phase 3). Function values are tagged dicts: `closure`,
`multi`(fun), `con-partial`, `builtin`. `hk-apply` unifies dispatch
across all four; closures and multifuns curry one argument at a
time, multifuns trying each clause's pat-list in order once arity
is reached. Top-level `hk-bind-decls!` is three-pass —
collect groups + pre-seed names → install multifuns (so closures
observe later names) → eval 0-arity bodies and pat-binds — making
forward and mutually recursive references work. `hk-eval-let` does
the same trick with a mutable child env. Built-ins:
`error`/`not`/`id`, plus `otherwise = True`. Operators wired:
arithmetic, comparison (returning Bool conses), `&&`, `||`, `:`,
`++`. Sections evaluate the captured operand once and return a
closure synthesized via the existing AST. `hk-eval-program`
registers data decls then binds, returning the env; `hk-run`
fetches `main` if present. Also extended `runtime.sx` to
pre-register the standard Prelude conses (`Maybe`, `Either`,
`Ordering`) so expression-level eval doesn't need a leading
`data` decl. 48 new tests in `lib/haskell/tests/eval.sx` cover
literals, arithmetic precedence, comparison/Bool, `if`, `let`
(incl. recursive factorial), lambdas (incl. constructor pattern
args), constructors, `case` (Just/Nothing/literal/tuple/wildcard),
list literals + cons + `++`, tuples, sections, multi-clause
top-level (factorial, list length via cons pattern, Maybe handler
with default), user-defined `data` with case-style matching, a
binary-tree height program, currying, higher-order (`twice`),
short-circuit `error` via `if`, and the three built-ins. 329/329
green. Phase 2 is now complete; Phase 3 (laziness) is next.
- **2026-04-24** — Phase 2: value-level pattern matcher
(`lib/haskell/match.sx`). Core entry `hk-match pat val env` returns
an extended env dict on success or `nil` on failure (uses `assoc`
rather than `dict-set!` so failed branches never pollute the
caller's env). Constructor values are tagged lists with the
constructor name as the first element; tuples use the tag `"Tuple"`,
lists are chained `(":" h t)` cons cells terminated by `("[]")`.
Value builders `hk-mk-con` / `hk-mk-tuple` / `hk-mk-nil` /
`hk-mk-cons` / `hk-mk-list` keep tests readable. The matcher
handles every pattern node the parser emits:
- `:p-wild` (always matches), `:p-var` (binds), `:p-int` /
`:p-float` / `:p-string` / `:p-char` (literal equality)
- `:p-as` (sub-match then bind whole), `:p-lazy` (eager for now;
laziness wired in phase 3)
- `:p-con` with arity check + recursive arg matching, including
deeply nested patterns and infix `:` cons (uses the same
code path as named constructors)
- `:p-tuple` against `"Tuple"` values, `:p-list` against an
exact-length cons spine.
Helper `hk-parse-pat-source` lifts a real Haskell pattern out of
`case _ of <pat> -> 0`, letting tests drive against parser output.
31 new tests in `lib/haskell/tests/match.sx` cover atomic
patterns, success/failure for each con/tuple/list shape, nested
`Just (Just x)`, cons-vs-empty, `as` over con / wildcard /
failing-sub, `~` lazy, plus four parser-driven cases (`Just x`,
`x : xs`, `(a, b)`, `n@(Just x)`). 281/281 green.
- **2026-04-24** — Phase 2: runtime constructor registry
(`lib/haskell/runtime.sx`). A mutable dict `hk-constructors` keyed
by constructor name, each entry carrying arity and owning type.
`hk-register-data!` walks a `:data` AST and registers every
`:con-def` with its arity (= number of field types) and the type
name; `hk-register-newtype!` does the one-constructor variant;
`hk-register-decls!` / `hk-register-program!` filter a decls list
(or a `:program` / `:module` AST) and call the appropriate
registrar. `hk-load-source!` composes it with `hk-core`
(tokenize → layout → parse → desugar → register). Pre-registers
five built-ins tied to Haskell syntactic forms: `True` / `False`
(Bool), `[]` and `:` (List), `()` (Unit) — everything else comes
from user declarations or the eventual Prelude. Query helpers:
`hk-is-con?`, `hk-con-arity`, `hk-con-type`, `hk-con-names`. 24
new tests in `lib/haskell/tests/runtime.sx` cover each built-in
(arity + type), unknown-name probes, registration of `MyBool` /
`Maybe` / `Either` / recursive `Tree` / `newtype Age`, multi-data
programs, a module-header body, ignoring non-data decls, and
last-wins re-registration. 250/250 green.
- **2026-04-24** — Phase 2 kicks off with `lib/haskell/desugar.sx` — a
tree-walking rewriter that eliminates the three surface-only forms
produced by the parser, leaving a smaller core AST for the evaluator:
- `:where BODY DECLS` → `:let DECLS BODY`
- `:guarded ((:guard C1 E1) (:guard C2 E2) …)` → right-folded
`(:if C1 E1 (:if C2 E2 … (:app (:var "error") (:string "…"))))`
- `:list-comp E QUALS` → Haskell 98 §3.11 translation:
empty quals → `(:list (E))`, `:q-guard` → `(:if … (:list (E)) (:list ()))`,
`:q-gen PAT SRC` → `(concatMap (\PAT -> …) SRC)`, `:q-let BINDS` →
`(:let BINDS …)`. Nested generators compile to nested concatMap.
Every other expression, decl, pattern, and type node is recursed
into and passed through unchanged. Public entries `hk-desugar`,
`hk-core` (tokenize → layout → parse → desugar on a module), and
`hk-core-expr` (the same for an expression). 15 new tests in
`lib/haskell/tests/desugar.sx` cover two- and three-way guards,
case-alt guards, single/multi-binding `where`, guards + `where`
combined, the four list-comprehension cases (single-gen, gen +
filter, gen + let, nested gens), and pass-through for literals,
lambdas, simple fun-clauses, `data` decls, and a module header
wrapping a guarded function. 226/226 green.
- **2026-04-24** — Phase 1 parser is now complete. This iteration adds
operator sections and list comprehensions, the two remaining
aexp-level forms, plus ticks the “AST design” item (the keyword-
tagged list shape has accumulated a full HsSyn-level surface).
Changes:
- `hk-parse-infix` now bails on `op )` without consuming the op, so
the paren parser can claim it as a left section.
- `hk-parse-parens` rewritten to recognise five new forms:
`()` (unit), `(op)` → `(:var OP)`, `(op e)` → `(:sect-right OP E)`
(excluded for `-` so that `(- 5)` stays `(:neg 5)`), `(e op)` →
`(:sect-left OP E)`, plus regular parens and tuples. Works for
varsym, consym, reservedop `:`, and backtick-quoted varids.
- `hk-section-op-info` inspects the current token and returns a
`{:name :len}` dict, so the same logic handles 1-token ops and
3-token backtick ops uniformly.
- `hk-parse-list-lit` now recognises a `|` after the first element
and dispatches to `hk-parse-qual` per qualifier (comma-separated),
producing `(:list-comp EXPR QUALS)`. Qualifiers are:
`(:q-gen PAT EXPR)` when a paren-balanced lookahead
(`hk-comp-qual-is-gen?`) finds `<-` before the next `,`/`]`,
`(:q-let BINDS)` for `let …`, and `(:q-guard EXPR)` otherwise.
- `hk-parse-comp-let` accepts `]` or `,` as an implicit block close
(single-line comprehensions never see layout's vrbrace before the
qualifier terminator arrives); explicit `{ }` still closes
strictly.
22 new tests in `lib/haskell/tests/parser-sect-comp.sx` cover
op-references (inc. `(-)`, `(:)`, backtick), right sections (inc.
backtick), left sections, the `(- 5)` → `:neg` corner, plain parens
and tuples, six comprehension shapes (simple, filter, let,
nested-generators, constructor pattern bind, tuple pattern bind,
and a three-qualifier mix). 211/211 green.
- **2026-04-24** — Phase 1: module header + imports. Added
`hk-parse-module-header`, `hk-parse-import`, plus shared helpers for
import/export entity lists (`hk-parse-ent`, `hk-parse-ent-member`,
`hk-parse-ent-list`). New AST:
- `(:module NAME EXPORTS IMPORTS DECLS)` — NAME `nil` means no header,
EXPORTS `nil` means no export list (distinct from empty `()`)
- `(:import QUALIFIED NAME AS SPEC)` — QUALIFIED bool, AS alias or nil,
SPEC nil / `(:spec-items ENTS)` / `(:spec-hiding ENTS)`
- Entity refs: `:ent-var`, `:ent-all` (`Tycon(..)`), `:ent-with`
(`Tycon(m1, m2, …)`), `:ent-module` (exports only).
`hk-parse-program` now dispatches on the leading token: `module`
keyword → full header-plus-body parse (consuming the `where` layout
brace around the module body); otherwise collect any leading
`import` decls and then remaining decls with the existing logic.
The outer shell is `(:module …)` as soon as any header or import is
present, and stays as `(:program DECLS)` otherwise — preserving every
previous test expectation untouched. Handles operator exports `((+:))`,
dotted module names (`Data.Map`), and the Haskell-98 context-sensitive
keywords `qualified`/`as`/`hiding` (all lexed as ordinary varids and
matched only in import position). 16 new tests in
`lib/haskell/tests/parser-module.sx` covering simple/exports/empty
headers, dotted names, operator exports, `module Foo` exports,
qualified/aliased/items/hiding imports, and a headerless-with-imports
file. 189/189 green.
- **2026-04-24** — Phase 1: guards + where clauses. Factored a single
`hk-parse-rhs sep` that all body-producing sites now share: it reads
a plain `sep expr` body or a chain of `| cond sep expr` guards, then
— regardless of which form — looks for an optional `where` block and
wraps accordingly. AST additions:
- `:guarded GUARDS` where each GUARD is `:guard COND EXPR`
- `:where BODY DECLS` where BODY is a plain expr or a `:guarded`
Both can nest (guards inside where). `hk-parse-alt` now routes through
`hk-parse-rhs "->"`, `hk-parse-fun-clause` and `hk-parse-bind` through
`hk-parse-rhs "="`. `hk-parse-where-decls` reuses `hk-parse-decl` so
where-blocks accept any decl form (signatures, fixity, nested funs).
As a side effect, `hk-parse-bind` now also picks up the Haskell-native
`let f x = …` funclause shorthand: a varid followed by one or more
apats produces `(:fun-clause NAME APATS BODY)` instead of a
`(:bind (:p-var …) …)` — keeping the simple `let x = e` shape
unchanged for existing tests. 11 new tests in
`lib/haskell/tests/parser-guards-where.sx` cover two- and three-way
guards, mixed guarded + equality clauses, single- and multi-binding
where blocks, guards plus where, case-alt guards, case-alt where,
let with funclause shorthand, let with guards, and a where containing
a type signature alongside a fun-clause. 173/173 green.
- **2026-04-24** — Phase 1: top-level decls. Refactored `hk-parse-expr` into a
`hk-parser tokens mode` with `:expr` / `:module` dispatch so the big lexical
state is shared (peek/advance/pat/expr helpers all reachable); added public
wrappers `hk-parse-expr`, `hk-parse-module`, and source-level entry
`hk-parse-top`. New type parser (`hk-parse-type` / `hk-parse-btype` /
`hk-parse-atype`): type variables (`:t-var`), type constructors (`:t-con`),
type application (`:t-app`, left-assoc), right-associative function arrow
(`:t-fun`), unit/tuples (`:t-tuple`), and lists (`:t-list`). New decl parser
(`hk-parse-decl` / `hk-parse-program`) producing a `(:program DECLS)` shell:
- `:type-sig NAMES TYPE` — comma-separated multi-name support
- `:fun-clause NAME APATS BODY` — patterns for args, body via existing expr
- `:pat-bind PAT BODY` — top-level pattern bindings like `(a, b) = pair`
- `:data NAME TVARS CONS` with `:con-def CNAME FIELDS` for nullary and
multi-arg constructors, including recursive references
- `:type-syn NAME TVARS TYPE`, `:newtype NAME TVARS CNAME FIELD`
- `:fixity ASSOC PREC OPS` — assoc one of `"l"`/`"r"`/`"n"`, default prec 9,
comma-separated operator names, including backtick-quoted varids.
Sig vs fun-clause disambiguated by a paren-balanced top-level scan for
`::` before the next `;`/`}` (`hk-has-top-dcolon?`). 24 new tests in
`lib/haskell/tests/parser-decls.sx` cover all decl forms, signatures with
application / tuples / lists / right-assoc arrows, nullary and recursive
data types, multi-clause functions, and a mixed program with data + type-
synonym + signature + two function clauses. Not yet: guards, where
clauses, module header, imports, deriving, contexts, GADTs. 162/162 green.
- **2026-04-24** — Phase 1: full patterns. Added `as` patterns
(`name@apat` → `(:p-as NAME PAT)`), lazy patterns (`~apat` →
`(:p-lazy PAT)`), negative literal patterns (`-N` / `-F` resolving
eagerly in the parser so downstream passes see a plain `(:p-int -1)`),
and infix constructor patterns via a right-associative single-band
layer on top of `hk-parse-pat-lhs` for any `consym` or reservedop `:`
(so `x : xs` parses as `(:p-con ":" [x, xs])`, `a :+: b` likewise).
Extended `hk-apat-start?` with `-` and `~` so the pattern-argument
loops in lambdas and constructor applications pick these up.
Lambdas now parse apat parameters instead of bare varids — so the
`:lambda` AST is `(:lambda APATS BODY)` with apats as pattern nodes.
`hk-parse-bind` became a plain `pat = expr` form, so `:bind` now has
a pattern LHS throughout (simple `x = 1` → `(:bind (:p-var "x") …)`);
this picks up `let (x, y) = pair in …` and `let Just x = m in x`
automatically, and flows through `do`-notation lets. Eight existing
tests updated to the pattern-flavoured AST. Also fixed a pragmatic
layout issue that surfaced in multi-line `let`s: when a layout-indent
would emit a spurious `;` just before an `in` token (because the
let block had already been closed by dedent), `hk-peek-next-reserved`
now lets the layout pass skip that indent and leave closing to the
existing `in` handler. 18 new tests in
`lib/haskell/tests/parser-patterns.sx` cover every pattern variant,
lambda with mixed apats, let pattern-bindings (tuple / constructor /
cons), and do-bind with a tuple pattern. 138/138 green.
- **2026-04-24** — Phase 1: `case … of` and `do`-notation parsers. Added `hk-parse-case`
/ `hk-parse-alt`, `hk-parse-do` / `hk-parse-do-stmt` / `hk-parse-do-let`, plus the
minimal pattern language needed to make arms and binds meaningful:
`hk-parse-apat` (var, wildcard `_`, int/float/string/char literal, 0-arity
conid/qconid, paren+tuple, list) and `hk-parse-pat` (conid applied to
apats greedily). AST nodes: `:case SCRUT ALTS`, `:alt PAT BODY`, `:do STMTS`
with stmts `:do-expr E` / `:do-bind PAT E` / `:do-let BINDS`, and pattern
tags `:p-wild` / `:p-int` / `:p-float` / `:p-string` / `:p-char` / `:p-var`
/ `:p-con NAME ARGS` / `:p-tuple` / `:p-list`. `do`-stmts disambiguate
`pat <- e` vs bare expression with a forward paren/bracket/brace-balanced
scan for `<-` before the next `;`/`}` — no backtracking, no AST rewrite.
`case` and `do` accept both implicit (`vlbrace`/`vsemi`/`vrbrace`) and
explicit braces. Added to `hk-parse-lexp` so they participate fully in
operator-precedence expressions. 19 new tests in
`lib/haskell/tests/parser-case-do.sx` cover every pattern variant,
explicit-brace `case`, expression scrutinees, do with bind/let/expr,
multi-binding `let` in `do`, constructor patterns in binds, and
`case`/`do` nested inside `let` and lambda. The full pattern item (as
patterns, negative literals, `~` lazy, lambda/let pattern extension)
remains a separate sub-item. 119/119 green.
- **2026-04-24** — Phase 1: expression parser (`lib/haskell/parser.sx`, ~380 lines).
Pratt-style precedence climbing against a Haskell-98-default op table (24
operators across precedence 09, left/right/non assoc, default infixl 9 for
anything unlisted). Supports literals (int/float/string/char), varid/conid
(qualified variants folded into `:var` / `:con`), parens / unit / tuples,
list literals, ranges `[a..b]` and `[a,b..c]`, left-associative application,
unary `-`, backtick operators (`x \`mod\` 3`), lambdas, `if-then-else`, and
`let … in` consuming both virtual and explicit braces. AST uses keyword
tags (`:var`, `:op`, `:lambda`, `:let`, `:bind`, `:tuple`, `:range`,
`:range-step`, `:app`, `:neg`, `:if`, `:list`, `:int`, `:float`, `:string`,
`:char`, `:con`). The parser skips a leading `vlbrace` / `lbrace` so it can
be called on full post-layout output, and uses a `raise`-based error channel
with location-lite messages. 42 new tests in `lib/haskell/tests/parser-expr.sx`
cover literals, identifiers, parens/tuple/unit, list + range, app associativity,
operator precedence (mul over add, cons right-assoc, function-composition
right-assoc, `$` lowest), backtick ops, unary `-`, lambda multi-param,
`if` with infix condition, single- and multi-binding `let` (both implicit
and explicit braces), plus a few mixed nestings. 100/100 green.
- **2026-04-24** — Phase 1: layout algorithm (`lib/haskell/layout.sx`, ~260 lines)
implementing Haskell 98 §10.3. Two-pass design: a pre-pass augments the raw
token stream with explicit `layout-open` / `layout-indent` markers (suppressing
`<n>` when `{n}` already applies, per note 3), then an L pass consumes the
augmented stream against a stack of implicit/explicit layout contexts and
emits `vlbrace` / `vsemi` / `vrbrace` tokens; newlines are dropped. Supports
the initial module-level implicit open (skipped when the first token is
`module` or `{`), the four layout keywords (`let`/`where`/`do`/`of`), explicit
braces disabling layout, dedent closing nested implicit blocks while also
emitting `vsemi` at the enclosing level, and the pragmatic single-line
`let … in` rule (emit `}` when `in` meets an implicit let). 15 new tests
in `lib/haskell/tests/layout.sx` cover module-start, do/let/where/case/of,
explicit braces, multi-level dedent, line continuation, and EOF close-down.
Shared test helpers moved to `lib/haskell/testlib.sx` so both test files
can share one `hk-test`. `test.sh` preloads tokenizer + layout + testlib.
58/58 green.
- **2026-04-24** — Phase 1: Haskell 98 tokenizer (`lib/haskell/tokenizer.sx`, 490 lines)
covering idents (lower/upper/qvarid/qconid), 23 reserved words, 11 reserved ops,
varsym/consym operator chains, integer/hex/octal/float literals incl. exponent
notation, char + string literals with escape sequences, nested `{- ... -}` block
comments with depth counter, `-- ... EOL` line comments (respecting the
"followed by symbol = not a comment" Haskell 98 rule), backticks, punctuation,
and explicit `newline` tokens for the upcoming layout pass. 43 structural tests
in `lib/haskell/tests/parse.sx`, a lightweight `hk-deep=?` equality helper
and a custom `lib/haskell/test.sh` runner (pipes through the OCaml epoch
protocol, falls back to the main-repo build when run from a worktree). 43/43
green.
Also peeked at `/root/rose-ash/sx-haskell/` per briefing: that directory is a
Haskell program implementing an **SX interpreter** (Types.hs, Eval.hs,
Primitives.hs, etc. — ~2800 lines of .hs) — the *opposite* direction from this
project. Nothing to fold in.
Gotchas hit: `emit!` and `peek` are SX evaluator special forms, so every local
helper uses the `hk-` prefix. `cond`/`when`/`let` clauses evaluate ONLY the
last expression; multi-expression bodies MUST be wrapped in `(do ...)`. These
two together account for all the tokenizer's early crashes.
## Blockers
- _(none yet)_