Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Has been cancelled
547 lines
34 KiB
Markdown
547 lines
34 KiB
Markdown
# Haskell-on-SX: mini-Haskell with real laziness
|
||
|
||
Mini-Haskell is the research-paper-worthy demo. Laziness is native to the SX runtime (thunks are already a first-class type); algebraic data types map onto tagged lists; typeclasses map onto dictionary passing; IO maps onto `perform`/`resume`. Hindley-Milner inference is the one real piece of new work.
|
||
|
||
End-state goal: a **Haskell 98 subset** that runs the small classic programs (sieve of Eratosthenes lazy stream, fibonacci as infinite list, naive quicksort, n-queens, expression evaluator) plus a ~150-test corpus.
|
||
|
||
## Scope decisions (defaults — override)
|
||
|
||
- **Standard:** Haskell 98 subset. No GHC extensions (no `DataKinds`, no `GADTs`, no `TypeFamilies`, no `TemplateHaskell`).
|
||
- **Phase 1-3 are untyped** — we get the evaluator right first with laziness + ADTs, then add HM inference in phase 4. This is deliberate: typing is the hard bit and will take a full phase on its own.
|
||
- **Typeclasses:** dictionary passing, no overlap, no orphan instances. Added in phase 5.
|
||
- **Layout rule:** yes — phase 1 implements Haskell's indentation-sensitive parsing (painful but required).
|
||
- **Test corpus:** custom. No GHC test suite. Bundle classic programs + ~100 hand-written expression-level tests + mini Prelude tests.
|
||
|
||
## Ground rules
|
||
|
||
- **Scope:** only `lib/haskell/**` and `plans/haskell-on-sx.md`. No edits to `spec/`, `hosts/`, `shared/`, or other language dirs.
|
||
- **SX files:** `sx-tree` MCP tools only.
|
||
- **Architecture:** Haskell source → AST → desugared-core → SX AST → CEK. Thunks on the SX side provide laziness natively.
|
||
- **Commits:** one feature per commit. Keep `## Progress log` updated.
|
||
|
||
## Architecture sketch
|
||
|
||
```
|
||
Haskell source
|
||
│
|
||
▼
|
||
lib/haskell/tokenizer.sx — idents, operators, layout-sensitive indentation
|
||
│
|
||
▼
|
||
lib/haskell/parser.sx — AST: modules, data decls, type sigs, fn clauses, expressions
|
||
│
|
||
▼
|
||
lib/haskell/desugar.sx — surface → core: case-of-case, do-notation, list comp, guards
|
||
│
|
||
▼
|
||
lib/haskell/transpile.sx — core → SX AST, wrapping everything in thunks for laziness
|
||
│
|
||
▼
|
||
lib/haskell/runtime.sx — force, ADT constructors, Prelude, typeclass dicts (phase 5+)
|
||
│
|
||
▼
|
||
existing CEK / VM
|
||
```
|
||
|
||
Key mappings:
|
||
- **Laziness** = every function argument is an SX thunk; `force` is WHNF reduction. SX already has `make-thunk` from the trampolining evaluator — we reuse it.
|
||
- **Pattern match** = forces the scrutinee to WHNF, then structural match on the tag
|
||
- **ADT** = `data Maybe a = Nothing | Just a` compiles to tagged lists: `(:Nothing)` and `(:Just <thunk>)`
|
||
- **Typeclass** = each class becomes a record type; each instance becomes a record value; each method becomes a projection; the elaborator inserts the dict at each call site (phase 5)
|
||
- **IO** = `IO a` is a function `World -> (a, World)` internally; in practice uses `perform`/`resume` for actual side effects
|
||
- **Layout** = offside rule; inserted virtual braces + semis during a lexer-parser feedback pass
|
||
|
||
## Roadmap
|
||
|
||
### Phase 1 — tokenizer + parser + layout rule
|
||
- [x] Tokenizer: reserved words, qualified names, operators, numbers (int, float, Rational later), chars/strings, comments (`--` and `{-` nested)
|
||
- [x] Layout algorithm: turn indentation into virtual `{`, `;`, `}` tokens per Haskell 98 §10.3
|
||
- Parser (split into sub-items — implement one per iteration):
|
||
- [x] Expressions: atoms, parens, tuples, lists, ranges, application, infix with full Haskell-98 precedence table, unary `-`, backtick operators, lambdas, `if`, `let`
|
||
- [x] `case … of` and `do`-notation expressions (plus minimal patterns needed for arms/binds: var, wildcard, literal, 0-arity and applied constructor, tuple, list)
|
||
- [x] Patterns — full: `as` patterns, nested, negative literal, `~` lazy, infix constructor (`:` / consym), extend lambdas/let with non-var patterns
|
||
- [x] Top-level decls: function clauses (simple — no guards/where yet), pattern bindings, multi-name type signatures, `data` with type vars and recursive constructors, `type` synonyms, `newtype`, fixity (`infix`/`infixl`/`infixr` with optional precedence, comma-separated ops, backtick names). Types: vars / constructors / application / `->` (right-assoc) / tuples / lists. `hk-parse-top` entry.
|
||
- [x] `where` clauses + guards (on fun-clauses, case alts, and let/do-let bindings — with the let funclause shorthand `let f x = …` now supported)
|
||
- [x] Module header + imports — `module NAME [exports] where …`, qualified/as/hiding/explicit imports, operator exports, `module Foo` exports, dotted names, headerless-with-imports
|
||
- [x] List comprehensions + operator sections — `(op)` / `(op e)` / `(e op)` (excluding `-` from right sections), `[e | q1, q2, …]` with `q-gen` / `q-guard` / `q-let` qualifiers
|
||
- [x] AST design modelled on GHC's HsSyn at a surface level — keyword-tagged lists cover modules/imports/decls/types/patterns/expressions; see parser.sx docstrings for the full node catalogue
|
||
- [x] Unit tests in `lib/haskell/tests/parse.sx` (43 tokenizer tests, all green)
|
||
|
||
### Phase 2 — desugar + eager-ish eval + ADTs (untyped)
|
||
- [x] Desugar: guards → nested `if`s; `where` → `let`; list comp → `concatMap`-based; do-notation stays for now (desugared in phase 3)
|
||
- [x] `data` declarations register constructors in runtime
|
||
- [x] Pattern match (tag-based, value-level): atoms, vars, wildcards, constructor patterns, `as` patterns, nested
|
||
- [x] Evaluator (still strict internally — laziness in phase 3): `let`, `lambda`, application, `case`, literals, constructors
|
||
- [x] 30+ eval tests in `lib/haskell/tests/eval.sx`
|
||
|
||
### Phase 3 — laziness + classic programs
|
||
- [x] Transpile to thunk-wrapped SX: every application arg becomes `(make-thunk (lambda () <arg>))`
|
||
- [x] `force` = SX eval-thunk-to-WHNF primitive
|
||
- [x] Pattern match forces scrutinee before matching
|
||
- [x] Infinite structures: `repeat x`, `iterate f x`, `[1..]`, Fibonacci stream (sieve deferred — needs lazy `++` and is exercised under `Classic programs`)
|
||
- [x] `seq`, `deepseq` from Prelude
|
||
- [x] Do-notation for a stub `IO` monad (just threading, no real side effects yet)
|
||
- [ ] Classic programs in `lib/haskell/tests/programs/`:
|
||
- [x] `fib.hs` — infinite Fibonacci stream
|
||
- [x] `sieve.hs` — lazy sieve of Eratosthenes
|
||
- [x] `quicksort.hs` — naive QS
|
||
- [ ] `nqueens.hs`
|
||
- [ ] `calculator.hs` — parser combinator style expression evaluator
|
||
- [ ] `lib/haskell/conformance.sh` + runner; `scoreboard.json` + `scoreboard.md`
|
||
- [ ] Target: 5/5 classic programs passing
|
||
|
||
### Phase 4 — Hindley-Milner inference
|
||
- [ ] Algorithm W: unification + type schemes + generalisation + instantiation
|
||
- [ ] Report type errors with meaningful positions
|
||
- [ ] Reject untypeable programs that phase 3 was accepting
|
||
- [ ] Type-sig checking: user writes `f :: Int -> Int`; verify
|
||
- [ ] Let-polymorphism
|
||
- [ ] Unit tests: inference for 50+ expressions
|
||
|
||
### Phase 5 — typeclasses (dictionary passing)
|
||
- [ ] `class` / `instance` declarations
|
||
- [ ] Dictionary-passing elaborator: inserts dict args at call sites
|
||
- [ ] Standard classes: `Eq`, `Ord`, `Show`, `Num`, `Functor`, `Monad`, `Applicative`
|
||
- [ ] `deriving (Eq, Show)` for ADTs
|
||
|
||
### Phase 6 — real IO + Prelude completion
|
||
- [ ] Real `IO` monad backed by `perform`/`resume`
|
||
- [ ] `putStrLn`, `getLine`, `readFile`, `writeFile`, `print`
|
||
- [ ] Full-ish Prelude: `Maybe`, `Either`, `List` functions, `Map`-lite
|
||
- [ ] Drive scoreboard toward 150+ passing
|
||
|
||
## Progress log
|
||
|
||
_Newest first._
|
||
|
||
- **2026-04-25** — Classic program `quicksort.hs`: naive functional quicksort.
|
||
`qsort (x:xs) = qsort smaller ++ [x] ++ qsort larger where smaller = filter (< x) xs; larger = filter (>= x) xs`.
|
||
No new runtime additions needed — right sections, `filter`, `++` all worked out of the box.
|
||
5 tests (general sort, empty, singleton, already-sorted, reverse-sorted). 395/395 green.
|
||
|
||
- **2026-04-25** — Classic program `sieve.hs`: lazy sieve of Eratosthenes.
|
||
Added `mod`, `div`, `rem`, `quot` to `hk-binop` (and as first-class
|
||
values in `hk-init-env`), enabling backtick operator use. The filter-based
|
||
sieve `sieve (p:xs) = p : sieve (filter (\x -> x \`mod\` p /= 0) xs)` works
|
||
with the existing lazy cons + Prelude `filter`. 2 new tests in
|
||
`lib/haskell/tests/program-sieve.sx` (first 10 primes, 20th prime = 71).
|
||
390/390 green.
|
||
|
||
- **2026-04-25** — First classic program: `fib.hs`. Canonical Haskell
|
||
source lives at `lib/haskell/tests/programs/fib.hs` (the
|
||
two-cons-cell self-referential fibs definition plus a hand-rolled
|
||
`zipPlus`). The runner at `lib/haskell/tests/program-fib.sx`
|
||
mirrors the source as an SX string (the OCaml server's
|
||
`read-file` lives in the page-helpers env, not the default load
|
||
env, so direct file reads from inside `eval` aren't available).
|
||
Tests: `take 15 myFibs == [0,1,1,2,3,5,8,13,21,34,55,89,144,233,377]`,
|
||
plus a spot-check that the user-defined `zipPlus` is also
|
||
reachable. Found and fixed an ordering bug in `hk-bind-decls!`:
|
||
pass 3 (0-arity body evaluation) iterated `(keys groups)` whose
|
||
order is implementation-defined, so a top-down program where
|
||
`result = take 15 myFibs` came after `myFibs = …` could see
|
||
`myFibs` still bound to its `nil` placeholder. Now group names
|
||
are tracked in source order via a parallel list and pass 3 walks
|
||
that. 388/388 green.
|
||
|
||
- **2026-04-25** — Phase 3 do-notation + stub IO monad. Added a
|
||
`hk-desugar-do` pass that follows Haskell 98 §3.14 verbatim:
|
||
`do { e } = e`, `do { e ; ss } = e >> do { ss }`,
|
||
`do { p <- e ; ss } = e >>= \p -> do { ss }`, and
|
||
`do { let ds ; ss } = let ds in do { ss }`. The desugarer's
|
||
`:do` branch now invokes this pass directly so the surface
|
||
AST forms (`:do-expr`, `:do-bind`, `:do-let`) never reach the
|
||
evaluator. IO is represented as a tagged value
|
||
`("IO" payload)` — `return` (lazy builtin) wraps; `>>=` (lazy
|
||
builtin) forces the action, unwraps, and calls the bound
|
||
function on the payload; `>>` (lazy builtin) forces the
|
||
action and returns the second one. All three are non-strict
|
||
in their action arguments so deeply nested do-blocks don't
|
||
walk the whole chain at construction time. 14 new tests in
|
||
`lib/haskell/tests/do-io.sx` cover single-stmt do, single
|
||
and multi-bind, `>>` sequencing (last action wins), do-let
|
||
(single, multi, interleaved with bind), bind-to-`Just`,
|
||
bind-to-tuple, do inside a top-level fun, nested do, and
|
||
using `(>>=)`/`(>>)` directly as functions. 382/382 green.
|
||
|
||
- **2026-04-25** — Phase 3 `seq` + `deepseq`. Built-ins were strict
|
||
in all args by default (every collected thunk forced before
|
||
invoking the underlying SX fn) — that defeats `seq`'s purpose,
|
||
which is strict in its first argument and lazy in its second.
|
||
Added a tiny `lazy` flag on the builtin record (set by a new
|
||
`hk-mk-lazy-builtin` constructor) and routed `hk-apply-builtin`
|
||
to skip the auto-force when the flag is true. `seq a b` calls
|
||
`hk-force a` then returns `b` unchanged so its laziness is
|
||
preserved; `deepseq` does the same with `hk-deep-force`. 9 new
|
||
tests in `lib/haskell/tests/seq.sx` cover primitive, computed,
|
||
and let-bound first args, deepseq on a list / `Just` /
|
||
tuple, seq inside arithmetic, seq via a fun-clause, and
|
||
`[seq 1 10, seq 2 20]` to confirm seq composes inside list
|
||
literals. The lazy-when-unused negative case is also tested:
|
||
`let x = error "never" in 42 == 42`. 368/368 green.
|
||
|
||
- **2026-04-24** — Phase 3 infinite structures + Prelude. Two
|
||
evaluator changes turn the lazy primitives into a working
|
||
language:
|
||
1. Op-form `:` is now non-strict in both args — `hk-eval-op`
|
||
special-cases it before the eager force-and-binop path, so a
|
||
cons-cell holds two thunks. This is what makes `repeat x =
|
||
x : repeat x`, `iterate f x = x : iterate f (f x)`, and the
|
||
classic `fibs = 0 : 1 : zipWith plus fibs (tail fibs)`
|
||
terminate when only a finite prefix is consumed.
|
||
2. Operators are now first-class values via a small
|
||
`hk-make-binop-builtin` helper, so `(+)`, `(*)`, `(==)` etc.
|
||
can be passed to `zipWith` and `map`.
|
||
Added range support across parser + evaluator: `[from..to]` and
|
||
`[from,next..to]` evaluate eagerly via `hk-build-range` (handles
|
||
step direction); `[from..]` parses to a new `:range-from` node
|
||
that the evaluator desugars to `iterate (+ 1) from`. New
|
||
`hk-load-into!` runs the regular pipeline (parse → desugar →
|
||
register data → bind decls) on a source string, and `hk-init-env`
|
||
preloads `hk-prelude-src` with the Phase-3 Prelude:
|
||
`head`, `tail`, `fst`, `snd`, `take`, `drop`, `repeat`, `iterate`,
|
||
`length`, `map`, `filter`, `zipWith`, plus `fibs` and `plus`.
|
||
25 new tests in `lib/haskell/tests/infinite.sx`, including
|
||
`take 10 fibs == [0,1,1,2,3,5,8,13,21,34]`,
|
||
`head (drop 99 [1..])`, `iterate (\x -> x * 2) 1` powers of two,
|
||
user-defined `ones = 1 : ones`, `naturalsFrom`, range edge cases,
|
||
composed `map`/`filter`, and a custom `mySum`. 359/359 green.
|
||
Sieve of Eratosthenes is deferred — it needs lazy `++` plus a
|
||
`mod` primitive — and lives under `Classic programs` anyway.
|
||
|
||
- **2026-04-24** — Phase 3 laziness foundation. Added a thunk type to
|
||
`lib/haskell/eval.sx` (`hk-mk-thunk` / `hk-is-thunk?`) backed by a
|
||
one-shot memoizing `hk-force` that evaluates the deferred AST, then
|
||
flips a `forced` flag and caches the value on the thunk dict; the
|
||
shared `hk-deep-force` walks the result tree at the test/output
|
||
boundary. Three single-line wiring changes in the evaluator make
|
||
every application argument lazy: `:app` now wraps its argument in
|
||
`hk-mk-thunk` rather than evaluating it. To preserve correctness
|
||
where values must be inspected, `hk-apply`, `hk-eval-op`,
|
||
`hk-eval-if`, `hk-eval-case`, and `hk-eval` for `:neg` now force
|
||
their operand. `hk-apply-builtin` forces every collected arg
|
||
before invoking the underlying SX fn so built-ins (`error`, `not`,
|
||
`id`) stay strict. The pattern matcher in `match.sx` now forces
|
||
the scrutinee just-in-time only for patterns that need to inspect
|
||
shape — `p-wild`, `p-var`, `p-as`, and `p-lazy` are no-force
|
||
paths, so the value flows through as a thunk and binding
|
||
preserves laziness. `hk-match-list-pat` forces at every cons-spine
|
||
step. 6 new lazy-specific tests in `lib/haskell/tests/eval.sx`
|
||
verify that `(\x y -> x) 1 (error …)` and `(\x y -> y) (error …) 99`
|
||
return without diverging, that `case Just (error …) of Just _ -> 7`
|
||
short-circuits, that `const` drops its second arg, that
|
||
`myHead (1 : error … : [])` returns 1 without touching the tail,
|
||
and that `Just (error …)` survives a wildcard-arm `case`. 333/333
|
||
green, all prior eval tests preserved by deep-forcing the result
|
||
in `hk-eval-expr-source` and `hk-prog-val`.
|
||
|
||
- **2026-04-24** — Phase 2 evaluator (`lib/haskell/eval.sx`) — ties
|
||
the whole pipeline together. Strict semantics throughout (laziness
|
||
is Phase 3). Function values are tagged dicts: `closure`,
|
||
`multi`(fun), `con-partial`, `builtin`. `hk-apply` unifies dispatch
|
||
across all four; closures and multifuns curry one argument at a
|
||
time, multifuns trying each clause's pat-list in order once arity
|
||
is reached. Top-level `hk-bind-decls!` is three-pass —
|
||
collect groups + pre-seed names → install multifuns (so closures
|
||
observe later names) → eval 0-arity bodies and pat-binds — making
|
||
forward and mutually recursive references work. `hk-eval-let` does
|
||
the same trick with a mutable child env. Built-ins:
|
||
`error`/`not`/`id`, plus `otherwise = True`. Operators wired:
|
||
arithmetic, comparison (returning Bool conses), `&&`, `||`, `:`,
|
||
`++`. Sections evaluate the captured operand once and return a
|
||
closure synthesized via the existing AST. `hk-eval-program`
|
||
registers data decls then binds, returning the env; `hk-run`
|
||
fetches `main` if present. Also extended `runtime.sx` to
|
||
pre-register the standard Prelude conses (`Maybe`, `Either`,
|
||
`Ordering`) so expression-level eval doesn't need a leading
|
||
`data` decl. 48 new tests in `lib/haskell/tests/eval.sx` cover
|
||
literals, arithmetic precedence, comparison/Bool, `if`, `let`
|
||
(incl. recursive factorial), lambdas (incl. constructor pattern
|
||
args), constructors, `case` (Just/Nothing/literal/tuple/wildcard),
|
||
list literals + cons + `++`, tuples, sections, multi-clause
|
||
top-level (factorial, list length via cons pattern, Maybe handler
|
||
with default), user-defined `data` with case-style matching, a
|
||
binary-tree height program, currying, higher-order (`twice`),
|
||
short-circuit `error` via `if`, and the three built-ins. 329/329
|
||
green. Phase 2 is now complete; Phase 3 (laziness) is next.
|
||
|
||
- **2026-04-24** — Phase 2: value-level pattern matcher
|
||
(`lib/haskell/match.sx`). Core entry `hk-match pat val env` returns
|
||
an extended env dict on success or `nil` on failure (uses `assoc`
|
||
rather than `dict-set!` so failed branches never pollute the
|
||
caller's env). Constructor values are tagged lists with the
|
||
constructor name as the first element; tuples use the tag `"Tuple"`,
|
||
lists are chained `(":" h t)` cons cells terminated by `("[]")`.
|
||
Value builders `hk-mk-con` / `hk-mk-tuple` / `hk-mk-nil` /
|
||
`hk-mk-cons` / `hk-mk-list` keep tests readable. The matcher
|
||
handles every pattern node the parser emits:
|
||
- `:p-wild` (always matches), `:p-var` (binds), `:p-int` /
|
||
`:p-float` / `:p-string` / `:p-char` (literal equality)
|
||
- `:p-as` (sub-match then bind whole), `:p-lazy` (eager for now;
|
||
laziness wired in phase 3)
|
||
- `:p-con` with arity check + recursive arg matching, including
|
||
deeply nested patterns and infix `:` cons (uses the same
|
||
code path as named constructors)
|
||
- `:p-tuple` against `"Tuple"` values, `:p-list` against an
|
||
exact-length cons spine.
|
||
Helper `hk-parse-pat-source` lifts a real Haskell pattern out of
|
||
`case _ of <pat> -> 0`, letting tests drive against parser output.
|
||
31 new tests in `lib/haskell/tests/match.sx` cover atomic
|
||
patterns, success/failure for each con/tuple/list shape, nested
|
||
`Just (Just x)`, cons-vs-empty, `as` over con / wildcard /
|
||
failing-sub, `~` lazy, plus four parser-driven cases (`Just x`,
|
||
`x : xs`, `(a, b)`, `n@(Just x)`). 281/281 green.
|
||
|
||
- **2026-04-24** — Phase 2: runtime constructor registry
|
||
(`lib/haskell/runtime.sx`). A mutable dict `hk-constructors` keyed
|
||
by constructor name, each entry carrying arity and owning type.
|
||
`hk-register-data!` walks a `:data` AST and registers every
|
||
`:con-def` with its arity (= number of field types) and the type
|
||
name; `hk-register-newtype!` does the one-constructor variant;
|
||
`hk-register-decls!` / `hk-register-program!` filter a decls list
|
||
(or a `:program` / `:module` AST) and call the appropriate
|
||
registrar. `hk-load-source!` composes it with `hk-core`
|
||
(tokenize → layout → parse → desugar → register). Pre-registers
|
||
five built-ins tied to Haskell syntactic forms: `True` / `False`
|
||
(Bool), `[]` and `:` (List), `()` (Unit) — everything else comes
|
||
from user declarations or the eventual Prelude. Query helpers:
|
||
`hk-is-con?`, `hk-con-arity`, `hk-con-type`, `hk-con-names`. 24
|
||
new tests in `lib/haskell/tests/runtime.sx` cover each built-in
|
||
(arity + type), unknown-name probes, registration of `MyBool` /
|
||
`Maybe` / `Either` / recursive `Tree` / `newtype Age`, multi-data
|
||
programs, a module-header body, ignoring non-data decls, and
|
||
last-wins re-registration. 250/250 green.
|
||
|
||
- **2026-04-24** — Phase 2 kicks off with `lib/haskell/desugar.sx` — a
|
||
tree-walking rewriter that eliminates the three surface-only forms
|
||
produced by the parser, leaving a smaller core AST for the evaluator:
|
||
- `:where BODY DECLS` → `:let DECLS BODY`
|
||
- `:guarded ((:guard C1 E1) (:guard C2 E2) …)` → right-folded
|
||
`(:if C1 E1 (:if C2 E2 … (:app (:var "error") (:string "…"))))`
|
||
- `:list-comp E QUALS` → Haskell 98 §3.11 translation:
|
||
empty quals → `(:list (E))`, `:q-guard` → `(:if … (:list (E)) (:list ()))`,
|
||
`:q-gen PAT SRC` → `(concatMap (\PAT -> …) SRC)`, `:q-let BINDS` →
|
||
`(:let BINDS …)`. Nested generators compile to nested concatMap.
|
||
Every other expression, decl, pattern, and type node is recursed
|
||
into and passed through unchanged. Public entries `hk-desugar`,
|
||
`hk-core` (tokenize → layout → parse → desugar on a module), and
|
||
`hk-core-expr` (the same for an expression). 15 new tests in
|
||
`lib/haskell/tests/desugar.sx` cover two- and three-way guards,
|
||
case-alt guards, single/multi-binding `where`, guards + `where`
|
||
combined, the four list-comprehension cases (single-gen, gen +
|
||
filter, gen + let, nested gens), and pass-through for literals,
|
||
lambdas, simple fun-clauses, `data` decls, and a module header
|
||
wrapping a guarded function. 226/226 green.
|
||
|
||
- **2026-04-24** — Phase 1 parser is now complete. This iteration adds
|
||
operator sections and list comprehensions, the two remaining
|
||
aexp-level forms, plus ticks the “AST design” item (the keyword-
|
||
tagged list shape has accumulated a full HsSyn-level surface).
|
||
Changes:
|
||
- `hk-parse-infix` now bails on `op )` without consuming the op, so
|
||
the paren parser can claim it as a left section.
|
||
- `hk-parse-parens` rewritten to recognise five new forms:
|
||
`()` (unit), `(op)` → `(:var OP)`, `(op e)` → `(:sect-right OP E)`
|
||
(excluded for `-` so that `(- 5)` stays `(:neg 5)`), `(e op)` →
|
||
`(:sect-left OP E)`, plus regular parens and tuples. Works for
|
||
varsym, consym, reservedop `:`, and backtick-quoted varids.
|
||
- `hk-section-op-info` inspects the current token and returns a
|
||
`{:name :len}` dict, so the same logic handles 1-token ops and
|
||
3-token backtick ops uniformly.
|
||
- `hk-parse-list-lit` now recognises a `|` after the first element
|
||
and dispatches to `hk-parse-qual` per qualifier (comma-separated),
|
||
producing `(:list-comp EXPR QUALS)`. Qualifiers are:
|
||
`(:q-gen PAT EXPR)` when a paren-balanced lookahead
|
||
(`hk-comp-qual-is-gen?`) finds `<-` before the next `,`/`]`,
|
||
`(:q-let BINDS)` for `let …`, and `(:q-guard EXPR)` otherwise.
|
||
- `hk-parse-comp-let` accepts `]` or `,` as an implicit block close
|
||
(single-line comprehensions never see layout's vrbrace before the
|
||
qualifier terminator arrives); explicit `{ }` still closes
|
||
strictly.
|
||
22 new tests in `lib/haskell/tests/parser-sect-comp.sx` cover
|
||
op-references (inc. `(-)`, `(:)`, backtick), right sections (inc.
|
||
backtick), left sections, the `(- 5)` → `:neg` corner, plain parens
|
||
and tuples, six comprehension shapes (simple, filter, let,
|
||
nested-generators, constructor pattern bind, tuple pattern bind,
|
||
and a three-qualifier mix). 211/211 green.
|
||
|
||
- **2026-04-24** — Phase 1: module header + imports. Added
|
||
`hk-parse-module-header`, `hk-parse-import`, plus shared helpers for
|
||
import/export entity lists (`hk-parse-ent`, `hk-parse-ent-member`,
|
||
`hk-parse-ent-list`). New AST:
|
||
- `(:module NAME EXPORTS IMPORTS DECLS)` — NAME `nil` means no header,
|
||
EXPORTS `nil` means no export list (distinct from empty `()`)
|
||
- `(:import QUALIFIED NAME AS SPEC)` — QUALIFIED bool, AS alias or nil,
|
||
SPEC nil / `(:spec-items ENTS)` / `(:spec-hiding ENTS)`
|
||
- Entity refs: `:ent-var`, `:ent-all` (`Tycon(..)`), `:ent-with`
|
||
(`Tycon(m1, m2, …)`), `:ent-module` (exports only).
|
||
`hk-parse-program` now dispatches on the leading token: `module`
|
||
keyword → full header-plus-body parse (consuming the `where` layout
|
||
brace around the module body); otherwise collect any leading
|
||
`import` decls and then remaining decls with the existing logic.
|
||
The outer shell is `(:module …)` as soon as any header or import is
|
||
present, and stays as `(:program DECLS)` otherwise — preserving every
|
||
previous test expectation untouched. Handles operator exports `((+:))`,
|
||
dotted module names (`Data.Map`), and the Haskell-98 context-sensitive
|
||
keywords `qualified`/`as`/`hiding` (all lexed as ordinary varids and
|
||
matched only in import position). 16 new tests in
|
||
`lib/haskell/tests/parser-module.sx` covering simple/exports/empty
|
||
headers, dotted names, operator exports, `module Foo` exports,
|
||
qualified/aliased/items/hiding imports, and a headerless-with-imports
|
||
file. 189/189 green.
|
||
|
||
- **2026-04-24** — Phase 1: guards + where clauses. Factored a single
|
||
`hk-parse-rhs sep` that all body-producing sites now share: it reads
|
||
a plain `sep expr` body or a chain of `| cond sep expr` guards, then
|
||
— regardless of which form — looks for an optional `where` block and
|
||
wraps accordingly. AST additions:
|
||
- `:guarded GUARDS` where each GUARD is `:guard COND EXPR`
|
||
- `:where BODY DECLS` where BODY is a plain expr or a `:guarded`
|
||
Both can nest (guards inside where). `hk-parse-alt` now routes through
|
||
`hk-parse-rhs "->"`, `hk-parse-fun-clause` and `hk-parse-bind` through
|
||
`hk-parse-rhs "="`. `hk-parse-where-decls` reuses `hk-parse-decl` so
|
||
where-blocks accept any decl form (signatures, fixity, nested funs).
|
||
As a side effect, `hk-parse-bind` now also picks up the Haskell-native
|
||
`let f x = …` funclause shorthand: a varid followed by one or more
|
||
apats produces `(:fun-clause NAME APATS BODY)` instead of a
|
||
`(:bind (:p-var …) …)` — keeping the simple `let x = e` shape
|
||
unchanged for existing tests. 11 new tests in
|
||
`lib/haskell/tests/parser-guards-where.sx` cover two- and three-way
|
||
guards, mixed guarded + equality clauses, single- and multi-binding
|
||
where blocks, guards plus where, case-alt guards, case-alt where,
|
||
let with funclause shorthand, let with guards, and a where containing
|
||
a type signature alongside a fun-clause. 173/173 green.
|
||
|
||
- **2026-04-24** — Phase 1: top-level decls. Refactored `hk-parse-expr` into a
|
||
`hk-parser tokens mode` with `:expr` / `:module` dispatch so the big lexical
|
||
state is shared (peek/advance/pat/expr helpers all reachable); added public
|
||
wrappers `hk-parse-expr`, `hk-parse-module`, and source-level entry
|
||
`hk-parse-top`. New type parser (`hk-parse-type` / `hk-parse-btype` /
|
||
`hk-parse-atype`): type variables (`:t-var`), type constructors (`:t-con`),
|
||
type application (`:t-app`, left-assoc), right-associative function arrow
|
||
(`:t-fun`), unit/tuples (`:t-tuple`), and lists (`:t-list`). New decl parser
|
||
(`hk-parse-decl` / `hk-parse-program`) producing a `(:program DECLS)` shell:
|
||
- `:type-sig NAMES TYPE` — comma-separated multi-name support
|
||
- `:fun-clause NAME APATS BODY` — patterns for args, body via existing expr
|
||
- `:pat-bind PAT BODY` — top-level pattern bindings like `(a, b) = pair`
|
||
- `:data NAME TVARS CONS` with `:con-def CNAME FIELDS` for nullary and
|
||
multi-arg constructors, including recursive references
|
||
- `:type-syn NAME TVARS TYPE`, `:newtype NAME TVARS CNAME FIELD`
|
||
- `:fixity ASSOC PREC OPS` — assoc one of `"l"`/`"r"`/`"n"`, default prec 9,
|
||
comma-separated operator names, including backtick-quoted varids.
|
||
Sig vs fun-clause disambiguated by a paren-balanced top-level scan for
|
||
`::` before the next `;`/`}` (`hk-has-top-dcolon?`). 24 new tests in
|
||
`lib/haskell/tests/parser-decls.sx` cover all decl forms, signatures with
|
||
application / tuples / lists / right-assoc arrows, nullary and recursive
|
||
data types, multi-clause functions, and a mixed program with data + type-
|
||
synonym + signature + two function clauses. Not yet: guards, where
|
||
clauses, module header, imports, deriving, contexts, GADTs. 162/162 green.
|
||
|
||
- **2026-04-24** — Phase 1: full patterns. Added `as` patterns
|
||
(`name@apat` → `(:p-as NAME PAT)`), lazy patterns (`~apat` →
|
||
`(:p-lazy PAT)`), negative literal patterns (`-N` / `-F` resolving
|
||
eagerly in the parser so downstream passes see a plain `(:p-int -1)`),
|
||
and infix constructor patterns via a right-associative single-band
|
||
layer on top of `hk-parse-pat-lhs` for any `consym` or reservedop `:`
|
||
(so `x : xs` parses as `(:p-con ":" [x, xs])`, `a :+: b` likewise).
|
||
Extended `hk-apat-start?` with `-` and `~` so the pattern-argument
|
||
loops in lambdas and constructor applications pick these up.
|
||
Lambdas now parse apat parameters instead of bare varids — so the
|
||
`:lambda` AST is `(:lambda APATS BODY)` with apats as pattern nodes.
|
||
`hk-parse-bind` became a plain `pat = expr` form, so `:bind` now has
|
||
a pattern LHS throughout (simple `x = 1` → `(:bind (:p-var "x") …)`);
|
||
this picks up `let (x, y) = pair in …` and `let Just x = m in x`
|
||
automatically, and flows through `do`-notation lets. Eight existing
|
||
tests updated to the pattern-flavoured AST. Also fixed a pragmatic
|
||
layout issue that surfaced in multi-line `let`s: when a layout-indent
|
||
would emit a spurious `;` just before an `in` token (because the
|
||
let block had already been closed by dedent), `hk-peek-next-reserved`
|
||
now lets the layout pass skip that indent and leave closing to the
|
||
existing `in` handler. 18 new tests in
|
||
`lib/haskell/tests/parser-patterns.sx` cover every pattern variant,
|
||
lambda with mixed apats, let pattern-bindings (tuple / constructor /
|
||
cons), and do-bind with a tuple pattern. 138/138 green.
|
||
|
||
- **2026-04-24** — Phase 1: `case … of` and `do`-notation parsers. Added `hk-parse-case`
|
||
/ `hk-parse-alt`, `hk-parse-do` / `hk-parse-do-stmt` / `hk-parse-do-let`, plus the
|
||
minimal pattern language needed to make arms and binds meaningful:
|
||
`hk-parse-apat` (var, wildcard `_`, int/float/string/char literal, 0-arity
|
||
conid/qconid, paren+tuple, list) and `hk-parse-pat` (conid applied to
|
||
apats greedily). AST nodes: `:case SCRUT ALTS`, `:alt PAT BODY`, `:do STMTS`
|
||
with stmts `:do-expr E` / `:do-bind PAT E` / `:do-let BINDS`, and pattern
|
||
tags `:p-wild` / `:p-int` / `:p-float` / `:p-string` / `:p-char` / `:p-var`
|
||
/ `:p-con NAME ARGS` / `:p-tuple` / `:p-list`. `do`-stmts disambiguate
|
||
`pat <- e` vs bare expression with a forward paren/bracket/brace-balanced
|
||
scan for `<-` before the next `;`/`}` — no backtracking, no AST rewrite.
|
||
`case` and `do` accept both implicit (`vlbrace`/`vsemi`/`vrbrace`) and
|
||
explicit braces. Added to `hk-parse-lexp` so they participate fully in
|
||
operator-precedence expressions. 19 new tests in
|
||
`lib/haskell/tests/parser-case-do.sx` cover every pattern variant,
|
||
explicit-brace `case`, expression scrutinees, do with bind/let/expr,
|
||
multi-binding `let` in `do`, constructor patterns in binds, and
|
||
`case`/`do` nested inside `let` and lambda. The full pattern item (as
|
||
patterns, negative literals, `~` lazy, lambda/let pattern extension)
|
||
remains a separate sub-item. 119/119 green.
|
||
|
||
- **2026-04-24** — Phase 1: expression parser (`lib/haskell/parser.sx`, ~380 lines).
|
||
Pratt-style precedence climbing against a Haskell-98-default op table (24
|
||
operators across precedence 0–9, left/right/non assoc, default infixl 9 for
|
||
anything unlisted). Supports literals (int/float/string/char), varid/conid
|
||
(qualified variants folded into `:var` / `:con`), parens / unit / tuples,
|
||
list literals, ranges `[a..b]` and `[a,b..c]`, left-associative application,
|
||
unary `-`, backtick operators (`x \`mod\` 3`), lambdas, `if-then-else`, and
|
||
`let … in` consuming both virtual and explicit braces. AST uses keyword
|
||
tags (`:var`, `:op`, `:lambda`, `:let`, `:bind`, `:tuple`, `:range`,
|
||
`:range-step`, `:app`, `:neg`, `:if`, `:list`, `:int`, `:float`, `:string`,
|
||
`:char`, `:con`). The parser skips a leading `vlbrace` / `lbrace` so it can
|
||
be called on full post-layout output, and uses a `raise`-based error channel
|
||
with location-lite messages. 42 new tests in `lib/haskell/tests/parser-expr.sx`
|
||
cover literals, identifiers, parens/tuple/unit, list + range, app associativity,
|
||
operator precedence (mul over add, cons right-assoc, function-composition
|
||
right-assoc, `$` lowest), backtick ops, unary `-`, lambda multi-param,
|
||
`if` with infix condition, single- and multi-binding `let` (both implicit
|
||
and explicit braces), plus a few mixed nestings. 100/100 green.
|
||
|
||
- **2026-04-24** — Phase 1: layout algorithm (`lib/haskell/layout.sx`, ~260 lines)
|
||
implementing Haskell 98 §10.3. Two-pass design: a pre-pass augments the raw
|
||
token stream with explicit `layout-open` / `layout-indent` markers (suppressing
|
||
`<n>` when `{n}` already applies, per note 3), then an L pass consumes the
|
||
augmented stream against a stack of implicit/explicit layout contexts and
|
||
emits `vlbrace` / `vsemi` / `vrbrace` tokens; newlines are dropped. Supports
|
||
the initial module-level implicit open (skipped when the first token is
|
||
`module` or `{`), the four layout keywords (`let`/`where`/`do`/`of`), explicit
|
||
braces disabling layout, dedent closing nested implicit blocks while also
|
||
emitting `vsemi` at the enclosing level, and the pragmatic single-line
|
||
`let … in` rule (emit `}` when `in` meets an implicit let). 15 new tests
|
||
in `lib/haskell/tests/layout.sx` cover module-start, do/let/where/case/of,
|
||
explicit braces, multi-level dedent, line continuation, and EOF close-down.
|
||
Shared test helpers moved to `lib/haskell/testlib.sx` so both test files
|
||
can share one `hk-test`. `test.sh` preloads tokenizer + layout + testlib.
|
||
58/58 green.
|
||
|
||
- **2026-04-24** — Phase 1: Haskell 98 tokenizer (`lib/haskell/tokenizer.sx`, 490 lines)
|
||
covering idents (lower/upper/qvarid/qconid), 23 reserved words, 11 reserved ops,
|
||
varsym/consym operator chains, integer/hex/octal/float literals incl. exponent
|
||
notation, char + string literals with escape sequences, nested `{- ... -}` block
|
||
comments with depth counter, `-- ... EOL` line comments (respecting the
|
||
"followed by symbol = not a comment" Haskell 98 rule), backticks, punctuation,
|
||
and explicit `newline` tokens for the upcoming layout pass. 43 structural tests
|
||
in `lib/haskell/tests/parse.sx`, a lightweight `hk-deep=?` equality helper
|
||
and a custom `lib/haskell/test.sh` runner (pipes through the OCaml epoch
|
||
protocol, falls back to the main-repo build when run from a worktree). 43/43
|
||
green.
|
||
|
||
Also peeked at `/root/rose-ash/sx-haskell/` per briefing: that directory is a
|
||
Haskell program implementing an **SX interpreter** (Types.hs, Eval.hs,
|
||
Primitives.hs, etc. — ~2800 lines of .hs) — the *opposite* direction from this
|
||
project. Nothing to fold in.
|
||
|
||
Gotchas hit: `emit!` and `peek` are SX evaluator special forms, so every local
|
||
helper uses the `hk-` prefix. `cond`/`when`/`let` clauses evaluate ONLY the
|
||
last expression; multi-expression bodies MUST be wrapped in `(do ...)`. These
|
||
two together account for all the tokenizer's early crashes.
|
||
|
||
## Blockers
|
||
|
||
- _(none yet)_
|