From 3fb0212414904d2e5786564a0ade848eed3f1706 Mon Sep 17 00:00:00 2001 From: giles Date: Sun, 26 Apr 2026 17:17:14 +0000 Subject: [PATCH] =?UTF-8?q?plan:=20Phase=206=20ADT=20design=20doc=20?= =?UTF-8?q?=E2=80=94=20define-type/match=20syntax,=20CEK=20dispatch,=20exh?= =?UTF-8?q?austiveness?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-Authored-By: Claude Sonnet 4.6 --- plans/agent-briefings/primitives-loop.md | 6 +- plans/designs/sx-adt.md | 257 +++++++++++++++++++++++ 2 files changed, 262 insertions(+), 1 deletion(-) create mode 100644 plans/designs/sx-adt.md diff --git a/plans/agent-briefings/primitives-loop.md b/plans/agent-briefings/primitives-loop.md index 1387e641..fab5ce8b 100644 --- a/plans/agent-briefings/primitives-loop.md +++ b/plans/agent-briefings/primitives-loop.md @@ -171,9 +171,12 @@ Fix O(n²) string concatenation in loops across Lua, Ruby, Common Lisp, Tcl. The deepest structural gap. Every language uses `{:tag "..." :field ...}` tagged dicts to simulate sum types. A native `define-type` + `match` form eliminates this everywhere. -- [ ] Design: write `plans/designs/sx-adt.md` covering syntax, CEK dispatch, interaction with +- [x] Design: write `plans/designs/sx-adt.md` covering syntax, CEK dispatch, interaction with existing `cond`/`case`, exhaustiveness checking, recursive types, pattern variables. Draft, then stop — next fire reviews design before implementing. + Written: define-type/match syntax, AdtValue runtime rep, stepSfDefineType + MatchFrame + CEK dispatch, exhaustiveness warnings via _adt_registry, recursive types, nested patterns, + wildcard _, 3-phase impl plan (basic/nested/exhaustiveness), open questions on accessors/singletons/inspect. - [ ] Spec: implement `define-type` special form in `spec/evaluator.sx`: `(define-type Name (Ctor1 field...) (Ctor2 field...) ...)` @@ -683,6 +686,7 @@ Brief each language's loop agent (or do inline) after rebasing their branch onto _Newest first._ +- 2026-04-26: Phase 6 Design done — plans/designs/sx-adt.md written. Covers define-type/match syntax, AdtValue CEK runtime, stepSfDefineType+MatchFrame dispatch, exhaustiveness warnings, recursive types, nested patterns, wildcard _. 3-phase impl plan. Next fire: Spec implement define-type. - 2026-04-26: Phase 5 complete — string buffer fully landed (d98b5fa2). 17 tests, 17/17 OCaml+JS. Phase 6 (ADTs) next. - 2026-04-26: Phase 5 Spec+OCaml+JS step done — StringBuffer of Buffer.t in sx_types.ml; make-string-buffer/append!/->string/length/string-buffer? in sx_primitives.ml; SxStringBuffer with _string_buffer marker + typeOf/dict? fixes in platform.py; JS rebuilt. 17/17 tests OCaml+JS. - 2026-04-26: Phase 4 complete — coroutine primitive fully landed (4 commits: spec library + OCaml verified + JS pre-load + 27 tests). Phase 5 (string buffer) next. diff --git a/plans/designs/sx-adt.md b/plans/designs/sx-adt.md new file mode 100644 index 00000000..8526e767 --- /dev/null +++ b/plans/designs/sx-adt.md @@ -0,0 +1,257 @@ +# SX Algebraic Data Types — Design + +## Motivation + +Every language implementation currently uses `{:tag "..." :field ...}` tagged dicts to +simulate sum types. This is verbose, error-prone (typos in tag strings go undetected), and +produces no exhaustiveness warnings. Native ADTs eliminate the pattern everywhere. + +Examples of current workarounds: +- Haskell `Maybe a` → `{:tag "Just" :value x}` / `{:tag "Nothing"}` +- Prolog terms → `{:tag "functor" :name "foo" :args (list x y)}` +- Lua result type → `{:tag "ok" :value v}` / `{:tag "err" :msg s}` +- Common Lisp `cons` pairs → `{:tag "cons" :car a :cdr b}` + +--- + +## Syntax + +### `define-type` + +```lisp +(define-type Name + (Ctor1 field1 field2 ...) + (Ctor2 field1 ...) + ...) +``` + +Creates: +- Constructor functions: `Ctor1`, `Ctor2`, … (callable like normal functions) +- Type predicate: `Name?` — returns true for any value of type `Name` +- Constructor predicates: `Ctor1?`, `Ctor2?`, … (optional, auto-generated) +- Field accessors: `Ctor1-field1`, `Ctor1-field2`, … (optional, auto-generated) + +Examples: + +```lisp +(define-type Maybe + (Just value) + (Nothing)) + +(define-type Result + (Ok value) + (Err message)) + +(define-type Tree + (Leaf) + (Node left value right)) + +(define-type List-of + (Nil-of) + (Cons-of head tail)) +``` + +Constructors with no fields are zero-argument constructors (singletons by value): + +```lisp +(Nothing) ; => # +(Leaf) ; => # +``` + +### `match` + +```lisp +(match expr + ((Ctor1 a b) body) + ((Ctor2 x) body) + ((Ctor3) body) + (else body)) +``` + +- Clauses are tried in order; first match wins. +- `else` clause is optional but suppresses exhaustiveness warnings. +- Pattern variables (`a`, `b`, `x`) are bound in the body scope. +- Wildcard `_` discards the matched value. +- Literal patterns: `42`, `"str"`, `true`, `nil` — match by value equality. +- Nested patterns: `((Node left (Leaf) right) body)` — nested constructor patterns. + +Examples: + +```lisp +(match result + ((Ok v) (str "got: " v)) + ((Err m) (str "error: " m))) + +(match tree + ((Leaf) 0) + ((Node l v r) (+ 1 (tree-depth l) (tree-depth r)))) +``` + +--- + +## CEK Dispatch + +### Runtime representation + +ADT values are OCaml records (not dicts) — opaque, non-inspectable via `get`: + +```ocaml +type adt_value = { + av_type : string; (* type name, e.g. "Maybe" *) + av_ctor : string; (* constructor name, e.g. "Just" *) + av_fields: value array; (* positional fields *) +} +``` + +In JS: `{ _adt: true, _type: "Maybe", _ctor: "Just", _fields: [v] }`. + +`typeOf` returns the ADT type name (e.g. `"Maybe"`). + +### `define-type` — special form + +`stepSfDefineType(args, env, kont)`: + +1. Parse `Name` and list of `(CtorN field...)` clauses. +2. For each constructor `CtorK` with fields `[f1, f2, …]`: + - Register `CtorK` as a `NativeFn` that takes `|fields|` args and returns an `AdtValue`. + - Register `CtorK?` as a predicate (`AdtValue` with matching ctor name → `true`). + - Register `CtorK-fN` as field accessor (returns `av_fields[N]`). +3. Register `Name?` as a predicate (`AdtValue` with matching type name → `true`). +4. All bindings go into the current environment via `env-bind!`. +5. Returns `Nil`. + +This is an environment mutation — no new frame needed. Evaluates in one step. + +### `match` — special form + +`stepSfMatch(args, env, kont)`: + +1. Push `MatchFrame` with `clauses` and `env` onto kont. +2. Return state evaluating the scrutinee `expr`. +3. `MatchFrame` continue: receive scrutinee value, walk clauses: + - For each `((CtorN vars...) body)`: + - If scrutinee is an `AdtValue` with `av_ctor = "CtorN"` and `av_fields.length = |vars|`: + - Bind `vars[i]` → `av_fields[i]` in fresh child env. + - Return state evaluating `body` in that env. + - `(else body)` — always matches, body evaluated in current env. + - Literal `42`/`"str"` patterns: match by value equality. + - Wildcard `_`: always matches, binds nothing. +4. If no clause matched and no `else`: raise `"match: no clause matched "`. + +Frame type: `"match"` — stores `cf_remaining` (clauses), `cf_env` (enclosing env). + +--- + +## Interaction with `cond` / `case` + +`match` is the primary dispatch form for ADTs. `cond` / `case` remain unchanged: + +- `cond` tests arbitrary boolean expressions — still useful for non-ADT dispatch. +- `case` matches on equality to literal values — unchanged. +- `match` is the new form: structural pattern matching on ADT constructors. + +They are orthogonal. A `match` clause can contain a `cond`; a `cond` clause can contain a `match`. + +--- + +## Exhaustiveness checking + +Emit a **warning** (not an error) when: +- A `match` has no `else` clause, AND +- Not all constructors of the scrutinee's type are covered. + +Detection: when `define-type` runs, it registers the constructor set in a global table +`_adt_registry: type_name → [ctor_names]`. At `match` compile/evaluation time: +- If the scrutinee's type is in `_adt_registry` and not all ctors appear as patterns: + - `console.warn("[sx] match: non-exhaustive — missing: Ctor3, Ctor4 for type Maybe")` + - Execution continues (warning, not error). + +This is best-effort: the scrutinee type is only known at runtime. The warning fires on +first non-exhaustive match evaluation, not at definition time. + +--- + +## Recursive types + +Recursive types work because constructors are registered as functions, and function bodies +are evaluated lazily: + +```lisp +(define-type Tree + (Leaf) + (Node left value right)) + +; Recursive function over a recursive type: +(define (depth tree) + (match tree + ((Leaf) 0) + ((Node l v r) (+ 1 (max (depth l) (depth r)))))) +``` + +No special treatment needed — the type definition doesn't need to know about recursion. +The constructor `Node` accepts any values, including other `Node` or `Leaf` values. + +--- + +## Pattern variables + +In `match` clauses, identifiers in constructor position that are NOT constructor names are +treated as pattern variables (bound to matched field values): + +```lisp +(match x + ((Just v) v) ; v bound to the wrapped value + ((Nothing) nil)) + +(match pair + ((Cons-of h t) (list h t))) ; h, t bound to head and tail +``` + +**Wildcard**: `_` is always a wildcard — matches anything, binds nothing. + +```lisp +(match x + ((Just _) "has value") + ((Nothing) "empty")) +``` + +**Nested patterns**: + +```lisp +(match tree + ((Node (Leaf) v (Leaf)) (str "leaf node: " v)) + ((Node l v r) (str "inner node: " v))) +``` + +Nested patterns are matched recursively: the inner `(Leaf)` pattern checks that the +`left` field is itself a `Leaf` ADT value. + +--- + +## Implementation Plan + +### Phase 6a — `define-type` + basic `match` (no nested patterns, no exhaustiveness) + +1. OCaml: add `AdtValue of adt_value` to `sx_types.ml`. +2. Evaluator: add `step-sf-define-type` — parse clauses, register ctor fns + predicates + accessors. +3. Evaluator: add `step-sf-match` + `MatchFrame` — linear scan of clauses, flat patterns only. +4. JS: same (AdtValue as plain object with `_adt`/`_type`/`_ctor`/`_fields` props). + +### Phase 6b — nested patterns (separate fire) + +Recursive `matchPattern(pattern, value, env)` helper that: +- Returns `{matched: bool, bindings: map}` +- Recursively matches sub-patterns against ADT fields. + +### Phase 6c — exhaustiveness warnings (separate fire) + +`_adt_registry` global + warning emission on first non-exhaustive match. + +--- + +## Open questions (deferred to review) + +1. **Accessor auto-generation**: should `Ctor-field` accessors be generated always, or only on demand? Risk: name collisions if two types have constructors with same field names. +2. **Singleton constructors**: `(Nothing)` — zero-arg ctor — should these be interned (same object every call) or fresh each time? Interning enables `eq?` checks but requires a global table. +3. **Printing/inspect**: `inspect` on an AdtValue should show `(Just 42)` not `#`. Implement in `inspect` function or via `display`/`write` (Phase 17 ports). +4. **Pattern-matching on non-ADT values**: should `match` handle list patterns `(a . b)` and literal patterns in clause heads? Deferred — add only if needed by a language implementation.