plan: Phase 6 ADT design doc — define-type/match syntax, CEK dispatch, exhaustiveness

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-04-26 17:17:14 +00:00
parent 518ad37def
commit 3fb0212414
2 changed files with 262 additions and 1 deletions

View File

@@ -171,9 +171,12 @@ Fix O(n²) string concatenation in loops across Lua, Ruby, Common Lisp, Tcl.
The deepest structural gap. Every language uses `{:tag "..." :field ...}` tagged dicts to
simulate sum types. A native `define-type` + `match` form eliminates this everywhere.
- [ ] Design: write `plans/designs/sx-adt.md` covering syntax, CEK dispatch, interaction with
- [x] Design: write `plans/designs/sx-adt.md` covering syntax, CEK dispatch, interaction with
existing `cond`/`case`, exhaustiveness checking, recursive types, pattern variables.
Draft, then stop — next fire reviews design before implementing.
Written: define-type/match syntax, AdtValue runtime rep, stepSfDefineType + MatchFrame
CEK dispatch, exhaustiveness warnings via _adt_registry, recursive types, nested patterns,
wildcard _, 3-phase impl plan (basic/nested/exhaustiveness), open questions on accessors/singletons/inspect.
- [ ] Spec: implement `define-type` special form in `spec/evaluator.sx`:
`(define-type Name (Ctor1 field...) (Ctor2 field...) ...)`
@@ -683,6 +686,7 @@ Brief each language's loop agent (or do inline) after rebasing their branch onto
_Newest first._
- 2026-04-26: Phase 6 Design done — plans/designs/sx-adt.md written. Covers define-type/match syntax, AdtValue CEK runtime, stepSfDefineType+MatchFrame dispatch, exhaustiveness warnings, recursive types, nested patterns, wildcard _. 3-phase impl plan. Next fire: Spec implement define-type.
- 2026-04-26: Phase 5 complete — string buffer fully landed (d98b5fa2). 17 tests, 17/17 OCaml+JS. Phase 6 (ADTs) next.
- 2026-04-26: Phase 5 Spec+OCaml+JS step done — StringBuffer of Buffer.t in sx_types.ml; make-string-buffer/append!/->string/length/string-buffer? in sx_primitives.ml; SxStringBuffer with _string_buffer marker + typeOf/dict? fixes in platform.py; JS rebuilt. 17/17 tests OCaml+JS.
- 2026-04-26: Phase 4 complete — coroutine primitive fully landed (4 commits: spec library + OCaml verified + JS pre-load + 27 tests). Phase 5 (string buffer) next.

257
plans/designs/sx-adt.md Normal file
View File

@@ -0,0 +1,257 @@
# SX Algebraic Data Types — Design
## Motivation
Every language implementation currently uses `{:tag "..." :field ...}` tagged dicts to
simulate sum types. This is verbose, error-prone (typos in tag strings go undetected), and
produces no exhaustiveness warnings. Native ADTs eliminate the pattern everywhere.
Examples of current workarounds:
- Haskell `Maybe a``{:tag "Just" :value x}` / `{:tag "Nothing"}`
- Prolog terms → `{:tag "functor" :name "foo" :args (list x y)}`
- Lua result type → `{:tag "ok" :value v}` / `{:tag "err" :msg s}`
- Common Lisp `cons` pairs → `{:tag "cons" :car a :cdr b}`
---
## Syntax
### `define-type`
```lisp
(define-type Name
(Ctor1 field1 field2 ...)
(Ctor2 field1 ...)
...)
```
Creates:
- Constructor functions: `Ctor1`, `Ctor2`, … (callable like normal functions)
- Type predicate: `Name?` — returns true for any value of type `Name`
- Constructor predicates: `Ctor1?`, `Ctor2?`, … (optional, auto-generated)
- Field accessors: `Ctor1-field1`, `Ctor1-field2`, … (optional, auto-generated)
Examples:
```lisp
(define-type Maybe
(Just value)
(Nothing))
(define-type Result
(Ok value)
(Err message))
(define-type Tree
(Leaf)
(Node left value right))
(define-type List-of
(Nil-of)
(Cons-of head tail))
```
Constructors with no fields are zero-argument constructors (singletons by value):
```lisp
(Nothing) ; => #<Nothing>
(Leaf) ; => #<Leaf>
```
### `match`
```lisp
(match expr
((Ctor1 a b) body)
((Ctor2 x) body)
((Ctor3) body)
(else body))
```
- Clauses are tried in order; first match wins.
- `else` clause is optional but suppresses exhaustiveness warnings.
- Pattern variables (`a`, `b`, `x`) are bound in the body scope.
- Wildcard `_` discards the matched value.
- Literal patterns: `42`, `"str"`, `true`, `nil` — match by value equality.
- Nested patterns: `((Node left (Leaf) right) body)` — nested constructor patterns.
Examples:
```lisp
(match result
((Ok v) (str "got: " v))
((Err m) (str "error: " m)))
(match tree
((Leaf) 0)
((Node l v r) (+ 1 (tree-depth l) (tree-depth r))))
```
---
## CEK Dispatch
### Runtime representation
ADT values are OCaml records (not dicts) — opaque, non-inspectable via `get`:
```ocaml
type adt_value = {
av_type : string; (* type name, e.g. "Maybe" *)
av_ctor : string; (* constructor name, e.g. "Just" *)
av_fields: value array; (* positional fields *)
}
```
In JS: `{ _adt: true, _type: "Maybe", _ctor: "Just", _fields: [v] }`.
`typeOf` returns the ADT type name (e.g. `"Maybe"`).
### `define-type` — special form
`stepSfDefineType(args, env, kont)`:
1. Parse `Name` and list of `(CtorN field...)` clauses.
2. For each constructor `CtorK` with fields `[f1, f2, …]`:
- Register `CtorK` as a `NativeFn` that takes `|fields|` args and returns an `AdtValue`.
- Register `CtorK?` as a predicate (`AdtValue` with matching ctor name → `true`).
- Register `CtorK-fN` as field accessor (returns `av_fields[N]`).
3. Register `Name?` as a predicate (`AdtValue` with matching type name → `true`).
4. All bindings go into the current environment via `env-bind!`.
5. Returns `Nil`.
This is an environment mutation — no new frame needed. Evaluates in one step.
### `match` — special form
`stepSfMatch(args, env, kont)`:
1. Push `MatchFrame` with `clauses` and `env` onto kont.
2. Return state evaluating the scrutinee `expr`.
3. `MatchFrame` continue: receive scrutinee value, walk clauses:
- For each `((CtorN vars...) body)`:
- If scrutinee is an `AdtValue` with `av_ctor = "CtorN"` and `av_fields.length = |vars|`:
- Bind `vars[i]``av_fields[i]` in fresh child env.
- Return state evaluating `body` in that env.
- `(else body)` — always matches, body evaluated in current env.
- Literal `42`/`"str"` patterns: match by value equality.
- Wildcard `_`: always matches, binds nothing.
4. If no clause matched and no `else`: raise `"match: no clause matched <value>"`.
Frame type: `"match"` — stores `cf_remaining` (clauses), `cf_env` (enclosing env).
---
## Interaction with `cond` / `case`
`match` is the primary dispatch form for ADTs. `cond` / `case` remain unchanged:
- `cond` tests arbitrary boolean expressions — still useful for non-ADT dispatch.
- `case` matches on equality to literal values — unchanged.
- `match` is the new form: structural pattern matching on ADT constructors.
They are orthogonal. A `match` clause can contain a `cond`; a `cond` clause can contain a `match`.
---
## Exhaustiveness checking
Emit a **warning** (not an error) when:
- A `match` has no `else` clause, AND
- Not all constructors of the scrutinee's type are covered.
Detection: when `define-type` runs, it registers the constructor set in a global table
`_adt_registry: type_name → [ctor_names]`. At `match` compile/evaluation time:
- If the scrutinee's type is in `_adt_registry` and not all ctors appear as patterns:
- `console.warn("[sx] match: non-exhaustive — missing: Ctor3, Ctor4 for type Maybe")`
- Execution continues (warning, not error).
This is best-effort: the scrutinee type is only known at runtime. The warning fires on
first non-exhaustive match evaluation, not at definition time.
---
## Recursive types
Recursive types work because constructors are registered as functions, and function bodies
are evaluated lazily:
```lisp
(define-type Tree
(Leaf)
(Node left value right))
; Recursive function over a recursive type:
(define (depth tree)
(match tree
((Leaf) 0)
((Node l v r) (+ 1 (max (depth l) (depth r))))))
```
No special treatment needed — the type definition doesn't need to know about recursion.
The constructor `Node` accepts any values, including other `Node` or `Leaf` values.
---
## Pattern variables
In `match` clauses, identifiers in constructor position that are NOT constructor names are
treated as pattern variables (bound to matched field values):
```lisp
(match x
((Just v) v) ; v bound to the wrapped value
((Nothing) nil))
(match pair
((Cons-of h t) (list h t))) ; h, t bound to head and tail
```
**Wildcard**: `_` is always a wildcard — matches anything, binds nothing.
```lisp
(match x
((Just _) "has value")
((Nothing) "empty"))
```
**Nested patterns**:
```lisp
(match tree
((Node (Leaf) v (Leaf)) (str "leaf node: " v))
((Node l v r) (str "inner node: " v)))
```
Nested patterns are matched recursively: the inner `(Leaf)` pattern checks that the
`left` field is itself a `Leaf` ADT value.
---
## Implementation Plan
### Phase 6a — `define-type` + basic `match` (no nested patterns, no exhaustiveness)
1. OCaml: add `AdtValue of adt_value` to `sx_types.ml`.
2. Evaluator: add `step-sf-define-type` — parse clauses, register ctor fns + predicates + accessors.
3. Evaluator: add `step-sf-match` + `MatchFrame` — linear scan of clauses, flat patterns only.
4. JS: same (AdtValue as plain object with `_adt`/`_type`/`_ctor`/`_fields` props).
### Phase 6b — nested patterns (separate fire)
Recursive `matchPattern(pattern, value, env)` helper that:
- Returns `{matched: bool, bindings: map}`
- Recursively matches sub-patterns against ADT fields.
### Phase 6c — exhaustiveness warnings (separate fire)
`_adt_registry` global + warning emission on first non-exhaustive match.
---
## Open questions (deferred to review)
1. **Accessor auto-generation**: should `Ctor-field` accessors be generated always, or only on demand? Risk: name collisions if two types have constructors with same field names.
2. **Singleton constructors**: `(Nothing)` — zero-arg ctor — should these be interned (same object every call) or fresh each time? Interning enables `eq?` checks but requires a global table.
3. **Printing/inspect**: `inspect` on an AdtValue should show `(Just 42)` not `#<adt:Just>`. Implement in `inspect` function or via `display`/`write` (Phase 17 ports).
4. **Pattern-matching on non-ADT values**: should `match` handle list patterns `(a . b)` and literal patterns in clause heads? Deferred — add only if needed by a language implementation.