plan: Phase 6 ADT design doc — define-type/match syntax, CEK dispatch, exhaustiveness
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -171,9 +171,12 @@ Fix O(n²) string concatenation in loops across Lua, Ruby, Common Lisp, Tcl.
|
||||
The deepest structural gap. Every language uses `{:tag "..." :field ...}` tagged dicts to
|
||||
simulate sum types. A native `define-type` + `match` form eliminates this everywhere.
|
||||
|
||||
- [ ] Design: write `plans/designs/sx-adt.md` covering syntax, CEK dispatch, interaction with
|
||||
- [x] Design: write `plans/designs/sx-adt.md` covering syntax, CEK dispatch, interaction with
|
||||
existing `cond`/`case`, exhaustiveness checking, recursive types, pattern variables.
|
||||
Draft, then stop — next fire reviews design before implementing.
|
||||
Written: define-type/match syntax, AdtValue runtime rep, stepSfDefineType + MatchFrame
|
||||
CEK dispatch, exhaustiveness warnings via _adt_registry, recursive types, nested patterns,
|
||||
wildcard _, 3-phase impl plan (basic/nested/exhaustiveness), open questions on accessors/singletons/inspect.
|
||||
|
||||
- [ ] Spec: implement `define-type` special form in `spec/evaluator.sx`:
|
||||
`(define-type Name (Ctor1 field...) (Ctor2 field...) ...)`
|
||||
@@ -683,6 +686,7 @@ Brief each language's loop agent (or do inline) after rebasing their branch onto
|
||||
|
||||
_Newest first._
|
||||
|
||||
- 2026-04-26: Phase 6 Design done — plans/designs/sx-adt.md written. Covers define-type/match syntax, AdtValue CEK runtime, stepSfDefineType+MatchFrame dispatch, exhaustiveness warnings, recursive types, nested patterns, wildcard _. 3-phase impl plan. Next fire: Spec implement define-type.
|
||||
- 2026-04-26: Phase 5 complete — string buffer fully landed (d98b5fa2). 17 tests, 17/17 OCaml+JS. Phase 6 (ADTs) next.
|
||||
- 2026-04-26: Phase 5 Spec+OCaml+JS step done — StringBuffer of Buffer.t in sx_types.ml; make-string-buffer/append!/->string/length/string-buffer? in sx_primitives.ml; SxStringBuffer with _string_buffer marker + typeOf/dict? fixes in platform.py; JS rebuilt. 17/17 tests OCaml+JS.
|
||||
- 2026-04-26: Phase 4 complete — coroutine primitive fully landed (4 commits: spec library + OCaml verified + JS pre-load + 27 tests). Phase 5 (string buffer) next.
|
||||
|
||||
257
plans/designs/sx-adt.md
Normal file
257
plans/designs/sx-adt.md
Normal file
@@ -0,0 +1,257 @@
|
||||
# SX Algebraic Data Types — Design
|
||||
|
||||
## Motivation
|
||||
|
||||
Every language implementation currently uses `{:tag "..." :field ...}` tagged dicts to
|
||||
simulate sum types. This is verbose, error-prone (typos in tag strings go undetected), and
|
||||
produces no exhaustiveness warnings. Native ADTs eliminate the pattern everywhere.
|
||||
|
||||
Examples of current workarounds:
|
||||
- Haskell `Maybe a` → `{:tag "Just" :value x}` / `{:tag "Nothing"}`
|
||||
- Prolog terms → `{:tag "functor" :name "foo" :args (list x y)}`
|
||||
- Lua result type → `{:tag "ok" :value v}` / `{:tag "err" :msg s}`
|
||||
- Common Lisp `cons` pairs → `{:tag "cons" :car a :cdr b}`
|
||||
|
||||
---
|
||||
|
||||
## Syntax
|
||||
|
||||
### `define-type`
|
||||
|
||||
```lisp
|
||||
(define-type Name
|
||||
(Ctor1 field1 field2 ...)
|
||||
(Ctor2 field1 ...)
|
||||
...)
|
||||
```
|
||||
|
||||
Creates:
|
||||
- Constructor functions: `Ctor1`, `Ctor2`, … (callable like normal functions)
|
||||
- Type predicate: `Name?` — returns true for any value of type `Name`
|
||||
- Constructor predicates: `Ctor1?`, `Ctor2?`, … (optional, auto-generated)
|
||||
- Field accessors: `Ctor1-field1`, `Ctor1-field2`, … (optional, auto-generated)
|
||||
|
||||
Examples:
|
||||
|
||||
```lisp
|
||||
(define-type Maybe
|
||||
(Just value)
|
||||
(Nothing))
|
||||
|
||||
(define-type Result
|
||||
(Ok value)
|
||||
(Err message))
|
||||
|
||||
(define-type Tree
|
||||
(Leaf)
|
||||
(Node left value right))
|
||||
|
||||
(define-type List-of
|
||||
(Nil-of)
|
||||
(Cons-of head tail))
|
||||
```
|
||||
|
||||
Constructors with no fields are zero-argument constructors (singletons by value):
|
||||
|
||||
```lisp
|
||||
(Nothing) ; => #<Nothing>
|
||||
(Leaf) ; => #<Leaf>
|
||||
```
|
||||
|
||||
### `match`
|
||||
|
||||
```lisp
|
||||
(match expr
|
||||
((Ctor1 a b) body)
|
||||
((Ctor2 x) body)
|
||||
((Ctor3) body)
|
||||
(else body))
|
||||
```
|
||||
|
||||
- Clauses are tried in order; first match wins.
|
||||
- `else` clause is optional but suppresses exhaustiveness warnings.
|
||||
- Pattern variables (`a`, `b`, `x`) are bound in the body scope.
|
||||
- Wildcard `_` discards the matched value.
|
||||
- Literal patterns: `42`, `"str"`, `true`, `nil` — match by value equality.
|
||||
- Nested patterns: `((Node left (Leaf) right) body)` — nested constructor patterns.
|
||||
|
||||
Examples:
|
||||
|
||||
```lisp
|
||||
(match result
|
||||
((Ok v) (str "got: " v))
|
||||
((Err m) (str "error: " m)))
|
||||
|
||||
(match tree
|
||||
((Leaf) 0)
|
||||
((Node l v r) (+ 1 (tree-depth l) (tree-depth r))))
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## CEK Dispatch
|
||||
|
||||
### Runtime representation
|
||||
|
||||
ADT values are OCaml records (not dicts) — opaque, non-inspectable via `get`:
|
||||
|
||||
```ocaml
|
||||
type adt_value = {
|
||||
av_type : string; (* type name, e.g. "Maybe" *)
|
||||
av_ctor : string; (* constructor name, e.g. "Just" *)
|
||||
av_fields: value array; (* positional fields *)
|
||||
}
|
||||
```
|
||||
|
||||
In JS: `{ _adt: true, _type: "Maybe", _ctor: "Just", _fields: [v] }`.
|
||||
|
||||
`typeOf` returns the ADT type name (e.g. `"Maybe"`).
|
||||
|
||||
### `define-type` — special form
|
||||
|
||||
`stepSfDefineType(args, env, kont)`:
|
||||
|
||||
1. Parse `Name` and list of `(CtorN field...)` clauses.
|
||||
2. For each constructor `CtorK` with fields `[f1, f2, …]`:
|
||||
- Register `CtorK` as a `NativeFn` that takes `|fields|` args and returns an `AdtValue`.
|
||||
- Register `CtorK?` as a predicate (`AdtValue` with matching ctor name → `true`).
|
||||
- Register `CtorK-fN` as field accessor (returns `av_fields[N]`).
|
||||
3. Register `Name?` as a predicate (`AdtValue` with matching type name → `true`).
|
||||
4. All bindings go into the current environment via `env-bind!`.
|
||||
5. Returns `Nil`.
|
||||
|
||||
This is an environment mutation — no new frame needed. Evaluates in one step.
|
||||
|
||||
### `match` — special form
|
||||
|
||||
`stepSfMatch(args, env, kont)`:
|
||||
|
||||
1. Push `MatchFrame` with `clauses` and `env` onto kont.
|
||||
2. Return state evaluating the scrutinee `expr`.
|
||||
3. `MatchFrame` continue: receive scrutinee value, walk clauses:
|
||||
- For each `((CtorN vars...) body)`:
|
||||
- If scrutinee is an `AdtValue` with `av_ctor = "CtorN"` and `av_fields.length = |vars|`:
|
||||
- Bind `vars[i]` → `av_fields[i]` in fresh child env.
|
||||
- Return state evaluating `body` in that env.
|
||||
- `(else body)` — always matches, body evaluated in current env.
|
||||
- Literal `42`/`"str"` patterns: match by value equality.
|
||||
- Wildcard `_`: always matches, binds nothing.
|
||||
4. If no clause matched and no `else`: raise `"match: no clause matched <value>"`.
|
||||
|
||||
Frame type: `"match"` — stores `cf_remaining` (clauses), `cf_env` (enclosing env).
|
||||
|
||||
---
|
||||
|
||||
## Interaction with `cond` / `case`
|
||||
|
||||
`match` is the primary dispatch form for ADTs. `cond` / `case` remain unchanged:
|
||||
|
||||
- `cond` tests arbitrary boolean expressions — still useful for non-ADT dispatch.
|
||||
- `case` matches on equality to literal values — unchanged.
|
||||
- `match` is the new form: structural pattern matching on ADT constructors.
|
||||
|
||||
They are orthogonal. A `match` clause can contain a `cond`; a `cond` clause can contain a `match`.
|
||||
|
||||
---
|
||||
|
||||
## Exhaustiveness checking
|
||||
|
||||
Emit a **warning** (not an error) when:
|
||||
- A `match` has no `else` clause, AND
|
||||
- Not all constructors of the scrutinee's type are covered.
|
||||
|
||||
Detection: when `define-type` runs, it registers the constructor set in a global table
|
||||
`_adt_registry: type_name → [ctor_names]`. At `match` compile/evaluation time:
|
||||
- If the scrutinee's type is in `_adt_registry` and not all ctors appear as patterns:
|
||||
- `console.warn("[sx] match: non-exhaustive — missing: Ctor3, Ctor4 for type Maybe")`
|
||||
- Execution continues (warning, not error).
|
||||
|
||||
This is best-effort: the scrutinee type is only known at runtime. The warning fires on
|
||||
first non-exhaustive match evaluation, not at definition time.
|
||||
|
||||
---
|
||||
|
||||
## Recursive types
|
||||
|
||||
Recursive types work because constructors are registered as functions, and function bodies
|
||||
are evaluated lazily:
|
||||
|
||||
```lisp
|
||||
(define-type Tree
|
||||
(Leaf)
|
||||
(Node left value right))
|
||||
|
||||
; Recursive function over a recursive type:
|
||||
(define (depth tree)
|
||||
(match tree
|
||||
((Leaf) 0)
|
||||
((Node l v r) (+ 1 (max (depth l) (depth r))))))
|
||||
```
|
||||
|
||||
No special treatment needed — the type definition doesn't need to know about recursion.
|
||||
The constructor `Node` accepts any values, including other `Node` or `Leaf` values.
|
||||
|
||||
---
|
||||
|
||||
## Pattern variables
|
||||
|
||||
In `match` clauses, identifiers in constructor position that are NOT constructor names are
|
||||
treated as pattern variables (bound to matched field values):
|
||||
|
||||
```lisp
|
||||
(match x
|
||||
((Just v) v) ; v bound to the wrapped value
|
||||
((Nothing) nil))
|
||||
|
||||
(match pair
|
||||
((Cons-of h t) (list h t))) ; h, t bound to head and tail
|
||||
```
|
||||
|
||||
**Wildcard**: `_` is always a wildcard — matches anything, binds nothing.
|
||||
|
||||
```lisp
|
||||
(match x
|
||||
((Just _) "has value")
|
||||
((Nothing) "empty"))
|
||||
```
|
||||
|
||||
**Nested patterns**:
|
||||
|
||||
```lisp
|
||||
(match tree
|
||||
((Node (Leaf) v (Leaf)) (str "leaf node: " v))
|
||||
((Node l v r) (str "inner node: " v)))
|
||||
```
|
||||
|
||||
Nested patterns are matched recursively: the inner `(Leaf)` pattern checks that the
|
||||
`left` field is itself a `Leaf` ADT value.
|
||||
|
||||
---
|
||||
|
||||
## Implementation Plan
|
||||
|
||||
### Phase 6a — `define-type` + basic `match` (no nested patterns, no exhaustiveness)
|
||||
|
||||
1. OCaml: add `AdtValue of adt_value` to `sx_types.ml`.
|
||||
2. Evaluator: add `step-sf-define-type` — parse clauses, register ctor fns + predicates + accessors.
|
||||
3. Evaluator: add `step-sf-match` + `MatchFrame` — linear scan of clauses, flat patterns only.
|
||||
4. JS: same (AdtValue as plain object with `_adt`/`_type`/`_ctor`/`_fields` props).
|
||||
|
||||
### Phase 6b — nested patterns (separate fire)
|
||||
|
||||
Recursive `matchPattern(pattern, value, env)` helper that:
|
||||
- Returns `{matched: bool, bindings: map}`
|
||||
- Recursively matches sub-patterns against ADT fields.
|
||||
|
||||
### Phase 6c — exhaustiveness warnings (separate fire)
|
||||
|
||||
`_adt_registry` global + warning emission on first non-exhaustive match.
|
||||
|
||||
---
|
||||
|
||||
## Open questions (deferred to review)
|
||||
|
||||
1. **Accessor auto-generation**: should `Ctor-field` accessors be generated always, or only on demand? Risk: name collisions if two types have constructors with same field names.
|
||||
2. **Singleton constructors**: `(Nothing)` — zero-arg ctor — should these be interned (same object every call) or fresh each time? Interning enables `eq?` checks but requires a global table.
|
||||
3. **Printing/inspect**: `inspect` on an AdtValue should show `(Just 42)` not `#<adt:Just>`. Implement in `inspect` function or via `display`/`write` (Phase 17 ports).
|
||||
4. **Pattern-matching on non-ADT values**: should `match` handle list patterns `(a . b)` and literal patterns in clause heads? Deferred — add only if needed by a language implementation.
|
||||
Reference in New Issue
Block a user