258 lines
8.0 KiB
Markdown
258 lines
8.0 KiB
Markdown
# SX Algebraic Data Types — Design
|
|
|
|
## Motivation
|
|
|
|
Every language implementation currently uses `{:tag "..." :field ...}` tagged dicts to
|
|
simulate sum types. This is verbose, error-prone (typos in tag strings go undetected), and
|
|
produces no exhaustiveness warnings. Native ADTs eliminate the pattern everywhere.
|
|
|
|
Examples of current workarounds:
|
|
- Haskell `Maybe a` → `{:tag "Just" :value x}` / `{:tag "Nothing"}`
|
|
- Prolog terms → `{:tag "functor" :name "foo" :args (list x y)}`
|
|
- Lua result type → `{:tag "ok" :value v}` / `{:tag "err" :msg s}`
|
|
- Common Lisp `cons` pairs → `{:tag "cons" :car a :cdr b}`
|
|
|
|
---
|
|
|
|
## Syntax
|
|
|
|
### `define-type`
|
|
|
|
```lisp
|
|
(define-type Name
|
|
(Ctor1 field1 field2 ...)
|
|
(Ctor2 field1 ...)
|
|
...)
|
|
```
|
|
|
|
Creates:
|
|
- Constructor functions: `Ctor1`, `Ctor2`, … (callable like normal functions)
|
|
- Type predicate: `Name?` — returns true for any value of type `Name`
|
|
- Constructor predicates: `Ctor1?`, `Ctor2?`, … (optional, auto-generated)
|
|
- Field accessors: `Ctor1-field1`, `Ctor1-field2`, … (optional, auto-generated)
|
|
|
|
Examples:
|
|
|
|
```lisp
|
|
(define-type Maybe
|
|
(Just value)
|
|
(Nothing))
|
|
|
|
(define-type Result
|
|
(Ok value)
|
|
(Err message))
|
|
|
|
(define-type Tree
|
|
(Leaf)
|
|
(Node left value right))
|
|
|
|
(define-type List-of
|
|
(Nil-of)
|
|
(Cons-of head tail))
|
|
```
|
|
|
|
Constructors with no fields are zero-argument constructors (singletons by value):
|
|
|
|
```lisp
|
|
(Nothing) ; => #<Nothing>
|
|
(Leaf) ; => #<Leaf>
|
|
```
|
|
|
|
### `match`
|
|
|
|
```lisp
|
|
(match expr
|
|
((Ctor1 a b) body)
|
|
((Ctor2 x) body)
|
|
((Ctor3) body)
|
|
(else body))
|
|
```
|
|
|
|
- Clauses are tried in order; first match wins.
|
|
- `else` clause is optional but suppresses exhaustiveness warnings.
|
|
- Pattern variables (`a`, `b`, `x`) are bound in the body scope.
|
|
- Wildcard `_` discards the matched value.
|
|
- Literal patterns: `42`, `"str"`, `true`, `nil` — match by value equality.
|
|
- Nested patterns: `((Node left (Leaf) right) body)` — nested constructor patterns.
|
|
|
|
Examples:
|
|
|
|
```lisp
|
|
(match result
|
|
((Ok v) (str "got: " v))
|
|
((Err m) (str "error: " m)))
|
|
|
|
(match tree
|
|
((Leaf) 0)
|
|
((Node l v r) (+ 1 (tree-depth l) (tree-depth r))))
|
|
```
|
|
|
|
---
|
|
|
|
## CEK Dispatch
|
|
|
|
### Runtime representation
|
|
|
|
ADT values are OCaml records (not dicts) — opaque, non-inspectable via `get`:
|
|
|
|
```ocaml
|
|
type adt_value = {
|
|
av_type : string; (* type name, e.g. "Maybe" *)
|
|
av_ctor : string; (* constructor name, e.g. "Just" *)
|
|
av_fields: value array; (* positional fields *)
|
|
}
|
|
```
|
|
|
|
In JS: `{ _adt: true, _type: "Maybe", _ctor: "Just", _fields: [v] }`.
|
|
|
|
`typeOf` returns the ADT type name (e.g. `"Maybe"`).
|
|
|
|
### `define-type` — special form
|
|
|
|
`stepSfDefineType(args, env, kont)`:
|
|
|
|
1. Parse `Name` and list of `(CtorN field...)` clauses.
|
|
2. For each constructor `CtorK` with fields `[f1, f2, …]`:
|
|
- Register `CtorK` as a `NativeFn` that takes `|fields|` args and returns an `AdtValue`.
|
|
- Register `CtorK?` as a predicate (`AdtValue` with matching ctor name → `true`).
|
|
- Register `CtorK-fN` as field accessor (returns `av_fields[N]`).
|
|
3. Register `Name?` as a predicate (`AdtValue` with matching type name → `true`).
|
|
4. All bindings go into the current environment via `env-bind!`.
|
|
5. Returns `Nil`.
|
|
|
|
This is an environment mutation — no new frame needed. Evaluates in one step.
|
|
|
|
### `match` — special form
|
|
|
|
`stepSfMatch(args, env, kont)`:
|
|
|
|
1. Push `MatchFrame` with `clauses` and `env` onto kont.
|
|
2. Return state evaluating the scrutinee `expr`.
|
|
3. `MatchFrame` continue: receive scrutinee value, walk clauses:
|
|
- For each `((CtorN vars...) body)`:
|
|
- If scrutinee is an `AdtValue` with `av_ctor = "CtorN"` and `av_fields.length = |vars|`:
|
|
- Bind `vars[i]` → `av_fields[i]` in fresh child env.
|
|
- Return state evaluating `body` in that env.
|
|
- `(else body)` — always matches, body evaluated in current env.
|
|
- Literal `42`/`"str"` patterns: match by value equality.
|
|
- Wildcard `_`: always matches, binds nothing.
|
|
4. If no clause matched and no `else`: raise `"match: no clause matched <value>"`.
|
|
|
|
Frame type: `"match"` — stores `cf_remaining` (clauses), `cf_env` (enclosing env).
|
|
|
|
---
|
|
|
|
## Interaction with `cond` / `case`
|
|
|
|
`match` is the primary dispatch form for ADTs. `cond` / `case` remain unchanged:
|
|
|
|
- `cond` tests arbitrary boolean expressions — still useful for non-ADT dispatch.
|
|
- `case` matches on equality to literal values — unchanged.
|
|
- `match` is the new form: structural pattern matching on ADT constructors.
|
|
|
|
They are orthogonal. A `match` clause can contain a `cond`; a `cond` clause can contain a `match`.
|
|
|
|
---
|
|
|
|
## Exhaustiveness checking
|
|
|
|
Emit a **warning** (not an error) when:
|
|
- A `match` has no `else` clause, AND
|
|
- Not all constructors of the scrutinee's type are covered.
|
|
|
|
Detection: when `define-type` runs, it registers the constructor set in a global table
|
|
`_adt_registry: type_name → [ctor_names]`. At `match` compile/evaluation time:
|
|
- If the scrutinee's type is in `_adt_registry` and not all ctors appear as patterns:
|
|
- `console.warn("[sx] match: non-exhaustive — missing: Ctor3, Ctor4 for type Maybe")`
|
|
- Execution continues (warning, not error).
|
|
|
|
This is best-effort: the scrutinee type is only known at runtime. The warning fires on
|
|
first non-exhaustive match evaluation, not at definition time.
|
|
|
|
---
|
|
|
|
## Recursive types
|
|
|
|
Recursive types work because constructors are registered as functions, and function bodies
|
|
are evaluated lazily:
|
|
|
|
```lisp
|
|
(define-type Tree
|
|
(Leaf)
|
|
(Node left value right))
|
|
|
|
; Recursive function over a recursive type:
|
|
(define (depth tree)
|
|
(match tree
|
|
((Leaf) 0)
|
|
((Node l v r) (+ 1 (max (depth l) (depth r))))))
|
|
```
|
|
|
|
No special treatment needed — the type definition doesn't need to know about recursion.
|
|
The constructor `Node` accepts any values, including other `Node` or `Leaf` values.
|
|
|
|
---
|
|
|
|
## Pattern variables
|
|
|
|
In `match` clauses, identifiers in constructor position that are NOT constructor names are
|
|
treated as pattern variables (bound to matched field values):
|
|
|
|
```lisp
|
|
(match x
|
|
((Just v) v) ; v bound to the wrapped value
|
|
((Nothing) nil))
|
|
|
|
(match pair
|
|
((Cons-of h t) (list h t))) ; h, t bound to head and tail
|
|
```
|
|
|
|
**Wildcard**: `_` is always a wildcard — matches anything, binds nothing.
|
|
|
|
```lisp
|
|
(match x
|
|
((Just _) "has value")
|
|
((Nothing) "empty"))
|
|
```
|
|
|
|
**Nested patterns**:
|
|
|
|
```lisp
|
|
(match tree
|
|
((Node (Leaf) v (Leaf)) (str "leaf node: " v))
|
|
((Node l v r) (str "inner node: " v)))
|
|
```
|
|
|
|
Nested patterns are matched recursively: the inner `(Leaf)` pattern checks that the
|
|
`left` field is itself a `Leaf` ADT value.
|
|
|
|
---
|
|
|
|
## Implementation Plan
|
|
|
|
### Phase 6a — `define-type` + basic `match` (no nested patterns, no exhaustiveness)
|
|
|
|
1. OCaml: add `AdtValue of adt_value` to `sx_types.ml`.
|
|
2. Evaluator: add `step-sf-define-type` — parse clauses, register ctor fns + predicates + accessors.
|
|
3. Evaluator: add `step-sf-match` + `MatchFrame` — linear scan of clauses, flat patterns only.
|
|
4. JS: same (AdtValue as plain object with `_adt`/`_type`/`_ctor`/`_fields` props).
|
|
|
|
### Phase 6b — nested patterns (separate fire)
|
|
|
|
Recursive `matchPattern(pattern, value, env)` helper that:
|
|
- Returns `{matched: bool, bindings: map}`
|
|
- Recursively matches sub-patterns against ADT fields.
|
|
|
|
### Phase 6c — exhaustiveness warnings (separate fire)
|
|
|
|
`_adt_registry` global + warning emission on first non-exhaustive match.
|
|
|
|
---
|
|
|
|
## Open questions (deferred to review)
|
|
|
|
1. **Accessor auto-generation**: should `Ctor-field` accessors be generated always, or only on demand? Risk: name collisions if two types have constructors with same field names.
|
|
2. **Singleton constructors**: `(Nothing)` — zero-arg ctor — should these be interned (same object every call) or fresh each time? Interning enables `eq?` checks but requires a global table.
|
|
3. **Printing/inspect**: `inspect` on an AdtValue should show `(Just 42)` not `#<adt:Just>`. Implement in `inspect` function or via `display`/`write` (Phase 17 ports).
|
|
4. **Pattern-matching on non-ADT values**: should `match` handle list patterns `(a . b)` and literal patterns in clause heads? Deferred — add only if needed by a language implementation.
|