8.0 KiB
SX Algebraic Data Types — Design
Motivation
Every language implementation currently uses {:tag "..." :field ...} tagged dicts to
simulate sum types. This is verbose, error-prone (typos in tag strings go undetected), and
produces no exhaustiveness warnings. Native ADTs eliminate the pattern everywhere.
Examples of current workarounds:
- Haskell
Maybe a→{:tag "Just" :value x}/{:tag "Nothing"} - Prolog terms →
{:tag "functor" :name "foo" :args (list x y)} - Lua result type →
{:tag "ok" :value v}/{:tag "err" :msg s} - Common Lisp
conspairs →{:tag "cons" :car a :cdr b}
Syntax
define-type
(define-type Name
(Ctor1 field1 field2 ...)
(Ctor2 field1 ...)
...)
Creates:
- Constructor functions:
Ctor1,Ctor2, … (callable like normal functions) - Type predicate:
Name?— returns true for any value of typeName - Constructor predicates:
Ctor1?,Ctor2?, … (optional, auto-generated) - Field accessors:
Ctor1-field1,Ctor1-field2, … (optional, auto-generated)
Examples:
(define-type Maybe
(Just value)
(Nothing))
(define-type Result
(Ok value)
(Err message))
(define-type Tree
(Leaf)
(Node left value right))
(define-type List-of
(Nil-of)
(Cons-of head tail))
Constructors with no fields are zero-argument constructors (singletons by value):
(Nothing) ; => #<Nothing>
(Leaf) ; => #<Leaf>
match
(match expr
((Ctor1 a b) body)
((Ctor2 x) body)
((Ctor3) body)
(else body))
- Clauses are tried in order; first match wins.
elseclause is optional but suppresses exhaustiveness warnings.- Pattern variables (
a,b,x) are bound in the body scope. - Wildcard
_discards the matched value. - Literal patterns:
42,"str",true,nil— match by value equality. - Nested patterns:
((Node left (Leaf) right) body)— nested constructor patterns.
Examples:
(match result
((Ok v) (str "got: " v))
((Err m) (str "error: " m)))
(match tree
((Leaf) 0)
((Node l v r) (+ 1 (tree-depth l) (tree-depth r))))
CEK Dispatch
Runtime representation
ADT values are OCaml records (not dicts) — opaque, non-inspectable via get:
type adt_value = {
av_type : string; (* type name, e.g. "Maybe" *)
av_ctor : string; (* constructor name, e.g. "Just" *)
av_fields: value array; (* positional fields *)
}
In JS: { _adt: true, _type: "Maybe", _ctor: "Just", _fields: [v] }.
typeOf returns the ADT type name (e.g. "Maybe").
define-type — special form
stepSfDefineType(args, env, kont):
- Parse
Nameand list of(CtorN field...)clauses. - For each constructor
CtorKwith fields[f1, f2, …]:- Register
CtorKas aNativeFnthat takes|fields|args and returns anAdtValue. - Register
CtorK?as a predicate (AdtValuewith matching ctor name →true). - Register
CtorK-fNas field accessor (returnsav_fields[N]).
- Register
- Register
Name?as a predicate (AdtValuewith matching type name →true). - All bindings go into the current environment via
env-bind!. - Returns
Nil.
This is an environment mutation — no new frame needed. Evaluates in one step.
match — special form
stepSfMatch(args, env, kont):
- Push
MatchFramewithclausesandenvonto kont. - Return state evaluating the scrutinee
expr. MatchFramecontinue: receive scrutinee value, walk clauses:- For each
((CtorN vars...) body):- If scrutinee is an
AdtValuewithav_ctor = "CtorN"andav_fields.length = |vars|:- Bind
vars[i]→av_fields[i]in fresh child env. - Return state evaluating
bodyin that env.
- Bind
- If scrutinee is an
(else body)— always matches, body evaluated in current env.- Literal
42/"str"patterns: match by value equality. - Wildcard
_: always matches, binds nothing.
- For each
- If no clause matched and no
else: raise"match: no clause matched <value>".
Frame type: "match" — stores cf_remaining (clauses), cf_env (enclosing env).
Interaction with cond / case
match is the primary dispatch form for ADTs. cond / case remain unchanged:
condtests arbitrary boolean expressions — still useful for non-ADT dispatch.casematches on equality to literal values — unchanged.matchis the new form: structural pattern matching on ADT constructors.
They are orthogonal. A match clause can contain a cond; a cond clause can contain a match.
Exhaustiveness checking
Emit a warning (not an error) when:
- A
matchhas noelseclause, AND - Not all constructors of the scrutinee's type are covered.
Detection: when define-type runs, it registers the constructor set in a global table
_adt_registry: type_name → [ctor_names]. At match compile/evaluation time:
- If the scrutinee's type is in
_adt_registryand not all ctors appear as patterns:console.warn("[sx] match: non-exhaustive — missing: Ctor3, Ctor4 for type Maybe")- Execution continues (warning, not error).
This is best-effort: the scrutinee type is only known at runtime. The warning fires on first non-exhaustive match evaluation, not at definition time.
Recursive types
Recursive types work because constructors are registered as functions, and function bodies are evaluated lazily:
(define-type Tree
(Leaf)
(Node left value right))
; Recursive function over a recursive type:
(define (depth tree)
(match tree
((Leaf) 0)
((Node l v r) (+ 1 (max (depth l) (depth r))))))
No special treatment needed — the type definition doesn't need to know about recursion.
The constructor Node accepts any values, including other Node or Leaf values.
Pattern variables
In match clauses, identifiers in constructor position that are NOT constructor names are
treated as pattern variables (bound to matched field values):
(match x
((Just v) v) ; v bound to the wrapped value
((Nothing) nil))
(match pair
((Cons-of h t) (list h t))) ; h, t bound to head and tail
Wildcard: _ is always a wildcard — matches anything, binds nothing.
(match x
((Just _) "has value")
((Nothing) "empty"))
Nested patterns:
(match tree
((Node (Leaf) v (Leaf)) (str "leaf node: " v))
((Node l v r) (str "inner node: " v)))
Nested patterns are matched recursively: the inner (Leaf) pattern checks that the
left field is itself a Leaf ADT value.
Implementation Plan
Phase 6a — define-type + basic match (no nested patterns, no exhaustiveness)
- OCaml: add
AdtValue of adt_valuetosx_types.ml. - Evaluator: add
step-sf-define-type— parse clauses, register ctor fns + predicates + accessors. - Evaluator: add
step-sf-match+MatchFrame— linear scan of clauses, flat patterns only. - JS: same (AdtValue as plain object with
_adt/_type/_ctor/_fieldsprops).
Phase 6b — nested patterns (separate fire)
Recursive matchPattern(pattern, value, env) helper that:
- Returns
{matched: bool, bindings: map} - Recursively matches sub-patterns against ADT fields.
Phase 6c — exhaustiveness warnings (separate fire)
_adt_registry global + warning emission on first non-exhaustive match.
Open questions (deferred to review)
- Accessor auto-generation: should
Ctor-fieldaccessors be generated always, or only on demand? Risk: name collisions if two types have constructors with same field names. - Singleton constructors:
(Nothing)— zero-arg ctor — should these be interned (same object every call) or fresh each time? Interning enableseq?checks but requires a global table. - Printing/inspect:
inspecton an AdtValue should show(Just 42)not#<adt:Just>. Implement ininspectfunction or viadisplay/write(Phase 17 ports). - Pattern-matching on non-ADT values: should
matchhandle list patterns(a . b)and literal patterns in clause heads? Deferred — add only if needed by a language implementation.