# SX Algebraic Data Types — Design ## Motivation Every language implementation currently uses `{:tag "..." :field ...}` tagged dicts to simulate sum types. This is verbose, error-prone (typos in tag strings go undetected), and produces no exhaustiveness warnings. Native ADTs eliminate the pattern everywhere. Examples of current workarounds: - Haskell `Maybe a` → `{:tag "Just" :value x}` / `{:tag "Nothing"}` - Prolog terms → `{:tag "functor" :name "foo" :args (list x y)}` - Lua result type → `{:tag "ok" :value v}` / `{:tag "err" :msg s}` - Common Lisp `cons` pairs → `{:tag "cons" :car a :cdr b}` --- ## Syntax ### `define-type` ```lisp (define-type Name (Ctor1 field1 field2 ...) (Ctor2 field1 ...) ...) ``` Creates: - Constructor functions: `Ctor1`, `Ctor2`, … (callable like normal functions) - Type predicate: `Name?` — returns true for any value of type `Name` - Constructor predicates: `Ctor1?`, `Ctor2?`, … (optional, auto-generated) - Field accessors: `Ctor1-field1`, `Ctor1-field2`, … (optional, auto-generated) Examples: ```lisp (define-type Maybe (Just value) (Nothing)) (define-type Result (Ok value) (Err message)) (define-type Tree (Leaf) (Node left value right)) (define-type List-of (Nil-of) (Cons-of head tail)) ``` Constructors with no fields are zero-argument constructors (singletons by value): ```lisp (Nothing) ; => # (Leaf) ; => # ``` ### `match` ```lisp (match expr ((Ctor1 a b) body) ((Ctor2 x) body) ((Ctor3) body) (else body)) ``` - Clauses are tried in order; first match wins. - `else` clause is optional but suppresses exhaustiveness warnings. - Pattern variables (`a`, `b`, `x`) are bound in the body scope. - Wildcard `_` discards the matched value. - Literal patterns: `42`, `"str"`, `true`, `nil` — match by value equality. - Nested patterns: `((Node left (Leaf) right) body)` — nested constructor patterns. Examples: ```lisp (match result ((Ok v) (str "got: " v)) ((Err m) (str "error: " m))) (match tree ((Leaf) 0) ((Node l v r) (+ 1 (tree-depth l) (tree-depth r)))) ``` --- ## CEK Dispatch ### Runtime representation ADT values are OCaml records (not dicts) — opaque, non-inspectable via `get`: ```ocaml type adt_value = { av_type : string; (* type name, e.g. "Maybe" *) av_ctor : string; (* constructor name, e.g. "Just" *) av_fields: value array; (* positional fields *) } ``` In JS: `{ _adt: true, _type: "Maybe", _ctor: "Just", _fields: [v] }`. `typeOf` returns the ADT type name (e.g. `"Maybe"`). ### `define-type` — special form `stepSfDefineType(args, env, kont)`: 1. Parse `Name` and list of `(CtorN field...)` clauses. 2. For each constructor `CtorK` with fields `[f1, f2, …]`: - Register `CtorK` as a `NativeFn` that takes `|fields|` args and returns an `AdtValue`. - Register `CtorK?` as a predicate (`AdtValue` with matching ctor name → `true`). - Register `CtorK-fN` as field accessor (returns `av_fields[N]`). 3. Register `Name?` as a predicate (`AdtValue` with matching type name → `true`). 4. All bindings go into the current environment via `env-bind!`. 5. Returns `Nil`. This is an environment mutation — no new frame needed. Evaluates in one step. ### `match` — special form `stepSfMatch(args, env, kont)`: 1. Push `MatchFrame` with `clauses` and `env` onto kont. 2. Return state evaluating the scrutinee `expr`. 3. `MatchFrame` continue: receive scrutinee value, walk clauses: - For each `((CtorN vars...) body)`: - If scrutinee is an `AdtValue` with `av_ctor = "CtorN"` and `av_fields.length = |vars|`: - Bind `vars[i]` → `av_fields[i]` in fresh child env. - Return state evaluating `body` in that env. - `(else body)` — always matches, body evaluated in current env. - Literal `42`/`"str"` patterns: match by value equality. - Wildcard `_`: always matches, binds nothing. 4. If no clause matched and no `else`: raise `"match: no clause matched "`. Frame type: `"match"` — stores `cf_remaining` (clauses), `cf_env` (enclosing env). --- ## Interaction with `cond` / `case` `match` is the primary dispatch form for ADTs. `cond` / `case` remain unchanged: - `cond` tests arbitrary boolean expressions — still useful for non-ADT dispatch. - `case` matches on equality to literal values — unchanged. - `match` is the new form: structural pattern matching on ADT constructors. They are orthogonal. A `match` clause can contain a `cond`; a `cond` clause can contain a `match`. --- ## Exhaustiveness checking Emit a **warning** (not an error) when: - A `match` has no `else` clause, AND - Not all constructors of the scrutinee's type are covered. Detection: when `define-type` runs, it registers the constructor set in a global table `_adt_registry: type_name → [ctor_names]`. At `match` compile/evaluation time: - If the scrutinee's type is in `_adt_registry` and not all ctors appear as patterns: - `console.warn("[sx] match: non-exhaustive — missing: Ctor3, Ctor4 for type Maybe")` - Execution continues (warning, not error). This is best-effort: the scrutinee type is only known at runtime. The warning fires on first non-exhaustive match evaluation, not at definition time. --- ## Recursive types Recursive types work because constructors are registered as functions, and function bodies are evaluated lazily: ```lisp (define-type Tree (Leaf) (Node left value right)) ; Recursive function over a recursive type: (define (depth tree) (match tree ((Leaf) 0) ((Node l v r) (+ 1 (max (depth l) (depth r)))))) ``` No special treatment needed — the type definition doesn't need to know about recursion. The constructor `Node` accepts any values, including other `Node` or `Leaf` values. --- ## Pattern variables In `match` clauses, identifiers in constructor position that are NOT constructor names are treated as pattern variables (bound to matched field values): ```lisp (match x ((Just v) v) ; v bound to the wrapped value ((Nothing) nil)) (match pair ((Cons-of h t) (list h t))) ; h, t bound to head and tail ``` **Wildcard**: `_` is always a wildcard — matches anything, binds nothing. ```lisp (match x ((Just _) "has value") ((Nothing) "empty")) ``` **Nested patterns**: ```lisp (match tree ((Node (Leaf) v (Leaf)) (str "leaf node: " v)) ((Node l v r) (str "inner node: " v))) ``` Nested patterns are matched recursively: the inner `(Leaf)` pattern checks that the `left` field is itself a `Leaf` ADT value. --- ## Implementation Plan ### Phase 6a — `define-type` + basic `match` (no nested patterns, no exhaustiveness) 1. OCaml: add `AdtValue of adt_value` to `sx_types.ml`. 2. Evaluator: add `step-sf-define-type` — parse clauses, register ctor fns + predicates + accessors. 3. Evaluator: add `step-sf-match` + `MatchFrame` — linear scan of clauses, flat patterns only. 4. JS: same (AdtValue as plain object with `_adt`/`_type`/`_ctor`/`_fields` props). ### Phase 6b — nested patterns (separate fire) Recursive `matchPattern(pattern, value, env)` helper that: - Returns `{matched: bool, bindings: map}` - Recursively matches sub-patterns against ADT fields. ### Phase 6c — exhaustiveness warnings (separate fire) `_adt_registry` global + warning emission on first non-exhaustive match. --- ## Open questions (deferred to review) 1. **Accessor auto-generation**: should `Ctor-field` accessors be generated always, or only on demand? Risk: name collisions if two types have constructors with same field names. 2. **Singleton constructors**: `(Nothing)` — zero-arg ctor — should these be interned (same object every call) or fresh each time? Interning enables `eq?` checks but requires a global table. 3. **Printing/inspect**: `inspect` on an AdtValue should show `(Just 42)` not `#`. Implement in `inspect` function or via `display`/`write` (Phase 17 ports). 4. **Pattern-matching on non-ADT values**: should `match` handle list patterns `(a . b)` and literal patterns in clause heads? Deferred — add only if needed by a language implementation.