# OCaml-on-SX: substrate validation + HM + reference oracle The strict-ML answer to "does the SX substrate really do what we claim it does?" OCaml has *exactly* the feature set SX was designed around — CEK, records, ADTs, exceptions, modules, refs, strict evaluation — so implementing it on SX is the strongest possible test of the substrate. Phase 5 also produces a real Hindley-Milner inferencer that feeds back into `lib/guest/hm.sx`, and the resulting OCaml interpreter serves as a reference oracle for every other guest language (when SX behavior is ambiguous, native OCaml answers). **End-state goal:** OCaml Phases 1–5 running on the SX CEK, with a vendored slice of the official OCaml testsuite as the oracle corpus. HM extracted into `lib/guest/hm.sx` once Haskell-on-SX adopts it as second consumer. **Out of scope (this plan):** Dream web framework — moved to `plans/dream-on-sx.md`, only spins up if a target user appears. Full standard library — only the minimal slice needed for substrate validation and the oracle role. **Conditional:** ReasonML syntax variant (Phase 8) — kept in the plan but deferred until Phases 1–2 land and a decision is made to ship a user-facing OCaml. ## What this covers that nothing else in the set does - **Strict ML semantics** — unlike Haskell, OCaml is call-by-value with explicit `Lazy.t` for laziness. Pattern match is exhaustive. Polymorphic variants. Structural equality. - **First-class modules and functors** — modules as values (Phase 4); functors as SX higher-order functions over module records. Unlike Haskell typeclasses, OCaml's module system is explicit and compositional. **The hardest test of the substrate** — if Phase 4 takes 3000 lines instead of 800, the substrate is telling us something. - **Mutable state without monads** — `ref`, `:=`, `!` are primitives. Arrays. `Hashtbl`. The IO model is direct. - **Reference oracle** — when other guest languages disagree about a semantic edge case (HM in Haskell-on-SX vs in OCaml-on-SX, exception ordering, equality semantics), native OCaml is the tiebreaker. The vendored testsuite slice (Phase 5.1) makes this oracle role concrete. ## Sequencing dependency **OCaml-on-SX should not start until lib-guest Steps 0–7 are complete.** OCaml's tokenizer should consume `lib/guest/lex.sx` (lib-guest Step 3); its precedence parser should consume `lib/guest/pratt.sx` (Step 4); its pattern matcher should consume `lib/guest/match.sx` (Step 6). Starting OCaml early means it hand-rolls these and never validates the abstraction — losing one of the main strategic payoffs. Reciprocally, **lib-guest Step 8 (HM extraction) waits on OCaml-on-SX Phase 5** — extracting HM with only Haskell as consumer is speculative; with both Haskell and OCaml the two-language rule is satisfied for real. ## Ground rules - **Scope:** only touch `lib/ocaml/**`, `lib/reasonml/**` (Phase 8 only), and `plans/ocaml-on-sx.md`. Do **not** edit `spec/`, `hosts/`, `shared/`, `lib/dream/**` (separate plan), or other `lib//`. - **Consume `lib/guest/`** wherever it covers a need (lex, pratt, match, ast). Hand-rolling instead of consuming defeats the substrate-validation goal. - **Shared-file issues** go under "Blockers" below with a minimal repro; do not fix here. - **SX files:** use `sx-tree` MCP tools only. - **Architecture:** OCaml source → AST → SX AST → CEK. No standalone OCaml evaluator. The OCaml AST is walked by an `ocaml-eval` function in SX that produces SX values. - **Type system:** deferred until Phase 5. Phases 1–4 are intentionally untyped — get the evaluator right first, then layer HM inference on top. - **Commits:** one feature per commit. Keep `## Progress log` updated and tick boxes. ## Architecture sketch ``` OCaml source text │ ▼ lib/ocaml/tokenizer.sx — keywords, operators, string/char literals, comments │ (built on lib/guest/lex.sx) ▼ lib/ocaml/parser.sx — OCaml AST: let/let rec, fun, match, if, begin/end, │ module/struct/functor, type decls, expressions │ (precedence via lib/guest/pratt.sx) ▼ lib/ocaml/desugar.sx — surface → core: tuple patterns, or-patterns, │ sequence (;) → (do), when guards, field punning ▼ lib/ocaml/transpile.sx — OCaml AST → SX AST │ ▼ lib/ocaml/runtime.sx — ADT constructors, module primitives, ref/array ops, │ minimal Stdlib shims (Phase 6) ▼ SX CEK evaluator (both JS and OCaml hosts) ``` ## Semantic mappings | OCaml construct | SX mapping | |----------------|-----------| | `let x = e` (top-level) | `(define x e)` | | `let f x y = e` | `(define (f x y) e)` | | `let rec f x = e` | `(define (f x) e)` — SX define is already recursive | | `fun x -> e` | `(fn (x) e)` | | `e1 \|> f` | `(f e1)` — pipe desugars to reverse application | | `e1; e2` | `(do e1 e2)` | | `begin e1; e2; e3 end` | `(do e1 e2 e3)` | | `if c then e1 else e2` | `(if c e1 e2)` | | `match x with \| P -> e` | `(match x (P e) ...)` via Phase 6 ADT primitive | | `type t = A \| B of int` | `(define-type t (A) (B v))` | | `module M = struct ... end` | SX dict `{:let-bindings ...}` — module as record | | `functor (M : S) -> ...` | `(fn (M) ...)` — functor as SX lambda over module record | | `open M` | inject M's bindings into scope via `env-merge` | | `M.field` | `(get M :field)` | | `{ r with f = v }` | `(dict-set r :f v)` | | `ref x` | `(make-ref x)` — mutable cell | | `!r` | `(deref-ref r)` | | `r := v` | `(set-ref! r v)` | | `(a, b, c)` | tagged list `(:tuple a b c)` | | `[1; 2; 3]` | `(list 1 2 3)` | | `[\| 1; 2; 3 \|]` | `(make-array 1 2 3)` (Phase 6) | | `try e with \| Ex -> h` | `(guard (fn (ex) h) e)` via SX exception system | | `raise Ex` | `(perform (:raise Ex))` | | `Printf.sprintf "%d" x` | `(format "%d" x)` | ## Roadmap ### Phase 1 — Tokenizer + parser - [ ] **Tokenizer** built on `lib/guest/lex.sx`: keywords (`let`, `rec`, `in`, `fun`, `function`, `match`, `with`, `type`, `of`, `module`, `struct`, `end`, `functor`, `sig`, `open`, `include`, `if`, `then`, `else`, `begin`, `try`, `exception`, `raise`, `mutable`, `for`, `while`, `do`, `done`, `and`, `as`, `when`), operators (`->`, `|>`, `<|`, `@@`, `@`, `:=`, `!`, `::`, `**`, `:`, `;`, `;;`), identifiers (lower, upper/ctor, labels `~label:`, optional `?label:`), char literals `'c'`, string literals (escaped + heredoc `{|...|}`), int/float literals, line comments `(*` nested block comments `*)`. - [ ] **Parser** with precedence via `lib/guest/pratt.sx`: top-level `let`/`let rec`/`type`/`module`/`exception`/`open`/`include` declarations; expressions: literals, identifiers, constructor application, lambda, application (left-assoc), binary ops with precedence table, `if`/`then`/`else`, `match`/`with`, `try`/`with`, `let`/`in`, `begin`/`end`, `fun`/`function`, tuples, list literals, record literals/updates, field access, sequences `;`, unit `()`. - [ ] **Patterns:** constructor, literal, variable, wildcard `_`, tuple, list cons `::`, list literal, record, `as`, or-pattern `P1 | P2`, `when` guard. - [ ] OCaml is **not** indentation-sensitive — no layout algorithm needed. - [ ] Tests in `lib/ocaml/tests/parse.sx` — 50+ round-trip parse tests. ### Phase 2 — Core evaluator (untyped) - [ ] `ocaml-eval` entry: walks OCaml AST, produces SX values. - [ ] `let`/`let rec`/`let ... in` (mutually recursive with `and`). - [ ] Lambda + application (curried by default — auto-curry multi-param defs). - [ ] `fun`/`function` (single-arg lambda with immediate match on arg). - [ ] `if`/`then`/`else`, `begin`/`end`, sequence `;`. - [ ] Arithmetic, comparison, boolean ops, string `^`, `mod`. - [ ] Unit `()` value; `ignore`. - [ ] References: `ref`, `!`, `:=`. - [ ] Mutable record fields. - [ ] `for i = lo to hi do ... done` loop; `while cond do ... done`. - [ ] `try`/`with` — maps to SX `guard`; `raise` via perform. - [ ] Tests in `lib/ocaml/tests/eval.sx` — 50+ tests, pure + imperative. ### Phase 3 — ADTs + pattern matching - [ ] `type` declarations: `type t = A | B of t1 * t2 | C of { x: int }`. - [ ] Constructors as tagged lists: `A` → `(:A)`, `B(1, "x")` → `(:B 1 "x")`. - [ ] `match`/`with` consumes `lib/guest/match.sx`: constructor, literal, variable, wildcard, tuple, list cons/nil, `as` binding, or-patterns, nested patterns, `when` guard. - [ ] Exhaustiveness: runtime error on incomplete match (no compile-time check yet). - [ ] Built-in types: `option` (`None`/`Some`), `result` (`Ok`/`Error`), `list` (nil/cons), `bool`, `unit`, `exn`. - [ ] `exception` declarations; built-in: `Not_found`, `Invalid_argument`, `Failure`, `Match_failure`. - [ ] Polymorphic variants (surface syntax `` `Tag value ``; runtime same tagged list). - [ ] Tests in `lib/ocaml/tests/adt.sx` — 40+ tests: ADTs, match, option/result. ### Phase 4 — Modules + functors **The hardest test of the substrate.** First-class modules + functors are where the SX/CEK story either works elegantly or reveals a missing piece. Track line count vs equivalent OCaml stdlib implementations as the substrate-validation signal. - [ ] `module M = struct let x = 1 let f y = x + y end` → SX dict `{:x 1 :f }`. - [ ] `module type S = sig val x : int val f : int -> int end` → interface record (runtime stub; typed checking in Phase 5). - [ ] `module M : S = struct ... end` — coercive sealing (runtime: pass-through). - [ ] `functor (M : S) -> struct ... end` → SX `(fn (M) ...)`. - [ ] `module F = Functor(Base)` — functor application. - [ ] `open M` — merge M's dict into current env (`env-merge`). - [ ] `include M` — same as open at structure level. - [ ] `M.name` — dict get via `:name` key. - [ ] First-class modules (pack/unpack) — deferred to Phase 5. - [ ] Standard module hierarchy stubs: `List`, `Option`, `Result`, `String`, `Int`, `Printf`, `Hashtbl` (filled in Phase 6). - [ ] Tests in `lib/ocaml/tests/modules.sx` — 30+ tests. ### Phase 5 — Hindley-Milner type inference This is one of the headline payoffs of the whole plan. The inferencer built here is the seed of `lib/guest/hm.sx` (lib-guest Step 8) — once Haskell-on-SX adopts it as second consumer, it gets extracted. - [ ] Algorithm W: `gen`/`inst`, `unify`, `infer-expr`, `infer-decl`. - [ ] Type variables: `'a`, `'b`; unification with occur-check. - [ ] Let-polymorphism: generalise at let-bindings. - [ ] ADT types: `type 'a option = None | Some of 'a`. - [ ] Function types, tuple types, record types. - [ ] Type signatures: `val f : int -> int` — verify against inferred type. - [ ] Module type checking: seal against `sig` (Phase 4 stubs become real checks). - [ ] Error reporting: position-tagged errors with expected vs actual types. - [ ] First-class modules: `(module M : S)` pack; `(val m : (module S))` unpack. - [ ] No rank-2 polymorphism, no GADTs (out of scope). - [ ] Tests in `lib/ocaml/tests/types.sx` — 60+ inference tests. ### Phase 5.1 — Vendor OCaml testsuite slice (oracle corpus) The oracle role only works against a real test corpus. Vendor a slice of the official OCaml testsuite (from `ocaml/ocaml` `testsuite/tests/`). - [ ] Pick ~100–200 tests covering: basic eval, ADTs, modules, functors, pattern matching, exceptions, refs, simple stdlib (List, Option, Result, String). Skip tests that depend on Phase 6 stdlib not implemented or on out-of-scope features (GADTs, objects, Lwt, Unix module, etc.). - [ ] Vendored at `lib/ocaml/testsuite/` with a manifest of which tests are included and why each excluded test was dropped. - [ ] `lib/ocaml/conformance.sh` runs the slice via the epoch protocol, writes `lib/ocaml/scoreboard.{json,md}`. - [ ] Each iteration after Phase 5.1 lands: scoreboard is the regression bar, just like other guests. - [ ] License: official OCaml testsuite is LGPL — confirm rose-ash repo can vendor LGPL test files (header preserved). If not, write equivalent tests from scratch sourced from the OCaml manual. ### Phase 6 — Minimal stdlib slice **Trimmed from the original 150+ functions to ~30** — only what HM tests, the Phase 5.1 testsuite slice, and the oracle role need. Full stdlib (`Hashtbl.iter`, `Map.Make`, `Set.Make`, `Format`, `Sys`, `Bytes`, …) becomes a conditional follow-on if a target user appears. - [ ] `List`: `map`, `filter`, `fold_left`, `fold_right`, `length`, `rev`, `append`, `iter`, `for_all`, `exists`, `find_opt`, `mem`. - [ ] `Option`: `map`, `bind`, `get`, `value`, `is_none`, `is_some`. - [ ] `Result`: `map`, `bind`, `get_ok`, `get_error`, `is_ok`, `is_error`. - [ ] `String`: `length`, `sub`, `concat`, `split_on_char`, `trim`. - [ ] `Printf`: `sprintf` only — wires to SX `(format ...)`. - [ ] `Hashtbl`: `create`, `add`, `find_opt`, `replace`, `mem` — backed by SX mutable dict. - [ ] Tests in `lib/ocaml/tests/stdlib.sx` — 40+ tests across the slice. Phase 5.1 testsuite slice exercises these in real programs. ### Phase 7 — Dream web framework **Moved to `plans/dream-on-sx.md`.** Spins up only if a target user appears. The plan there inherits OCaml-on-SX Phases 1–5 + the Phase 6 slice plus whatever additional stdlib Dream needs (likely `Bytes`, `Format`, more `String`, `Sys.argv`). ### Phase 8 — ReasonML syntax variant `[deferred]` `[deferred — depends on Phases 1–2 landing + decision to ship a user-facing OCaml]`. ReasonML is OCaml with a JS-friendly surface: semicolons, `let` with `=` everywhere, `=>` for lambdas, `switch` for match, `{j|...|j}` string interpolation. Same semantics — different tokenizer + parser, same `lib/ocaml/transpile.sx` output. The cheapest user-facing payoff in the plan but only worthwhile if there's a concrete user goal (e.g. JSX-flavoured frontend syntax for SX components, attracting React refugees). Don't start without that target. - [ ] **Tokenizer** in `lib/reasonml/tokenizer.sx`: `let x = e;`, `(x, y) => e`, `switch (x) { | Pat => e | ... }`, JSX, `{j|hello $(name)|j}`, `let f : int => int = x => x + 1`. - [ ] **Parser** in `lib/reasonml/parser.sx`: produce same OCaml AST nodes; JSX → SX component calls (`` → `(~comp :x 1)`); auto-curry multi-arg. - [ ] Shared transpiler delegates to `lib/ocaml/transpile.sx`. - [ ] Tests in `lib/reasonml/tests/` — 40+. ## The meta-circular angle SX is bootstrapped to OCaml (`hosts/ocaml/`). Running OCaml inside SX running on OCaml is the "mother tongue" closure: OCaml → SX → OCaml. This means: - The OCaml host's native pattern matching and ADTs are exact reference semantics for the SX-level implementation — any mismatch is a bug. - The SX `match` / `define-type` primitives were built knowing OCaml was the intended target. - When debugging the transpiler, the OCaml REPL is always available as oracle. - The vendored testsuite slice (Phase 5.1) makes the oracle role mechanical, not just rhetorical. ## Key dependencies - **lib-guest Steps 0–7** — must complete before OCaml-on-SX starts. OCaml consumes `lib/guest/lex.sx`, `lib/guest/pratt.sx`, `lib/guest/match.sx`. Hand-rolling defeats the substrate-validation goal. - **Phase 6 ADT primitive** (`define-type`/`match`) in the SX core — required before Phase 3. - **HO forms** and first-class lambdas — already in spec, no blocker. - **Module system** (Phase 4) is independent of type inference (Phase 5) — can overlap. - **lib-guest Step 8** (HM extraction) — *waits on this plan's Phase 5*. The two are paired. ## Progress log _Newest first._ _(awaiting lib-guest Steps 0–7)_ ## Blockers _(none yet)_