Design + ops scaffolding for the next phase of work, none of it touching
substrate or guest code.
lib-guest.md: rewrites Architectural framing as a 5-layer stack
(substrate → lib/guest → languages → shared/ → applications),
recursive dependency-direction rule, scaled two-consumer rule. Adds
Phase B (long-running stratification) with sub-layer matrix
(core/typed/relational/effects/layout/lazy/oo), language profiles, and
the long-running-discipline section. Preserves existing Phase A
progress log and rules.
ocaml-on-sx.md: scope reduced to substrate validation + HM + reference
oracle. Phases 1-5 + minimal stdlib slice + vendored testsuite slice.
Dream carved out into dream-on-sx.md; Phase 8 (ReasonML) deferred.
Records lib-guest sequencing dependency.
datalog-on-sx.md: adds Phase 4 built-in predicates + body arithmetic,
Phase 6 magic sets, safety analysis in Phase 3, Non-goals section.
New chisel plans (forward-looking, not yet launchable):
kernel-on-sx.md — first-class everything, env-as-value endgame
idris-on-sx.md — dependent types, evidence chisel
probabilistic-on-sx.md — weighted nondeterminism + traces
maude-on-sx.md — rewriting as primitive
linear-on-sx.md — resource model, artdag-relevant
Loop briefings (4 active, 1 cold):
minikanren-loop.md, ocaml-loop.md, datalog-loop.md, elm-loop.md, koka-loop.md
Restore scripts mirror the loop pattern:
restore-{minikanren,ocaml,datalog,jit-perf,lib-guest}.sh
Each captures worktree state, plan progress, MCP health, tmux status.
Includes the .mcp.json absolute-path patch instruction (fresh worktrees
have no _build/, so the relative mcp_tree path fails on first launch).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
221 lines
15 KiB
Markdown
221 lines
15 KiB
Markdown
# OCaml-on-SX: substrate validation + HM + reference oracle
|
||
|
||
The strict-ML answer to "does the SX substrate really do what we claim it does?" OCaml has *exactly* the feature set SX was designed around — CEK, records, ADTs, exceptions, modules, refs, strict evaluation — so implementing it on SX is the strongest possible test of the substrate. Phase 5 also produces a real Hindley-Milner inferencer that feeds back into `lib/guest/hm.sx`, and the resulting OCaml interpreter serves as a reference oracle for every other guest language (when SX behavior is ambiguous, native OCaml answers).
|
||
|
||
**End-state goal:** OCaml Phases 1–5 running on the SX CEK, with a vendored slice of the official OCaml testsuite as the oracle corpus. HM extracted into `lib/guest/hm.sx` once Haskell-on-SX adopts it as second consumer.
|
||
|
||
**Out of scope (this plan):** Dream web framework — moved to `plans/dream-on-sx.md`, only spins up if a target user appears. Full standard library — only the minimal slice needed for substrate validation and the oracle role.
|
||
|
||
**Conditional:** ReasonML syntax variant (Phase 8) — kept in the plan but deferred until Phases 1–2 land and a decision is made to ship a user-facing OCaml.
|
||
|
||
## What this covers that nothing else in the set does
|
||
|
||
- **Strict ML semantics** — unlike Haskell, OCaml is call-by-value with explicit `Lazy.t` for laziness. Pattern match is exhaustive. Polymorphic variants. Structural equality.
|
||
- **First-class modules and functors** — modules as values (Phase 4); functors as SX higher-order functions over module records. Unlike Haskell typeclasses, OCaml's module system is explicit and compositional. **The hardest test of the substrate** — if Phase 4 takes 3000 lines instead of 800, the substrate is telling us something.
|
||
- **Mutable state without monads** — `ref`, `:=`, `!` are primitives. Arrays. `Hashtbl`. The IO model is direct.
|
||
- **Reference oracle** — when other guest languages disagree about a semantic edge case (HM in Haskell-on-SX vs in OCaml-on-SX, exception ordering, equality semantics), native OCaml is the tiebreaker. The vendored testsuite slice (Phase 5.1) makes this oracle role concrete.
|
||
|
||
## Sequencing dependency
|
||
|
||
**OCaml-on-SX should not start until lib-guest Steps 0–7 are complete.** OCaml's tokenizer should consume `lib/guest/lex.sx` (lib-guest Step 3); its precedence parser should consume `lib/guest/pratt.sx` (Step 4); its pattern matcher should consume `lib/guest/match.sx` (Step 6). Starting OCaml early means it hand-rolls these and never validates the abstraction — losing one of the main strategic payoffs.
|
||
|
||
Reciprocally, **lib-guest Step 8 (HM extraction) waits on OCaml-on-SX Phase 5** — extracting HM with only Haskell as consumer is speculative; with both Haskell and OCaml the two-language rule is satisfied for real.
|
||
|
||
## Ground rules
|
||
|
||
- **Scope:** only touch `lib/ocaml/**`, `lib/reasonml/**` (Phase 8 only), and `plans/ocaml-on-sx.md`. Do **not** edit `spec/`, `hosts/`, `shared/`, `lib/dream/**` (separate plan), or other `lib/<lang>/`.
|
||
- **Consume `lib/guest/`** wherever it covers a need (lex, pratt, match, ast). Hand-rolling instead of consuming defeats the substrate-validation goal.
|
||
- **Shared-file issues** go under "Blockers" below with a minimal repro; do not fix here.
|
||
- **SX files:** use `sx-tree` MCP tools only.
|
||
- **Architecture:** OCaml source → AST → SX AST → CEK. No standalone OCaml evaluator. The OCaml AST is walked by an `ocaml-eval` function in SX that produces SX values.
|
||
- **Type system:** deferred until Phase 5. Phases 1–4 are intentionally untyped — get the evaluator right first, then layer HM inference on top.
|
||
- **Commits:** one feature per commit. Keep `## Progress log` updated and tick boxes.
|
||
|
||
## Architecture sketch
|
||
|
||
```
|
||
OCaml source text
|
||
│
|
||
▼
|
||
lib/ocaml/tokenizer.sx — keywords, operators, string/char literals, comments
|
||
│ (built on lib/guest/lex.sx)
|
||
▼
|
||
lib/ocaml/parser.sx — OCaml AST: let/let rec, fun, match, if, begin/end,
|
||
│ module/struct/functor, type decls, expressions
|
||
│ (precedence via lib/guest/pratt.sx)
|
||
▼
|
||
lib/ocaml/desugar.sx — surface → core: tuple patterns, or-patterns,
|
||
│ sequence (;) → (do), when guards, field punning
|
||
▼
|
||
lib/ocaml/transpile.sx — OCaml AST → SX AST
|
||
│
|
||
▼
|
||
lib/ocaml/runtime.sx — ADT constructors, module primitives, ref/array ops,
|
||
│ minimal Stdlib shims (Phase 6)
|
||
▼
|
||
SX CEK evaluator (both JS and OCaml hosts)
|
||
```
|
||
|
||
## Semantic mappings
|
||
|
||
| OCaml construct | SX mapping |
|
||
|----------------|-----------|
|
||
| `let x = e` (top-level) | `(define x e)` |
|
||
| `let f x y = e` | `(define (f x y) e)` |
|
||
| `let rec f x = e` | `(define (f x) e)` — SX define is already recursive |
|
||
| `fun x -> e` | `(fn (x) e)` |
|
||
| `e1 \|> f` | `(f e1)` — pipe desugars to reverse application |
|
||
| `e1; e2` | `(do e1 e2)` |
|
||
| `begin e1; e2; e3 end` | `(do e1 e2 e3)` |
|
||
| `if c then e1 else e2` | `(if c e1 e2)` |
|
||
| `match x with \| P -> e` | `(match x (P e) ...)` via Phase 6 ADT primitive |
|
||
| `type t = A \| B of int` | `(define-type t (A) (B v))` |
|
||
| `module M = struct ... end` | SX dict `{:let-bindings ...}` — module as record |
|
||
| `functor (M : S) -> ...` | `(fn (M) ...)` — functor as SX lambda over module record |
|
||
| `open M` | inject M's bindings into scope via `env-merge` |
|
||
| `M.field` | `(get M :field)` |
|
||
| `{ r with f = v }` | `(dict-set r :f v)` |
|
||
| `ref x` | `(make-ref x)` — mutable cell |
|
||
| `!r` | `(deref-ref r)` |
|
||
| `r := v` | `(set-ref! r v)` |
|
||
| `(a, b, c)` | tagged list `(:tuple a b c)` |
|
||
| `[1; 2; 3]` | `(list 1 2 3)` |
|
||
| `[\| 1; 2; 3 \|]` | `(make-array 1 2 3)` (Phase 6) |
|
||
| `try e with \| Ex -> h` | `(guard (fn (ex) h) e)` via SX exception system |
|
||
| `raise Ex` | `(perform (:raise Ex))` |
|
||
| `Printf.sprintf "%d" x` | `(format "%d" x)` |
|
||
|
||
## Roadmap
|
||
|
||
### Phase 1 — Tokenizer + parser
|
||
|
||
- [ ] **Tokenizer** built on `lib/guest/lex.sx`: keywords (`let`, `rec`, `in`, `fun`, `function`, `match`, `with`, `type`, `of`, `module`, `struct`, `end`, `functor`, `sig`, `open`, `include`, `if`, `then`, `else`, `begin`, `try`, `exception`, `raise`, `mutable`, `for`, `while`, `do`, `done`, `and`, `as`, `when`), operators (`->`, `|>`, `<|`, `@@`, `@`, `:=`, `!`, `::`, `**`, `:`, `;`, `;;`), identifiers (lower, upper/ctor, labels `~label:`, optional `?label:`), char literals `'c'`, string literals (escaped + heredoc `{|...|}`), int/float literals, line comments `(*` nested block comments `*)`.
|
||
- [ ] **Parser** with precedence via `lib/guest/pratt.sx`: top-level `let`/`let rec`/`type`/`module`/`exception`/`open`/`include` declarations; expressions: literals, identifiers, constructor application, lambda, application (left-assoc), binary ops with precedence table, `if`/`then`/`else`, `match`/`with`, `try`/`with`, `let`/`in`, `begin`/`end`, `fun`/`function`, tuples, list literals, record literals/updates, field access, sequences `;`, unit `()`.
|
||
- [ ] **Patterns:** constructor, literal, variable, wildcard `_`, tuple, list cons `::`, list literal, record, `as`, or-pattern `P1 | P2`, `when` guard.
|
||
- [ ] OCaml is **not** indentation-sensitive — no layout algorithm needed.
|
||
- [ ] Tests in `lib/ocaml/tests/parse.sx` — 50+ round-trip parse tests.
|
||
|
||
### Phase 2 — Core evaluator (untyped)
|
||
|
||
- [ ] `ocaml-eval` entry: walks OCaml AST, produces SX values.
|
||
- [ ] `let`/`let rec`/`let ... in` (mutually recursive with `and`).
|
||
- [ ] Lambda + application (curried by default — auto-curry multi-param defs).
|
||
- [ ] `fun`/`function` (single-arg lambda with immediate match on arg).
|
||
- [ ] `if`/`then`/`else`, `begin`/`end`, sequence `;`.
|
||
- [ ] Arithmetic, comparison, boolean ops, string `^`, `mod`.
|
||
- [ ] Unit `()` value; `ignore`.
|
||
- [ ] References: `ref`, `!`, `:=`.
|
||
- [ ] Mutable record fields.
|
||
- [ ] `for i = lo to hi do ... done` loop; `while cond do ... done`.
|
||
- [ ] `try`/`with` — maps to SX `guard`; `raise` via perform.
|
||
- [ ] Tests in `lib/ocaml/tests/eval.sx` — 50+ tests, pure + imperative.
|
||
|
||
### Phase 3 — ADTs + pattern matching
|
||
|
||
- [ ] `type` declarations: `type t = A | B of t1 * t2 | C of { x: int }`.
|
||
- [ ] Constructors as tagged lists: `A` → `(:A)`, `B(1, "x")` → `(:B 1 "x")`.
|
||
- [ ] `match`/`with` consumes `lib/guest/match.sx`: constructor, literal, variable, wildcard, tuple, list cons/nil, `as` binding, or-patterns, nested patterns, `when` guard.
|
||
- [ ] Exhaustiveness: runtime error on incomplete match (no compile-time check yet).
|
||
- [ ] Built-in types: `option` (`None`/`Some`), `result` (`Ok`/`Error`), `list` (nil/cons), `bool`, `unit`, `exn`.
|
||
- [ ] `exception` declarations; built-in: `Not_found`, `Invalid_argument`, `Failure`, `Match_failure`.
|
||
- [ ] Polymorphic variants (surface syntax `` `Tag value ``; runtime same tagged list).
|
||
- [ ] Tests in `lib/ocaml/tests/adt.sx` — 40+ tests: ADTs, match, option/result.
|
||
|
||
### Phase 4 — Modules + functors
|
||
|
||
**The hardest test of the substrate.** First-class modules + functors are where the SX/CEK story either works elegantly or reveals a missing piece. Track line count vs equivalent OCaml stdlib implementations as the substrate-validation signal.
|
||
|
||
- [ ] `module M = struct let x = 1 let f y = x + y end` → SX dict `{:x 1 :f <fn>}`.
|
||
- [ ] `module type S = sig val x : int val f : int -> int end` → interface record (runtime stub; typed checking in Phase 5).
|
||
- [ ] `module M : S = struct ... end` — coercive sealing (runtime: pass-through).
|
||
- [ ] `functor (M : S) -> struct ... end` → SX `(fn (M) ...)`.
|
||
- [ ] `module F = Functor(Base)` — functor application.
|
||
- [ ] `open M` — merge M's dict into current env (`env-merge`).
|
||
- [ ] `include M` — same as open at structure level.
|
||
- [ ] `M.name` — dict get via `:name` key.
|
||
- [ ] First-class modules (pack/unpack) — deferred to Phase 5.
|
||
- [ ] Standard module hierarchy stubs: `List`, `Option`, `Result`, `String`, `Int`, `Printf`, `Hashtbl` (filled in Phase 6).
|
||
- [ ] Tests in `lib/ocaml/tests/modules.sx` — 30+ tests.
|
||
|
||
### Phase 5 — Hindley-Milner type inference
|
||
|
||
This is one of the headline payoffs of the whole plan. The inferencer built here is the seed of `lib/guest/hm.sx` (lib-guest Step 8) — once Haskell-on-SX adopts it as second consumer, it gets extracted.
|
||
|
||
- [ ] Algorithm W: `gen`/`inst`, `unify`, `infer-expr`, `infer-decl`.
|
||
- [ ] Type variables: `'a`, `'b`; unification with occur-check.
|
||
- [ ] Let-polymorphism: generalise at let-bindings.
|
||
- [ ] ADT types: `type 'a option = None | Some of 'a`.
|
||
- [ ] Function types, tuple types, record types.
|
||
- [ ] Type signatures: `val f : int -> int` — verify against inferred type.
|
||
- [ ] Module type checking: seal against `sig` (Phase 4 stubs become real checks).
|
||
- [ ] Error reporting: position-tagged errors with expected vs actual types.
|
||
- [ ] First-class modules: `(module M : S)` pack; `(val m : (module S))` unpack.
|
||
- [ ] No rank-2 polymorphism, no GADTs (out of scope).
|
||
- [ ] Tests in `lib/ocaml/tests/types.sx` — 60+ inference tests.
|
||
|
||
### Phase 5.1 — Vendor OCaml testsuite slice (oracle corpus)
|
||
|
||
The oracle role only works against a real test corpus. Vendor a slice of the official OCaml testsuite (from `ocaml/ocaml` `testsuite/tests/`).
|
||
|
||
- [ ] Pick ~100–200 tests covering: basic eval, ADTs, modules, functors, pattern matching, exceptions, refs, simple stdlib (List, Option, Result, String). Skip tests that depend on Phase 6 stdlib not implemented or on out-of-scope features (GADTs, objects, Lwt, Unix module, etc.).
|
||
- [ ] Vendored at `lib/ocaml/testsuite/` with a manifest of which tests are included and why each excluded test was dropped.
|
||
- [ ] `lib/ocaml/conformance.sh` runs the slice via the epoch protocol, writes `lib/ocaml/scoreboard.{json,md}`.
|
||
- [ ] Each iteration after Phase 5.1 lands: scoreboard is the regression bar, just like other guests.
|
||
- [ ] License: official OCaml testsuite is LGPL — confirm rose-ash repo can vendor LGPL test files (header preserved). If not, write equivalent tests from scratch sourced from the OCaml manual.
|
||
|
||
### Phase 6 — Minimal stdlib slice
|
||
|
||
**Trimmed from the original 150+ functions to ~30** — only what HM tests, the Phase 5.1 testsuite slice, and the oracle role need. Full stdlib (`Hashtbl.iter`, `Map.Make`, `Set.Make`, `Format`, `Sys`, `Bytes`, …) becomes a conditional follow-on if a target user appears.
|
||
|
||
- [ ] `List`: `map`, `filter`, `fold_left`, `fold_right`, `length`, `rev`, `append`, `iter`, `for_all`, `exists`, `find_opt`, `mem`.
|
||
- [ ] `Option`: `map`, `bind`, `get`, `value`, `is_none`, `is_some`.
|
||
- [ ] `Result`: `map`, `bind`, `get_ok`, `get_error`, `is_ok`, `is_error`.
|
||
- [ ] `String`: `length`, `sub`, `concat`, `split_on_char`, `trim`.
|
||
- [ ] `Printf`: `sprintf` only — wires to SX `(format ...)`.
|
||
- [ ] `Hashtbl`: `create`, `add`, `find_opt`, `replace`, `mem` — backed by SX mutable dict.
|
||
- [ ] Tests in `lib/ocaml/tests/stdlib.sx` — 40+ tests across the slice. Phase 5.1 testsuite slice exercises these in real programs.
|
||
|
||
### Phase 7 — Dream web framework
|
||
|
||
**Moved to `plans/dream-on-sx.md`.** Spins up only if a target user appears. The plan there inherits OCaml-on-SX Phases 1–5 + the Phase 6 slice plus whatever additional stdlib Dream needs (likely `Bytes`, `Format`, more `String`, `Sys.argv`).
|
||
|
||
### Phase 8 — ReasonML syntax variant `[deferred]`
|
||
|
||
`[deferred — depends on Phases 1–2 landing + decision to ship a user-facing OCaml]`.
|
||
|
||
ReasonML is OCaml with a JS-friendly surface: semicolons, `let` with `=` everywhere, `=>` for lambdas, `switch` for match, `{j|...|j}` string interpolation. Same semantics — different tokenizer + parser, same `lib/ocaml/transpile.sx` output.
|
||
|
||
The cheapest user-facing payoff in the plan but only worthwhile if there's a concrete user goal (e.g. JSX-flavoured frontend syntax for SX components, attracting React refugees). Don't start without that target.
|
||
|
||
- [ ] **Tokenizer** in `lib/reasonml/tokenizer.sx`: `let x = e;`, `(x, y) => e`, `switch (x) { | Pat => e | ... }`, JSX, `{j|hello $(name)|j}`, `let f : int => int = x => x + 1`.
|
||
- [ ] **Parser** in `lib/reasonml/parser.sx`: produce same OCaml AST nodes; JSX → SX component calls (`<Comp x=1 />` → `(~comp :x 1)`); auto-curry multi-arg.
|
||
- [ ] Shared transpiler delegates to `lib/ocaml/transpile.sx`.
|
||
- [ ] Tests in `lib/reasonml/tests/` — 40+.
|
||
|
||
## The meta-circular angle
|
||
|
||
SX is bootstrapped to OCaml (`hosts/ocaml/`). Running OCaml inside SX running on OCaml is the "mother tongue" closure: OCaml → SX → OCaml. This means:
|
||
|
||
- The OCaml host's native pattern matching and ADTs are exact reference semantics for the SX-level implementation — any mismatch is a bug.
|
||
- The SX `match` / `define-type` primitives were built knowing OCaml was the intended target.
|
||
- When debugging the transpiler, the OCaml REPL is always available as oracle.
|
||
- The vendored testsuite slice (Phase 5.1) makes the oracle role mechanical, not just rhetorical.
|
||
|
||
## Key dependencies
|
||
|
||
- **lib-guest Steps 0–7** — must complete before OCaml-on-SX starts. OCaml consumes `lib/guest/lex.sx`, `lib/guest/pratt.sx`, `lib/guest/match.sx`. Hand-rolling defeats the substrate-validation goal.
|
||
- **Phase 6 ADT primitive** (`define-type`/`match`) in the SX core — required before Phase 3.
|
||
- **HO forms** and first-class lambdas — already in spec, no blocker.
|
||
- **Module system** (Phase 4) is independent of type inference (Phase 5) — can overlap.
|
||
- **lib-guest Step 8** (HM extraction) — *waits on this plan's Phase 5*. The two are paired.
|
||
|
||
## Progress log
|
||
|
||
_Newest first._
|
||
|
||
_(awaiting lib-guest Steps 0–7)_
|
||
|
||
## Blockers
|
||
|
||
_(none yet)_
|