Files
rose-ash/plans/ocaml-on-sx.md
giles 9dd9fb9c37 plans: layered-stack framing + chisel sequence + loop scaffolding
Design + ops scaffolding for the next phase of work, none of it touching
substrate or guest code.

lib-guest.md: rewrites Architectural framing as a 5-layer stack
  (substrate → lib/guest → languages → shared/ → applications),
  recursive dependency-direction rule, scaled two-consumer rule. Adds
  Phase B (long-running stratification) with sub-layer matrix
  (core/typed/relational/effects/layout/lazy/oo), language profiles, and
  the long-running-discipline section. Preserves existing Phase A
  progress log and rules.

ocaml-on-sx.md: scope reduced to substrate validation + HM + reference
  oracle. Phases 1-5 + minimal stdlib slice + vendored testsuite slice.
  Dream carved out into dream-on-sx.md; Phase 8 (ReasonML) deferred.
  Records lib-guest sequencing dependency.

datalog-on-sx.md: adds Phase 4 built-in predicates + body arithmetic,
  Phase 6 magic sets, safety analysis in Phase 3, Non-goals section.

New chisel plans (forward-looking, not yet launchable):
  kernel-on-sx.md       — first-class everything, env-as-value endgame
  idris-on-sx.md        — dependent types, evidence chisel
  probabilistic-on-sx.md — weighted nondeterminism + traces
  maude-on-sx.md        — rewriting as primitive
  linear-on-sx.md       — resource model, artdag-relevant

Loop briefings (4 active, 1 cold):
  minikanren-loop.md, ocaml-loop.md, datalog-loop.md, elm-loop.md, koka-loop.md

Restore scripts mirror the loop pattern:
  restore-{minikanren,ocaml,datalog,jit-perf,lib-guest}.sh
  Each captures worktree state, plan progress, MCP health, tmux status.
  Includes the .mcp.json absolute-path patch instruction (fresh worktrees
  have no _build/, so the relative mcp_tree path fails on first launch).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 22:27:50 +00:00

221 lines
15 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# OCaml-on-SX: substrate validation + HM + reference oracle
The strict-ML answer to "does the SX substrate really do what we claim it does?" OCaml has *exactly* the feature set SX was designed around — CEK, records, ADTs, exceptions, modules, refs, strict evaluation — so implementing it on SX is the strongest possible test of the substrate. Phase 5 also produces a real Hindley-Milner inferencer that feeds back into `lib/guest/hm.sx`, and the resulting OCaml interpreter serves as a reference oracle for every other guest language (when SX behavior is ambiguous, native OCaml answers).
**End-state goal:** OCaml Phases 15 running on the SX CEK, with a vendored slice of the official OCaml testsuite as the oracle corpus. HM extracted into `lib/guest/hm.sx` once Haskell-on-SX adopts it as second consumer.
**Out of scope (this plan):** Dream web framework — moved to `plans/dream-on-sx.md`, only spins up if a target user appears. Full standard library — only the minimal slice needed for substrate validation and the oracle role.
**Conditional:** ReasonML syntax variant (Phase 8) — kept in the plan but deferred until Phases 12 land and a decision is made to ship a user-facing OCaml.
## What this covers that nothing else in the set does
- **Strict ML semantics** — unlike Haskell, OCaml is call-by-value with explicit `Lazy.t` for laziness. Pattern match is exhaustive. Polymorphic variants. Structural equality.
- **First-class modules and functors** — modules as values (Phase 4); functors as SX higher-order functions over module records. Unlike Haskell typeclasses, OCaml's module system is explicit and compositional. **The hardest test of the substrate** — if Phase 4 takes 3000 lines instead of 800, the substrate is telling us something.
- **Mutable state without monads** — `ref`, `:=`, `!` are primitives. Arrays. `Hashtbl`. The IO model is direct.
- **Reference oracle** — when other guest languages disagree about a semantic edge case (HM in Haskell-on-SX vs in OCaml-on-SX, exception ordering, equality semantics), native OCaml is the tiebreaker. The vendored testsuite slice (Phase 5.1) makes this oracle role concrete.
## Sequencing dependency
**OCaml-on-SX should not start until lib-guest Steps 07 are complete.** OCaml's tokenizer should consume `lib/guest/lex.sx` (lib-guest Step 3); its precedence parser should consume `lib/guest/pratt.sx` (Step 4); its pattern matcher should consume `lib/guest/match.sx` (Step 6). Starting OCaml early means it hand-rolls these and never validates the abstraction — losing one of the main strategic payoffs.
Reciprocally, **lib-guest Step 8 (HM extraction) waits on OCaml-on-SX Phase 5** — extracting HM with only Haskell as consumer is speculative; with both Haskell and OCaml the two-language rule is satisfied for real.
## Ground rules
- **Scope:** only touch `lib/ocaml/**`, `lib/reasonml/**` (Phase 8 only), and `plans/ocaml-on-sx.md`. Do **not** edit `spec/`, `hosts/`, `shared/`, `lib/dream/**` (separate plan), or other `lib/<lang>/`.
- **Consume `lib/guest/`** wherever it covers a need (lex, pratt, match, ast). Hand-rolling instead of consuming defeats the substrate-validation goal.
- **Shared-file issues** go under "Blockers" below with a minimal repro; do not fix here.
- **SX files:** use `sx-tree` MCP tools only.
- **Architecture:** OCaml source → AST → SX AST → CEK. No standalone OCaml evaluator. The OCaml AST is walked by an `ocaml-eval` function in SX that produces SX values.
- **Type system:** deferred until Phase 5. Phases 14 are intentionally untyped — get the evaluator right first, then layer HM inference on top.
- **Commits:** one feature per commit. Keep `## Progress log` updated and tick boxes.
## Architecture sketch
```
OCaml source text
lib/ocaml/tokenizer.sx — keywords, operators, string/char literals, comments
│ (built on lib/guest/lex.sx)
lib/ocaml/parser.sx — OCaml AST: let/let rec, fun, match, if, begin/end,
│ module/struct/functor, type decls, expressions
│ (precedence via lib/guest/pratt.sx)
lib/ocaml/desugar.sx — surface → core: tuple patterns, or-patterns,
│ sequence (;) → (do), when guards, field punning
lib/ocaml/transpile.sx — OCaml AST → SX AST
lib/ocaml/runtime.sx — ADT constructors, module primitives, ref/array ops,
│ minimal Stdlib shims (Phase 6)
SX CEK evaluator (both JS and OCaml hosts)
```
## Semantic mappings
| OCaml construct | SX mapping |
|----------------|-----------|
| `let x = e` (top-level) | `(define x e)` |
| `let f x y = e` | `(define (f x y) e)` |
| `let rec f x = e` | `(define (f x) e)` — SX define is already recursive |
| `fun x -> e` | `(fn (x) e)` |
| `e1 \|> f` | `(f e1)` — pipe desugars to reverse application |
| `e1; e2` | `(do e1 e2)` |
| `begin e1; e2; e3 end` | `(do e1 e2 e3)` |
| `if c then e1 else e2` | `(if c e1 e2)` |
| `match x with \| P -> e` | `(match x (P e) ...)` via Phase 6 ADT primitive |
| `type t = A \| B of int` | `(define-type t (A) (B v))` |
| `module M = struct ... end` | SX dict `{:let-bindings ...}` — module as record |
| `functor (M : S) -> ...` | `(fn (M) ...)` — functor as SX lambda over module record |
| `open M` | inject M's bindings into scope via `env-merge` |
| `M.field` | `(get M :field)` |
| `{ r with f = v }` | `(dict-set r :f v)` |
| `ref x` | `(make-ref x)` — mutable cell |
| `!r` | `(deref-ref r)` |
| `r := v` | `(set-ref! r v)` |
| `(a, b, c)` | tagged list `(:tuple a b c)` |
| `[1; 2; 3]` | `(list 1 2 3)` |
| `[\| 1; 2; 3 \|]` | `(make-array 1 2 3)` (Phase 6) |
| `try e with \| Ex -> h` | `(guard (fn (ex) h) e)` via SX exception system |
| `raise Ex` | `(perform (:raise Ex))` |
| `Printf.sprintf "%d" x` | `(format "%d" x)` |
## Roadmap
### Phase 1 — Tokenizer + parser
- [ ] **Tokenizer** built on `lib/guest/lex.sx`: keywords (`let`, `rec`, `in`, `fun`, `function`, `match`, `with`, `type`, `of`, `module`, `struct`, `end`, `functor`, `sig`, `open`, `include`, `if`, `then`, `else`, `begin`, `try`, `exception`, `raise`, `mutable`, `for`, `while`, `do`, `done`, `and`, `as`, `when`), operators (`->`, `|>`, `<|`, `@@`, `@`, `:=`, `!`, `::`, `**`, `:`, `;`, `;;`), identifiers (lower, upper/ctor, labels `~label:`, optional `?label:`), char literals `'c'`, string literals (escaped + heredoc `{|...|}`), int/float literals, line comments `(*` nested block comments `*)`.
- [ ] **Parser** with precedence via `lib/guest/pratt.sx`: top-level `let`/`let rec`/`type`/`module`/`exception`/`open`/`include` declarations; expressions: literals, identifiers, constructor application, lambda, application (left-assoc), binary ops with precedence table, `if`/`then`/`else`, `match`/`with`, `try`/`with`, `let`/`in`, `begin`/`end`, `fun`/`function`, tuples, list literals, record literals/updates, field access, sequences `;`, unit `()`.
- [ ] **Patterns:** constructor, literal, variable, wildcard `_`, tuple, list cons `::`, list literal, record, `as`, or-pattern `P1 | P2`, `when` guard.
- [ ] OCaml is **not** indentation-sensitive — no layout algorithm needed.
- [ ] Tests in `lib/ocaml/tests/parse.sx` — 50+ round-trip parse tests.
### Phase 2 — Core evaluator (untyped)
- [ ] `ocaml-eval` entry: walks OCaml AST, produces SX values.
- [ ] `let`/`let rec`/`let ... in` (mutually recursive with `and`).
- [ ] Lambda + application (curried by default — auto-curry multi-param defs).
- [ ] `fun`/`function` (single-arg lambda with immediate match on arg).
- [ ] `if`/`then`/`else`, `begin`/`end`, sequence `;`.
- [ ] Arithmetic, comparison, boolean ops, string `^`, `mod`.
- [ ] Unit `()` value; `ignore`.
- [ ] References: `ref`, `!`, `:=`.
- [ ] Mutable record fields.
- [ ] `for i = lo to hi do ... done` loop; `while cond do ... done`.
- [ ] `try`/`with` — maps to SX `guard`; `raise` via perform.
- [ ] Tests in `lib/ocaml/tests/eval.sx` — 50+ tests, pure + imperative.
### Phase 3 — ADTs + pattern matching
- [ ] `type` declarations: `type t = A | B of t1 * t2 | C of { x: int }`.
- [ ] Constructors as tagged lists: `A``(:A)`, `B(1, "x")``(:B 1 "x")`.
- [ ] `match`/`with` consumes `lib/guest/match.sx`: constructor, literal, variable, wildcard, tuple, list cons/nil, `as` binding, or-patterns, nested patterns, `when` guard.
- [ ] Exhaustiveness: runtime error on incomplete match (no compile-time check yet).
- [ ] Built-in types: `option` (`None`/`Some`), `result` (`Ok`/`Error`), `list` (nil/cons), `bool`, `unit`, `exn`.
- [ ] `exception` declarations; built-in: `Not_found`, `Invalid_argument`, `Failure`, `Match_failure`.
- [ ] Polymorphic variants (surface syntax `` `Tag value ``; runtime same tagged list).
- [ ] Tests in `lib/ocaml/tests/adt.sx` — 40+ tests: ADTs, match, option/result.
### Phase 4 — Modules + functors
**The hardest test of the substrate.** First-class modules + functors are where the SX/CEK story either works elegantly or reveals a missing piece. Track line count vs equivalent OCaml stdlib implementations as the substrate-validation signal.
- [ ] `module M = struct let x = 1 let f y = x + y end` → SX dict `{:x 1 :f <fn>}`.
- [ ] `module type S = sig val x : int val f : int -> int end` → interface record (runtime stub; typed checking in Phase 5).
- [ ] `module M : S = struct ... end` — coercive sealing (runtime: pass-through).
- [ ] `functor (M : S) -> struct ... end` → SX `(fn (M) ...)`.
- [ ] `module F = Functor(Base)` — functor application.
- [ ] `open M` — merge M's dict into current env (`env-merge`).
- [ ] `include M` — same as open at structure level.
- [ ] `M.name` — dict get via `:name` key.
- [ ] First-class modules (pack/unpack) — deferred to Phase 5.
- [ ] Standard module hierarchy stubs: `List`, `Option`, `Result`, `String`, `Int`, `Printf`, `Hashtbl` (filled in Phase 6).
- [ ] Tests in `lib/ocaml/tests/modules.sx` — 30+ tests.
### Phase 5 — Hindley-Milner type inference
This is one of the headline payoffs of the whole plan. The inferencer built here is the seed of `lib/guest/hm.sx` (lib-guest Step 8) — once Haskell-on-SX adopts it as second consumer, it gets extracted.
- [ ] Algorithm W: `gen`/`inst`, `unify`, `infer-expr`, `infer-decl`.
- [ ] Type variables: `'a`, `'b`; unification with occur-check.
- [ ] Let-polymorphism: generalise at let-bindings.
- [ ] ADT types: `type 'a option = None | Some of 'a`.
- [ ] Function types, tuple types, record types.
- [ ] Type signatures: `val f : int -> int` — verify against inferred type.
- [ ] Module type checking: seal against `sig` (Phase 4 stubs become real checks).
- [ ] Error reporting: position-tagged errors with expected vs actual types.
- [ ] First-class modules: `(module M : S)` pack; `(val m : (module S))` unpack.
- [ ] No rank-2 polymorphism, no GADTs (out of scope).
- [ ] Tests in `lib/ocaml/tests/types.sx` — 60+ inference tests.
### Phase 5.1 — Vendor OCaml testsuite slice (oracle corpus)
The oracle role only works against a real test corpus. Vendor a slice of the official OCaml testsuite (from `ocaml/ocaml` `testsuite/tests/`).
- [ ] Pick ~100200 tests covering: basic eval, ADTs, modules, functors, pattern matching, exceptions, refs, simple stdlib (List, Option, Result, String). Skip tests that depend on Phase 6 stdlib not implemented or on out-of-scope features (GADTs, objects, Lwt, Unix module, etc.).
- [ ] Vendored at `lib/ocaml/testsuite/` with a manifest of which tests are included and why each excluded test was dropped.
- [ ] `lib/ocaml/conformance.sh` runs the slice via the epoch protocol, writes `lib/ocaml/scoreboard.{json,md}`.
- [ ] Each iteration after Phase 5.1 lands: scoreboard is the regression bar, just like other guests.
- [ ] License: official OCaml testsuite is LGPL — confirm rose-ash repo can vendor LGPL test files (header preserved). If not, write equivalent tests from scratch sourced from the OCaml manual.
### Phase 6 — Minimal stdlib slice
**Trimmed from the original 150+ functions to ~30** — only what HM tests, the Phase 5.1 testsuite slice, and the oracle role need. Full stdlib (`Hashtbl.iter`, `Map.Make`, `Set.Make`, `Format`, `Sys`, `Bytes`, …) becomes a conditional follow-on if a target user appears.
- [ ] `List`: `map`, `filter`, `fold_left`, `fold_right`, `length`, `rev`, `append`, `iter`, `for_all`, `exists`, `find_opt`, `mem`.
- [ ] `Option`: `map`, `bind`, `get`, `value`, `is_none`, `is_some`.
- [ ] `Result`: `map`, `bind`, `get_ok`, `get_error`, `is_ok`, `is_error`.
- [ ] `String`: `length`, `sub`, `concat`, `split_on_char`, `trim`.
- [ ] `Printf`: `sprintf` only — wires to SX `(format ...)`.
- [ ] `Hashtbl`: `create`, `add`, `find_opt`, `replace`, `mem` — backed by SX mutable dict.
- [ ] Tests in `lib/ocaml/tests/stdlib.sx` — 40+ tests across the slice. Phase 5.1 testsuite slice exercises these in real programs.
### Phase 7 — Dream web framework
**Moved to `plans/dream-on-sx.md`.** Spins up only if a target user appears. The plan there inherits OCaml-on-SX Phases 15 + the Phase 6 slice plus whatever additional stdlib Dream needs (likely `Bytes`, `Format`, more `String`, `Sys.argv`).
### Phase 8 — ReasonML syntax variant `[deferred]`
`[deferred — depends on Phases 12 landing + decision to ship a user-facing OCaml]`.
ReasonML is OCaml with a JS-friendly surface: semicolons, `let` with `=` everywhere, `=>` for lambdas, `switch` for match, `{j|...|j}` string interpolation. Same semantics — different tokenizer + parser, same `lib/ocaml/transpile.sx` output.
The cheapest user-facing payoff in the plan but only worthwhile if there's a concrete user goal (e.g. JSX-flavoured frontend syntax for SX components, attracting React refugees). Don't start without that target.
- [ ] **Tokenizer** in `lib/reasonml/tokenizer.sx`: `let x = e;`, `(x, y) => e`, `switch (x) { | Pat => e | ... }`, JSX, `{j|hello $(name)|j}`, `let f : int => int = x => x + 1`.
- [ ] **Parser** in `lib/reasonml/parser.sx`: produce same OCaml AST nodes; JSX → SX component calls (`<Comp x=1 />``(~comp :x 1)`); auto-curry multi-arg.
- [ ] Shared transpiler delegates to `lib/ocaml/transpile.sx`.
- [ ] Tests in `lib/reasonml/tests/` — 40+.
## The meta-circular angle
SX is bootstrapped to OCaml (`hosts/ocaml/`). Running OCaml inside SX running on OCaml is the "mother tongue" closure: OCaml → SX → OCaml. This means:
- The OCaml host's native pattern matching and ADTs are exact reference semantics for the SX-level implementation — any mismatch is a bug.
- The SX `match` / `define-type` primitives were built knowing OCaml was the intended target.
- When debugging the transpiler, the OCaml REPL is always available as oracle.
- The vendored testsuite slice (Phase 5.1) makes the oracle role mechanical, not just rhetorical.
## Key dependencies
- **lib-guest Steps 07** — must complete before OCaml-on-SX starts. OCaml consumes `lib/guest/lex.sx`, `lib/guest/pratt.sx`, `lib/guest/match.sx`. Hand-rolling defeats the substrate-validation goal.
- **Phase 6 ADT primitive** (`define-type`/`match`) in the SX core — required before Phase 3.
- **HO forms** and first-class lambdas — already in spec, no blocker.
- **Module system** (Phase 4) is independent of type inference (Phase 5) — can overlap.
- **lib-guest Step 8** (HM extraction) — *waits on this plan's Phase 5*. The two are paired.
## Progress log
_Newest first._
_(awaiting lib-guest Steps 07)_
## Blockers
_(none yet)_