Files
rose-ash/plans/ocaml-on-sx.md
giles 9dd9fb9c37 plans: layered-stack framing + chisel sequence + loop scaffolding
Design + ops scaffolding for the next phase of work, none of it touching
substrate or guest code.

lib-guest.md: rewrites Architectural framing as a 5-layer stack
  (substrate → lib/guest → languages → shared/ → applications),
  recursive dependency-direction rule, scaled two-consumer rule. Adds
  Phase B (long-running stratification) with sub-layer matrix
  (core/typed/relational/effects/layout/lazy/oo), language profiles, and
  the long-running-discipline section. Preserves existing Phase A
  progress log and rules.

ocaml-on-sx.md: scope reduced to substrate validation + HM + reference
  oracle. Phases 1-5 + minimal stdlib slice + vendored testsuite slice.
  Dream carved out into dream-on-sx.md; Phase 8 (ReasonML) deferred.
  Records lib-guest sequencing dependency.

datalog-on-sx.md: adds Phase 4 built-in predicates + body arithmetic,
  Phase 6 magic sets, safety analysis in Phase 3, Non-goals section.

New chisel plans (forward-looking, not yet launchable):
  kernel-on-sx.md       — first-class everything, env-as-value endgame
  idris-on-sx.md        — dependent types, evidence chisel
  probabilistic-on-sx.md — weighted nondeterminism + traces
  maude-on-sx.md        — rewriting as primitive
  linear-on-sx.md       — resource model, artdag-relevant

Loop briefings (4 active, 1 cold):
  minikanren-loop.md, ocaml-loop.md, datalog-loop.md, elm-loop.md, koka-loop.md

Restore scripts mirror the loop pattern:
  restore-{minikanren,ocaml,datalog,jit-perf,lib-guest}.sh
  Each captures worktree state, plan progress, MCP health, tmux status.
  Includes the .mcp.json absolute-path patch instruction (fresh worktrees
  have no _build/, so the relative mcp_tree path fails on first launch).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 22:27:50 +00:00

15 KiB
Raw Blame History

OCaml-on-SX: substrate validation + HM + reference oracle

The strict-ML answer to "does the SX substrate really do what we claim it does?" OCaml has exactly the feature set SX was designed around — CEK, records, ADTs, exceptions, modules, refs, strict evaluation — so implementing it on SX is the strongest possible test of the substrate. Phase 5 also produces a real Hindley-Milner inferencer that feeds back into lib/guest/hm.sx, and the resulting OCaml interpreter serves as a reference oracle for every other guest language (when SX behavior is ambiguous, native OCaml answers).

End-state goal: OCaml Phases 15 running on the SX CEK, with a vendored slice of the official OCaml testsuite as the oracle corpus. HM extracted into lib/guest/hm.sx once Haskell-on-SX adopts it as second consumer.

Out of scope (this plan): Dream web framework — moved to plans/dream-on-sx.md, only spins up if a target user appears. Full standard library — only the minimal slice needed for substrate validation and the oracle role.

Conditional: ReasonML syntax variant (Phase 8) — kept in the plan but deferred until Phases 12 land and a decision is made to ship a user-facing OCaml.

What this covers that nothing else in the set does

  • Strict ML semantics — unlike Haskell, OCaml is call-by-value with explicit Lazy.t for laziness. Pattern match is exhaustive. Polymorphic variants. Structural equality.
  • First-class modules and functors — modules as values (Phase 4); functors as SX higher-order functions over module records. Unlike Haskell typeclasses, OCaml's module system is explicit and compositional. The hardest test of the substrate — if Phase 4 takes 3000 lines instead of 800, the substrate is telling us something.
  • Mutable state without monadsref, :=, ! are primitives. Arrays. Hashtbl. The IO model is direct.
  • Reference oracle — when other guest languages disagree about a semantic edge case (HM in Haskell-on-SX vs in OCaml-on-SX, exception ordering, equality semantics), native OCaml is the tiebreaker. The vendored testsuite slice (Phase 5.1) makes this oracle role concrete.

Sequencing dependency

OCaml-on-SX should not start until lib-guest Steps 07 are complete. OCaml's tokenizer should consume lib/guest/lex.sx (lib-guest Step 3); its precedence parser should consume lib/guest/pratt.sx (Step 4); its pattern matcher should consume lib/guest/match.sx (Step 6). Starting OCaml early means it hand-rolls these and never validates the abstraction — losing one of the main strategic payoffs.

Reciprocally, lib-guest Step 8 (HM extraction) waits on OCaml-on-SX Phase 5 — extracting HM with only Haskell as consumer is speculative; with both Haskell and OCaml the two-language rule is satisfied for real.

Ground rules

  • Scope: only touch lib/ocaml/**, lib/reasonml/** (Phase 8 only), and plans/ocaml-on-sx.md. Do not edit spec/, hosts/, shared/, lib/dream/** (separate plan), or other lib/<lang>/.
  • Consume lib/guest/ wherever it covers a need (lex, pratt, match, ast). Hand-rolling instead of consuming defeats the substrate-validation goal.
  • Shared-file issues go under "Blockers" below with a minimal repro; do not fix here.
  • SX files: use sx-tree MCP tools only.
  • Architecture: OCaml source → AST → SX AST → CEK. No standalone OCaml evaluator. The OCaml AST is walked by an ocaml-eval function in SX that produces SX values.
  • Type system: deferred until Phase 5. Phases 14 are intentionally untyped — get the evaluator right first, then layer HM inference on top.
  • Commits: one feature per commit. Keep ## Progress log updated and tick boxes.

Architecture sketch

OCaml source text
    │
    ▼
lib/ocaml/tokenizer.sx   — keywords, operators, string/char literals, comments
    │                      (built on lib/guest/lex.sx)
    ▼
lib/ocaml/parser.sx      — OCaml AST: let/let rec, fun, match, if, begin/end,
    │                      module/struct/functor, type decls, expressions
    │                      (precedence via lib/guest/pratt.sx)
    ▼
lib/ocaml/desugar.sx     — surface → core: tuple patterns, or-patterns,
    │                      sequence (;) → (do), when guards, field punning
    ▼
lib/ocaml/transpile.sx   — OCaml AST → SX AST
    │
    ▼
lib/ocaml/runtime.sx     — ADT constructors, module primitives, ref/array ops,
    │                      minimal Stdlib shims (Phase 6)
    ▼
SX CEK evaluator (both JS and OCaml hosts)

Semantic mappings

OCaml construct SX mapping
let x = e (top-level) (define x e)
let f x y = e (define (f x y) e)
let rec f x = e (define (f x) e) — SX define is already recursive
fun x -> e (fn (x) e)
e1 |> f (f e1) — pipe desugars to reverse application
e1; e2 (do e1 e2)
begin e1; e2; e3 end (do e1 e2 e3)
if c then e1 else e2 (if c e1 e2)
match x with | P -> e (match x (P e) ...) via Phase 6 ADT primitive
type t = A | B of int (define-type t (A) (B v))
module M = struct ... end SX dict {:let-bindings ...} — module as record
functor (M : S) -> ... (fn (M) ...) — functor as SX lambda over module record
open M inject M's bindings into scope via env-merge
M.field (get M :field)
{ r with f = v } (dict-set r :f v)
ref x (make-ref x) — mutable cell
!r (deref-ref r)
r := v (set-ref! r v)
(a, b, c) tagged list (:tuple a b c)
[1; 2; 3] (list 1 2 3)
[| 1; 2; 3 |] (make-array 1 2 3) (Phase 6)
try e with | Ex -> h (guard (fn (ex) h) e) via SX exception system
raise Ex (perform (:raise Ex))
Printf.sprintf "%d" x (format "%d" x)

Roadmap

Phase 1 — Tokenizer + parser

  • Tokenizer built on lib/guest/lex.sx: keywords (let, rec, in, fun, function, match, with, type, of, module, struct, end, functor, sig, open, include, if, then, else, begin, try, exception, raise, mutable, for, while, do, done, and, as, when), operators (->, |>, <|, @@, @, :=, !, ::, **, :, ;, ;;), identifiers (lower, upper/ctor, labels ~label:, optional ?label:), char literals 'c', string literals (escaped + heredoc {|...|}), int/float literals, line comments (* nested block comments *).
  • Parser with precedence via lib/guest/pratt.sx: top-level let/let rec/type/module/exception/open/include declarations; expressions: literals, identifiers, constructor application, lambda, application (left-assoc), binary ops with precedence table, if/then/else, match/with, try/with, let/in, begin/end, fun/function, tuples, list literals, record literals/updates, field access, sequences ;, unit ().
  • Patterns: constructor, literal, variable, wildcard _, tuple, list cons ::, list literal, record, as, or-pattern P1 | P2, when guard.
  • OCaml is not indentation-sensitive — no layout algorithm needed.
  • Tests in lib/ocaml/tests/parse.sx — 50+ round-trip parse tests.

Phase 2 — Core evaluator (untyped)

  • ocaml-eval entry: walks OCaml AST, produces SX values.
  • let/let rec/let ... in (mutually recursive with and).
  • Lambda + application (curried by default — auto-curry multi-param defs).
  • fun/function (single-arg lambda with immediate match on arg).
  • if/then/else, begin/end, sequence ;.
  • Arithmetic, comparison, boolean ops, string ^, mod.
  • Unit () value; ignore.
  • References: ref, !, :=.
  • Mutable record fields.
  • for i = lo to hi do ... done loop; while cond do ... done.
  • try/with — maps to SX guard; raise via perform.
  • Tests in lib/ocaml/tests/eval.sx — 50+ tests, pure + imperative.

Phase 3 — ADTs + pattern matching

  • type declarations: type t = A | B of t1 * t2 | C of { x: int }.
  • Constructors as tagged lists: A(:A), B(1, "x")(:B 1 "x").
  • match/with consumes lib/guest/match.sx: constructor, literal, variable, wildcard, tuple, list cons/nil, as binding, or-patterns, nested patterns, when guard.
  • Exhaustiveness: runtime error on incomplete match (no compile-time check yet).
  • Built-in types: option (None/Some), result (Ok/Error), list (nil/cons), bool, unit, exn.
  • exception declarations; built-in: Not_found, Invalid_argument, Failure, Match_failure.
  • Polymorphic variants (surface syntax `Tag value; runtime same tagged list).
  • Tests in lib/ocaml/tests/adt.sx — 40+ tests: ADTs, match, option/result.

Phase 4 — Modules + functors

The hardest test of the substrate. First-class modules + functors are where the SX/CEK story either works elegantly or reveals a missing piece. Track line count vs equivalent OCaml stdlib implementations as the substrate-validation signal.

  • module M = struct let x = 1 let f y = x + y end → SX dict {:x 1 :f <fn>}.
  • module type S = sig val x : int val f : int -> int end → interface record (runtime stub; typed checking in Phase 5).
  • module M : S = struct ... end — coercive sealing (runtime: pass-through).
  • functor (M : S) -> struct ... end → SX (fn (M) ...).
  • module F = Functor(Base) — functor application.
  • open M — merge M's dict into current env (env-merge).
  • include M — same as open at structure level.
  • M.name — dict get via :name key.
  • First-class modules (pack/unpack) — deferred to Phase 5.
  • Standard module hierarchy stubs: List, Option, Result, String, Int, Printf, Hashtbl (filled in Phase 6).
  • Tests in lib/ocaml/tests/modules.sx — 30+ tests.

Phase 5 — Hindley-Milner type inference

This is one of the headline payoffs of the whole plan. The inferencer built here is the seed of lib/guest/hm.sx (lib-guest Step 8) — once Haskell-on-SX adopts it as second consumer, it gets extracted.

  • Algorithm W: gen/inst, unify, infer-expr, infer-decl.
  • Type variables: 'a, 'b; unification with occur-check.
  • Let-polymorphism: generalise at let-bindings.
  • ADT types: type 'a option = None | Some of 'a.
  • Function types, tuple types, record types.
  • Type signatures: val f : int -> int — verify against inferred type.
  • Module type checking: seal against sig (Phase 4 stubs become real checks).
  • Error reporting: position-tagged errors with expected vs actual types.
  • First-class modules: (module M : S) pack; (val m : (module S)) unpack.
  • No rank-2 polymorphism, no GADTs (out of scope).
  • Tests in lib/ocaml/tests/types.sx — 60+ inference tests.

Phase 5.1 — Vendor OCaml testsuite slice (oracle corpus)

The oracle role only works against a real test corpus. Vendor a slice of the official OCaml testsuite (from ocaml/ocaml testsuite/tests/).

  • Pick ~100200 tests covering: basic eval, ADTs, modules, functors, pattern matching, exceptions, refs, simple stdlib (List, Option, Result, String). Skip tests that depend on Phase 6 stdlib not implemented or on out-of-scope features (GADTs, objects, Lwt, Unix module, etc.).
  • Vendored at lib/ocaml/testsuite/ with a manifest of which tests are included and why each excluded test was dropped.
  • lib/ocaml/conformance.sh runs the slice via the epoch protocol, writes lib/ocaml/scoreboard.{json,md}.
  • Each iteration after Phase 5.1 lands: scoreboard is the regression bar, just like other guests.
  • License: official OCaml testsuite is LGPL — confirm rose-ash repo can vendor LGPL test files (header preserved). If not, write equivalent tests from scratch sourced from the OCaml manual.

Phase 6 — Minimal stdlib slice

Trimmed from the original 150+ functions to ~30 — only what HM tests, the Phase 5.1 testsuite slice, and the oracle role need. Full stdlib (Hashtbl.iter, Map.Make, Set.Make, Format, Sys, Bytes, …) becomes a conditional follow-on if a target user appears.

  • List: map, filter, fold_left, fold_right, length, rev, append, iter, for_all, exists, find_opt, mem.
  • Option: map, bind, get, value, is_none, is_some.
  • Result: map, bind, get_ok, get_error, is_ok, is_error.
  • String: length, sub, concat, split_on_char, trim.
  • Printf: sprintf only — wires to SX (format ...).
  • Hashtbl: create, add, find_opt, replace, mem — backed by SX mutable dict.
  • Tests in lib/ocaml/tests/stdlib.sx — 40+ tests across the slice. Phase 5.1 testsuite slice exercises these in real programs.

Phase 7 — Dream web framework

Moved to plans/dream-on-sx.md. Spins up only if a target user appears. The plan there inherits OCaml-on-SX Phases 15 + the Phase 6 slice plus whatever additional stdlib Dream needs (likely Bytes, Format, more String, Sys.argv).

Phase 8 — ReasonML syntax variant [deferred]

[deferred — depends on Phases 12 landing + decision to ship a user-facing OCaml].

ReasonML is OCaml with a JS-friendly surface: semicolons, let with = everywhere, => for lambdas, switch for match, {j|...|j} string interpolation. Same semantics — different tokenizer + parser, same lib/ocaml/transpile.sx output.

The cheapest user-facing payoff in the plan but only worthwhile if there's a concrete user goal (e.g. JSX-flavoured frontend syntax for SX components, attracting React refugees). Don't start without that target.

  • Tokenizer in lib/reasonml/tokenizer.sx: let x = e;, (x, y) => e, switch (x) { | Pat => e | ... }, JSX, {j|hello $(name)|j}, let f : int => int = x => x + 1.
  • Parser in lib/reasonml/parser.sx: produce same OCaml AST nodes; JSX → SX component calls (<Comp x=1 />(~comp :x 1)); auto-curry multi-arg.
  • Shared transpiler delegates to lib/ocaml/transpile.sx.
  • Tests in lib/reasonml/tests/ — 40+.

The meta-circular angle

SX is bootstrapped to OCaml (hosts/ocaml/). Running OCaml inside SX running on OCaml is the "mother tongue" closure: OCaml → SX → OCaml. This means:

  • The OCaml host's native pattern matching and ADTs are exact reference semantics for the SX-level implementation — any mismatch is a bug.
  • The SX match / define-type primitives were built knowing OCaml was the intended target.
  • When debugging the transpiler, the OCaml REPL is always available as oracle.
  • The vendored testsuite slice (Phase 5.1) makes the oracle role mechanical, not just rhetorical.

Key dependencies

  • lib-guest Steps 07 — must complete before OCaml-on-SX starts. OCaml consumes lib/guest/lex.sx, lib/guest/pratt.sx, lib/guest/match.sx. Hand-rolling defeats the substrate-validation goal.
  • Phase 6 ADT primitive (define-type/match) in the SX core — required before Phase 3.
  • HO forms and first-class lambdas — already in spec, no blocker.
  • Module system (Phase 4) is independent of type inference (Phase 5) — can overlap.
  • lib-guest Step 8 (HM extraction) — waits on this plan's Phase 5. The two are paired.

Progress log

Newest first.

(awaiting lib-guest Steps 07)

Blockers

(none yet)