Design + ops scaffolding for the next phase of work, none of it touching
substrate or guest code.
lib-guest.md: rewrites Architectural framing as a 5-layer stack
(substrate → lib/guest → languages → shared/ → applications),
recursive dependency-direction rule, scaled two-consumer rule. Adds
Phase B (long-running stratification) with sub-layer matrix
(core/typed/relational/effects/layout/lazy/oo), language profiles, and
the long-running-discipline section. Preserves existing Phase A
progress log and rules.
ocaml-on-sx.md: scope reduced to substrate validation + HM + reference
oracle. Phases 1-5 + minimal stdlib slice + vendored testsuite slice.
Dream carved out into dream-on-sx.md; Phase 8 (ReasonML) deferred.
Records lib-guest sequencing dependency.
datalog-on-sx.md: adds Phase 4 built-in predicates + body arithmetic,
Phase 6 magic sets, safety analysis in Phase 3, Non-goals section.
New chisel plans (forward-looking, not yet launchable):
kernel-on-sx.md — first-class everything, env-as-value endgame
idris-on-sx.md — dependent types, evidence chisel
probabilistic-on-sx.md — weighted nondeterminism + traces
maude-on-sx.md — rewriting as primitive
linear-on-sx.md — resource model, artdag-relevant
Loop briefings (4 active, 1 cold):
minikanren-loop.md, ocaml-loop.md, datalog-loop.md, elm-loop.md, koka-loop.md
Restore scripts mirror the loop pattern:
restore-{minikanren,ocaml,datalog,jit-perf,lib-guest}.sh
Each captures worktree state, plan progress, MCP health, tmux status.
Includes the .mcp.json absolute-path patch instruction (fresh worktrees
have no _build/, so the relative mcp_tree path fails on first launch).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
15 KiB
OCaml-on-SX: substrate validation + HM + reference oracle
The strict-ML answer to "does the SX substrate really do what we claim it does?" OCaml has exactly the feature set SX was designed around — CEK, records, ADTs, exceptions, modules, refs, strict evaluation — so implementing it on SX is the strongest possible test of the substrate. Phase 5 also produces a real Hindley-Milner inferencer that feeds back into lib/guest/hm.sx, and the resulting OCaml interpreter serves as a reference oracle for every other guest language (when SX behavior is ambiguous, native OCaml answers).
End-state goal: OCaml Phases 1–5 running on the SX CEK, with a vendored slice of the official OCaml testsuite as the oracle corpus. HM extracted into lib/guest/hm.sx once Haskell-on-SX adopts it as second consumer.
Out of scope (this plan): Dream web framework — moved to plans/dream-on-sx.md, only spins up if a target user appears. Full standard library — only the minimal slice needed for substrate validation and the oracle role.
Conditional: ReasonML syntax variant (Phase 8) — kept in the plan but deferred until Phases 1–2 land and a decision is made to ship a user-facing OCaml.
What this covers that nothing else in the set does
- Strict ML semantics — unlike Haskell, OCaml is call-by-value with explicit
Lazy.tfor laziness. Pattern match is exhaustive. Polymorphic variants. Structural equality. - First-class modules and functors — modules as values (Phase 4); functors as SX higher-order functions over module records. Unlike Haskell typeclasses, OCaml's module system is explicit and compositional. The hardest test of the substrate — if Phase 4 takes 3000 lines instead of 800, the substrate is telling us something.
- Mutable state without monads —
ref,:=,!are primitives. Arrays.Hashtbl. The IO model is direct. - Reference oracle — when other guest languages disagree about a semantic edge case (HM in Haskell-on-SX vs in OCaml-on-SX, exception ordering, equality semantics), native OCaml is the tiebreaker. The vendored testsuite slice (Phase 5.1) makes this oracle role concrete.
Sequencing dependency
OCaml-on-SX should not start until lib-guest Steps 0–7 are complete. OCaml's tokenizer should consume lib/guest/lex.sx (lib-guest Step 3); its precedence parser should consume lib/guest/pratt.sx (Step 4); its pattern matcher should consume lib/guest/match.sx (Step 6). Starting OCaml early means it hand-rolls these and never validates the abstraction — losing one of the main strategic payoffs.
Reciprocally, lib-guest Step 8 (HM extraction) waits on OCaml-on-SX Phase 5 — extracting HM with only Haskell as consumer is speculative; with both Haskell and OCaml the two-language rule is satisfied for real.
Ground rules
- Scope: only touch
lib/ocaml/**,lib/reasonml/**(Phase 8 only), andplans/ocaml-on-sx.md. Do not editspec/,hosts/,shared/,lib/dream/**(separate plan), or otherlib/<lang>/. - Consume
lib/guest/wherever it covers a need (lex, pratt, match, ast). Hand-rolling instead of consuming defeats the substrate-validation goal. - Shared-file issues go under "Blockers" below with a minimal repro; do not fix here.
- SX files: use
sx-treeMCP tools only. - Architecture: OCaml source → AST → SX AST → CEK. No standalone OCaml evaluator. The OCaml AST is walked by an
ocaml-evalfunction in SX that produces SX values. - Type system: deferred until Phase 5. Phases 1–4 are intentionally untyped — get the evaluator right first, then layer HM inference on top.
- Commits: one feature per commit. Keep
## Progress logupdated and tick boxes.
Architecture sketch
OCaml source text
│
▼
lib/ocaml/tokenizer.sx — keywords, operators, string/char literals, comments
│ (built on lib/guest/lex.sx)
▼
lib/ocaml/parser.sx — OCaml AST: let/let rec, fun, match, if, begin/end,
│ module/struct/functor, type decls, expressions
│ (precedence via lib/guest/pratt.sx)
▼
lib/ocaml/desugar.sx — surface → core: tuple patterns, or-patterns,
│ sequence (;) → (do), when guards, field punning
▼
lib/ocaml/transpile.sx — OCaml AST → SX AST
│
▼
lib/ocaml/runtime.sx — ADT constructors, module primitives, ref/array ops,
│ minimal Stdlib shims (Phase 6)
▼
SX CEK evaluator (both JS and OCaml hosts)
Semantic mappings
| OCaml construct | SX mapping |
|---|---|
let x = e (top-level) |
(define x e) |
let f x y = e |
(define (f x y) e) |
let rec f x = e |
(define (f x) e) — SX define is already recursive |
fun x -> e |
(fn (x) e) |
e1 |> f |
(f e1) — pipe desugars to reverse application |
e1; e2 |
(do e1 e2) |
begin e1; e2; e3 end |
(do e1 e2 e3) |
if c then e1 else e2 |
(if c e1 e2) |
match x with | P -> e |
(match x (P e) ...) via Phase 6 ADT primitive |
type t = A | B of int |
(define-type t (A) (B v)) |
module M = struct ... end |
SX dict {:let-bindings ...} — module as record |
functor (M : S) -> ... |
(fn (M) ...) — functor as SX lambda over module record |
open M |
inject M's bindings into scope via env-merge |
M.field |
(get M :field) |
{ r with f = v } |
(dict-set r :f v) |
ref x |
(make-ref x) — mutable cell |
!r |
(deref-ref r) |
r := v |
(set-ref! r v) |
(a, b, c) |
tagged list (:tuple a b c) |
[1; 2; 3] |
(list 1 2 3) |
[| 1; 2; 3 |] |
(make-array 1 2 3) (Phase 6) |
try e with | Ex -> h |
(guard (fn (ex) h) e) via SX exception system |
raise Ex |
(perform (:raise Ex)) |
Printf.sprintf "%d" x |
(format "%d" x) |
Roadmap
Phase 1 — Tokenizer + parser
- Tokenizer built on
lib/guest/lex.sx: keywords (let,rec,in,fun,function,match,with,type,of,module,struct,end,functor,sig,open,include,if,then,else,begin,try,exception,raise,mutable,for,while,do,done,and,as,when), operators (->,|>,<|,@@,@,:=,!,::,**,:,;,;;), identifiers (lower, upper/ctor, labels~label:, optional?label:), char literals'c', string literals (escaped + heredoc{|...|}), int/float literals, line comments(*nested block comments*). - Parser with precedence via
lib/guest/pratt.sx: top-levellet/let rec/type/module/exception/open/includedeclarations; expressions: literals, identifiers, constructor application, lambda, application (left-assoc), binary ops with precedence table,if/then/else,match/with,try/with,let/in,begin/end,fun/function, tuples, list literals, record literals/updates, field access, sequences;, unit(). - Patterns: constructor, literal, variable, wildcard
_, tuple, list cons::, list literal, record,as, or-patternP1 | P2,whenguard. - OCaml is not indentation-sensitive — no layout algorithm needed.
- Tests in
lib/ocaml/tests/parse.sx— 50+ round-trip parse tests.
Phase 2 — Core evaluator (untyped)
ocaml-evalentry: walks OCaml AST, produces SX values.let/let rec/let ... in(mutually recursive withand).- Lambda + application (curried by default — auto-curry multi-param defs).
fun/function(single-arg lambda with immediate match on arg).if/then/else,begin/end, sequence;.- Arithmetic, comparison, boolean ops, string
^,mod. - Unit
()value;ignore. - References:
ref,!,:=. - Mutable record fields.
for i = lo to hi do ... doneloop;while cond do ... done.try/with— maps to SXguard;raisevia perform.- Tests in
lib/ocaml/tests/eval.sx— 50+ tests, pure + imperative.
Phase 3 — ADTs + pattern matching
typedeclarations:type t = A | B of t1 * t2 | C of { x: int }.- Constructors as tagged lists:
A→(:A),B(1, "x")→(:B 1 "x"). match/withconsumeslib/guest/match.sx: constructor, literal, variable, wildcard, tuple, list cons/nil,asbinding, or-patterns, nested patterns,whenguard.- Exhaustiveness: runtime error on incomplete match (no compile-time check yet).
- Built-in types:
option(None/Some),result(Ok/Error),list(nil/cons),bool,unit,exn. exceptiondeclarations; built-in:Not_found,Invalid_argument,Failure,Match_failure.- Polymorphic variants (surface syntax
`Tag value; runtime same tagged list). - Tests in
lib/ocaml/tests/adt.sx— 40+ tests: ADTs, match, option/result.
Phase 4 — Modules + functors
The hardest test of the substrate. First-class modules + functors are where the SX/CEK story either works elegantly or reveals a missing piece. Track line count vs equivalent OCaml stdlib implementations as the substrate-validation signal.
module M = struct let x = 1 let f y = x + y end→ SX dict{:x 1 :f <fn>}.module type S = sig val x : int val f : int -> int end→ interface record (runtime stub; typed checking in Phase 5).module M : S = struct ... end— coercive sealing (runtime: pass-through).functor (M : S) -> struct ... end→ SX(fn (M) ...).module F = Functor(Base)— functor application.open M— merge M's dict into current env (env-merge).include M— same as open at structure level.M.name— dict get via:namekey.- First-class modules (pack/unpack) — deferred to Phase 5.
- Standard module hierarchy stubs:
List,Option,Result,String,Int,Printf,Hashtbl(filled in Phase 6). - Tests in
lib/ocaml/tests/modules.sx— 30+ tests.
Phase 5 — Hindley-Milner type inference
This is one of the headline payoffs of the whole plan. The inferencer built here is the seed of lib/guest/hm.sx (lib-guest Step 8) — once Haskell-on-SX adopts it as second consumer, it gets extracted.
- Algorithm W:
gen/inst,unify,infer-expr,infer-decl. - Type variables:
'a,'b; unification with occur-check. - Let-polymorphism: generalise at let-bindings.
- ADT types:
type 'a option = None | Some of 'a. - Function types, tuple types, record types.
- Type signatures:
val f : int -> int— verify against inferred type. - Module type checking: seal against
sig(Phase 4 stubs become real checks). - Error reporting: position-tagged errors with expected vs actual types.
- First-class modules:
(module M : S)pack;(val m : (module S))unpack. - No rank-2 polymorphism, no GADTs (out of scope).
- Tests in
lib/ocaml/tests/types.sx— 60+ inference tests.
Phase 5.1 — Vendor OCaml testsuite slice (oracle corpus)
The oracle role only works against a real test corpus. Vendor a slice of the official OCaml testsuite (from ocaml/ocaml testsuite/tests/).
- Pick ~100–200 tests covering: basic eval, ADTs, modules, functors, pattern matching, exceptions, refs, simple stdlib (List, Option, Result, String). Skip tests that depend on Phase 6 stdlib not implemented or on out-of-scope features (GADTs, objects, Lwt, Unix module, etc.).
- Vendored at
lib/ocaml/testsuite/with a manifest of which tests are included and why each excluded test was dropped. lib/ocaml/conformance.shruns the slice via the epoch protocol, writeslib/ocaml/scoreboard.{json,md}.- Each iteration after Phase 5.1 lands: scoreboard is the regression bar, just like other guests.
- License: official OCaml testsuite is LGPL — confirm rose-ash repo can vendor LGPL test files (header preserved). If not, write equivalent tests from scratch sourced from the OCaml manual.
Phase 6 — Minimal stdlib slice
Trimmed from the original 150+ functions to ~30 — only what HM tests, the Phase 5.1 testsuite slice, and the oracle role need. Full stdlib (Hashtbl.iter, Map.Make, Set.Make, Format, Sys, Bytes, …) becomes a conditional follow-on if a target user appears.
List:map,filter,fold_left,fold_right,length,rev,append,iter,for_all,exists,find_opt,mem.Option:map,bind,get,value,is_none,is_some.Result:map,bind,get_ok,get_error,is_ok,is_error.String:length,sub,concat,split_on_char,trim.Printf:sprintfonly — wires to SX(format ...).Hashtbl:create,add,find_opt,replace,mem— backed by SX mutable dict.- Tests in
lib/ocaml/tests/stdlib.sx— 40+ tests across the slice. Phase 5.1 testsuite slice exercises these in real programs.
Phase 7 — Dream web framework
Moved to plans/dream-on-sx.md. Spins up only if a target user appears. The plan there inherits OCaml-on-SX Phases 1–5 + the Phase 6 slice plus whatever additional stdlib Dream needs (likely Bytes, Format, more String, Sys.argv).
Phase 8 — ReasonML syntax variant [deferred]
[deferred — depends on Phases 1–2 landing + decision to ship a user-facing OCaml].
ReasonML is OCaml with a JS-friendly surface: semicolons, let with = everywhere, => for lambdas, switch for match, {j|...|j} string interpolation. Same semantics — different tokenizer + parser, same lib/ocaml/transpile.sx output.
The cheapest user-facing payoff in the plan but only worthwhile if there's a concrete user goal (e.g. JSX-flavoured frontend syntax for SX components, attracting React refugees). Don't start without that target.
- Tokenizer in
lib/reasonml/tokenizer.sx:let x = e;,(x, y) => e,switch (x) { | Pat => e | ... }, JSX,{j|hello $(name)|j},let f : int => int = x => x + 1. - Parser in
lib/reasonml/parser.sx: produce same OCaml AST nodes; JSX → SX component calls (<Comp x=1 />→(~comp :x 1)); auto-curry multi-arg. - Shared transpiler delegates to
lib/ocaml/transpile.sx. - Tests in
lib/reasonml/tests/— 40+.
The meta-circular angle
SX is bootstrapped to OCaml (hosts/ocaml/). Running OCaml inside SX running on OCaml is the "mother tongue" closure: OCaml → SX → OCaml. This means:
- The OCaml host's native pattern matching and ADTs are exact reference semantics for the SX-level implementation — any mismatch is a bug.
- The SX
match/define-typeprimitives were built knowing OCaml was the intended target. - When debugging the transpiler, the OCaml REPL is always available as oracle.
- The vendored testsuite slice (Phase 5.1) makes the oracle role mechanical, not just rhetorical.
Key dependencies
- lib-guest Steps 0–7 — must complete before OCaml-on-SX starts. OCaml consumes
lib/guest/lex.sx,lib/guest/pratt.sx,lib/guest/match.sx. Hand-rolling defeats the substrate-validation goal. - Phase 6 ADT primitive (
define-type/match) in the SX core — required before Phase 3. - HO forms and first-class lambdas — already in spec, no blocker.
- Module system (Phase 4) is independent of type inference (Phase 5) — can overlap.
- lib-guest Step 8 (HM extraction) — waits on this plan's Phase 5. The two are paired.
Progress log
Newest first.
(awaiting lib-guest Steps 0–7)
Blockers
(none yet)