Added parser support for OCaml array literal syntax:
[| e1; e2; ...; en |] --> Array.of_list [e1; e2; ...; en]
[||] --> Array.of_list []
Desugaring keeps the array representation unchanged (ref-of-list)
since Array.of_list is a no-op constructor for that backing.
Tokenizer emits [, |, |, ] as separate ops; parser detects [ followed
by | and enters array-literal mode, terminating on |].
Baseline lis.ml exercises the syntax:
let lis arr =
let n = Array.length arr in
let dp = Array.make n 1 in
for i = 1 to n - 1 do
for j = 0 to i - 1 do
if arr.(j) < arr.(i) && dp.(j) + 1 > dp.(i) then
dp.(i) <- dp.(j) + 1
done
done;
let best = ref 0 in
for i = 0 to n - 1 do
if dp.(i) > !best then best := dp.(i)
done;
!best
lis [|10; 22; 9; 33; 21; 50; 41; 60; 80|] = 6
137 baseline programs total.
Three parser changes:
1. at-app-start? returns true on op '~' or '?' so the app loop
keeps consuming labeled args.
2. The app arg parser handles:
~name:VAL drop label, parse VAL as the arg
?name:VAL same
~name punning -- treat as (:var name)
?name same
3. try-consume-param! drops '~' or '?' and treats the following
ident as a regular positional param name.
Caveats:
- Order in the call must match definition order; we don't reorder
by label name.
- Optional args don't auto-wrap in Some, so the function body sees
the raw value for ?x:V.
Lets us write idiomatic-looking OCaml even though the runtime is
positional underneath:
let f ~x ~y = x + y in f ~x:3 ~y:7 = 10
let x = 4 in let y = 5 in f ~x ~y = 20 (punning)
let f ?x ~y = x + y in f ?x:1 ~y:2 = 3
parse-decl-let lives in the outer ocaml-parse-program scope and does
not have access to parse-pattern (which is local to ocaml-parse).
Source-slicing approach instead:
1. detect '(IDENT, ...)' in collect-params
2. scan tokens to the matching ')' (tracking nested parens)
3. slice the pattern source string from src
4. push (synth_name, pat_src) onto tuple-srcs
Then after collecting params, the rhs source string gets wrapped with
'match SN with PAT_SRC -> (RHS_SRC)' for each tuple-param,
innermost-first, and the final string is fed through ocaml-parse.
End result is the same AST shape as the iteration-102 inner-let
case: a function whose body destructures a synthetic name.
let f (a, b) = a + b ;; f (3, 7) = 10
let g x (a, b) = x + a + b ;; g 1 (2, 3) = 6
let h (a, b) (c, d) = a * b + c * d
;; h (1, 2) (3, 4) = 14
Mirrors iteration 101's parse-fun change inside parse-let's
parse-one!:
- same '(IDENT, ...)' detection on collect-params
- same __pat_N synth name for the function param
- same innermost-first match-wrapping
Difference: for inner-let the wrapping is applied to the rhs of the
let-binding (which is the function value), not directly to a fun
body.
let f (a, b) = a + b in f (3, 7) = 10
let g x (a, b) = x + a + b in g 1 (2, 3) = 6
let h (a, b) (c, d) = a * b + c * d
in h (1, 2) (3, 4) = 14
parse-fun's collect-params now detects '(IDENT, ...)' as a
tuple-pattern parameter (lookahead at peek-tok-at 1/2 distinguishes
from '(x : T)' and '()' cases that try-consume-param! already
handles). For each tuple param it:
1. parse-pattern to get the full pattern AST
2. generate a synthetic __pat_N name as the actual fun parameter
3. push (synth_name, pattern) onto tuple-binds
After parsing the body, wraps it innermost-first with one
'match __pat_N with PAT -> ...' per tuple-param. The user-visible
result is a (:fun (params...) body) where params are all simple
names but the body destructures.
Also retroactively simplifies Hashtbl.keys/values from
'fun pair -> match pair with (k, _) -> k' to plain
'fun (k, _) -> k', closing the iteration-99 workaround.
(fun (a, b) -> a + b) (3, 7) = 10
List.map (fun (a, b) -> a * b)
[(1, 2); (3, 4); (5, 6)] = [2; 12; 30]
List.map (fun (k, _) -> k)
[("a", 1); ("b", 2)] = ["a"; "b"]
(fun a (b, c) d -> a + b + c + d) 1 (2, 3) 4 = 10
When 'let' is followed by '(', parse-let now reads a full pattern
(via the existing parse-pattern used by match), expects '=', then
'in', and desugars to:
let PATTERN = EXPR in BODY => match EXPR with PATTERN -> BODY
This reuses the entire pattern-matching machinery, so any pattern
the match parser accepts works here too — paren-tuples, nested
tuples, cons patterns, list patterns. No 'rec' allowed for pattern
bindings (real OCaml's restriction).
let (a, b) = (1, 2) in a + b = 3
let (a, b, c) = (10, 20, 30) in a + b + c = 60
let pair = (5, 7) in
let (x, y) = pair in x * y = 35
Also retroactively cleaned up Printf's iter-97 width-pos packing
hack ('width * 1000000 + spec_pos') — it's now
'let (width, spec_pos) = parse_width_loop after_flags in ...' like
real OCaml.
parse-atom-postfix's '.()' branch now disambiguates between let-open
and array-get based on whether the head is a module path (':con' or
':field' chain rooted in ':con'). Module paths still emit
(:let-open M EXPR); everything else emits (:array-get ARR I).
Eval handles :array-get by reading the cell's underlying list at
index. The '<-' assignment handler now also accepts :array-get lhs
and rewrites the cell with one position changed.
Idiomatic OCaml array code now works:
let a = Array.make 5 0 in
for i = 0 to 4 do a.(i) <- i * i done;
a.(3) + a.(4) = 25
let a = Array.init 4 (fun i -> i + 1) in
a.(0) + a.(1) + a.(2) + a.(3) = 10
List.(length [1;2;3]) = 3 (* unchanged: List is a module *)
Array module (runtime.sx, OCaml syntax):
Backed by a 'ref of list'. make/length/get/init build the cell;
set rewrites the underlying list with one cell changed (O(n) but
works for short arrays in baseline programs). Includes
iter/iteri/map/mapi/fold_left/to_list/of_list/copy/blit/fill.
(op) operator sections (parser.sx, parse-atom):
When the token after '(' is a binop (any op with non-zero
precedence in the binop table) and the next token is ')', emit
(:fun ('a' 'b') (:op OP a b)) — i.e. (+) becomes fun a b -> a + b.
Recognises every binop including 'mod', 'land', '^', '@', '::',
etc.
Lets us write:
List.fold_left (+) 0 [1;2;3;4;5] = 15
let f = ( * ) in f 6 7 = 42
List.map ((-) 10) [1;2;3] = [9;8;7]
let a = Array.make 5 7 in
Array.set a 2 99;
Array.fold_left (+) 0 a = 127
Tokenizer already had 'lazy' as a keyword. This commit wires it through:
parser : parse-prefix emits (:lazy EXPR), like the existing 'assert'
handler.
eval : creates a one-element cell with state ('Thunk' expr env).
host : _lazy_force flips the cell to ('Forced' v) on first call
and returns the cached value thereafter.
runtime : module Lazy = struct let force lz = _lazy_force lz end.
Memoisation confirmed by tracking a side-effect counter through two
forces of the same lazy:
let counter = ref 0 in
let lz = lazy (counter := !counter + 1; 42) in
let a = Lazy.force lz in
let b = Lazy.force lz in
(a + b) * 100 + !counter = 8401 (= 84*100 + 1)
Tokenizer already classified 'assert' as a keyword; this commit wires
it through:
parser : parse-prefix dispatches like 'not' — advance, recur, wrap
as (:assert EXPR).
eval : evaluate operand; nil on truthy, host-error 'Assert_failure'
on false. Caught cleanly by existing try/with.
assert true; 42 = 42
let x = 5 in assert (x = 5); x + 1 = 6
try (assert false; 0) with _ -> 99 = 99
Side-quests required to land caesar.ml:
1. Top-level 'let r = expr in body' is now an expression decl, not a
broken decl-let. ocaml-parse-program's dispatch now checks
has-matching-in? at every top-level let; if matched, slices via
skip-let-rhs-boundary (which already opens depth on a leading let
with matching in) and ocaml-parse on the slice, wrapping as :expr.
2. runtime.sx: added String.make / String.init / String.map. Used by
caesar.ml's encode = String.init n (fun i -> shift_char s.[i] k).
3. baseline run.sh per-program timeout 240->480s (system load on the
shared host frequently exceeds 240s for large baselines).
caesar.ml exercises:
* the new top-level let-in expression dispatch
* s.[i] string indexing
* Char.code / Char.chr round-trip math
* String.init with a closure that captures k
Test value: Char.code r.[0] + Char.code r.[4] after ROT13(ROT13('hello')) = 104 + 111 = 215.
parse-atom-postfix now dispatches three cases after consuming '.':
.field -> existing field/module access
.(EXPR) -> existing local-open
.[EXPR] -> new string-get syntax (this commit)
Eval reduces (:string-get S I) to host (nth S I), which already returns
a one-character string for OCaml's char model.
Lets us write idiomatic OCaml string traversal:
let s = "hi" in
let n = ref 0 in
for i = 0 to String.length s - 1 do
n := !n + Char.code s.[i]
done;
!n (* = 209 *)
Side-quest emerged from adding roman.ml baseline (Roman numeral greedy
encoding): top-level 'let () = expr' was unsupported because
ocaml-parse-program's parse-decl-let consumed an ident strictly. Now
parse-decl-let recognises a leading '()' as a unit binding and
synthesises a __unit_NN name (matching how parse-let already handles
inner-let unit patterns).
roman.ml exercises:
* tuple list literal [(int * string); ...]
* recursive pattern match on tuple-cons
* String.length + List.fold_left
* the new top-level let () support (sanity in a comment, even though
the program ends with a bare expression for the test harness)
Bumped lib/ocaml/test.sh server timeout 180->360s — the recent surge in
test count plus a CPU-contended host was crowding out the sole epoch
reaching the deeper smarts.
In parse-atom-postfix, after consuming '.', if the next token is '(',
parse the inner expression and emit (:let-open M EXPR) instead of
:field. Cleanly composes with the existing :let-open evaluator and
loops to allow chained dot postfixes.
List.(length [1;2;3]) = 3
List.(map (fun x -> x + 1) [1;2;3]) = [2;3;4]
Option.(map (fun x -> x * 10) (Some 4)) = Some 40
Parser detects 'let open' as a separate let-form, parses M as a path
(Ctor(.Ctor)*) directly via inline AST construction (no source slicing
since cur-pos is only available in ocaml-parse-program), and emits
(:let-open PATH BODY).
Eval resolves the path to a module dict and merges its bindings into
the env for body evaluation. Now:
let open List in map (fun x -> x * 2) [1;2;3] = [2;4;6]
let open Option in map (fun x -> x + 1) (Some 5) = Some 6
let NAME [PARAMS] : T = expr and (expr : T) parse and skip the type
source. Runtime no-op since SX is dynamic. Works in inline let,
top-level let, and parenthesised expressions:
let x : int = 5 ;; x + 1 -> 6
let f (x : int) : int = x + 1 in f 41 -> 42
(5 : int) -> 5
((1 + 2) : int) * 3 -> 9
Parser: in parse-decl-type, dispatch on the post-= token:
'|' or Ctor -> sum type
'{' -> record type
otherwise -> type alias (skip to boundary)
AST (:type-alias NAME PARAMS) with body discarded. Runtime no-op since
SX has no nominal types.
poly_stack.ml baseline exercises:
module type ELEMENT = sig type t val show : t -> string end
module IntElem = struct type t = int let show x = ... end
module Make (E : ELEMENT) = struct ... use E.show ... end
module IntStack = Make(IntElem)
Demonstrates the substrate handles signature decls + abstract types +
functor parameter with sig constraint.
parse-try now consumes optional 'when GUARD-EXPR' before -> and emits
(:case-when PAT GUARD BODY). Eval try clause loop dispatches on case /
case-when and falls through on guard false — same semantics as match.
Examples:
try raise (E 5) with | E n when n > 0 -> n | _ -> 0 = 5
try raise (E (-3)) with | E n when n > 0 -> n | _ -> 0 = 0
try raise (E 5) with | E n when n > 100 -> n | E n -> n + 1000 = 1005
parse-function now consumes optional 'when GUARD-EXPR' before -> and
emits (:case-when PAT GUARD BODY) — same handling as match clauses.
function-style sign extraction now works:
(function | n when n > 0 -> 1 | n when n < 0 -> -1 | _ -> 0)
<- added to op-table at level 1 (same as :=). Eval short-circuits on
<- to mutate the lhs's field via host SX dict-set!. The lhs must be a
:field expression; otherwise raises.
Tested:
let r = { x = 1; y = 2 } in r.x <- 5; r.x (5)
let r = { x = 0 } in for i = 1 to 5 do r.x <- r.x + i done; r.x (15)
let r = { name = ...; age = 30 } in r.name <- "Alice"; r.name
The 'mutable' keyword in record type decls is parsed-and-discarded;
runtime semantics: every field is mutable. Phase 2 closes this gap
without changing the dict-based record representation.
type r = { x : int; mutable y : string } parses to
(:type-def-record NAME PARAMS FIELDS) with FIELDS each (NAME) or
(:mutable NAME). Parser dispatches on { after = to parse field list.
Field-type sources are skipped (HM registration TBD). Runtime no-op
since records already work as dynamic dicts.
Recursive-descent calculator parses '(1 + 2) * 3 + 4' = 13. Two parser
bugs fixed:
1. parse-let now handles inline 'let rec a () = ... and b () = ... in
body' via new (:let-rec-mut BINDINGS BODY) and (:let-mut BINDINGS
BODY) AST shapes; eval handles both.
2. has-matching-in? lookahead no longer stops at 'and' — 'and' is
internal to let-rec, not a decl boundary. Without this fix, the
inner 'let rec a () = ... and b () = ...' inside a let-decl rhs
would have been treated as the start of a new top-level decl.
Baseline exercises mutually-recursive functions, while-loops, ref-cell
imperative parsing, and ADT-based AST construction.
Parser fix: at-app-start? and parse-app's loop recognise prefix !
as a deref of the next app arg. So 'List.rev !b' parses as
'(:app List.rev (:deref b))' instead of stalling at !.
Buffer module backed by a ref holding string list:
create _ = ref []
add_string b s = b := s :: !b
contents b = String.concat "" (List.rev !b)
add_char/length/clear/reset
Parser: when | follows a pattern inside parens, build (:por ALT1 ALT2
...). Eval: try alternatives, succeed on first match. Top-level |
remains the clause separator — parens-only avoids ambiguity without
lookahead.
Examples now work:
match n with | (1 | 2 | 3) -> 100 | _ -> 0
match c with | (Red | Green) -> 1 | Blue -> 2
module type S = sig DECLS end is parsed-and-discarded — sig..end
balanced skipping in parse-decl-module-type. AST (:module-type-def
NAME). Runtime no-op (signatures are type-level only).
Allows real OCaml programs with module type decls to parse and run
without stripping the sig blocks.
Parser: { f1 = pat; f2 = pat; ... } in pattern position emits
(:precord (FIELDNAME PAT)...). Mixed with the existing { in
expression position via the at-pattern-atom? whitelist.
Eval: :precord matches against a dict; required fields must be present
and each pat must match the field's value. Can mix literal+var:
'match { x = 1; y = y } with | { x = 1; y = y } -> y' matches only
when x is 1.
lib/ocaml/baseline/{factorial,list_ops,option_match,module_use,sum_squares}.ml
exercised through ocaml-run-program (file-read F). lib/ocaml/baseline/
run.sh runs them and compares against expected.json — all 5 pass.
To make module_use.ml (with nested let-in) parse, parser's
skip-let-rhs-boundary! now uses has-matching-in? lookahead: a let at
depth 0 in a let-decl rhs opens a nested block IFF a matching in
exists before any decl-keyword. Without that in, the let is a new
top-level decl (preserves test 274 'let x = 1 let y = 2').
This is the first piece of Phase 5.1 'vendor a slice of OCaml
testsuite' — handcrafted fixtures for now, real testsuite TBD.
List: concat/flatten, init, find/find_opt, partition, mapi/iteri,
assoc/assoc_opt. Option: iter/fold/to_list. Result: get_ok/get_error/
map_error/to_option.
Fixed skip-to-boundary! in parser to track let..in / begin..end /
struct..end / for/while..done nesting via a depth counter — without
this, nested-let inside a top-level decl body trips over the
decl-boundary detector. Stdlib functions like List.init / mapi / iteri
use begin..end to make their nested-let intent explicit.
exception NAME [of TYPE] parses to (:exception-def NAME [ARG-SRC]).
Runtime is a no-op: raise/match already work on tagged ctor values, so
'exception E of int;; try raise (E 5) with | E n -> n' end-to-end with
zero new eval logic.
Parser: type [PARAMS] NAME = | Ctor [of T1 [* T2]*] | ...
- PARAMS: optional 'a or ('a, 'b) tyvar list
- AST: (:type-def NAME PARAMS CTORS) with each CTOR (NAME ARG-SOURCES)
- Argument types captured as raw source strings (treated opaquely at
runtime since ctor dispatch is dynamic)
Runtime is a no-op — constructors and pattern matching already work
dynamically. Phase 5 will use these decls to register ctor types for
HM checking.
Pattern parser top wraps cons-pat with 'as ident' -> (:pas PAT NAME).
Match clause parser consumes optional 'when GUARD-EXPR' before -> and
emits (:case-when PAT GUARD BODY) instead of :case.
Eval: :pas matches inner pattern then binds the alias name; case-when
checks the guard after a successful match and falls through to the next
clause if the guard is false.
Or-patterns deferred — ambiguous with clause separator without
parens-only support.
Parser: { f = e; f = e; ... } -> (:record (F E)...). { base with f = e;
... } -> (:record-update BASE (F E)...). Eval builds a dict from field
bindings; record-update merges the new fields over the base dict — the
same dict representation already used for modules.
{ also added to at-app-start? so records are valid arg atoms. Field
access via the existing :field postfix unifies record/module access.
Record patterns deferred to a later iteration.
Parser: try-consume-param! handles ident, wildcard _ (fresh __wild_N
name), unit () (fresh __unit_N), typed (x : T) (skips signature).
parse-fun and parse-let (inline) reuse the helper; top-level
parse-decl-let inlines a similar test.
test.sh timeout bumped from 60s to 180s — the growing suite was hitting
the cap and reporting spurious failures.
Parser collects multiple bindings via 'and', emitting (:def-rec-mut
BINDINGS) for let-rec chains and (:def-mut BINDINGS) for non-rec.
Single bindings keep the existing (:def …) / (:def-rec …) shapes.
Eval (def-rec-mut): allocate placeholder cell per binding, build joint
env where each name forwards through its cell, then evaluate each rhs
against the joint env and fill the cells. Even/odd mutual-rec works.
Parser: module F (M) (N) ... = struct DECLS end -> (:functor-def NAME
PARAMS DECLS). module N = expr (non-struct) -> (:module-alias NAME
BODY-SRC). Functor params accept (P) or (P : Sig) — signatures
parsed-and-skipped via skip-optional-sig.
Eval: ocaml-make-functor builds curried host-SX closures from module
dicts to a module dict. ocaml-resolve-module-path extended for :app so
F(A), F(A)(B), and Outer.Inner all resolve to dicts.
Phase 4 LOC ~290 cumulative (still well under 2000).
Parser: open Path and include Path top-level decls; Path is Ctor (.Ctor)*.
Eval resolves via ocaml-resolve-module-path (same :con-as-module-lookup
escape hatch used by :field). open extends the env with the module's
bindings; include also merges into the surrounding module's exports
(when inside a struct...end).
Path resolver lets M.Sub.x work for nested modules. Phase 4 LOC ~165.
module M = struct DECLS end parsed by sub-tokenising the body source
between struct and the matching end (nesting tracked via struct/begin/
sig/end). Field access is a postfix layer above parse-atom, binding
tighter than application: f r.x -> (:app f (:field r "x")).
Eval (:module-def NAME DECLS) builds a dict via ocaml-eval-module
running decls in a sub-env. (:field EXPR NAME) looks up dict fields,
treating (:con NAME) heads as module-name lookups instead of nullary
ctors so M.x works with M as a module.
Phase 4 LOC so far: ~110 lines (well under 2000 budget).
Parser: try EXPR with | pat -> handler | ... -> (:try EXPR CLAUSES).
Eval delegates to SX guard with else matching the raised value against
clause patterns; re-raises on no-match. raise/failwith/invalid_arg
shipped as builtins. failwith "msg" raises ("Failure" msg) so
| Failure msg -> ... patterns match.
Sugar for fun + match. AST (:function CLAUSES) -> unary closure that
runs ocaml-match-clauses on its arg. let rec recognises :function as a
recursive rhs and ties the knot via cell, so
let rec map f = function | [] -> [] | h::t -> f h :: map f t
works. ocaml-match-eval refactored to share clause-walk with function.
Parser: for i = lo to|downto hi do body done, while cond do body done.
AST: (:for NAME LO HI :ascend|:descend BODY) and (:while COND BODY).
Eval re-binds the loop var per iteration; both forms evaluate to unit.
ref is a builtin boxing its arg in a one-element list. Prefix ! parses
to (:deref ...) and reads via (nth cell 0). := joins the binop
precedence table at level 1 right-assoc and mutates via set-nth!.
Closures share the underlying cell.
Two-phase grammar: parse-expr-no-seq (prior entry) + parse-expr wraps
it with ;-chaining. List bodies keep parse-expr-no-seq so ; remains a
separator inside [...]. Match clause bodies use the seq variant and stop
at | — real OCaml semantics. Trailing ; before end/)/|/in/then/else/eof
permitted.
Patterns: wildcard, literal, var, ctor (nullary + arg, flattens tuple
args so Pair(a,b) -> (:pcon "Pair" PA PB)), tuple, list literal, cons
:: (right-assoc), unit. Match: leading | optional, (:match SCRUT
CLAUSES) with each clause (:case PAT BODY). Body parsed via parse-expr
because | is below level-1 binop precedence.
ocaml-parse-program: program = decls + bare exprs, ;;-separated.
Each decl is (:def …), (:def-rec …), or (:expr …). Body parsing
re-feeds the source slice through ocaml-parse — shared-state refactor
deferred.