Real OCaml's Seq.t is 'unit -> Cons of elt * Seq.t | Nil' — a lazy
thunk that lets you build infinite sequences. Ours is just a list,
which gives the right shape for everything in baseline programs that
don't rely on laziness (taking from infinite sequences would force
memory).
API: empty, cons, return, is_empty, iter, iteri, map, filter,
filter_map, fold_left, length, take, drop, append, to_list,
of_list, init, unfold.
unfold takes a step fn 'acc -> Option (elt * acc)' and threads
through until it returns None:
Seq.fold_left (+) 0
(Seq.unfold (fun n -> if n > 4 then None
else Some (n, n + 1))
1)
= 1 + 2 + 3 + 4 = 10
Functors were already wired through ocaml-make-functor in eval.sx
(curried host closure consuming module dicts) but had no explicit
tests for the user-defined Ord application path. This commit adds
three smoke tests that confirm:
module IntOrd = struct let compare a b = a - b end
module S = Set.Make (IntOrd)
S.elements (fold-add [5;1;3;1;5]) sums to 9 (dedupe + sort)
S.mem 2 (S.add 1 (S.add 2 (S.add 3 S.empty))) = true
M.cardinal (M.add 1 'a' (M.add 2 'b' M.empty)) = 2
The Ord parameter is properly threaded through the functor body —
elements are sorted in compare order and dedupe works.
Three parser changes:
1. at-app-start? returns true on op '~' or '?' so the app loop
keeps consuming labeled args.
2. The app arg parser handles:
~name:VAL drop label, parse VAL as the arg
?name:VAL same
~name punning -- treat as (:var name)
?name same
3. try-consume-param! drops '~' or '?' and treats the following
ident as a regular positional param name.
Caveats:
- Order in the call must match definition order; we don't reorder
by label name.
- Optional args don't auto-wrap in Some, so the function body sees
the raw value for ?x:V.
Lets us write idiomatic-looking OCaml even though the runtime is
positional underneath:
let f ~x ~y = x + y in f ~x:3 ~y:7 = 10
let x = 4 in let y = 5 in f ~x ~y = 20 (punning)
let f ?x ~y = x + y in f ?x:1 ~y:2 = 3
parse-decl-let lives in the outer ocaml-parse-program scope and does
not have access to parse-pattern (which is local to ocaml-parse).
Source-slicing approach instead:
1. detect '(IDENT, ...)' in collect-params
2. scan tokens to the matching ')' (tracking nested parens)
3. slice the pattern source string from src
4. push (synth_name, pat_src) onto tuple-srcs
Then after collecting params, the rhs source string gets wrapped with
'match SN with PAT_SRC -> (RHS_SRC)' for each tuple-param,
innermost-first, and the final string is fed through ocaml-parse.
End result is the same AST shape as the iteration-102 inner-let
case: a function whose body destructures a synthetic name.
let f (a, b) = a + b ;; f (3, 7) = 10
let g x (a, b) = x + a + b ;; g 1 (2, 3) = 6
let h (a, b) (c, d) = a * b + c * d
;; h (1, 2) (3, 4) = 14
Mirrors iteration 101's parse-fun change inside parse-let's
parse-one!:
- same '(IDENT, ...)' detection on collect-params
- same __pat_N synth name for the function param
- same innermost-first match-wrapping
Difference: for inner-let the wrapping is applied to the rhs of the
let-binding (which is the function value), not directly to a fun
body.
let f (a, b) = a + b in f (3, 7) = 10
let g x (a, b) = x + a + b in g 1 (2, 3) = 6
let h (a, b) (c, d) = a * b + c * d
in h (1, 2) (3, 4) = 14
parse-fun's collect-params now detects '(IDENT, ...)' as a
tuple-pattern parameter (lookahead at peek-tok-at 1/2 distinguishes
from '(x : T)' and '()' cases that try-consume-param! already
handles). For each tuple param it:
1. parse-pattern to get the full pattern AST
2. generate a synthetic __pat_N name as the actual fun parameter
3. push (synth_name, pattern) onto tuple-binds
After parsing the body, wraps it innermost-first with one
'match __pat_N with PAT -> ...' per tuple-param. The user-visible
result is a (:fun (params...) body) where params are all simple
names but the body destructures.
Also retroactively simplifies Hashtbl.keys/values from
'fun pair -> match pair with (k, _) -> k' to plain
'fun (k, _) -> k', closing the iteration-99 workaround.
(fun (a, b) -> a + b) (3, 7) = 10
List.map (fun (a, b) -> a * b)
[(1, 2); (3, 4); (5, 6)] = [2; 12; 30]
List.map (fun (k, _) -> k)
[("a", 1); ("b", 2)] = ["a"; "b"]
(fun a (b, c) d -> a + b + c + d) 1 (2, 3) 4 = 10
Linear-congruential PRNG with mutable seed (_state ref). API:
init s seed the PRNG
self_init () default seed (1)
int bound 0 <= n < bound
bool () fair coin
float bound uniform in [0, bound)
bits () 30 bits
Stepping rule:
state := (state * 1103515245 + 12345) mod 2147483647
result := |state| mod bound
Same seed reproduces the same sequence. Real OCaml's Random uses
Lagged Fibonacci; ours is simpler but adequate for shuffles and
Monte Carlo demos in baseline programs.
Random.init 42; Random.int 100 = 48
Random.init 1; Random.int 10 = 0
Two new host primitives:
_hashtbl_remove t k -> dissoc the key from the underlying dict
_hashtbl_clear t -> reset the cell to {}
Eight new OCaml-syntax helpers in runtime.sx Hashtbl module:
bindings t = _hashtbl_to_list t
keys t = List.map (fun (k, _) -> k) (...)
values t = List.map (fun (_, v) -> v) (...)
to_seq t = bindings t
to_seq_keys / to_seq_values
remove / clear / reset
The keys/values implementations use a 'fun pair -> match pair with
(k, _) -> k' indirection because parse-fun does not currently allow
tuple patterns directly on parameters. Same restriction we worked
around in iteration 98's let-pattern desugaring.
Also: a detour attempting to add top-level 'let (a, b) = expr'
support was started but reverted — parse-decl-let in the outer
ocaml-parse-program scope does not have access to parse-pattern
(which is local to ocaml-parse). Will need a slice + re-parse trick
later.
When 'let' is followed by '(', parse-let now reads a full pattern
(via the existing parse-pattern used by match), expects '=', then
'in', and desugars to:
let PATTERN = EXPR in BODY => match EXPR with PATTERN -> BODY
This reuses the entire pattern-matching machinery, so any pattern
the match parser accepts works here too — paren-tuples, nested
tuples, cons patterns, list patterns. No 'rec' allowed for pattern
bindings (real OCaml's restriction).
let (a, b) = (1, 2) in a + b = 3
let (a, b, c) = (10, 20, 30) in a + b + c = 60
let pair = (5, 7) in
let (x, y) = pair in x * y = 35
Also retroactively cleaned up Printf's iter-97 width-pos packing
hack ('width * 1000000 + spec_pos') — it's now
'let (width, spec_pos) = parse_width_loop after_flags in ...' like
real OCaml.
The Printf walker now parses optional flags + width digits between
'%' and the spec letter:
- left-align (default is right-align)
0 zero-pad (default is space-pad; only honoured when not left-aligned)
Nd... decimal width digits (any number)
After formatting the argument into a base string with the existing
spec dispatch (%d/%i/%u/%s/%f/%c/%b/%x/%X/%o), the result is padded
to the requested width.
Workaround: width and spec_pos are returned packed as
width * 1000000 + spec_pos
because the parser does not yet support tuple destructuring in let
('let (a, b) = expr in body' fails with 'expected ident'). TODO: lift
that limitation; for now the encoding round-trips losslessly for any
practical width.
Printf.sprintf '%5d' 42 = ' 42'
Printf.sprintf '%-5d|' 42 = '42 |'
Printf.sprintf '%05d' 42 = '00042'
Printf.sprintf '%4s' 'hi' = ' hi'
Printf.sprintf 'hi=%-3d, hex=%04x' 9 15 = 'hi=9 , hex=000f'
The previous List.sort was O(n^2) insertion sort. Replaced with a
straightforward mergesort:
split lst -> alternating-take into ([odd], [even])
merge xs ys -> classic two-finger merge under cmp
sort cmp xs -> base cases [], [x]; otherwise split + recursive
sort on each half + merge
Tuple destructuring on the split result is expressed via nested
match — let-tuple-destructuring would be cleaner but works today.
This benefits sort_uniq (which calls sort first), Set.Make.add via
sort etc., and any user program using List.sort. Stable_sort is
already aliased to sort.
Three things in this commit:
1. Integer / is now truncate-toward-zero on ints, IEEE on floats. The
eval-op handler for '/' checks (number? + (= (round x) x)) on both
sides; if both integral, applies host floor/ceil based on sign;
otherwise falls through to host '/'.
2. Fixes Int.rem, which was returning 0 because (a - b * (a / b))
was using float division: 17 - 5 * 3.4 = 0.0. Now Int.rem 17 5 = 2.
3. Int module fleshed out:
max_int / min_int / zero / one / minus_one,
succ / pred / neg, add / sub / mul / div / rem,
equal, compare.
Also adds globals: max_int, min_int, abs_float, float_of_int,
int_of_float (the latter two are identity in our dynamic runtime).
17 / 5 = 3
-17 / 5 = -3 (trunc toward zero)
Int.rem 17 5 = 2
Int.compare 5 3 = 1
Eight new Array functions, all in OCaml syntax inside runtime.sx,
delegating to the corresponding List operation on the cell's
underlying list:
sort cmp a -> a := List.sort cmp !a (* mutates the cell *)
stable_sort = sort
fast_sort = sort
append a b -> ref (List.append !a !b)
sub a pos n -> ref (take n (drop pos !a))
exists p -> List.exists p !a
for_all p -> List.for_all p !a
mem x a -> List.mem x !a
Round-trip:
let a = Array.of_list [3;1;4;1;5;9;2;6] in
Array.sort compare a;
Array.to_list a = [1;1;2;3;4;5;6;9]
parse-atom-postfix's '.()' branch now disambiguates between let-open
and array-get based on whether the head is a module path (':con' or
':field' chain rooted in ':con'). Module paths still emit
(:let-open M EXPR); everything else emits (:array-get ARR I).
Eval handles :array-get by reading the cell's underlying list at
index. The '<-' assignment handler now also accepts :array-get lhs
and rewrites the cell with one position changed.
Idiomatic OCaml array code now works:
let a = Array.make 5 0 in
for i = 0 to 4 do a.(i) <- i * i done;
a.(3) + a.(4) = 25
let a = Array.init 4 (fun i -> i + 1) in
a.(0) + a.(1) + a.(2) + a.(3) = 10
List.(length [1;2;3]) = 3 (* unchanged: List is a module *)
Array module (runtime.sx, OCaml syntax):
Backed by a 'ref of list'. make/length/get/init build the cell;
set rewrites the underlying list with one cell changed (O(n) but
works for short arrays in baseline programs). Includes
iter/iteri/map/mapi/fold_left/to_list/of_list/copy/blit/fill.
(op) operator sections (parser.sx, parse-atom):
When the token after '(' is a binop (any op with non-zero
precedence in the binop table) and the next token is ')', emit
(:fun ('a' 'b') (:op OP a b)) — i.e. (+) becomes fun a b -> a + b.
Recognises every binop including 'mod', 'land', '^', '@', '::',
etc.
Lets us write:
List.fold_left (+) 0 [1;2;3;4;5] = 15
let f = ( * ) in f 6 7 = 42
List.map ((-) 10) [1;2;3] = [9;8;7]
let a = Array.make 5 7 in
Array.set a 2 99;
Array.fold_left (+) 0 a = 127
Tokenizer already classified backtick-uppercase as a ctor identical
to a nominal one, but it had never been exercised by the suite. This
commit adds three smoke tests confirming that nullary, n-ary, and
list-of-polyvariant patterns all match:
let x = polyvar(Foo) in match x with polyvar(Foo) -> 1 | polyvar(Bar) -> 2
let x = polyvar(Pair) (5, 7) in
match x with polyvar(Pair) (a, b) -> a + b | _ -> 0
List.map (fun x -> match x with polyvar(On) -> 1 | polyvar(Off) -> 0)
[polyvar(On); polyvar(Off); polyvar(On)]
(In the actual SX, polyvar(X) is the literal backtick-X — backticks
in this commit message are escaped to avoid shell interpretation.)
Since OCaml-on-SX is dynamic, there's no structural row inference,
but matching by tag works.
sort_uniq:
Sort with the user comparator, then walk the sorted list dropping
any element equal to its predecessor. Output is sorted and unique.
List.sort_uniq compare [3;1;2;1;3;2;4] = [1;2;3;4]
find_map:
Walk until the user fn returns Some v; return that. If all None,
return None.
List.find_map (fun x -> if x > 5 then Some (x * 2) else None)
[1;2;3;6;7]
= Some 12
Both defined in OCaml syntax in runtime.sx — no host primitive
needed since they're pure list traversals over existing operations.
Six new String functions, all in OCaml syntax inside runtime.sx:
iter : index-walk with side-effecting f
iteri : iter with index
fold_left : thread accumulator left-to-right
fold_right: thread accumulator right-to-left
to_seq : return a char list (lazy in real OCaml; eager here)
of_seq : concat a char list back to a string
Round-trip:
String.of_seq (List.rev (String.to_seq "hello")) = "olleh"
Note: real OCaml's Seq is lazy. We return a plain list because the
existing stdlib already provides exhaustive list operations and we
don't yet have lazy sequences. If a baseline needs Seq.unfold or
similar, we'll graduate to a proper Seq module then.
frequency.ml exercises the recently-added Hashtbl.iter / fold +
Hashtbl.find_opt + s.[i] indexing + for-loop together: build a
char-count table for 'abracadabra' then take the max via
Hashtbl.fold. Expected = 5 (a x 5). Total 25 baseline programs.
Format module added as a thin alias of Printf — sprintf, printf, and
asprintf all delegate to Printf.sprintf. The dynamic runtime doesn't
distinguish boxes/breaks, so format strings work the same as in
Printf and most Format-using OCaml programs now compile.
Tokenizer already had 'lazy' as a keyword. This commit wires it through:
parser : parse-prefix emits (:lazy EXPR), like the existing 'assert'
handler.
eval : creates a one-element cell with state ('Thunk' expr env).
host : _lazy_force flips the cell to ('Forced' v) on first call
and returns the cached value thereafter.
runtime : module Lazy = struct let force lz = _lazy_force lz end.
Memoisation confirmed by tracking a side-effect counter through two
forces of the same lazy:
let counter = ref 0 in
let lz = lazy (counter := !counter + 1; 42) in
let a = Lazy.force lz in
let b = Lazy.force lz in
(a + b) * 100 + !counter = 8401 (= 84*100 + 1)
New host primitive _hashtbl_to_list returns the entries as a list of
OCaml tuples — ('tuple' k v) form, matching the AST representation
that the pattern-match VM (:ptuple) expects. Without that exact
shape, '(k, v) :: rest' patterns fail to match.
Hashtbl.iter / Hashtbl.fold in runtime walk that list with the user
fn. This closes a long-standing gap: previously Hashtbl was opaque
once values were written (we could only find_opt one key at a time).
let t = Hashtbl.create 4 in
Hashtbl.add t "a" 1; Hashtbl.add t "b" 2; Hashtbl.add t "c" 3;
Hashtbl.fold (fun _ v acc -> acc + v) t 0 = 6
Replaces the stub sprintf in runtime.sx with a real implementation:
walk fmt char-by-char accumulating a prefix; on recognised %X return a
one-arg fn that formats the arg and recurses on the rest of fmt. The
function self-curries to the spec count — there's no separate arity
machinery, just a closure chain.
Specs: %d (int), %s (string), %f (float), %c (char/string in our model),
%b (bool), %% (literal). Unknown specs pass through.
Same expression returns a string (no specs) or a function (>=1 spec) —
OCaml proper would reject this; works fine in OCaml-on-SX's dynamic
runtime.
Also adds top-level aliases:
string_of_int = _string_of_int
string_of_float = _string_of_float
string_of_bool = if b then "true" else "false"
int_of_string = _int_of_string
Printf.sprintf "x=%d" 42 = "x=42"
Printf.sprintf "%s = %d" "answer" 42 = "answer = 42"
Printf.sprintf "%d%%" 50 = "50%"
Tokenizer already classified 'assert' as a keyword; this commit wires
it through:
parser : parse-prefix dispatches like 'not' — advance, recur, wrap
as (:assert EXPR).
eval : evaluate operand; nil on truthy, host-error 'Assert_failure'
on false. Caught cleanly by existing try/with.
assert true; 42 = 42
let x = 5 in assert (x = 5); x + 1 = 6
try (assert false; 0) with _ -> 99 = 99
parse-atom-postfix now dispatches three cases after consuming '.':
.field -> existing field/module access
.(EXPR) -> existing local-open
.[EXPR] -> new string-get syntax (this commit)
Eval reduces (:string-get S I) to host (nth S I), which already returns
a one-character string for OCaml's char model.
Lets us write idiomatic OCaml string traversal:
let s = "hi" in
let n = ref 0 in
for i = 0 to String.length s - 1 do
n := !n + Char.code s.[i]
done;
!n (* = 209 *)
Side-quest emerged from adding roman.ml baseline (Roman numeral greedy
encoding): top-level 'let () = expr' was unsupported because
ocaml-parse-program's parse-decl-let consumed an ident strictly. Now
parse-decl-let recognises a leading '()' as a unit binding and
synthesises a __unit_NN name (matching how parse-let already handles
inner-let unit patterns).
roman.ml exercises:
* tuple list literal [(int * string); ...]
* recursive pattern match on tuple-cons
* String.length + List.fold_left
* the new top-level let () support (sanity in a comment, even though
the program ends with a bare expression for the test harness)
Bumped lib/ocaml/test.sh server timeout 180->360s — the recent surge in
test count plus a CPU-contended host was crowding out the sole epoch
reaching the deeper smarts.
In parse-atom-postfix, after consuming '.', if the next token is '(',
parse the inner expression and emit (:let-open M EXPR) instead of
:field. Cleanly composes with the existing :let-open evaluator and
loops to allow chained dot postfixes.
List.(length [1;2;3]) = 3
List.(map (fun x -> x + 1) [1;2;3]) = [2;3;4]
Option.(map (fun x -> x * 10) (Some 4)) = Some 40
Parser detects 'let open' as a separate let-form, parses M as a path
(Ctor(.Ctor)*) directly via inline AST construction (no source slicing
since cur-pos is only available in ocaml-parse-program), and emits
(:let-open PATH BODY).
Eval resolves the path to a module dict and merges its bindings into
the env for body evaluation. Now:
let open List in map (fun x -> x * 2) [1;2;3] = [2;4;6]
let open Option in map (fun x -> x + 1) (Some 5) = Some 6
ocaml-eval-module now handles :def-mut and :def-rec-mut decls so
'module M = struct let rec a n = ... and b n = ... end' works. The
def-rec-mut version uses cell-based mutual recursion exactly as the
top-level version.
let NAME [PARAMS] : T = expr and (expr : T) parse and skip the type
source. Runtime no-op since SX is dynamic. Works in inline let,
top-level let, and parenthesised expressions:
let x : int = 5 ;; x + 1 -> 6
let f (x : int) : int = x + 1 in f 41 -> 42
(5 : int) -> 5
((1 + 2) : int) * 3 -> 9
Parser: in parse-decl-type, dispatch on the post-= token:
'|' or Ctor -> sum type
'{' -> record type
otherwise -> type alias (skip to boundary)
AST (:type-alias NAME PARAMS) with body discarded. Runtime no-op since
SX has no nominal types.
poly_stack.ml baseline exercises:
module type ELEMENT = sig type t val show : t -> string end
module IntElem = struct type t = int let show x = ... end
module Make (E : ELEMENT) = struct ... use E.show ... end
module IntStack = Make(IntElem)
Demonstrates the substrate handles signature decls + abstract types +
functor parameter with sig constraint.
parse-try now consumes optional 'when GUARD-EXPR' before -> and emits
(:case-when PAT GUARD BODY). Eval try clause loop dispatches on case /
case-when and falls through on guard false — same semantics as match.
Examples:
try raise (E 5) with | E n when n > 0 -> n | _ -> 0 = 5
try raise (E (-3)) with | E n when n > 0 -> n | _ -> 0 = 0
try raise (E 5) with | E n when n > 100 -> n | E n -> n + 1000 = 1005
parse-function now consumes optional 'when GUARD-EXPR' before -> and
emits (:case-when PAT GUARD BODY) — same handling as match clauses.
function-style sign extraction now works:
(function | n when n > 0 -> 1 | n when n < 0 -> -1 | _ -> 0)
ocaml-type-of-program now handles :def-mut (sequential generalize) and
:def-rec-mut (pre-bind tvs, infer rhs, unify, generalize all, infer
body — same algorithm as the inline let-rec-mut version).
Mutual top-level recursion now type-checks:
let rec even n = ... and odd n = ...;; even 10 : Bool
let rec map f xs = ... and length lst = ...;; map :
('a -> 'b) -> 'a list -> 'b list
<- added to op-table at level 1 (same as :=). Eval short-circuits on
<- to mutate the lhs's field via host SX dict-set!. The lhs must be a
:field expression; otherwise raises.
Tested:
let r = { x = 1; y = 2 } in r.x <- 5; r.x (5)
let r = { x = 0 } in for i = 1 to 5 do r.x <- r.x + i done; r.x (15)
let r = { name = ...; age = 30 } in r.name <- "Alice"; r.name
The 'mutable' keyword in record type decls is parsed-and-discarded;
runtime semantics: every field is mutable. Phase 2 closes this gap
without changing the dict-based record representation.
type r = { x : int; mutable y : string } parses to
(:type-def-record NAME PARAMS FIELDS) with FIELDS each (NAME) or
(:mutable NAME). Parser dispatches on { after = to parse field list.
Field-type sources are skipped (HM registration TBD). Runtime no-op
since records already work as dynamic dicts.
print_string / print_endline / print_int / print_newline now route to
SX display primitive (not the non-existent print/println). print_endline
appends '\n'.
let _ = expr ;; at top level confirmed working via the
wildcard-param parser.
ocaml-infer-let-mut: each rhs inferred in parent env, generalized
sequentially before adding to body env.
ocaml-infer-let-rec-mut: pre-bind all names with fresh tvs; infer
each rhs against the joint env, unify each with its tv, then
generalize all and infer body.
Mutual recursion now type-checks:
let rec even n = if n = 0 then true else odd (n - 1)
and odd n = if n = 0 then false else even (n - 1)
in even : Int -> Bool
Option: join, to_result, some, none.
Result: value, iter, fold.
Bytes: length, get, of_string, to_string, concat, sub — thin alias of
String (SX has no separate immutable byte type).
Ordering fix: Bytes module placed after String so its closures capture
String in scope. Earlier draft put Bytes before String which made
String.* lookups fail with 'not a record/module' (treated as nullary
ctor).
Parser fix: at-app-start? and parse-app's loop recognise prefix !
as a deref of the next app arg. So 'List.rev !b' parses as
'(:app List.rev (:deref b))' instead of stalling at !.
Buffer module backed by a ref holding string list:
create _ = ref []
add_string b s = b := s :: !b
contents b = String.concat "" (List.rev !b)
add_char/length/clear/reset
Both written in OCaml inside lib/ocaml/runtime.sx:
module Map = struct module Make (Ord) = struct
let empty = []
let add k v m = ... (* sorted insert via Ord.compare *)
let find_opt / find / mem / remove / bindings / cardinal
end end
module Set = struct module Make (Ord) = struct
let empty = []
let mem / add / remove / elements / cardinal
end end
Sorted association list / sorted list backing — linear ops but
correct. Strong substrate-validation: Map.Make is a non-trivial
functor implemented entirely on top of the OCaml-on-SX evaluator.
os_type="SX", word_size=64, max_array_length, max_string_length,
executable_name="ocaml-on-sx", big_endian=false, unix=true,
win32=false, cygwin=false. Constants-only for now — argv/getenv_opt/
command would need host platform integration.
ocaml-hm-parse-type-src recognises primitive type names (int/bool/
string/float/unit), tyvars 'a, and simple parametric T list / T option.
Replaces the previous int-by-default placeholder in
ocaml-hm-register-type-def!.
So 'type tag = TStr of string | TInt of int' correctly registers
TStr : string -> tag and TInt : int -> tag. Pattern-match on tag
gives proper field types in the body. Multi-arg / function types
still fall back to a fresh tv.
Parser: when | follows a pattern inside parens, build (:por ALT1 ALT2
...). Eval: try alternatives, succeed on first match. Top-level |
remains the clause separator — parens-only avoids ambiguity without
lookahead.
Examples now work:
match n with | (1 | 2 | 3) -> 100 | _ -> 0
match c with | (Red | Green) -> 1 | Blue -> 2