The previous List.sort was O(n^2) insertion sort. Replaced with a
straightforward mergesort:
split lst -> alternating-take into ([odd], [even])
merge xs ys -> classic two-finger merge under cmp
sort cmp xs -> base cases [], [x]; otherwise split + recursive
sort on each half + merge
Tuple destructuring on the split result is expressed via nested
match — let-tuple-destructuring would be cleaner but works today.
This benefits sort_uniq (which calls sort first), Set.Make.add via
sort etc., and any user program using List.sort. Stable_sort is
already aliased to sort.
Three things in this commit:
1. Integer / is now truncate-toward-zero on ints, IEEE on floats. The
eval-op handler for '/' checks (number? + (= (round x) x)) on both
sides; if both integral, applies host floor/ceil based on sign;
otherwise falls through to host '/'.
2. Fixes Int.rem, which was returning 0 because (a - b * (a / b))
was using float division: 17 - 5 * 3.4 = 0.0. Now Int.rem 17 5 = 2.
3. Int module fleshed out:
max_int / min_int / zero / one / minus_one,
succ / pred / neg, add / sub / mul / div / rem,
equal, compare.
Also adds globals: max_int, min_int, abs_float, float_of_int,
int_of_float (the latter two are identity in our dynamic runtime).
17 / 5 = 3
-17 / 5 = -3 (trunc toward zero)
Int.rem 17 5 = 2
Int.compare 5 3 = 1
Eight new Array functions, all in OCaml syntax inside runtime.sx,
delegating to the corresponding List operation on the cell's
underlying list:
sort cmp a -> a := List.sort cmp !a (* mutates the cell *)
stable_sort = sort
fast_sort = sort
append a b -> ref (List.append !a !b)
sub a pos n -> ref (take n (drop pos !a))
exists p -> List.exists p !a
for_all p -> List.for_all p !a
mem x a -> List.mem x !a
Round-trip:
let a = Array.of_list [3;1;4;1;5;9;2;6] in
Array.sort compare a;
Array.to_list a = [1;1;2;3;4;5;6;9]
Five '+++++.' groups, cumulative accumulator 5+10+15+20+25 = 75.
This is a brainfuck *subset* — only > < + - . (no [ ] looping). That's
intentional: the goal is to stress imperative idioms that the recently
added Array module + array indexing syntax + s.[i] make ergonomic, all
in one program.
Exercises:
Array.make 256 0
arr.(!ptr)
arr.(!ptr) <- arr.(!ptr) + 1
prog.[!pc]
ref / ! / :=
while + nested if/else if/else if for op dispatch
25 baseline programs total.
Counts primes <= 50, expected 15.
Stresses the recently-added Array module + the new array-indexing
syntax together with nested control flow:
let sieve = Array.make (n + 1) true in
sieve.(0) <- false;
sieve.(1) <- false;
for i = 2 to n do
if sieve.(i) then begin
let j = ref (i * i) in
while !j <= n do
sieve.(!j) <- false;
j := !j + i
done
end
done;
...
Exercises: Array.make, arr.(i), arr.(i) <- v, nested for/while,
begin..end blocks, ref/!/:=, integer arithmetic. 24 baseline
programs total.
parse-atom-postfix's '.()' branch now disambiguates between let-open
and array-get based on whether the head is a module path (':con' or
':field' chain rooted in ':con'). Module paths still emit
(:let-open M EXPR); everything else emits (:array-get ARR I).
Eval handles :array-get by reading the cell's underlying list at
index. The '<-' assignment handler now also accepts :array-get lhs
and rewrites the cell with one position changed.
Idiomatic OCaml array code now works:
let a = Array.make 5 0 in
for i = 0 to 4 do a.(i) <- i * i done;
a.(3) + a.(4) = 25
let a = Array.init 4 (fun i -> i + 1) in
a.(0) + a.(1) + a.(2) + a.(3) = 10
List.(length [1;2;3]) = 3 (* unchanged: List is a module *)
Array module (runtime.sx, OCaml syntax):
Backed by a 'ref of list'. make/length/get/init build the cell;
set rewrites the underlying list with one cell changed (O(n) but
works for short arrays in baseline programs). Includes
iter/iteri/map/mapi/fold_left/to_list/of_list/copy/blit/fill.
(op) operator sections (parser.sx, parse-atom):
When the token after '(' is a binop (any op with non-zero
precedence in the binop table) and the next token is ')', emit
(:fun ('a' 'b') (:op OP a b)) — i.e. (+) becomes fun a b -> a + b.
Recognises every binop including 'mod', 'land', '^', '@', '::',
etc.
Lets us write:
List.fold_left (+) 0 [1;2;3;4;5] = 15
let f = ( * ) in f 6 7 = 42
List.map ((-) 10) [1;2;3] = [9;8;7]
let a = Array.make 5 7 in
Array.set a 2 99;
Array.fold_left (+) 0 a = 127
Inline CSV-like text:
a,1,extra
b,2,extra
c,3,extra
d,4,extra
Two-stage String.split_on_char: first on '\n' for rows, then on ','
for fields per row. List.fold_left accumulates int_of_string of the
second field across rows. Result = 1+2+3+4 = 10.
Exercises char escapes inside string literals ('\n'), nested
String.split_on_char, List.fold_left with a non-trivial closure body,
and int_of_string. 23 baseline programs total.
Tokenizer already classified backtick-uppercase as a ctor identical
to a nominal one, but it had never been exercised by the suite. This
commit adds three smoke tests confirming that nullary, n-ary, and
list-of-polyvariant patterns all match:
let x = polyvar(Foo) in match x with polyvar(Foo) -> 1 | polyvar(Bar) -> 2
let x = polyvar(Pair) (5, 7) in
match x with polyvar(Pair) (a, b) -> a + b | _ -> 0
List.map (fun x -> match x with polyvar(On) -> 1 | polyvar(Off) -> 0)
[polyvar(On); polyvar(Off); polyvar(On)]
(In the actual SX, polyvar(X) is the literal backtick-X — backticks
in this commit message are escaped to avoid shell interpretation.)
Since OCaml-on-SX is dynamic, there's no structural row inference,
but matching by tag works.
sort_uniq:
Sort with the user comparator, then walk the sorted list dropping
any element equal to its predecessor. Output is sorted and unique.
List.sort_uniq compare [3;1;2;1;3;2;4] = [1;2;3;4]
find_map:
Walk until the user fn returns Some v; return that. If all None,
return None.
List.find_map (fun x -> if x > 5 then Some (x * 2) else None)
[1;2;3;6;7]
= Some 12
Both defined in OCaml syntax in runtime.sx — no host primitive
needed since they're pure list traversals over existing operations.
Six new String functions, all in OCaml syntax inside runtime.sx:
iter : index-walk with side-effecting f
iteri : iter with index
fold_left : thread accumulator left-to-right
fold_right: thread accumulator right-to-left
to_seq : return a char list (lazy in real OCaml; eager here)
of_seq : concat a char list back to a string
Round-trip:
String.of_seq (List.rev (String.to_seq "hello")) = "olleh"
Note: real OCaml's Seq is lazy. We return a plain list because the
existing stdlib already provides exhaustive list operations and we
don't yet have lazy sequences. If a baseline needs Seq.unfold or
similar, we'll graduate to a proper Seq module then.
frequency.ml exercises the recently-added Hashtbl.iter / fold +
Hashtbl.find_opt + s.[i] indexing + for-loop together: build a
char-count table for 'abracadabra' then take the max via
Hashtbl.fold. Expected = 5 (a x 5). Total 25 baseline programs.
Format module added as a thin alias of Printf — sprintf, printf, and
asprintf all delegate to Printf.sprintf. The dynamic runtime doesn't
distinguish boxes/breaks, so format strings work the same as in
Printf and most Format-using OCaml programs now compile.
Tokenizer already had 'lazy' as a keyword. This commit wires it through:
parser : parse-prefix emits (:lazy EXPR), like the existing 'assert'
handler.
eval : creates a one-element cell with state ('Thunk' expr env).
host : _lazy_force flips the cell to ('Forced' v) on first call
and returns the cached value thereafter.
runtime : module Lazy = struct let force lz = _lazy_force lz end.
Memoisation confirmed by tracking a side-effect counter through two
forces of the same lazy:
let counter = ref 0 in
let lz = lazy (counter := !counter + 1; 42) in
let a = Lazy.force lz in
let b = Lazy.force lz in
(a + b) * 100 + !counter = 8401 (= 84*100 + 1)
New host primitive _hashtbl_to_list returns the entries as a list of
OCaml tuples — ('tuple' k v) form, matching the AST representation
that the pattern-match VM (:ptuple) expects. Without that exact
shape, '(k, v) :: rest' patterns fail to match.
Hashtbl.iter / Hashtbl.fold in runtime walk that list with the user
fn. This closes a long-standing gap: previously Hashtbl was opaque
once values were written (we could only find_opt one key at a time).
let t = Hashtbl.create 4 in
Hashtbl.add t "a" 1; Hashtbl.add t "b" 2; Hashtbl.add t "c" 3;
Hashtbl.fold (fun _ v acc -> acc + v) t 0 = 6
Replaces the stub sprintf in runtime.sx with a real implementation:
walk fmt char-by-char accumulating a prefix; on recognised %X return a
one-arg fn that formats the arg and recurses on the rest of fmt. The
function self-curries to the spec count — there's no separate arity
machinery, just a closure chain.
Specs: %d (int), %s (string), %f (float), %c (char/string in our model),
%b (bool), %% (literal). Unknown specs pass through.
Same expression returns a string (no specs) or a function (>=1 spec) —
OCaml proper would reject this; works fine in OCaml-on-SX's dynamic
runtime.
Also adds top-level aliases:
string_of_int = _string_of_int
string_of_float = _string_of_float
string_of_bool = if b then "true" else "false"
int_of_string = _int_of_string
Printf.sprintf "x=%d" 42 = "x=42"
Printf.sprintf "%s = %d" "answer" 42 = "answer = 42"
Printf.sprintf "%d%%" 50 = "50%"
Tokenizer already classified 'assert' as a keyword; this commit wires
it through:
parser : parse-prefix dispatches like 'not' — advance, recur, wrap
as (:assert EXPR).
eval : evaluate operand; nil on truthy, host-error 'Assert_failure'
on false. Caught cleanly by existing try/with.
assert true; 42 = 42
let x = 5 in assert (x = 5); x + 1 = 6
try (assert false; 0) with _ -> 99 = 99
Recursive Levenshtein edit distance with no memoization (the test
strings are short enough for the exponential-without-memo version to
fit in <2 minutes on contended hosts). Sums distances for five short
pairs:
('abc','abx') + ('ab','ba') + ('abc','axyc') + ('','abcd') + ('ab','')
= 1 + 2 + 2 + 4 + 2 = 11
Exercises:
* curried four-arg recursion
* s.[i] equality test (char comparison)
* min nested twice for the three-way recurrence
* mixed empty-string base cases
Side-quests required to land caesar.ml:
1. Top-level 'let r = expr in body' is now an expression decl, not a
broken decl-let. ocaml-parse-program's dispatch now checks
has-matching-in? at every top-level let; if matched, slices via
skip-let-rhs-boundary (which already opens depth on a leading let
with matching in) and ocaml-parse on the slice, wrapping as :expr.
2. runtime.sx: added String.make / String.init / String.map. Used by
caesar.ml's encode = String.init n (fun i -> shift_char s.[i] k).
3. baseline run.sh per-program timeout 240->480s (system load on the
shared host frequently exceeds 240s for large baselines).
caesar.ml exercises:
* the new top-level let-in expression dispatch
* s.[i] string indexing
* Char.code / Char.chr round-trip math
* String.init with a closure that captures k
Test value: Char.code r.[0] + Char.code r.[4] after ROT13(ROT13('hello')) = 104 + 111 = 215.
parse-atom-postfix now dispatches three cases after consuming '.':
.field -> existing field/module access
.(EXPR) -> existing local-open
.[EXPR] -> new string-get syntax (this commit)
Eval reduces (:string-get S I) to host (nth S I), which already returns
a one-character string for OCaml's char model.
Lets us write idiomatic OCaml string traversal:
let s = "hi" in
let n = ref 0 in
for i = 0 to String.length s - 1 do
n := !n + Char.code s.[i]
done;
!n (* = 209 *)
OCaml kernel changes:
sx_types.ml:
- Add l_call_count : int field to lambda type — counts how many times
a named lambda has been invoked through the VM dispatch path.
- Add module-level refs jit_threshold (default 4), jit_compiled_count,
jit_skipped_count, jit_threshold_skipped_count for stats.
Refs live here (not sx_vm) so sx_primitives can read them without
creating a sx_primitives → sx_vm dependency cycle.
sx_vm.ml:
- In the Lambda case of cek_call_or_suspend, before triggering the JIT,
increment l.l_call_count. Only call jit_compile_ref if count >= the
runtime-tunable threshold. Below threshold, fall through to the
existing cek_call_or_suspend path (interpreter-style).
sx_primitives.ml:
- Register jit-stats — returns dict {threshold, compiled, compile-failed,
below-threshold}.
- Register jit-set-threshold! N — change threshold at runtime.
- Register jit-reset-counters! — zero the stats counters.
bin/run_tests.ml:
- Add l_call_count = 0 to the test-fixture lambda construction.
Effect: lambdas only get JIT-compiled after the 4th invocation. One-shot
lambdas (test harness wrappers, eval-hs throwaways, REPL inputs) never enter
the JIT cache, eliminating the cumulative slowdown that the batched runner
currently works around. Hot paths (component renders, event handlers) cross
the threshold within a handful of calls and get the full JIT speed.
Phase 2 (LRU eviction) and Phase 3 (jit-reset! / jit-clear-cold!) follow.
Verified: 4771 passed, 1111 failed in OCaml run_tests.exe — identical to
baseline before this change. No regressions; tiered logic is correct.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Side-quest emerged from adding roman.ml baseline (Roman numeral greedy
encoding): top-level 'let () = expr' was unsupported because
ocaml-parse-program's parse-decl-let consumed an ident strictly. Now
parse-decl-let recognises a leading '()' as a unit binding and
synthesises a __unit_NN name (matching how parse-let already handles
inner-let unit patterns).
roman.ml exercises:
* tuple list literal [(int * string); ...]
* recursive pattern match on tuple-cons
* String.length + List.fold_left
* the new top-level let () support (sanity in a comment, even though
the program ends with a bare expression for the test harness)
Bumped lib/ocaml/test.sh server timeout 180->360s — the recent surge in
test count plus a CPU-contended host was crowding out the sole epoch
reaching the deeper smarts.
Parser hk-parse-parens gains a `::` arm after the first inner expression:
consume `::`, parse a type via the existing hk-parse-type, expect `)`,
emit (:type-ann EXPR TYPE). Sections, tuples, parenthesised expressions
and unit `()` are unchanged.
Desugar drops the annotation — :type-ann E _ → (hk-desugar E) — since
the existing eval path has no type-directed dispatch. Phase 20 will
extend infer.sx to consume the annotation and unify against the
inferred type.
tests/parse-extras.sx (12/12) covers literal, arithmetic, function arg,
string, bool, tuple, nested annotation, function-typed annotation, and
no-regression checks for plain parens / 3-tuples / left+right sections.
eval (66/0), exceptions (14/0), typecheck (15/0), records (14/0), ioref
(13/0) all still clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bytecode + sx_browser.bc.{js,wasm.js} regenerated from sources updated
by the hs-f merge (e8246340). No semantic change — these are build
outputs catching up to their inputs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Tcl tokenizer treats $::g-name as $::g + literal -name, so the var
lookup fails. Renamed test vars to ::gname / ::nval (no hyphens).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Five "typed ok: …" tests in tests/typecheck.sx compared an unforced thunk
against an integer/list. The untyped-path convention is hk-deep-force on
the result; hk-run-typed follows the same shape but the tests omitted
that wrap. Added hk-deep-force around hk-run-typed in those five tests.
typecheck.sx now 15/15; infer.sx still 75/75.
Plan adds three phases capturing the remaining type-system work:
- Phase 20: Algorithm W gaps (case, do, record accessors, expression
annotations).
- Phase 21: type classes with qualified types ([Eq a] => …) and
constraint propagation, integrated with the existing dict-passing
evaluator.
- Phase 22: typecheck-then-run as the default conformance path, with a
≥ 30/36 typechecking threshold before swap.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
In parse-atom-postfix, after consuming '.', if the next token is '(',
parse the inner expression and emit (:let-open M EXPR) instead of
:field. Cleanly composes with the existing :let-open evaluator and
loops to allow chained dot postfixes.
List.(length [1;2;3]) = 3
List.(map (fun x -> x + 1) [1;2;3]) = [2;3;4]
Option.(map (fun x -> x * 10) (Some 4)) = Some 40
Parser detects 'let open' as a separate let-form, parses M as a path
(Ctor(.Ctor)*) directly via inline AST construction (no source slicing
since cur-pos is only available in ocaml-parse-program), and emits
(:let-open PATH BODY).
Eval resolves the path to a module dict and merges its bindings into
the env for body evaluation. Now:
let open List in map (fun x -> x * 2) [1;2;3] = [2;4;6]
let open Option in map (fun x -> x + 1) (Some 5) = Some 6
ocaml-eval-module now handles :def-mut and :def-rec-mut decls so
'module M = struct let rec a n = ... and b n = ... end' works. The
def-rec-mut version uses cell-based mutual recursion exactly as the
top-level version.
Graph BFS using Queue + Hashtbl visited-set + List.assoc_opt + List.iter.
Returns 6 for a graph where A reaches B/C/D/E/F. Demonstrates 4 stdlib
modules (Queue, Hashtbl, List) cooperating in a real algorithm.
let NAME [PARAMS] : T = expr and (expr : T) parse and skip the type
source. Runtime no-op since SX is dynamic. Works in inline let,
top-level let, and parenthesised expressions:
let x : int = 5 ;; x + 1 -> 6
let f (x : int) : int = x + 1 in f 41 -> 42
(5 : int) -> 5
((1 + 2) : int) * 3 -> 9
Parser: in parse-decl-type, dispatch on the post-= token:
'|' or Ctor -> sum type
'{' -> record type
otherwise -> type alias (skip to boundary)
AST (:type-alias NAME PARAMS) with body discarded. Runtime no-op since
SX has no nominal types.
poly_stack.ml baseline exercises:
module type ELEMENT = sig type t val show : t -> string end
module IntElem = struct type t = int let show x = ... end
module Make (E : ELEMENT) = struct ... use E.show ... end
module IntStack = Make(IntElem)
Demonstrates the substrate handles signature decls + abstract types +
functor parameter with sig constraint.
parse-try now consumes optional 'when GUARD-EXPR' before -> and emits
(:case-when PAT GUARD BODY). Eval try clause loop dispatches on case /
case-when and falls through on guard false — same semantics as match.
Examples:
try raise (E 5) with | E n when n > 0 -> n | _ -> 0 = 5
try raise (E (-3)) with | E n when n > 0 -> n | _ -> 0 = 0
try raise (E 5) with | E n when n > 100 -> n | E n -> n + 1000 = 1005
parse-function now consumes optional 'when GUARD-EXPR' before -> and
emits (:case-when PAT GUARD BODY) — same handling as match clauses.
function-style sign extraction now works:
(function | n when n > 0 -> 1 | n when n < 0 -> -1 | _ -> 0)
Group anagrams by canonical (sorted-chars) key using Hashtbl +
List.sort. Demonstrates char-by-char traversal via String.get + for-loop +
ref accumulator + Hashtbl as a multi-valued counter.
Untyped lambda calculus interpreter inside OCaml-on-SX:
type term = Var | Abs of string * term | App | Num of int
type value = VNum of int | VClos of string * term * env
let rec eval env t = match t with ...
(\x.\y.x) 7 99 = 7. The substrate handles two ADTs, recursive eval,
closure-based env, and pattern matching all written as a single
self-contained OCaml program — strong validation.
ocaml-type-of-program now handles :def-mut (sequential generalize) and
:def-rec-mut (pre-bind tvs, infer rhs, unify, generalize all, infer
body — same algorithm as the inline let-rec-mut version).
Mutual top-level recursion now type-checks:
let rec even n = ... and odd n = ...;; even 10 : Bool
let rec map f xs = ... and length lst = ...;; map :
('a -> 'b) -> 'a list -> 'b list
Memoized fibonacci using Hashtbl.find_opt + Hashtbl.add.
fib(25) = 75025. Demonstrates mutable Hashtbl through the OCaml
stdlib API in real recursive code.
4-queens via recursive backtracking + List.fold_left. Returns 2 (the
two solutions of 4-queens). Per-program timeout in run.sh bumped to
240s — the tree-walking interpreter is slow on heavy recursion but
correct.
The substrate handles full backtracking + safe-check recursion +
list-driven candidate enumeration end-to-end.
Counter-style record with two mutable fields. Validates the new
r.f <- v field mutation end-to-end through type decl + record literal
+ field access + field assignment + sequence operator.
type counter = { mutable count : int; mutable last : int }
let bump c = c.count <- c.count + 1 ; c.last <- c.count
After 5 bumps: count=5, last=5, sum=10.
<- added to op-table at level 1 (same as :=). Eval short-circuits on
<- to mutate the lhs's field via host SX dict-set!. The lhs must be a
:field expression; otherwise raises.
Tested:
let r = { x = 1; y = 2 } in r.x <- 5; r.x (5)
let r = { x = 0 } in for i = 1 to 5 do r.x <- r.x + i done; r.x (15)
let r = { name = ...; age = 30 } in r.name <- "Alice"; r.name
The 'mutable' keyword in record type decls is parsed-and-discarded;
runtime semantics: every field is mutable. Phase 2 closes this gap
without changing the dict-based record representation.