Linear-congruential PRNG with mutable seed (_state ref). API:
init s seed the PRNG
self_init () default seed (1)
int bound 0 <= n < bound
bool () fair coin
float bound uniform in [0, bound)
bits () 30 bits
Stepping rule:
state := (state * 1103515245 + 12345) mod 2147483647
result := |state| mod bound
Same seed reproduces the same sequence. Real OCaml's Random uses
Lagged Fibonacci; ours is simpler but adequate for shuffles and
Monte Carlo demos in baseline programs.
Random.init 42; Random.int 100 = 48
Random.init 1; Random.int 10 = 0
Two new host primitives:
_hashtbl_remove t k -> dissoc the key from the underlying dict
_hashtbl_clear t -> reset the cell to {}
Eight new OCaml-syntax helpers in runtime.sx Hashtbl module:
bindings t = _hashtbl_to_list t
keys t = List.map (fun (k, _) -> k) (...)
values t = List.map (fun (_, v) -> v) (...)
to_seq t = bindings t
to_seq_keys / to_seq_values
remove / clear / reset
The keys/values implementations use a 'fun pair -> match pair with
(k, _) -> k' indirection because parse-fun does not currently allow
tuple patterns directly on parameters. Same restriction we worked
around in iteration 98's let-pattern desugaring.
Also: a detour attempting to add top-level 'let (a, b) = expr'
support was started but reverted — parse-decl-let in the outer
ocaml-parse-program scope does not have access to parse-pattern
(which is local to ocaml-parse). Will need a slice + re-parse trick
later.
When 'let' is followed by '(', parse-let now reads a full pattern
(via the existing parse-pattern used by match), expects '=', then
'in', and desugars to:
let PATTERN = EXPR in BODY => match EXPR with PATTERN -> BODY
This reuses the entire pattern-matching machinery, so any pattern
the match parser accepts works here too — paren-tuples, nested
tuples, cons patterns, list patterns. No 'rec' allowed for pattern
bindings (real OCaml's restriction).
let (a, b) = (1, 2) in a + b = 3
let (a, b, c) = (10, 20, 30) in a + b + c = 60
let pair = (5, 7) in
let (x, y) = pair in x * y = 35
Also retroactively cleaned up Printf's iter-97 width-pos packing
hack ('width * 1000000 + spec_pos') — it's now
'let (width, spec_pos) = parse_width_loop after_flags in ...' like
real OCaml.
The Printf walker now parses optional flags + width digits between
'%' and the spec letter:
- left-align (default is right-align)
0 zero-pad (default is space-pad; only honoured when not left-aligned)
Nd... decimal width digits (any number)
After formatting the argument into a base string with the existing
spec dispatch (%d/%i/%u/%s/%f/%c/%b/%x/%X/%o), the result is padded
to the requested width.
Workaround: width and spec_pos are returned packed as
width * 1000000 + spec_pos
because the parser does not yet support tuple destructuring in let
('let (a, b) = expr in body' fails with 'expected ident'). TODO: lift
that limitation; for now the encoding round-trips losslessly for any
practical width.
Printf.sprintf '%5d' 42 = ' 42'
Printf.sprintf '%-5d|' 42 = '42 |'
Printf.sprintf '%05d' 42 = '00042'
Printf.sprintf '%4s' 'hi' = ' hi'
Printf.sprintf 'hi=%-3d, hex=%04x' 9 15 = 'hi=9 , hex=000f'
The previous List.sort was O(n^2) insertion sort. Replaced with a
straightforward mergesort:
split lst -> alternating-take into ([odd], [even])
merge xs ys -> classic two-finger merge under cmp
sort cmp xs -> base cases [], [x]; otherwise split + recursive
sort on each half + merge
Tuple destructuring on the split result is expressed via nested
match — let-tuple-destructuring would be cleaner but works today.
This benefits sort_uniq (which calls sort first), Set.Make.add via
sort etc., and any user program using List.sort. Stable_sort is
already aliased to sort.
Three things in this commit:
1. Integer / is now truncate-toward-zero on ints, IEEE on floats. The
eval-op handler for '/' checks (number? + (= (round x) x)) on both
sides; if both integral, applies host floor/ceil based on sign;
otherwise falls through to host '/'.
2. Fixes Int.rem, which was returning 0 because (a - b * (a / b))
was using float division: 17 - 5 * 3.4 = 0.0. Now Int.rem 17 5 = 2.
3. Int module fleshed out:
max_int / min_int / zero / one / minus_one,
succ / pred / neg, add / sub / mul / div / rem,
equal, compare.
Also adds globals: max_int, min_int, abs_float, float_of_int,
int_of_float (the latter two are identity in our dynamic runtime).
17 / 5 = 3
-17 / 5 = -3 (trunc toward zero)
Int.rem 17 5 = 2
Int.compare 5 3 = 1
Eight new Array functions, all in OCaml syntax inside runtime.sx,
delegating to the corresponding List operation on the cell's
underlying list:
sort cmp a -> a := List.sort cmp !a (* mutates the cell *)
stable_sort = sort
fast_sort = sort
append a b -> ref (List.append !a !b)
sub a pos n -> ref (take n (drop pos !a))
exists p -> List.exists p !a
for_all p -> List.for_all p !a
mem x a -> List.mem x !a
Round-trip:
let a = Array.of_list [3;1;4;1;5;9;2;6] in
Array.sort compare a;
Array.to_list a = [1;1;2;3;4;5;6;9]
Five '+++++.' groups, cumulative accumulator 5+10+15+20+25 = 75.
This is a brainfuck *subset* — only > < + - . (no [ ] looping). That's
intentional: the goal is to stress imperative idioms that the recently
added Array module + array indexing syntax + s.[i] make ergonomic, all
in one program.
Exercises:
Array.make 256 0
arr.(!ptr)
arr.(!ptr) <- arr.(!ptr) + 1
prog.[!pc]
ref / ! / :=
while + nested if/else if/else if for op dispatch
25 baseline programs total.
Counts primes <= 50, expected 15.
Stresses the recently-added Array module + the new array-indexing
syntax together with nested control flow:
let sieve = Array.make (n + 1) true in
sieve.(0) <- false;
sieve.(1) <- false;
for i = 2 to n do
if sieve.(i) then begin
let j = ref (i * i) in
while !j <= n do
sieve.(!j) <- false;
j := !j + i
done
end
done;
...
Exercises: Array.make, arr.(i), arr.(i) <- v, nested for/while,
begin..end blocks, ref/!/:=, integer arithmetic. 24 baseline
programs total.
parse-atom-postfix's '.()' branch now disambiguates between let-open
and array-get based on whether the head is a module path (':con' or
':field' chain rooted in ':con'). Module paths still emit
(:let-open M EXPR); everything else emits (:array-get ARR I).
Eval handles :array-get by reading the cell's underlying list at
index. The '<-' assignment handler now also accepts :array-get lhs
and rewrites the cell with one position changed.
Idiomatic OCaml array code now works:
let a = Array.make 5 0 in
for i = 0 to 4 do a.(i) <- i * i done;
a.(3) + a.(4) = 25
let a = Array.init 4 (fun i -> i + 1) in
a.(0) + a.(1) + a.(2) + a.(3) = 10
List.(length [1;2;3]) = 3 (* unchanged: List is a module *)
Array module (runtime.sx, OCaml syntax):
Backed by a 'ref of list'. make/length/get/init build the cell;
set rewrites the underlying list with one cell changed (O(n) but
works for short arrays in baseline programs). Includes
iter/iteri/map/mapi/fold_left/to_list/of_list/copy/blit/fill.
(op) operator sections (parser.sx, parse-atom):
When the token after '(' is a binop (any op with non-zero
precedence in the binop table) and the next token is ')', emit
(:fun ('a' 'b') (:op OP a b)) — i.e. (+) becomes fun a b -> a + b.
Recognises every binop including 'mod', 'land', '^', '@', '::',
etc.
Lets us write:
List.fold_left (+) 0 [1;2;3;4;5] = 15
let f = ( * ) in f 6 7 = 42
List.map ((-) 10) [1;2;3] = [9;8;7]
let a = Array.make 5 7 in
Array.set a 2 99;
Array.fold_left (+) 0 a = 127
Inline CSV-like text:
a,1,extra
b,2,extra
c,3,extra
d,4,extra
Two-stage String.split_on_char: first on '\n' for rows, then on ','
for fields per row. List.fold_left accumulates int_of_string of the
second field across rows. Result = 1+2+3+4 = 10.
Exercises char escapes inside string literals ('\n'), nested
String.split_on_char, List.fold_left with a non-trivial closure body,
and int_of_string. 23 baseline programs total.
Tokenizer already classified backtick-uppercase as a ctor identical
to a nominal one, but it had never been exercised by the suite. This
commit adds three smoke tests confirming that nullary, n-ary, and
list-of-polyvariant patterns all match:
let x = polyvar(Foo) in match x with polyvar(Foo) -> 1 | polyvar(Bar) -> 2
let x = polyvar(Pair) (5, 7) in
match x with polyvar(Pair) (a, b) -> a + b | _ -> 0
List.map (fun x -> match x with polyvar(On) -> 1 | polyvar(Off) -> 0)
[polyvar(On); polyvar(Off); polyvar(On)]
(In the actual SX, polyvar(X) is the literal backtick-X — backticks
in this commit message are escaped to avoid shell interpretation.)
Since OCaml-on-SX is dynamic, there's no structural row inference,
but matching by tag works.
sort_uniq:
Sort with the user comparator, then walk the sorted list dropping
any element equal to its predecessor. Output is sorted and unique.
List.sort_uniq compare [3;1;2;1;3;2;4] = [1;2;3;4]
find_map:
Walk until the user fn returns Some v; return that. If all None,
return None.
List.find_map (fun x -> if x > 5 then Some (x * 2) else None)
[1;2;3;6;7]
= Some 12
Both defined in OCaml syntax in runtime.sx — no host primitive
needed since they're pure list traversals over existing operations.
Six new String functions, all in OCaml syntax inside runtime.sx:
iter : index-walk with side-effecting f
iteri : iter with index
fold_left : thread accumulator left-to-right
fold_right: thread accumulator right-to-left
to_seq : return a char list (lazy in real OCaml; eager here)
of_seq : concat a char list back to a string
Round-trip:
String.of_seq (List.rev (String.to_seq "hello")) = "olleh"
Note: real OCaml's Seq is lazy. We return a plain list because the
existing stdlib already provides exhaustive list operations and we
don't yet have lazy sequences. If a baseline needs Seq.unfold or
similar, we'll graduate to a proper Seq module then.
frequency.ml exercises the recently-added Hashtbl.iter / fold +
Hashtbl.find_opt + s.[i] indexing + for-loop together: build a
char-count table for 'abracadabra' then take the max via
Hashtbl.fold. Expected = 5 (a x 5). Total 25 baseline programs.
Format module added as a thin alias of Printf — sprintf, printf, and
asprintf all delegate to Printf.sprintf. The dynamic runtime doesn't
distinguish boxes/breaks, so format strings work the same as in
Printf and most Format-using OCaml programs now compile.
Tokenizer already had 'lazy' as a keyword. This commit wires it through:
parser : parse-prefix emits (:lazy EXPR), like the existing 'assert'
handler.
eval : creates a one-element cell with state ('Thunk' expr env).
host : _lazy_force flips the cell to ('Forced' v) on first call
and returns the cached value thereafter.
runtime : module Lazy = struct let force lz = _lazy_force lz end.
Memoisation confirmed by tracking a side-effect counter through two
forces of the same lazy:
let counter = ref 0 in
let lz = lazy (counter := !counter + 1; 42) in
let a = Lazy.force lz in
let b = Lazy.force lz in
(a + b) * 100 + !counter = 8401 (= 84*100 + 1)
New host primitive _hashtbl_to_list returns the entries as a list of
OCaml tuples — ('tuple' k v) form, matching the AST representation
that the pattern-match VM (:ptuple) expects. Without that exact
shape, '(k, v) :: rest' patterns fail to match.
Hashtbl.iter / Hashtbl.fold in runtime walk that list with the user
fn. This closes a long-standing gap: previously Hashtbl was opaque
once values were written (we could only find_opt one key at a time).
let t = Hashtbl.create 4 in
Hashtbl.add t "a" 1; Hashtbl.add t "b" 2; Hashtbl.add t "c" 3;
Hashtbl.fold (fun _ v acc -> acc + v) t 0 = 6
Replaces the stub sprintf in runtime.sx with a real implementation:
walk fmt char-by-char accumulating a prefix; on recognised %X return a
one-arg fn that formats the arg and recurses on the rest of fmt. The
function self-curries to the spec count — there's no separate arity
machinery, just a closure chain.
Specs: %d (int), %s (string), %f (float), %c (char/string in our model),
%b (bool), %% (literal). Unknown specs pass through.
Same expression returns a string (no specs) or a function (>=1 spec) —
OCaml proper would reject this; works fine in OCaml-on-SX's dynamic
runtime.
Also adds top-level aliases:
string_of_int = _string_of_int
string_of_float = _string_of_float
string_of_bool = if b then "true" else "false"
int_of_string = _int_of_string
Printf.sprintf "x=%d" 42 = "x=42"
Printf.sprintf "%s = %d" "answer" 42 = "answer = 42"
Printf.sprintf "%d%%" 50 = "50%"
Tokenizer already classified 'assert' as a keyword; this commit wires
it through:
parser : parse-prefix dispatches like 'not' — advance, recur, wrap
as (:assert EXPR).
eval : evaluate operand; nil on truthy, host-error 'Assert_failure'
on false. Caught cleanly by existing try/with.
assert true; 42 = 42
let x = 5 in assert (x = 5); x + 1 = 6
try (assert false; 0) with _ -> 99 = 99
Recursive Levenshtein edit distance with no memoization (the test
strings are short enough for the exponential-without-memo version to
fit in <2 minutes on contended hosts). Sums distances for five short
pairs:
('abc','abx') + ('ab','ba') + ('abc','axyc') + ('','abcd') + ('ab','')
= 1 + 2 + 2 + 4 + 2 = 11
Exercises:
* curried four-arg recursion
* s.[i] equality test (char comparison)
* min nested twice for the three-way recurrence
* mixed empty-string base cases
Side-quests required to land caesar.ml:
1. Top-level 'let r = expr in body' is now an expression decl, not a
broken decl-let. ocaml-parse-program's dispatch now checks
has-matching-in? at every top-level let; if matched, slices via
skip-let-rhs-boundary (which already opens depth on a leading let
with matching in) and ocaml-parse on the slice, wrapping as :expr.
2. runtime.sx: added String.make / String.init / String.map. Used by
caesar.ml's encode = String.init n (fun i -> shift_char s.[i] k).
3. baseline run.sh per-program timeout 240->480s (system load on the
shared host frequently exceeds 240s for large baselines).
caesar.ml exercises:
* the new top-level let-in expression dispatch
* s.[i] string indexing
* Char.code / Char.chr round-trip math
* String.init with a closure that captures k
Test value: Char.code r.[0] + Char.code r.[4] after ROT13(ROT13('hello')) = 104 + 111 = 215.
parse-atom-postfix now dispatches three cases after consuming '.':
.field -> existing field/module access
.(EXPR) -> existing local-open
.[EXPR] -> new string-get syntax (this commit)
Eval reduces (:string-get S I) to host (nth S I), which already returns
a one-character string for OCaml's char model.
Lets us write idiomatic OCaml string traversal:
let s = "hi" in
let n = ref 0 in
for i = 0 to String.length s - 1 do
n := !n + Char.code s.[i]
done;
!n (* = 209 *)
OCaml kernel changes:
sx_types.ml:
- Add l_call_count : int field to lambda type — counts how many times
a named lambda has been invoked through the VM dispatch path.
- Add module-level refs jit_threshold (default 4), jit_compiled_count,
jit_skipped_count, jit_threshold_skipped_count for stats.
Refs live here (not sx_vm) so sx_primitives can read them without
creating a sx_primitives → sx_vm dependency cycle.
sx_vm.ml:
- In the Lambda case of cek_call_or_suspend, before triggering the JIT,
increment l.l_call_count. Only call jit_compile_ref if count >= the
runtime-tunable threshold. Below threshold, fall through to the
existing cek_call_or_suspend path (interpreter-style).
sx_primitives.ml:
- Register jit-stats — returns dict {threshold, compiled, compile-failed,
below-threshold}.
- Register jit-set-threshold! N — change threshold at runtime.
- Register jit-reset-counters! — zero the stats counters.
bin/run_tests.ml:
- Add l_call_count = 0 to the test-fixture lambda construction.
Effect: lambdas only get JIT-compiled after the 4th invocation. One-shot
lambdas (test harness wrappers, eval-hs throwaways, REPL inputs) never enter
the JIT cache, eliminating the cumulative slowdown that the batched runner
currently works around. Hot paths (component renders, event handlers) cross
the threshold within a handful of calls and get the full JIT speed.
Phase 2 (LRU eviction) and Phase 3 (jit-reset! / jit-clear-cold!) follow.
Verified: 4771 passed, 1111 failed in OCaml run_tests.exe — identical to
baseline before this change. No regressions; tiered logic is correct.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three fixes for Iverson's dfn
{1≥≢⍵:⍵ ⋄ p←⍵⌷⍨?≢⍵ ⋄ (∇⍵⌿⍨⍵<p),(p=⍵)/⍵,∇⍵⌿⍨⍵>p}:
1. parser: standalone op-glyph branch (/ ⌿ \ ⍀) now consumes a
following ⍨ or ¨ and emits :derived-fn — `⍵⌿⍨⍵<p` parses
as compress-commute (was previously dropping ⍨)
2. tokenizer: `name←...` (no spaces) now tokenizes as separate
:name + :assign instead of eating ← into the name. ⎕← still
stays one token for the output op
3. inline p←⍵⌷⍨?≢⍵ mid-dfn now works via existing :assign-expr
Full suite 585/585. Phase 10 complete (all 7 items ticked).
Remaining gaps for a future phase: heterogeneous-strand inner
product is the only unfinished part — life works after dropping ⊃,
quicksort works directly.
Side-quest emerged from adding roman.ml baseline (Roman numeral greedy
encoding): top-level 'let () = expr' was unsupported because
ocaml-parse-program's parse-decl-let consumed an ident strictly. Now
parse-decl-let recognises a leading '()' as a unit binding and
synthesises a __unit_NN name (matching how parse-let already handles
inner-let unit patterns).
roman.ml exercises:
* tuple list literal [(int * string); ...]
* recursive pattern match on tuple-cons
* String.length + List.fold_left
* the new top-level let () support (sanity in a comment, even though
the program ends with a bare expression for the test harness)
Bumped lib/ocaml/test.sh server timeout 180->360s — the recent surge in
test count plus a CPU-contended host was crowding out the sole epoch
reaching the deeper smarts.
Parser hk-parse-parens gains a `::` arm after the first inner expression:
consume `::`, parse a type via the existing hk-parse-type, expect `)`,
emit (:type-ann EXPR TYPE). Sections, tuples, parenthesised expressions
and unit `()` are unchanged.
Desugar drops the annotation — :type-ann E _ → (hk-desugar E) — since
the existing eval path has no type-directed dispatch. Phase 20 will
extend infer.sx to consume the annotation and unify against the
inferred type.
tests/parse-extras.sx (12/12) covers literal, arithmetic, function arg,
string, bool, tuple, nested annotation, function-typed annotation, and
no-regression checks for plain parens / 3-tuples / left+right sections.
eval (66/0), exceptions (14/0), typecheck (15/0), records (14/0), ioref
(13/0) all still clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bytecode + sx_browser.bc.{js,wasm.js} regenerated from sources updated
by the hs-f merge (e8246340). No semantic change — these are build
outputs catching up to their inputs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- apl-partition: new partition where M[i]>M[i-1] (init prev=0);
continue where M[i]≤prev∧M[i]≠0; drop cells where M[i]=0
- Returns apl-vector of apl-vector parts
- pipeline 140/140
Tcl tokenizer treats $::g-name as $::g + literal -name, so the var
lookup fails. Renamed test vars to ::gname / ::nval (no hyphens).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>