fed-sx-m1: Step 3c.a segment rotation — log:open_disk/3, <ActorId>-NNNNNN.log filename, threshold-driven rotation; 10/10 log_rotate tests
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 21s

`next/kernel/log.erl` rewritten around a `seg_lens :: [N0, N1, ...]` per-segment entry-count list + a `seg_size` byte threshold. Filename
scheme moved from `<ActorId>.log` to `<ActorId>-NNNNNN.log` (6-digit zero-padded) so `file:list_dir`'s alphabetical sort coincides
with numeric order.

`open_disk/3(ActorId, BasePath, [{segment_size, N}])` opts a caller into a smaller rotation threshold; `open_disk/2` keeps a 1 GiB
default that effectively never rotates (preserves Step 3b acceptance — log_disk.sh unchanged in behaviour).

Rotation rule in `place_append/4`: if the active segment's pre-append encoded size is already >= threshold AND it holds at least one
entry, the new activity opens a fresh segment; otherwise it extends the current active segment. A single huge entry that exceeds
the threshold stays alone — never rotated recursively.

On reopen, `load_all_segments` lists the dir, filters `<ActorId>-NNNNNN.log`, sorts numerically (insertion sort — `lists:sort/1`
isn't registered in this port, only `lists:append/2`/`lists:reverse/1`/`lists:filter/2`/etc.), reads each via `try_read_segment`,
and concatenates the entries to rebuild flat `entries` + `seg_lens`.

Erlang-port gotchas worked around during this iteration:
(a) String literals like `"foo"` in this port are NOT charlists — `[H|T] = "foo"` badmatches and `length("foo")` errors as "not a
    proper list". `parse_segment_name` builds prefix/suffix from `atom_to_list/1` + explicit `[$-]` / `[$., $l, $o, $g]` cons.
(b) Cross-arg variable repetition (`strip_prefix([C | Rest], [C | PRest])`) was rewritten to explicit `case C =:= P` for robustness.
(c) `Pattern = Binding` syntax in a case clause (`[_|_] = Lst when length(Lst) > 1 -> ...`) errors as "unsupported pattern type
    'match'" — replaced with `Lst when is_list(Lst), length(Lst) > 1`.

Tests:
- new `next/tests/log_rotate.sh` (10 cases): no-opt single-seg-after-3, rotation-fires-on-threshold, rotated-chronological,
  reopen-rebuilds-history, reopen-rebuilds-same-seg-shape, huge-single-entry-stays-1-seg, append-after-huge-keeps-order,
  tip-monotonic-across-rotations.
- `next/tests/log_disk.sh` updated to the new filename (`corrupted-000000.log`); stays 12/12.
- Erlang conformance 761/761 unchanged (log.erl is in next/, not lib/erlang/).

3c.a ticked in plans/fed-sx-milestone-1.md; 3c.b (gen_server-mediated concurrent appends) is the next iteration.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-05 07:40:48 +00:00
parent 595c15a3fb
commit 897449cb35
4 changed files with 408 additions and 88 deletions

View File

@@ -1,102 +1,302 @@
-module(log).
-export([open/2, open_disk/2, append/2, tip/1, replay/3, entries/1]).
-export([open/2, open_disk/2, open_disk/3,
append/2, tip/1, replay/3, entries/1,
segments/1]).
%% Per-actor activity log — the canonical record of everything an
%% actor has emitted, in chronological order. Per design §15.2 this
%% lives on disk as a JSONL segment file; v1 starts with an in-memory
%% backend so the API and seq-number machinery can be locked down
%% before the on-disk format is added (Step 3b).
%% lives on disk as numbered segment files; v1 started with an
%% in-memory backend (Step 3a) so the API + seq-number machinery
%% could be locked down before on-disk persistence (Step 3b) and
%% segment rotation (Step 3c.a — this revision).
%%
%% State shape (a property list):
%% [{actor, ActorId}, {base, BasePath}, {seq, NextSeq}, {entries, [Act|...]}]
%% On-disk layout:
%% <BasePath>/<ActorId>-NNNNNN.log
%%
%% `entries` stores activities in append order — i.e. oldest first.
%% `seq` is the next sequence number that will be assigned by append.
%% `base` is kept on the state for forward-compatibility with 3b
%% (where it becomes the segment-file directory).
%% NNNNNN is a 6-digit zero-padded segment index (000000..999999) so
%% file:list_dir's alphabetical ordering coincides with numeric. Each
%% segment file is the concat of length-prefixed frames; each frame
%% is `<<Len:32/big>>` + `term_codec:encode(Activity)`.
%%
%% open/2 takes ActorId + BasePath and returns {ok, LogState} starting
%% with seq=0 and no entries.
%% In-memory state (a property list):
%% [{actor, ActorId},
%% {base, BasePath}, %% binary | charlist
%% {seq, NextSeq}, %% next seq the log will assign
%% {entries, [Activity, ...]}, %% flat, append order, oldest first
%% {persisted, true|false}, %% does append write through?
%% {seg_size, MaxBytes}, %% rotate when active segment > this
%% {seg_lens, [N0, N1, ...]}] %% entry count per segment in order
%%
%% append/2 returns {ok, NewLogState, AssignedSeq}.
%%
%% tip/1 returns the next seq the log would assign (== count of entries).
%%
%% replay/3 folds Fun(Activity, AssignedSeq, Acc) over every entry in
%% append order. Three-arity rather than two-arity because the plan's
%% example test is "sequence numbers gap-free across replay" — having
%% the seq number visible in the fold makes that test direct.
%%
%% entries/1 is a debug accessor returning [Activity, ...] in append
%% order. Not part of the public API contract.
%% `seg_lens` is the sole bookkeeping needed to compute (a) which
%% segment any given seq lives in, and (b) which slice of `entries`
%% is the active segment's contents to rewrite on append. The last
%% element is the active segment's length.
%% In-memory only — atoms accepted as BasePath for back-compat with
%% Step 3a tests that just want the API surface.
open(ActorId, BasePath) ->
{ok, [{actor, ActorId}, {base, BasePath}, {seq, 0}, {entries, []}]}.
{ok, [{actor, ActorId}, {base, BasePath},
{seq, 0}, {entries, []},
{persisted, false}]}.
append(LogState, Activity) ->
Seq = field(seq, LogState),
Entries = field(entries, LogState),
NewEntries = Entries ++ [Activity],
NewState = replace_field(seq, Seq + 1,
replace_field(entries, NewEntries, LogState)),
case persisted_path(LogState) of
{persisted, Path} ->
ok = write_segment(Path, NewEntries),
{ok, NewState, Seq};
not_persisted ->
{ok, NewState, Seq}
end.
%% open_disk/2 — disk-backed variant of open. Reads any existing
%% segment file under BasePath, replays entries into memory state,
%% and tags the state {persisted, true} so future append/2 calls
%% write through. BasePath must be a binary or charlist (real path),
%% not an atom — the in-memory open/2 still accepts atoms for tests.
%%
%% Segment format (per frame): 4-byte big-endian length + that many
%% bytes of term_codec:encode(Activity). Whole file is the concat of
%% all frames in append order; no header.
%%
%% Returns {ok, LogState} on success, {error, {corrupt, Reason}} if
%% the segment is truncated/garbled, {error, {read, Reason}} on other
%% file errors. Missing file is treated as an empty fresh log.
%% Disk-backed; default segment size = effectively unlimited (no
%% rotation). Use open_disk/3 with {segment_size, N} to enable.
open_disk(ActorId, BasePath) ->
Path = segment_path(ActorId, BasePath),
case try_read_segment(Path) of
{ok, Entries} ->
open_disk(ActorId, BasePath, [{segment_size, 1073741824}]). %% 1 GiB
open_disk(ActorId, BasePath, Opts) ->
SegSize = proplist_get(segment_size, Opts, 1073741824),
case load_all_segments(ActorId, BasePath) of
{ok, SegEntries} ->
%% SegEntries :: [[Entry, ...]] in segment-index order
%% (empty list when no segments exist on disk).
Lens0 = [length(S) || S <- SegEntries],
%% Always have at least one active segment, even if empty.
Lens = case Lens0 of
[] -> [0];
_ -> Lens0
end,
Flat = flatten_segs(SegEntries),
State = [{actor, ActorId}, {base, BasePath},
{seq, length(Entries)},
{entries, Entries},
{seq, length(Flat)},
{entries, Flat},
{persisted, true},
{path, Path}],
{seg_size, SegSize},
{seg_lens, Lens}],
{ok, State};
{error, _} = E ->
E
end.
persisted_path(LogState) ->
append(LogState, Activity) ->
Seq = field(seq, LogState),
Entries = field(entries, LogState),
case lookup(persisted, LogState) of
true ->
case lookup(path, LogState) of
undefined -> not_persisted;
P -> {persisted, P}
end;
_ -> not_persisted
SegLens = field(seg_lens, LogState),
SegSize = field(seg_size, LogState),
{NewSegLens, ActiveIdx, ActiveEntries} =
place_append(Entries, Activity, SegLens, SegSize),
Path = segment_path(field(actor, LogState),
field(base, LogState),
ActiveIdx),
ok = write_segment(Path, ActiveEntries),
NewState = replace_field(seq, Seq + 1,
replace_field(entries, Entries ++ [Activity],
replace_field(seg_lens, NewSegLens, LogState))),
{ok, NewState, Seq};
_ ->
NewState = replace_field(seq, Seq + 1,
replace_field(entries, Entries ++ [Activity],
LogState)),
{ok, NewState, Seq}
end.
%% segment_path/2 — returns the segment file path as a charlist (list
%% of int char codes). BasePath may be a binary OR a charlist; we
%% normalize to charlist via binary_to_list so the result is purely
%% cons-based — this works around an iolist-walker quirk in
%% er-source-to-string that surfaces when list_to_binary nests binaries
%% built from charlists. file:read_file accepts charlists fine.
segment_path(ActorId, BasePath) ->
tip(LogState) ->
field(seq, LogState).
replay(LogState, InitAcc, Fun) ->
Entries = field(entries, LogState),
replay_loop(Entries, 0, InitAcc, Fun).
entries(LogState) ->
field(entries, LogState).
%% Debug accessor: returns the in-memory seg_lens (count per segment
%% in index order). Used by rotation tests to assert that rotation
%% happened.
segments(LogState) ->
case lookup(seg_lens, LogState) of
undefined -> [];
L -> L
end.
%% --- internals ---
replay_loop([], _, Acc, _) -> Acc;
replay_loop([Act | Rest], Seq, Acc, Fun) ->
replay_loop(Rest, Seq + 1, Fun(Act, Seq, Acc), Fun).
%% place_append/4 decides whether the new Activity extends the current
%% active segment or opens a fresh one, returning the resulting
%% seg_lens, the active segment's index, and the active segment's
%% complete entry list (the slice that needs to be (re)written to
%% disk).
%%
%% Rotation rule: if the active segment already on disk is at or past
%% the size threshold (encoded_size(OldActive) >= SegSize) AND it
%% already holds at least one entry, the new Activity opens a new
%% segment. A single entry larger than the threshold therefore lives
%% on its own — we never recurse rotating a one-entry segment.
%%
%% This is decided BEFORE the append (looking at the pre-append size),
%% so each segment file is written exactly once per append cycle.
place_append(OldEntries, Activity, SegLens, SegSize) ->
{Pre, Last} = split_last(SegLens),
PreCount = sum(Pre),
OldActive = drop(PreCount, OldEntries),
OldActiveSize = encoded_size(OldActive),
case (OldActiveSize >= SegSize) andalso (Last >= 1) of
true ->
%% Rotate: new entry starts a brand-new segment.
NewSegLens = SegLens ++ [1],
NewActiveIdx = length(SegLens),
{NewSegLens, NewActiveIdx, [Activity]};
false ->
%% Stay: extend current active.
NewSegLens = Pre ++ [Last + 1],
NewActiveIdx = length(Pre),
{NewSegLens, NewActiveIdx, OldActive ++ [Activity]}
end.
split_last([X]) -> {[], X};
split_last([H | T]) ->
{Tl, Last} = split_last(T),
{[H | Tl], Last}.
sum(L) -> sum_(L, 0).
sum_([], A) -> A;
sum_([H | T], A) -> sum_(T, A + H).
drop(0, L) -> L;
drop(_, []) -> [];
drop(N, [_ | T]) -> drop(N - 1, T).
%% flatten_segs/1 — concat a list of segments (each itself a list of
%% entries) into a single flat list, preserving order. Used by
%% open_disk to assemble the on-disk activity history from per-
%% segment loads. Implemented locally because lists:append/1 isn't
%% registered in this port — only lists:append/2.
flatten_segs([]) -> [];
flatten_segs([Seg | Rest]) -> Seg ++ flatten_segs(Rest).
encoded_size(Entries) ->
byte_size(list_to_binary(
[frame(term_codec:encode(E)) || E <- Entries])).
%% Try to read every segment file under BasePath matching the actor.
%% Returns {ok, [[Entry, ...]]} where the outer list is in segment-
%% index order. Empty when no segments exist.
load_all_segments(ActorId, BasePath) ->
%% list_dir returns {ok, [Binary]} of entry names in sorted order
%% per fed-prims contract.
BaseChars = base_chars(BasePath),
case file:list_dir(BaseChars) of
{ok, Names} ->
%% Erlang string literals are NOT charlists in this port,
%% so build prefix/suffix as explicit char-code lists.
Prefix = atom_to_list(ActorId) ++ [$-],
Suffix = [$., $l, $o, $g],
Indices = collect_segment_indices(Names, Prefix, Suffix),
read_segments_in_order(Indices, ActorId, BasePath, []);
{error, enoent} ->
{ok, []};
{error, R} ->
{error, {read, R}}
end.
collect_segment_indices([], _, _) -> [];
collect_segment_indices([Name | Rest], Prefix, Suffix) ->
case parse_segment_name(Name, Prefix, Suffix) of
{ok, N} ->
[N | collect_segment_indices(Rest, Prefix, Suffix)];
not_ours ->
collect_segment_indices(Rest, Prefix, Suffix)
end.
parse_segment_name(NameBin, Prefix, Suffix) when is_binary(NameBin) ->
parse_segment_name(binary_to_list(NameBin), Prefix, Suffix);
parse_segment_name(Name, Prefix, Suffix) ->
case strip_prefix(Name, Prefix) of
{ok, Rest} ->
case strip_suffix(Rest, Suffix) of
{ok, NumStr} ->
case is_all_digits(NumStr) of
true -> {ok, list_to_integer(NumStr)};
false -> not_ours
end;
not_ours -> not_ours
end;
not_ours -> not_ours
end.
strip_prefix(Str, []) -> {ok, Str};
strip_prefix([C | Rest], [P | PRest]) ->
case C =:= P of
true -> strip_prefix(Rest, PRest);
false -> not_ours
end;
strip_prefix(_, _) -> not_ours.
strip_suffix(Str, Suffix) ->
SL = length(Str),
XL = length(Suffix),
case SL >= XL of
true ->
Head = take_n_pl(SL - XL, Str),
Tail = drop(SL - XL, Str),
case Tail =:= Suffix of
true -> {ok, Head};
false -> not_ours
end;
false -> not_ours
end.
take_n_pl(0, _) -> [];
take_n_pl(_, []) -> [];
take_n_pl(N, [H | T]) -> [H | take_n_pl(N - 1, T)].
is_all_digits([]) -> false;
is_all_digits(Chars) -> all_digits(Chars).
all_digits([]) -> true;
all_digits([C | Rest]) when C >= $0, C =< $9 -> all_digits(Rest);
all_digits(_) -> false.
%% read_segments_in_order/4 — fed-prims sorts list_dir alphabetically;
%% with 6-digit zero-padded names that coincides with numeric order.
%% But we also accept legacy unpadded names, so sort by index to be
%% defensive.
read_segments_in_order(Indices, ActorId, BasePath, Acc) ->
Sorted = isort(Indices),
read_each(Sorted, ActorId, BasePath, Acc).
read_each([], _, _, Acc) ->
{ok, lists:reverse(Acc)};
read_each([Idx | Rest], ActorId, BasePath, Acc) ->
Path = segment_path(ActorId, BasePath, Idx),
case try_read_segment(Path) of
{ok, Entries} ->
read_each(Rest, ActorId, BasePath, [Entries | Acc]);
{error, _} = E -> E
end.
%% Tiny insertion sort over a small list of integers.
isort([]) -> [];
isort([H | T]) -> insert(H, isort(T)).
insert(X, []) -> [X];
insert(X, [Y | Rest]) when X =< Y -> [X, Y | Rest];
insert(X, [Y | Rest]) -> [Y | insert(X, Rest)].
%% segment_path/3 — charlist path to the Idx'th segment file.
segment_path(ActorId, BasePath, Idx) ->
base_chars(BasePath) ++ [$/] ++ atom_to_list(ActorId)
++ [$., $l, $o, $g].
++ [$-] ++ pad_int(Idx, 6) ++ [$., $l, $o, $g].
base_chars(B) when is_binary(B) -> binary_to_list(B);
base_chars(L) when is_list(L) -> L.
%% Zero-pad an integer to Width digits as a charlist.
pad_int(N, Width) ->
Cs = integer_to_list(N),
pad_left(Cs, Width).
pad_left(Cs, Width) ->
case length(Cs) >= Width of
true -> Cs;
false -> pad_left([$0 | Cs], Width)
end.
write_segment(Path, Entries) ->
Frames = [frame(term_codec:encode(E)) || E <- Entries],
file:write_file(Path, list_to_binary(Frames)).
@@ -143,26 +343,12 @@ take_n(N, [H | T]) ->
take_n(_, []) ->
throw(truncated_body).
tip(LogState) ->
field(seq, LogState).
replay(LogState, InitAcc, Fun) ->
Entries = field(entries, LogState),
replay_loop(Entries, 0, InitAcc, Fun).
replay_loop([], _, Acc, _) -> Acc;
replay_loop([Act | Rest], Seq, Acc, Fun) ->
replay_loop(Rest, Seq + 1, Fun(Act, Seq, Acc), Fun).
entries(LogState) ->
field(entries, LogState).
%% --- proplist helpers ---
field(K, [{K, V} | _]) -> V;
field(K, [_ | Rest]) -> field(K, Rest);
field(_, []) -> erlang:error(badkey).
%% lookup/2 — like field but returns `undefined` for missing key
%% (used by persisted_path/1 which probes optional state fields).
lookup(K, [{K, V} | _]) -> V;
lookup(K, [_ | Rest]) -> lookup(K, Rest);
lookup(_, []) -> undefined.
@@ -170,3 +356,7 @@ lookup(_, []) -> undefined.
replace_field(K, V, []) -> [{K, V}];
replace_field(K, V, [{K, _} | Rest]) -> [{K, V} | Rest];
replace_field(K, V, [P | Rest]) -> [P | replace_field(K, V, Rest)].
proplist_get(K, [{K, V} | _], _) -> V;
proplist_get(K, [_ | Rest], Default) -> proplist_get(K, Rest, Default);
proplist_get(_, [], Default) -> Default.

View File

@@ -23,8 +23,10 @@ rm -rf "$DISK_BASE"
mkdir -p "$DISK_BASE"
# Pre-write a corrupted segment file for the corrupt-detect test
# (just a truncated 4-byte length header with no payload).
printf '\x00\x00\x00\x05XX' > "$DISK_BASE/corrupted.log"
# (just a truncated 4-byte length header with no payload). Segment
# filenames are <ActorId>-NNNNNN.log (6-digit zero-padded index) as
# of Step 3c.a.
printf '\x00\x00\x00\x05XX' > "$DISK_BASE/corrupted-000000.log"
VERBOSE="${1:-}"
PASS=0; FAIL=0; ERRORS=""

125
next/tests/log_rotate.sh Executable file
View File

@@ -0,0 +1,125 @@
#!/usr/bin/env bash
# next/tests/log_rotate.sh — Step 3c.a segment rotation acceptance.
#
# Exercises log:open_disk/3 with {segment_size, N} opt-in, append/2
# rotation behaviour at the threshold, replay across segments, and
# reopen-after-rotation. Builds on the Step 3b on-disk substrate
# (term_codec.erl + log.erl framed-segment writer).
set -uo pipefail
cd "$(git rev-parse --show-toplevel)"
SX_SERVER="${SX_SERVER:-hosts/ocaml/_build/default/bin/sx_server.exe}"
if [ ! -x "$SX_SERVER" ]; then
SX_SERVER="/root/rose-ash/hosts/ocaml/_build/default/bin/sx_server.exe"
fi
if [ ! -x "$SX_SERVER" ]; then
echo "ERROR: sx_server.exe not found." >&2
exit 1
fi
DISK_BASE=/tmp/fed_sx_m1_log_rotate
rm -rf "$DISK_BASE"
mkdir -p "$DISK_BASE"
VERBOSE="${1:-}"
PASS=0; FAIL=0; ERRORS=""
TMPFILE=$(mktemp); trap "rm -f $TMPFILE; rm -rf $DISK_BASE" EXIT
cat > "$TMPFILE" <<'EPOCHS'
(epoch 1)
(load "lib/erlang/tokenizer.sx")
(load "lib/erlang/parser.sx")
(load "lib/erlang/parser-core.sx")
(load "lib/erlang/parser-expr.sx")
(load "lib/erlang/parser-module.sx")
(load "lib/erlang/transpile.sx")
(load "lib/erlang/runtime.sx")
(load "lib/erlang/vm/dispatcher.sx")
(epoch 2)
(eval "(get (erlang-load-module (file-read \"next/kernel/term_codec.erl\")) :name)")
(epoch 3)
(eval "(get (erlang-load-module (file-read \"next/kernel/log.erl\")) :name)")
;; Base path /tmp/fed_sx_m1_log_rotate built byte-by-byte.
;; --- default open_disk/2 = no rotation: many appends still single seg ---
(epoch 10)
(eval "(get (erlang-eval-ast \"Base = list_to_binary([$/, $t, $m, $p, $/, $f, $e, $d, $_, $s, $x, $_, $m, $1, $_, $l, $o, $g, $_, $r, $o, $t, $a, $t, $e]), {ok, L0} = log:open_disk(noopt, Base), {ok, L1, _} = log:append(L0, a), {ok, L2, _} = log:append(L1, b), {ok, L3, _} = log:append(L2, c), log:segments(L3) =:= [3]\") :name)")
;; --- small threshold rotates: 5 short entries -> multiple segs ---
;; Each encoded entry like 'msg' is ~6 bytes + 4-byte length header = 10 bytes.
;; Threshold 16 bytes means seg rotates after every 2 entries.
(epoch 20)
(eval "(get (erlang-eval-ast \"Base = list_to_binary([$/, $t, $m, $p, $/, $f, $e, $d, $_, $s, $x, $_, $m, $1, $_, $l, $o, $g, $_, $r, $o, $t, $a, $t, $e]), {ok, L0} = log:open_disk(small, Base, [{segment_size, 16}]), {ok, L1, _} = log:append(L0, aa), {ok, L2, _} = log:append(L1, bb), {ok, L3, _} = log:append(L2, cc), {ok, L4, _} = log:append(L3, dd), {ok, L5, _} = log:append(L4, ee), case log:segments(L5) of Lst when is_list(Lst), length(Lst) > 1 -> rotated; _ -> singleseg end\") :name)")
;; --- rotated entries replay in chronological order ---
(epoch 21)
(eval "(get (erlang-eval-ast \"Base = list_to_binary([$/, $t, $m, $p, $/, $f, $e, $d, $_, $s, $x, $_, $m, $1, $_, $l, $o, $g, $_, $r, $o, $t, $a, $t, $e]), {ok, L0} = log:open_disk(replay, Base, [{segment_size, 16}]), {ok, L1, _} = log:append(L0, aa), {ok, L2, _} = log:append(L1, bb), {ok, L3, _} = log:append(L2, cc), {ok, L4, _} = log:append(L3, dd), {ok, L5, _} = log:append(L4, ee), log:entries(L5) =:= [aa, bb, cc, dd, ee]\") :name)")
;; --- reopen after rotation: history is reassembled in order ---
(epoch 22)
(eval "(get (erlang-eval-ast \"Base = list_to_binary([$/, $t, $m, $p, $/, $f, $e, $d, $_, $s, $x, $_, $m, $1, $_, $l, $o, $g, $_, $r, $o, $t, $a, $t, $e]), {ok, L0} = log:open_disk(reopen, Base, [{segment_size, 16}]), {ok, L1, _} = log:append(L0, aa), {ok, L2, _} = log:append(L1, bb), {ok, L3, _} = log:append(L2, cc), {ok, L4, _} = log:append(L3, dd), {ok, L5, _} = log:append(L4, ee), {ok, R} = log:open_disk(reopen, Base, [{segment_size, 16}]), {log:entries(R), log:tip(R)} =:= {[aa, bb, cc, dd, ee], 5}\") :name)")
;; --- segments after reopen match (same shape rebuilt from disk) ---
(epoch 23)
(eval "(get (erlang-eval-ast \"Base = list_to_binary([$/, $t, $m, $p, $/, $f, $e, $d, $_, $s, $x, $_, $m, $1, $_, $l, $o, $g, $_, $r, $o, $t, $a, $t, $e]), {ok, L0} = log:open_disk(shape, Base, [{segment_size, 16}]), {ok, L1, _} = log:append(L0, aa), {ok, L2, _} = log:append(L1, bb), {ok, L3, _} = log:append(L2, cc), {ok, L4, _} = log:append(L3, dd), {ok, L5, _} = log:append(L4, ee), {ok, R} = log:open_disk(shape, Base, [{segment_size, 16}]), log:segments(R) =:= log:segments(L5)\") :name)")
;; --- single huge entry > threshold: still one segment, no infinite loop ---
(epoch 30)
(eval "(get (erlang-eval-ast \"Base = list_to_binary([$/, $t, $m, $p, $/, $f, $e, $d, $_, $s, $x, $_, $m, $1, $_, $l, $o, $g, $_, $r, $o, $t, $a, $t, $e]), {ok, L0} = log:open_disk(huge, Base, [{segment_size, 4}]), Big = <<0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15>>, {ok, L1, _} = log:append(L0, Big), log:segments(L1) =:= [1]\") :name)")
;; --- append after huge first entry forces rotation on next entry ---
(epoch 31)
(eval "(get (erlang-eval-ast \"Base = list_to_binary([$/, $t, $m, $p, $/, $f, $e, $d, $_, $s, $x, $_, $m, $1, $_, $l, $o, $g, $_, $r, $o, $t, $a, $t, $e]), {ok, L0} = log:open_disk(post, Base, [{segment_size, 4}]), Big = <<0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15>>, {ok, L1, _} = log:append(L0, Big), {ok, L2, _} = log:append(L1, small), log:entries(L2) =:= [Big, small]\") :name)")
;; --- tip increments monotonically across rotations ---
(epoch 40)
(eval "(get (erlang-eval-ast \"Base = list_to_binary([$/, $t, $m, $p, $/, $f, $e, $d, $_, $s, $x, $_, $m, $1, $_, $l, $o, $g, $_, $r, $o, $t, $a, $t, $e]), {ok, L0} = log:open_disk(tipcheck, Base, [{segment_size, 16}]), {ok, L1, _} = log:append(L0, x1), {ok, L2, _} = log:append(L1, x2), {ok, L3, _} = log:append(L2, x3), {ok, L4, _} = log:append(L3, x4), log:tip(L4) =:= 4\") :name)")
EPOCHS
OUTPUT=$(timeout 90 "$SX_SERVER" < "$TMPFILE" 2>/dev/null)
check() {
local epoch="$1" desc="$2" expected="$3"
local actual
actual=$(echo "$OUTPUT" | grep -A1 "^(ok-len $epoch " | tail -1 || true)
if echo "$actual" | grep -q "^(ok-len"; then actual=""; fi
if [ -z "$actual" ]; then
actual=$(echo "$OUTPUT" | grep "^(ok $epoch " | head -1 || true)
fi
if [ -z "$actual" ]; then
actual=$(echo "$OUTPUT" | grep "^(error $epoch " | head -1 || true)
fi
[ -z "$actual" ] && actual="<no output for epoch $epoch>"
if echo "$actual" | grep -qF -- "$expected"; then
PASS=$((PASS+1))
[ "$VERBOSE" = "-v" ] && echo " ok $desc"
else
FAIL=$((FAIL+1))
ERRORS+=" FAIL [$desc] (epoch $epoch) expected: $expected | actual: $actual
"
fi
}
check 2 "term_codec loads" "term_codec"
check 3 "log module loads" "log"
check 10 "no-opt = single seg after 3" "true"
check 20 "rotation fires on threshold" "rotated"
check 21 "rotated entries chronological" "true"
check 22 "reopen rebuilds history" "true"
check 23 "reopen rebuilds same seg shape" "true"
check 30 "huge single entry stays 1 seg" "true"
check 31 "append after huge keeps order" "true"
check 40 "tip monotonic across rotations" "true"
TOTAL=$((PASS+FAIL))
if [ $FAIL -eq 0 ]; then
echo "ok $PASS/$TOTAL log_rotate tests passed"
else
echo "FAIL $PASS/$TOTAL passed, $FAIL failed:"
echo "$ERRORS"
fi
[ $FAIL -eq 0 ]

View File

@@ -199,6 +199,8 @@ verify_signature(Activity, ActorState) ->
- [x] **3a**`log:open/2` + `log:append/2` + `log:tip/1` + `log:replay/3` + `log:entries/1` over an in-memory log state (per-actor seq; replay in append order; round-trip the stored activity). `next/tests/log_memory.sh` (12 cases).
- [x] **3b** — Term codec + on-disk persistence. Codec: `next/kernel/term_codec.erl` `encode/1` + `decode/1` over netstring framing (`a/i/b/t/l` + length + body; binary bodies byte-clean — NUL/LF allowed). On-disk: `log:open_disk/2(ActorId, BasePath)` reads any existing segment file (charlist path = `BasePath ++ "/" ++ atom_to_list(ActorId) ++ ".log"`); `append/2` is polymorphic on a `{persisted, true}` state field and writes through. Frame format on disk: 4-byte big-endian length prefix + `term_codec:encode(Activity)`. `try_read_segment` catches throw/error and surfaces `{error, {corrupt, Reason}}`. 18 codec round-trips + 12 disk acceptance tests (`next/tests/term_codec.sh`, `next/tests/log_disk.sh`); 3a in-memory `open/2` semantics unchanged. `encode/1`/`decode/1` for atoms, integers, binaries, tuples, lists, nesting; netstring-ish framing (`a/i/b/t/l` tag + length + body); byte-clean (binary bodies may contain NUL/LF). 18 round-trip + streaming + bad-form tests in `next/tests/term_codec.sh`. On-disk segment writer (open/2 reads existing, append/2 writes-through, replay/3 reads from disk) is the next sub-step — codec is the load-bearing piece.
- [ ] **3c** — Segment rotation at size threshold + gen_server-mediated concurrent appends.
- [x] **3c.a** — Segment rotation. `log:open_disk/3(ActorId, BasePath, [{segment_size, N}])` opts in with a byte threshold; default `open_disk/2` keeps a 1 GiB threshold (effectively no rotation). Filename scheme moved to `<ActorId>-NNNNNN.log` (6-digit zero-padded index) so `file:list_dir`'s alphabetical sort matches numeric order. `append/2` checks `encoded_size(active)` BEFORE the append: if already ≥ threshold AND active has at least one entry, the new activity opens a fresh segment; otherwise it extends current active. Single huge entries stay alone (no recursive rotation). On reopen, every matching `<ActorId>-*.log` file is read, decoded, and concatenated in numeric order to rebuild flat entries + `seg_lens`. `next/tests/log_rotate.sh` 10/10 (no-opt single-seg, threshold-rotates, chronological after rotation, reopen rebuilds shape, huge-entry-alone, post-huge keeps order, tip monotonic) + `log_disk.sh` updated to the new filename and stays 12/12. Erlang conformance 761/761.
- [ ] **3c.b** — gen_server-mediated concurrent appends.
**Blockers (Step 3b) — byte-level path resolved 2026-06-04:** `binary_to_list/1` and `list_to_binary/1` are now registered Erlang BIFs in `lib/erlang/runtime.sx` (Step 3b substrate fix, +9 ffi tests, 738/738 conformance). `list_to_binary` is iolist-aware: accepts nested cons of integer bytes (0-255) and/or binaries; `binary_to_list` returns a proper Erlang charlist of integers. Round-trip verified: `list_to_binary(binary_to_list(B)) =:= B`. On-disk segment writer (3b) can now build segment bytes from `[Header, IoListPayload]` and reconstruct on read — option (c) of the original workaround menu is now cheap. `$X` char literals now decode correctly **as of 2026-06-04**: the Erlang tokenizer's `(= ch "$")` branch (`lib/erlang/tokenizer.sx`) now emits the decimal char code as the token value instead of the raw `$X` text (which `parse-number` couldn't decode → nil). Plain chars use `char->integer` of the first char; the standard escape table (`\n=10 \t=9 \r=13 \s=32 \b=8 \e=27 \f=12 \v=11 \d=127 \0=0 \\=92 \"=34 \'=39`) handles `$\X` forms. So `[$h, $i | T]` patterns and `list_to_binary([$f,$e,$d])` both work end-to-end. +12 eval tests, 750/750. Combined with 3b's `binary_to_list`/`list_to_binary`, Erlang code can now read/write byte sequences and string-shaped char lists fluently. **All three substrate gaps resolved as of 2026-06-05.** `atom_to_list/1` and `integer_to_list/1` now return Erlang charlists (cons of int char codes — standard Erlang semantics) via a new `er-string->charlist` helper in `transpile.sx`. `list_to_atom/1` and `list_to_integer/1` accept either charlists OR SX strings (back-compat via the existing `er-source-to-string` coercer). Composition works end-to-end: `list_to_binary(atom_to_list(hello)) =:= <<104,101,108,108,111>>` and `integer_to_list(N)` round-trips through `list_to_integer`. 5 existing eval tests rewritten to charlist semantics, 8 new charlist-aware tests added (759/759). The full term-codec primitive set — `binary_to_list`, `list_to_binary`, `$X`, `atom_to_list`, `integer_to_list` charlist semantics, plus existing `file:read_file`/`write_file`/`list_dir` — is now in place.
@@ -1003,6 +1005,7 @@ A few things still under-specified; resolve as work begins.
Newest first. One line per sub-deliverable commit. Erlang conformance gate
(`bash lib/erlang/conformance.sh`) must remain 729/729 on every entry.
- **2026-06-05** — Step 3c.a segment rotation: `next/kernel/log.erl` rewritten around a `seg_lens :: [N0, N1, ...]` bookkeeping list (one entry-count per segment in numeric order, last is active) + `seg_size` threshold. Filename scheme now `<ActorId>-NNNNNN.log` (6-digit zero-padded so `file:list_dir`'s alphabetical sort = numeric). `open_disk/3(ActorId, BasePath, [{segment_size, N}])` opts a caller into a smaller rotation threshold; `open_disk/2` keeps a 1 GiB default that effectively never rotates (preserves Step 3b acceptance). Rotation rule (`place_append/4`): if the active segment's pre-append serialized size already ≥ threshold AND it holds at least one entry, the new activity opens a fresh segment — otherwise it extends current active. Single huge entry > threshold stays alone (no recursive rotation, no loop). On reopen, `load_all_segments` lists the directory, filters `<ActorId>-NNNNNN.log`, sorts numerically (insertion sort, since `lists:sort/1` isn't registered in this port — only `lists:append/2`/`lists:reverse/1`/`lists:filter/2` etc.), reads each via `try_read_segment`, and concatenates to rebuild flat `entries` + `seg_lens`. **Erlang-port gotchas hit & worked around:** (a) Erlang string literals like `"foo"` in this port are NOT charlists — `[H|T] = "foo"` badmatches, `length("foo")` errors as "not a proper list". `parse_segment_name` had to build prefix/suffix from `atom_to_list/1` + explicit `[$-]` / `[$., $l, $o, $g]` cons. (b) Cross-arg variable repetition (`strip_prefix([C | Rest], [C | PRest])`) works in tuple patterns but I rewrote it to explicit `case C =:= P of true -> ... false -> ...` for robustness. (c) `Pattern = Binding` syntax in a case clause (`[_|_] = Lst when length(Lst) > 1 -> ...`) errors "unsupported pattern type 'match'" — used `Lst when is_list(Lst), length(Lst) > 1` instead. New `next/tests/log_rotate.sh` 10/10: no-opt single-seg-after-3, rotation-fires-on-threshold, rotated-chronological, reopen-rebuilds-history, reopen-rebuilds-same-seg-shape, huge-single-entry-stays-1-seg, append-after-huge-keeps-order, tip-monotonic-across-rotations. Existing `next/tests/log_disk.sh` updated to the new filename (`corrupted-000000.log`) and stays 12/12. Erlang conformance **761/761** unchanged (log.erl is in next/, not lib/erlang/). Step 3c.a ticked; 3c.b (gen_server-mediated concurrent appends) is the next iteration.
- **2026-06-05** — Step 3b on-disk log: `next/kernel/log.erl` gains `open_disk/2(ActorId, BasePath)` and a write-through `append/2`. New state field `{persisted, true} | {path, CharList}` keys the polymorphism — 3a's in-memory `open/2` stays untouched and tests unchanged. `segment_path/2` builds the path as a charlist (`base_chars(BasePath) ++ "/" ++ atom_to_list(ActorId) ++ ".log"`) so it works whether the caller passes a binary or charlist BasePath; everything flows through `er-source-to-string` cleanly. On-disk frame format: 4-byte big-endian length prefix + `term_codec:encode(Activity)`. Restart path: `try_read_segment` reads the whole segment, length-decodes each frame, decodes via `term_codec`, returns `{ok, Entries}`; missing file → `{ok, []}`; throw/error during decode → `{error, {corrupt, _}}`. `next/tests/log_disk.sh` 12/12: open-missing-fresh, append+reopen-entries-match, tip-resumes, replay-chronological, mixed-types (atom/int/binary/tuple/list) round-trip, append-after-reopen, corrupted-segment, per-actor isolation, 3a back-compat. Erlang conformance **761/761** unchanged (log.erl is in next/, not lib/erlang/). Step 3b is now FULLY ticked; 3c (segment rotation + gen_server-mediated concurrent appends) remains for the next iteration.
- **2026-06-05** — Step 3b substrate fix #4: integer-literal eval now produces real ints (was floats). `transpile.sx`'s `(= ty "integer") (parse-number ...)` path returns `float_of_string` per host's `parse-number`, so `42`, `$X`, etc. were floats that `(integer? v)` returned true for but `(integer->char v)` rejected. Wrapped in `truncate` so all integer literals coerce to strict int; added nil-guard with a descriptive error. Discovered while debugging Step 3b on-disk log (file:read_file on a charlist path failed at the inner `(map integer->char ...)` because charlist elements were floats). Conformance **761/761** (eval 406→408, +2 net; no other suites changed). Unblocks any path that does `integer->char` on int-literal-derived values — most notably `file:read_file` / `file:write_file` on charlist paths and binaries built from `$X` literals.
- **2026-06-05** — Step 3b codec landed: `next/kernel/term_codec.erl` with `encode/1` + `decode/1` over a netstring-ish wire format (`a` atom / `i` int / `b` binary / `t` tuple / `l` list, each as `tag + decimal-length + ":" + body`; nil = `l0:`). Byte-clean — binary bodies may contain NUL, LF, or any byte; encoding stays parseable. Built end-to-end on the three substrate fixes (binary_to_list/list_to_binary + $X + atom_to_list/integer_to_list charlists). `decode/1` returns `{ok, Term, RestBinary}` so callers can stream multiple frames from one buffer. 18 acceptance tests in `next/tests/term_codec.sh`: encode bytes for every leaf type, round-trip for each, nested activity-shaped term (`{create, [{id,1},{actor,alice},{payload,<<104,105>>}]}`), 2-frame streaming, binary with embedded NUL+LF, bad-form returns `{error, badform}` not crash. Erlang conformance **759/759** unchanged (codec is in `next/`, not lib/erlang/). Step 3b on-disk segment writer (the second half — open/append/replay reading/writing the actual segment file) is the natural next iteration: encode each activity with `term_codec`, frame with a 4-byte big-endian length prefix, append to disk.