fed-sx-m1: Step 3c.a segment rotation — log:open_disk/3, <ActorId>-NNNNNN.log filename, threshold-driven rotation; 10/10 log_rotate tests
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 21s
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 21s
`next/kernel/log.erl` rewritten around a `seg_lens :: [N0, N1, ...]` per-segment entry-count list + a `seg_size` byte threshold. Filename
scheme moved from `<ActorId>.log` to `<ActorId>-NNNNNN.log` (6-digit zero-padded) so `file:list_dir`'s alphabetical sort coincides
with numeric order.
`open_disk/3(ActorId, BasePath, [{segment_size, N}])` opts a caller into a smaller rotation threshold; `open_disk/2` keeps a 1 GiB
default that effectively never rotates (preserves Step 3b acceptance — log_disk.sh unchanged in behaviour).
Rotation rule in `place_append/4`: if the active segment's pre-append encoded size is already >= threshold AND it holds at least one
entry, the new activity opens a fresh segment; otherwise it extends the current active segment. A single huge entry that exceeds
the threshold stays alone — never rotated recursively.
On reopen, `load_all_segments` lists the dir, filters `<ActorId>-NNNNNN.log`, sorts numerically (insertion sort — `lists:sort/1`
isn't registered in this port, only `lists:append/2`/`lists:reverse/1`/`lists:filter/2`/etc.), reads each via `try_read_segment`,
and concatenates the entries to rebuild flat `entries` + `seg_lens`.
Erlang-port gotchas worked around during this iteration:
(a) String literals like `"foo"` in this port are NOT charlists — `[H|T] = "foo"` badmatches and `length("foo")` errors as "not a
proper list". `parse_segment_name` builds prefix/suffix from `atom_to_list/1` + explicit `[$-]` / `[$., $l, $o, $g]` cons.
(b) Cross-arg variable repetition (`strip_prefix([C | Rest], [C | PRest])`) was rewritten to explicit `case C =:= P` for robustness.
(c) `Pattern = Binding` syntax in a case clause (`[_|_] = Lst when length(Lst) > 1 -> ...`) errors as "unsupported pattern type
'match'" — replaced with `Lst when is_list(Lst), length(Lst) > 1`.
Tests:
- new `next/tests/log_rotate.sh` (10 cases): no-opt single-seg-after-3, rotation-fires-on-threshold, rotated-chronological,
reopen-rebuilds-history, reopen-rebuilds-same-seg-shape, huge-single-entry-stays-1-seg, append-after-huge-keeps-order,
tip-monotonic-across-rotations.
- `next/tests/log_disk.sh` updated to the new filename (`corrupted-000000.log`); stays 12/12.
- Erlang conformance 761/761 unchanged (log.erl is in next/, not lib/erlang/).
3c.a ticked in plans/fed-sx-milestone-1.md; 3c.b (gen_server-mediated concurrent appends) is the next iteration.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -1,102 +1,302 @@
|
||||
-module(log).
|
||||
-export([open/2, open_disk/2, append/2, tip/1, replay/3, entries/1]).
|
||||
-export([open/2, open_disk/2, open_disk/3,
|
||||
append/2, tip/1, replay/3, entries/1,
|
||||
segments/1]).
|
||||
|
||||
%% Per-actor activity log — the canonical record of everything an
|
||||
%% actor has emitted, in chronological order. Per design §15.2 this
|
||||
%% lives on disk as a JSONL segment file; v1 starts with an in-memory
|
||||
%% backend so the API and seq-number machinery can be locked down
|
||||
%% before the on-disk format is added (Step 3b).
|
||||
%% lives on disk as numbered segment files; v1 started with an
|
||||
%% in-memory backend (Step 3a) so the API + seq-number machinery
|
||||
%% could be locked down before on-disk persistence (Step 3b) and
|
||||
%% segment rotation (Step 3c.a — this revision).
|
||||
%%
|
||||
%% State shape (a property list):
|
||||
%% [{actor, ActorId}, {base, BasePath}, {seq, NextSeq}, {entries, [Act|...]}]
|
||||
%% On-disk layout:
|
||||
%% <BasePath>/<ActorId>-NNNNNN.log
|
||||
%%
|
||||
%% `entries` stores activities in append order — i.e. oldest first.
|
||||
%% `seq` is the next sequence number that will be assigned by append.
|
||||
%% `base` is kept on the state for forward-compatibility with 3b
|
||||
%% (where it becomes the segment-file directory).
|
||||
%% NNNNNN is a 6-digit zero-padded segment index (000000..999999) so
|
||||
%% file:list_dir's alphabetical ordering coincides with numeric. Each
|
||||
%% segment file is the concat of length-prefixed frames; each frame
|
||||
%% is `<<Len:32/big>>` + `term_codec:encode(Activity)`.
|
||||
%%
|
||||
%% open/2 takes ActorId + BasePath and returns {ok, LogState} starting
|
||||
%% with seq=0 and no entries.
|
||||
%% In-memory state (a property list):
|
||||
%% [{actor, ActorId},
|
||||
%% {base, BasePath}, %% binary | charlist
|
||||
%% {seq, NextSeq}, %% next seq the log will assign
|
||||
%% {entries, [Activity, ...]}, %% flat, append order, oldest first
|
||||
%% {persisted, true|false}, %% does append write through?
|
||||
%% {seg_size, MaxBytes}, %% rotate when active segment > this
|
||||
%% {seg_lens, [N0, N1, ...]}] %% entry count per segment in order
|
||||
%%
|
||||
%% append/2 returns {ok, NewLogState, AssignedSeq}.
|
||||
%%
|
||||
%% tip/1 returns the next seq the log would assign (== count of entries).
|
||||
%%
|
||||
%% replay/3 folds Fun(Activity, AssignedSeq, Acc) over every entry in
|
||||
%% append order. Three-arity rather than two-arity because the plan's
|
||||
%% example test is "sequence numbers gap-free across replay" — having
|
||||
%% the seq number visible in the fold makes that test direct.
|
||||
%%
|
||||
%% entries/1 is a debug accessor returning [Activity, ...] in append
|
||||
%% order. Not part of the public API contract.
|
||||
%% `seg_lens` is the sole bookkeeping needed to compute (a) which
|
||||
%% segment any given seq lives in, and (b) which slice of `entries`
|
||||
%% is the active segment's contents to rewrite on append. The last
|
||||
%% element is the active segment's length.
|
||||
|
||||
%% In-memory only — atoms accepted as BasePath for back-compat with
|
||||
%% Step 3a tests that just want the API surface.
|
||||
open(ActorId, BasePath) ->
|
||||
{ok, [{actor, ActorId}, {base, BasePath}, {seq, 0}, {entries, []}]}.
|
||||
{ok, [{actor, ActorId}, {base, BasePath},
|
||||
{seq, 0}, {entries, []},
|
||||
{persisted, false}]}.
|
||||
|
||||
append(LogState, Activity) ->
|
||||
Seq = field(seq, LogState),
|
||||
Entries = field(entries, LogState),
|
||||
NewEntries = Entries ++ [Activity],
|
||||
NewState = replace_field(seq, Seq + 1,
|
||||
replace_field(entries, NewEntries, LogState)),
|
||||
case persisted_path(LogState) of
|
||||
{persisted, Path} ->
|
||||
ok = write_segment(Path, NewEntries),
|
||||
{ok, NewState, Seq};
|
||||
not_persisted ->
|
||||
{ok, NewState, Seq}
|
||||
end.
|
||||
|
||||
%% open_disk/2 — disk-backed variant of open. Reads any existing
|
||||
%% segment file under BasePath, replays entries into memory state,
|
||||
%% and tags the state {persisted, true} so future append/2 calls
|
||||
%% write through. BasePath must be a binary or charlist (real path),
|
||||
%% not an atom — the in-memory open/2 still accepts atoms for tests.
|
||||
%%
|
||||
%% Segment format (per frame): 4-byte big-endian length + that many
|
||||
%% bytes of term_codec:encode(Activity). Whole file is the concat of
|
||||
%% all frames in append order; no header.
|
||||
%%
|
||||
%% Returns {ok, LogState} on success, {error, {corrupt, Reason}} if
|
||||
%% the segment is truncated/garbled, {error, {read, Reason}} on other
|
||||
%% file errors. Missing file is treated as an empty fresh log.
|
||||
%% Disk-backed; default segment size = effectively unlimited (no
|
||||
%% rotation). Use open_disk/3 with {segment_size, N} to enable.
|
||||
open_disk(ActorId, BasePath) ->
|
||||
Path = segment_path(ActorId, BasePath),
|
||||
case try_read_segment(Path) of
|
||||
{ok, Entries} ->
|
||||
open_disk(ActorId, BasePath, [{segment_size, 1073741824}]). %% 1 GiB
|
||||
|
||||
open_disk(ActorId, BasePath, Opts) ->
|
||||
SegSize = proplist_get(segment_size, Opts, 1073741824),
|
||||
case load_all_segments(ActorId, BasePath) of
|
||||
{ok, SegEntries} ->
|
||||
%% SegEntries :: [[Entry, ...]] in segment-index order
|
||||
%% (empty list when no segments exist on disk).
|
||||
Lens0 = [length(S) || S <- SegEntries],
|
||||
%% Always have at least one active segment, even if empty.
|
||||
Lens = case Lens0 of
|
||||
[] -> [0];
|
||||
_ -> Lens0
|
||||
end,
|
||||
Flat = flatten_segs(SegEntries),
|
||||
State = [{actor, ActorId}, {base, BasePath},
|
||||
{seq, length(Entries)},
|
||||
{entries, Entries},
|
||||
{seq, length(Flat)},
|
||||
{entries, Flat},
|
||||
{persisted, true},
|
||||
{path, Path}],
|
||||
{seg_size, SegSize},
|
||||
{seg_lens, Lens}],
|
||||
{ok, State};
|
||||
{error, _} = E ->
|
||||
E
|
||||
end.
|
||||
|
||||
persisted_path(LogState) ->
|
||||
append(LogState, Activity) ->
|
||||
Seq = field(seq, LogState),
|
||||
Entries = field(entries, LogState),
|
||||
case lookup(persisted, LogState) of
|
||||
true ->
|
||||
case lookup(path, LogState) of
|
||||
undefined -> not_persisted;
|
||||
P -> {persisted, P}
|
||||
end;
|
||||
_ -> not_persisted
|
||||
SegLens = field(seg_lens, LogState),
|
||||
SegSize = field(seg_size, LogState),
|
||||
{NewSegLens, ActiveIdx, ActiveEntries} =
|
||||
place_append(Entries, Activity, SegLens, SegSize),
|
||||
Path = segment_path(field(actor, LogState),
|
||||
field(base, LogState),
|
||||
ActiveIdx),
|
||||
ok = write_segment(Path, ActiveEntries),
|
||||
NewState = replace_field(seq, Seq + 1,
|
||||
replace_field(entries, Entries ++ [Activity],
|
||||
replace_field(seg_lens, NewSegLens, LogState))),
|
||||
{ok, NewState, Seq};
|
||||
_ ->
|
||||
NewState = replace_field(seq, Seq + 1,
|
||||
replace_field(entries, Entries ++ [Activity],
|
||||
LogState)),
|
||||
{ok, NewState, Seq}
|
||||
end.
|
||||
|
||||
%% segment_path/2 — returns the segment file path as a charlist (list
|
||||
%% of int char codes). BasePath may be a binary OR a charlist; we
|
||||
%% normalize to charlist via binary_to_list so the result is purely
|
||||
%% cons-based — this works around an iolist-walker quirk in
|
||||
%% er-source-to-string that surfaces when list_to_binary nests binaries
|
||||
%% built from charlists. file:read_file accepts charlists fine.
|
||||
segment_path(ActorId, BasePath) ->
|
||||
tip(LogState) ->
|
||||
field(seq, LogState).
|
||||
|
||||
replay(LogState, InitAcc, Fun) ->
|
||||
Entries = field(entries, LogState),
|
||||
replay_loop(Entries, 0, InitAcc, Fun).
|
||||
|
||||
entries(LogState) ->
|
||||
field(entries, LogState).
|
||||
|
||||
%% Debug accessor: returns the in-memory seg_lens (count per segment
|
||||
%% in index order). Used by rotation tests to assert that rotation
|
||||
%% happened.
|
||||
segments(LogState) ->
|
||||
case lookup(seg_lens, LogState) of
|
||||
undefined -> [];
|
||||
L -> L
|
||||
end.
|
||||
|
||||
%% --- internals ---
|
||||
|
||||
replay_loop([], _, Acc, _) -> Acc;
|
||||
replay_loop([Act | Rest], Seq, Acc, Fun) ->
|
||||
replay_loop(Rest, Seq + 1, Fun(Act, Seq, Acc), Fun).
|
||||
|
||||
%% place_append/4 decides whether the new Activity extends the current
|
||||
%% active segment or opens a fresh one, returning the resulting
|
||||
%% seg_lens, the active segment's index, and the active segment's
|
||||
%% complete entry list (the slice that needs to be (re)written to
|
||||
%% disk).
|
||||
%%
|
||||
%% Rotation rule: if the active segment already on disk is at or past
|
||||
%% the size threshold (encoded_size(OldActive) >= SegSize) AND it
|
||||
%% already holds at least one entry, the new Activity opens a new
|
||||
%% segment. A single entry larger than the threshold therefore lives
|
||||
%% on its own — we never recurse rotating a one-entry segment.
|
||||
%%
|
||||
%% This is decided BEFORE the append (looking at the pre-append size),
|
||||
%% so each segment file is written exactly once per append cycle.
|
||||
place_append(OldEntries, Activity, SegLens, SegSize) ->
|
||||
{Pre, Last} = split_last(SegLens),
|
||||
PreCount = sum(Pre),
|
||||
OldActive = drop(PreCount, OldEntries),
|
||||
OldActiveSize = encoded_size(OldActive),
|
||||
case (OldActiveSize >= SegSize) andalso (Last >= 1) of
|
||||
true ->
|
||||
%% Rotate: new entry starts a brand-new segment.
|
||||
NewSegLens = SegLens ++ [1],
|
||||
NewActiveIdx = length(SegLens),
|
||||
{NewSegLens, NewActiveIdx, [Activity]};
|
||||
false ->
|
||||
%% Stay: extend current active.
|
||||
NewSegLens = Pre ++ [Last + 1],
|
||||
NewActiveIdx = length(Pre),
|
||||
{NewSegLens, NewActiveIdx, OldActive ++ [Activity]}
|
||||
end.
|
||||
|
||||
split_last([X]) -> {[], X};
|
||||
split_last([H | T]) ->
|
||||
{Tl, Last} = split_last(T),
|
||||
{[H | Tl], Last}.
|
||||
|
||||
sum(L) -> sum_(L, 0).
|
||||
sum_([], A) -> A;
|
||||
sum_([H | T], A) -> sum_(T, A + H).
|
||||
|
||||
drop(0, L) -> L;
|
||||
drop(_, []) -> [];
|
||||
drop(N, [_ | T]) -> drop(N - 1, T).
|
||||
|
||||
%% flatten_segs/1 — concat a list of segments (each itself a list of
|
||||
%% entries) into a single flat list, preserving order. Used by
|
||||
%% open_disk to assemble the on-disk activity history from per-
|
||||
%% segment loads. Implemented locally because lists:append/1 isn't
|
||||
%% registered in this port — only lists:append/2.
|
||||
flatten_segs([]) -> [];
|
||||
flatten_segs([Seg | Rest]) -> Seg ++ flatten_segs(Rest).
|
||||
|
||||
encoded_size(Entries) ->
|
||||
byte_size(list_to_binary(
|
||||
[frame(term_codec:encode(E)) || E <- Entries])).
|
||||
|
||||
%% Try to read every segment file under BasePath matching the actor.
|
||||
%% Returns {ok, [[Entry, ...]]} where the outer list is in segment-
|
||||
%% index order. Empty when no segments exist.
|
||||
load_all_segments(ActorId, BasePath) ->
|
||||
%% list_dir returns {ok, [Binary]} of entry names in sorted order
|
||||
%% per fed-prims contract.
|
||||
BaseChars = base_chars(BasePath),
|
||||
case file:list_dir(BaseChars) of
|
||||
{ok, Names} ->
|
||||
%% Erlang string literals are NOT charlists in this port,
|
||||
%% so build prefix/suffix as explicit char-code lists.
|
||||
Prefix = atom_to_list(ActorId) ++ [$-],
|
||||
Suffix = [$., $l, $o, $g],
|
||||
Indices = collect_segment_indices(Names, Prefix, Suffix),
|
||||
read_segments_in_order(Indices, ActorId, BasePath, []);
|
||||
{error, enoent} ->
|
||||
{ok, []};
|
||||
{error, R} ->
|
||||
{error, {read, R}}
|
||||
end.
|
||||
|
||||
collect_segment_indices([], _, _) -> [];
|
||||
collect_segment_indices([Name | Rest], Prefix, Suffix) ->
|
||||
case parse_segment_name(Name, Prefix, Suffix) of
|
||||
{ok, N} ->
|
||||
[N | collect_segment_indices(Rest, Prefix, Suffix)];
|
||||
not_ours ->
|
||||
collect_segment_indices(Rest, Prefix, Suffix)
|
||||
end.
|
||||
|
||||
parse_segment_name(NameBin, Prefix, Suffix) when is_binary(NameBin) ->
|
||||
parse_segment_name(binary_to_list(NameBin), Prefix, Suffix);
|
||||
parse_segment_name(Name, Prefix, Suffix) ->
|
||||
case strip_prefix(Name, Prefix) of
|
||||
{ok, Rest} ->
|
||||
case strip_suffix(Rest, Suffix) of
|
||||
{ok, NumStr} ->
|
||||
case is_all_digits(NumStr) of
|
||||
true -> {ok, list_to_integer(NumStr)};
|
||||
false -> not_ours
|
||||
end;
|
||||
not_ours -> not_ours
|
||||
end;
|
||||
not_ours -> not_ours
|
||||
end.
|
||||
|
||||
strip_prefix(Str, []) -> {ok, Str};
|
||||
strip_prefix([C | Rest], [P | PRest]) ->
|
||||
case C =:= P of
|
||||
true -> strip_prefix(Rest, PRest);
|
||||
false -> not_ours
|
||||
end;
|
||||
strip_prefix(_, _) -> not_ours.
|
||||
|
||||
strip_suffix(Str, Suffix) ->
|
||||
SL = length(Str),
|
||||
XL = length(Suffix),
|
||||
case SL >= XL of
|
||||
true ->
|
||||
Head = take_n_pl(SL - XL, Str),
|
||||
Tail = drop(SL - XL, Str),
|
||||
case Tail =:= Suffix of
|
||||
true -> {ok, Head};
|
||||
false -> not_ours
|
||||
end;
|
||||
false -> not_ours
|
||||
end.
|
||||
|
||||
take_n_pl(0, _) -> [];
|
||||
take_n_pl(_, []) -> [];
|
||||
take_n_pl(N, [H | T]) -> [H | take_n_pl(N - 1, T)].
|
||||
|
||||
is_all_digits([]) -> false;
|
||||
is_all_digits(Chars) -> all_digits(Chars).
|
||||
|
||||
all_digits([]) -> true;
|
||||
all_digits([C | Rest]) when C >= $0, C =< $9 -> all_digits(Rest);
|
||||
all_digits(_) -> false.
|
||||
|
||||
%% read_segments_in_order/4 — fed-prims sorts list_dir alphabetically;
|
||||
%% with 6-digit zero-padded names that coincides with numeric order.
|
||||
%% But we also accept legacy unpadded names, so sort by index to be
|
||||
%% defensive.
|
||||
read_segments_in_order(Indices, ActorId, BasePath, Acc) ->
|
||||
Sorted = isort(Indices),
|
||||
read_each(Sorted, ActorId, BasePath, Acc).
|
||||
|
||||
read_each([], _, _, Acc) ->
|
||||
{ok, lists:reverse(Acc)};
|
||||
read_each([Idx | Rest], ActorId, BasePath, Acc) ->
|
||||
Path = segment_path(ActorId, BasePath, Idx),
|
||||
case try_read_segment(Path) of
|
||||
{ok, Entries} ->
|
||||
read_each(Rest, ActorId, BasePath, [Entries | Acc]);
|
||||
{error, _} = E -> E
|
||||
end.
|
||||
|
||||
%% Tiny insertion sort over a small list of integers.
|
||||
isort([]) -> [];
|
||||
isort([H | T]) -> insert(H, isort(T)).
|
||||
insert(X, []) -> [X];
|
||||
insert(X, [Y | Rest]) when X =< Y -> [X, Y | Rest];
|
||||
insert(X, [Y | Rest]) -> [Y | insert(X, Rest)].
|
||||
|
||||
%% segment_path/3 — charlist path to the Idx'th segment file.
|
||||
segment_path(ActorId, BasePath, Idx) ->
|
||||
base_chars(BasePath) ++ [$/] ++ atom_to_list(ActorId)
|
||||
++ [$., $l, $o, $g].
|
||||
++ [$-] ++ pad_int(Idx, 6) ++ [$., $l, $o, $g].
|
||||
|
||||
base_chars(B) when is_binary(B) -> binary_to_list(B);
|
||||
base_chars(L) when is_list(L) -> L.
|
||||
|
||||
%% Zero-pad an integer to Width digits as a charlist.
|
||||
pad_int(N, Width) ->
|
||||
Cs = integer_to_list(N),
|
||||
pad_left(Cs, Width).
|
||||
|
||||
pad_left(Cs, Width) ->
|
||||
case length(Cs) >= Width of
|
||||
true -> Cs;
|
||||
false -> pad_left([$0 | Cs], Width)
|
||||
end.
|
||||
|
||||
write_segment(Path, Entries) ->
|
||||
Frames = [frame(term_codec:encode(E)) || E <- Entries],
|
||||
file:write_file(Path, list_to_binary(Frames)).
|
||||
@@ -143,26 +343,12 @@ take_n(N, [H | T]) ->
|
||||
take_n(_, []) ->
|
||||
throw(truncated_body).
|
||||
|
||||
tip(LogState) ->
|
||||
field(seq, LogState).
|
||||
|
||||
replay(LogState, InitAcc, Fun) ->
|
||||
Entries = field(entries, LogState),
|
||||
replay_loop(Entries, 0, InitAcc, Fun).
|
||||
|
||||
replay_loop([], _, Acc, _) -> Acc;
|
||||
replay_loop([Act | Rest], Seq, Acc, Fun) ->
|
||||
replay_loop(Rest, Seq + 1, Fun(Act, Seq, Acc), Fun).
|
||||
|
||||
entries(LogState) ->
|
||||
field(entries, LogState).
|
||||
%% --- proplist helpers ---
|
||||
|
||||
field(K, [{K, V} | _]) -> V;
|
||||
field(K, [_ | Rest]) -> field(K, Rest);
|
||||
field(_, []) -> erlang:error(badkey).
|
||||
|
||||
%% lookup/2 — like field but returns `undefined` for missing key
|
||||
%% (used by persisted_path/1 which probes optional state fields).
|
||||
lookup(K, [{K, V} | _]) -> V;
|
||||
lookup(K, [_ | Rest]) -> lookup(K, Rest);
|
||||
lookup(_, []) -> undefined.
|
||||
@@ -170,3 +356,7 @@ lookup(_, []) -> undefined.
|
||||
replace_field(K, V, []) -> [{K, V}];
|
||||
replace_field(K, V, [{K, _} | Rest]) -> [{K, V} | Rest];
|
||||
replace_field(K, V, [P | Rest]) -> [P | replace_field(K, V, Rest)].
|
||||
|
||||
proplist_get(K, [{K, V} | _], _) -> V;
|
||||
proplist_get(K, [_ | Rest], Default) -> proplist_get(K, Rest, Default);
|
||||
proplist_get(_, [], Default) -> Default.
|
||||
|
||||
@@ -23,8 +23,10 @@ rm -rf "$DISK_BASE"
|
||||
mkdir -p "$DISK_BASE"
|
||||
|
||||
# Pre-write a corrupted segment file for the corrupt-detect test
|
||||
# (just a truncated 4-byte length header with no payload).
|
||||
printf '\x00\x00\x00\x05XX' > "$DISK_BASE/corrupted.log"
|
||||
# (just a truncated 4-byte length header with no payload). Segment
|
||||
# filenames are <ActorId>-NNNNNN.log (6-digit zero-padded index) as
|
||||
# of Step 3c.a.
|
||||
printf '\x00\x00\x00\x05XX' > "$DISK_BASE/corrupted-000000.log"
|
||||
|
||||
VERBOSE="${1:-}"
|
||||
PASS=0; FAIL=0; ERRORS=""
|
||||
|
||||
125
next/tests/log_rotate.sh
Executable file
125
next/tests/log_rotate.sh
Executable file
@@ -0,0 +1,125 @@
|
||||
#!/usr/bin/env bash
|
||||
# next/tests/log_rotate.sh — Step 3c.a segment rotation acceptance.
|
||||
#
|
||||
# Exercises log:open_disk/3 with {segment_size, N} opt-in, append/2
|
||||
# rotation behaviour at the threshold, replay across segments, and
|
||||
# reopen-after-rotation. Builds on the Step 3b on-disk substrate
|
||||
# (term_codec.erl + log.erl framed-segment writer).
|
||||
|
||||
set -uo pipefail
|
||||
cd "$(git rev-parse --show-toplevel)"
|
||||
|
||||
SX_SERVER="${SX_SERVER:-hosts/ocaml/_build/default/bin/sx_server.exe}"
|
||||
if [ ! -x "$SX_SERVER" ]; then
|
||||
SX_SERVER="/root/rose-ash/hosts/ocaml/_build/default/bin/sx_server.exe"
|
||||
fi
|
||||
if [ ! -x "$SX_SERVER" ]; then
|
||||
echo "ERROR: sx_server.exe not found." >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
DISK_BASE=/tmp/fed_sx_m1_log_rotate
|
||||
rm -rf "$DISK_BASE"
|
||||
mkdir -p "$DISK_BASE"
|
||||
|
||||
VERBOSE="${1:-}"
|
||||
PASS=0; FAIL=0; ERRORS=""
|
||||
TMPFILE=$(mktemp); trap "rm -f $TMPFILE; rm -rf $DISK_BASE" EXIT
|
||||
|
||||
cat > "$TMPFILE" <<'EPOCHS'
|
||||
(epoch 1)
|
||||
(load "lib/erlang/tokenizer.sx")
|
||||
(load "lib/erlang/parser.sx")
|
||||
(load "lib/erlang/parser-core.sx")
|
||||
(load "lib/erlang/parser-expr.sx")
|
||||
(load "lib/erlang/parser-module.sx")
|
||||
(load "lib/erlang/transpile.sx")
|
||||
(load "lib/erlang/runtime.sx")
|
||||
(load "lib/erlang/vm/dispatcher.sx")
|
||||
|
||||
(epoch 2)
|
||||
(eval "(get (erlang-load-module (file-read \"next/kernel/term_codec.erl\")) :name)")
|
||||
|
||||
(epoch 3)
|
||||
(eval "(get (erlang-load-module (file-read \"next/kernel/log.erl\")) :name)")
|
||||
|
||||
;; Base path /tmp/fed_sx_m1_log_rotate built byte-by-byte.
|
||||
;; --- default open_disk/2 = no rotation: many appends still single seg ---
|
||||
(epoch 10)
|
||||
(eval "(get (erlang-eval-ast \"Base = list_to_binary([$/, $t, $m, $p, $/, $f, $e, $d, $_, $s, $x, $_, $m, $1, $_, $l, $o, $g, $_, $r, $o, $t, $a, $t, $e]), {ok, L0} = log:open_disk(noopt, Base), {ok, L1, _} = log:append(L0, a), {ok, L2, _} = log:append(L1, b), {ok, L3, _} = log:append(L2, c), log:segments(L3) =:= [3]\") :name)")
|
||||
|
||||
;; --- small threshold rotates: 5 short entries -> multiple segs ---
|
||||
;; Each encoded entry like 'msg' is ~6 bytes + 4-byte length header = 10 bytes.
|
||||
;; Threshold 16 bytes means seg rotates after every 2 entries.
|
||||
(epoch 20)
|
||||
(eval "(get (erlang-eval-ast \"Base = list_to_binary([$/, $t, $m, $p, $/, $f, $e, $d, $_, $s, $x, $_, $m, $1, $_, $l, $o, $g, $_, $r, $o, $t, $a, $t, $e]), {ok, L0} = log:open_disk(small, Base, [{segment_size, 16}]), {ok, L1, _} = log:append(L0, aa), {ok, L2, _} = log:append(L1, bb), {ok, L3, _} = log:append(L2, cc), {ok, L4, _} = log:append(L3, dd), {ok, L5, _} = log:append(L4, ee), case log:segments(L5) of Lst when is_list(Lst), length(Lst) > 1 -> rotated; _ -> singleseg end\") :name)")
|
||||
|
||||
;; --- rotated entries replay in chronological order ---
|
||||
(epoch 21)
|
||||
(eval "(get (erlang-eval-ast \"Base = list_to_binary([$/, $t, $m, $p, $/, $f, $e, $d, $_, $s, $x, $_, $m, $1, $_, $l, $o, $g, $_, $r, $o, $t, $a, $t, $e]), {ok, L0} = log:open_disk(replay, Base, [{segment_size, 16}]), {ok, L1, _} = log:append(L0, aa), {ok, L2, _} = log:append(L1, bb), {ok, L3, _} = log:append(L2, cc), {ok, L4, _} = log:append(L3, dd), {ok, L5, _} = log:append(L4, ee), log:entries(L5) =:= [aa, bb, cc, dd, ee]\") :name)")
|
||||
|
||||
;; --- reopen after rotation: history is reassembled in order ---
|
||||
(epoch 22)
|
||||
(eval "(get (erlang-eval-ast \"Base = list_to_binary([$/, $t, $m, $p, $/, $f, $e, $d, $_, $s, $x, $_, $m, $1, $_, $l, $o, $g, $_, $r, $o, $t, $a, $t, $e]), {ok, L0} = log:open_disk(reopen, Base, [{segment_size, 16}]), {ok, L1, _} = log:append(L0, aa), {ok, L2, _} = log:append(L1, bb), {ok, L3, _} = log:append(L2, cc), {ok, L4, _} = log:append(L3, dd), {ok, L5, _} = log:append(L4, ee), {ok, R} = log:open_disk(reopen, Base, [{segment_size, 16}]), {log:entries(R), log:tip(R)} =:= {[aa, bb, cc, dd, ee], 5}\") :name)")
|
||||
|
||||
;; --- segments after reopen match (same shape rebuilt from disk) ---
|
||||
(epoch 23)
|
||||
(eval "(get (erlang-eval-ast \"Base = list_to_binary([$/, $t, $m, $p, $/, $f, $e, $d, $_, $s, $x, $_, $m, $1, $_, $l, $o, $g, $_, $r, $o, $t, $a, $t, $e]), {ok, L0} = log:open_disk(shape, Base, [{segment_size, 16}]), {ok, L1, _} = log:append(L0, aa), {ok, L2, _} = log:append(L1, bb), {ok, L3, _} = log:append(L2, cc), {ok, L4, _} = log:append(L3, dd), {ok, L5, _} = log:append(L4, ee), {ok, R} = log:open_disk(shape, Base, [{segment_size, 16}]), log:segments(R) =:= log:segments(L5)\") :name)")
|
||||
|
||||
;; --- single huge entry > threshold: still one segment, no infinite loop ---
|
||||
(epoch 30)
|
||||
(eval "(get (erlang-eval-ast \"Base = list_to_binary([$/, $t, $m, $p, $/, $f, $e, $d, $_, $s, $x, $_, $m, $1, $_, $l, $o, $g, $_, $r, $o, $t, $a, $t, $e]), {ok, L0} = log:open_disk(huge, Base, [{segment_size, 4}]), Big = <<0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15>>, {ok, L1, _} = log:append(L0, Big), log:segments(L1) =:= [1]\") :name)")
|
||||
|
||||
;; --- append after huge first entry forces rotation on next entry ---
|
||||
(epoch 31)
|
||||
(eval "(get (erlang-eval-ast \"Base = list_to_binary([$/, $t, $m, $p, $/, $f, $e, $d, $_, $s, $x, $_, $m, $1, $_, $l, $o, $g, $_, $r, $o, $t, $a, $t, $e]), {ok, L0} = log:open_disk(post, Base, [{segment_size, 4}]), Big = <<0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15>>, {ok, L1, _} = log:append(L0, Big), {ok, L2, _} = log:append(L1, small), log:entries(L2) =:= [Big, small]\") :name)")
|
||||
|
||||
;; --- tip increments monotonically across rotations ---
|
||||
(epoch 40)
|
||||
(eval "(get (erlang-eval-ast \"Base = list_to_binary([$/, $t, $m, $p, $/, $f, $e, $d, $_, $s, $x, $_, $m, $1, $_, $l, $o, $g, $_, $r, $o, $t, $a, $t, $e]), {ok, L0} = log:open_disk(tipcheck, Base, [{segment_size, 16}]), {ok, L1, _} = log:append(L0, x1), {ok, L2, _} = log:append(L1, x2), {ok, L3, _} = log:append(L2, x3), {ok, L4, _} = log:append(L3, x4), log:tip(L4) =:= 4\") :name)")
|
||||
EPOCHS
|
||||
|
||||
OUTPUT=$(timeout 90 "$SX_SERVER" < "$TMPFILE" 2>/dev/null)
|
||||
|
||||
check() {
|
||||
local epoch="$1" desc="$2" expected="$3"
|
||||
local actual
|
||||
actual=$(echo "$OUTPUT" | grep -A1 "^(ok-len $epoch " | tail -1 || true)
|
||||
if echo "$actual" | grep -q "^(ok-len"; then actual=""; fi
|
||||
if [ -z "$actual" ]; then
|
||||
actual=$(echo "$OUTPUT" | grep "^(ok $epoch " | head -1 || true)
|
||||
fi
|
||||
if [ -z "$actual" ]; then
|
||||
actual=$(echo "$OUTPUT" | grep "^(error $epoch " | head -1 || true)
|
||||
fi
|
||||
[ -z "$actual" ] && actual="<no output for epoch $epoch>"
|
||||
|
||||
if echo "$actual" | grep -qF -- "$expected"; then
|
||||
PASS=$((PASS+1))
|
||||
[ "$VERBOSE" = "-v" ] && echo " ok $desc"
|
||||
else
|
||||
FAIL=$((FAIL+1))
|
||||
ERRORS+=" FAIL [$desc] (epoch $epoch) expected: $expected | actual: $actual
|
||||
"
|
||||
fi
|
||||
}
|
||||
|
||||
check 2 "term_codec loads" "term_codec"
|
||||
check 3 "log module loads" "log"
|
||||
check 10 "no-opt = single seg after 3" "true"
|
||||
check 20 "rotation fires on threshold" "rotated"
|
||||
check 21 "rotated entries chronological" "true"
|
||||
check 22 "reopen rebuilds history" "true"
|
||||
check 23 "reopen rebuilds same seg shape" "true"
|
||||
check 30 "huge single entry stays 1 seg" "true"
|
||||
check 31 "append after huge keeps order" "true"
|
||||
check 40 "tip monotonic across rotations" "true"
|
||||
|
||||
TOTAL=$((PASS+FAIL))
|
||||
if [ $FAIL -eq 0 ]; then
|
||||
echo "ok $PASS/$TOTAL log_rotate tests passed"
|
||||
else
|
||||
echo "FAIL $PASS/$TOTAL passed, $FAIL failed:"
|
||||
echo "$ERRORS"
|
||||
fi
|
||||
[ $FAIL -eq 0 ]
|
||||
@@ -199,6 +199,8 @@ verify_signature(Activity, ActorState) ->
|
||||
- [x] **3a** — `log:open/2` + `log:append/2` + `log:tip/1` + `log:replay/3` + `log:entries/1` over an in-memory log state (per-actor seq; replay in append order; round-trip the stored activity). `next/tests/log_memory.sh` (12 cases).
|
||||
- [x] **3b** — Term codec + on-disk persistence. Codec: `next/kernel/term_codec.erl` `encode/1` + `decode/1` over netstring framing (`a/i/b/t/l` + length + body; binary bodies byte-clean — NUL/LF allowed). On-disk: `log:open_disk/2(ActorId, BasePath)` reads any existing segment file (charlist path = `BasePath ++ "/" ++ atom_to_list(ActorId) ++ ".log"`); `append/2` is polymorphic on a `{persisted, true}` state field and writes through. Frame format on disk: 4-byte big-endian length prefix + `term_codec:encode(Activity)`. `try_read_segment` catches throw/error and surfaces `{error, {corrupt, Reason}}`. 18 codec round-trips + 12 disk acceptance tests (`next/tests/term_codec.sh`, `next/tests/log_disk.sh`); 3a in-memory `open/2` semantics unchanged. `encode/1`/`decode/1` for atoms, integers, binaries, tuples, lists, nesting; netstring-ish framing (`a/i/b/t/l` tag + length + body); byte-clean (binary bodies may contain NUL/LF). 18 round-trip + streaming + bad-form tests in `next/tests/term_codec.sh`. On-disk segment writer (open/2 reads existing, append/2 writes-through, replay/3 reads from disk) is the next sub-step — codec is the load-bearing piece.
|
||||
- [ ] **3c** — Segment rotation at size threshold + gen_server-mediated concurrent appends.
|
||||
- [x] **3c.a** — Segment rotation. `log:open_disk/3(ActorId, BasePath, [{segment_size, N}])` opts in with a byte threshold; default `open_disk/2` keeps a 1 GiB threshold (effectively no rotation). Filename scheme moved to `<ActorId>-NNNNNN.log` (6-digit zero-padded index) so `file:list_dir`'s alphabetical sort matches numeric order. `append/2` checks `encoded_size(active)` BEFORE the append: if already ≥ threshold AND active has at least one entry, the new activity opens a fresh segment; otherwise it extends current active. Single huge entries stay alone (no recursive rotation). On reopen, every matching `<ActorId>-*.log` file is read, decoded, and concatenated in numeric order to rebuild flat entries + `seg_lens`. `next/tests/log_rotate.sh` 10/10 (no-opt single-seg, threshold-rotates, chronological after rotation, reopen rebuilds shape, huge-entry-alone, post-huge keeps order, tip monotonic) + `log_disk.sh` updated to the new filename and stays 12/12. Erlang conformance 761/761.
|
||||
- [ ] **3c.b** — gen_server-mediated concurrent appends.
|
||||
|
||||
**Blockers (Step 3b) — byte-level path resolved 2026-06-04:** `binary_to_list/1` and `list_to_binary/1` are now registered Erlang BIFs in `lib/erlang/runtime.sx` (Step 3b substrate fix, +9 ffi tests, 738/738 conformance). `list_to_binary` is iolist-aware: accepts nested cons of integer bytes (0-255) and/or binaries; `binary_to_list` returns a proper Erlang charlist of integers. Round-trip verified: `list_to_binary(binary_to_list(B)) =:= B`. On-disk segment writer (3b) can now build segment bytes from `[Header, IoListPayload]` and reconstruct on read — option (c) of the original workaround menu is now cheap. `$X` char literals now decode correctly **as of 2026-06-04**: the Erlang tokenizer's `(= ch "$")` branch (`lib/erlang/tokenizer.sx`) now emits the decimal char code as the token value instead of the raw `$X` text (which `parse-number` couldn't decode → nil). Plain chars use `char->integer` of the first char; the standard escape table (`\n=10 \t=9 \r=13 \s=32 \b=8 \e=27 \f=12 \v=11 \d=127 \0=0 \\=92 \"=34 \'=39`) handles `$\X` forms. So `[$h, $i | T]` patterns and `list_to_binary([$f,$e,$d])` both work end-to-end. +12 eval tests, 750/750. Combined with 3b's `binary_to_list`/`list_to_binary`, Erlang code can now read/write byte sequences and string-shaped char lists fluently. **All three substrate gaps resolved as of 2026-06-05.** `atom_to_list/1` and `integer_to_list/1` now return Erlang charlists (cons of int char codes — standard Erlang semantics) via a new `er-string->charlist` helper in `transpile.sx`. `list_to_atom/1` and `list_to_integer/1` accept either charlists OR SX strings (back-compat via the existing `er-source-to-string` coercer). Composition works end-to-end: `list_to_binary(atom_to_list(hello)) =:= <<104,101,108,108,111>>` and `integer_to_list(N)` round-trips through `list_to_integer`. 5 existing eval tests rewritten to charlist semantics, 8 new charlist-aware tests added (759/759). The full term-codec primitive set — `binary_to_list`, `list_to_binary`, `$X`, `atom_to_list`, `integer_to_list` charlist semantics, plus existing `file:read_file`/`write_file`/`list_dir` — is now in place.
|
||||
|
||||
@@ -1003,6 +1005,7 @@ A few things still under-specified; resolve as work begins.
|
||||
Newest first. One line per sub-deliverable commit. Erlang conformance gate
|
||||
(`bash lib/erlang/conformance.sh`) must remain 729/729 on every entry.
|
||||
|
||||
- **2026-06-05** — Step 3c.a segment rotation: `next/kernel/log.erl` rewritten around a `seg_lens :: [N0, N1, ...]` bookkeeping list (one entry-count per segment in numeric order, last is active) + `seg_size` threshold. Filename scheme now `<ActorId>-NNNNNN.log` (6-digit zero-padded so `file:list_dir`'s alphabetical sort = numeric). `open_disk/3(ActorId, BasePath, [{segment_size, N}])` opts a caller into a smaller rotation threshold; `open_disk/2` keeps a 1 GiB default that effectively never rotates (preserves Step 3b acceptance). Rotation rule (`place_append/4`): if the active segment's pre-append serialized size already ≥ threshold AND it holds at least one entry, the new activity opens a fresh segment — otherwise it extends current active. Single huge entry > threshold stays alone (no recursive rotation, no loop). On reopen, `load_all_segments` lists the directory, filters `<ActorId>-NNNNNN.log`, sorts numerically (insertion sort, since `lists:sort/1` isn't registered in this port — only `lists:append/2`/`lists:reverse/1`/`lists:filter/2` etc.), reads each via `try_read_segment`, and concatenates to rebuild flat `entries` + `seg_lens`. **Erlang-port gotchas hit & worked around:** (a) Erlang string literals like `"foo"` in this port are NOT charlists — `[H|T] = "foo"` badmatches, `length("foo")` errors as "not a proper list". `parse_segment_name` had to build prefix/suffix from `atom_to_list/1` + explicit `[$-]` / `[$., $l, $o, $g]` cons. (b) Cross-arg variable repetition (`strip_prefix([C | Rest], [C | PRest])`) works in tuple patterns but I rewrote it to explicit `case C =:= P of true -> ... false -> ...` for robustness. (c) `Pattern = Binding` syntax in a case clause (`[_|_] = Lst when length(Lst) > 1 -> ...`) errors "unsupported pattern type 'match'" — used `Lst when is_list(Lst), length(Lst) > 1` instead. New `next/tests/log_rotate.sh` 10/10: no-opt single-seg-after-3, rotation-fires-on-threshold, rotated-chronological, reopen-rebuilds-history, reopen-rebuilds-same-seg-shape, huge-single-entry-stays-1-seg, append-after-huge-keeps-order, tip-monotonic-across-rotations. Existing `next/tests/log_disk.sh` updated to the new filename (`corrupted-000000.log`) and stays 12/12. Erlang conformance **761/761** unchanged (log.erl is in next/, not lib/erlang/). Step 3c.a ticked; 3c.b (gen_server-mediated concurrent appends) is the next iteration.
|
||||
- **2026-06-05** — Step 3b on-disk log: `next/kernel/log.erl` gains `open_disk/2(ActorId, BasePath)` and a write-through `append/2`. New state field `{persisted, true} | {path, CharList}` keys the polymorphism — 3a's in-memory `open/2` stays untouched and tests unchanged. `segment_path/2` builds the path as a charlist (`base_chars(BasePath) ++ "/" ++ atom_to_list(ActorId) ++ ".log"`) so it works whether the caller passes a binary or charlist BasePath; everything flows through `er-source-to-string` cleanly. On-disk frame format: 4-byte big-endian length prefix + `term_codec:encode(Activity)`. Restart path: `try_read_segment` reads the whole segment, length-decodes each frame, decodes via `term_codec`, returns `{ok, Entries}`; missing file → `{ok, []}`; throw/error during decode → `{error, {corrupt, _}}`. `next/tests/log_disk.sh` 12/12: open-missing-fresh, append+reopen-entries-match, tip-resumes, replay-chronological, mixed-types (atom/int/binary/tuple/list) round-trip, append-after-reopen, corrupted-segment, per-actor isolation, 3a back-compat. Erlang conformance **761/761** unchanged (log.erl is in next/, not lib/erlang/). Step 3b is now FULLY ticked; 3c (segment rotation + gen_server-mediated concurrent appends) remains for the next iteration.
|
||||
- **2026-06-05** — Step 3b substrate fix #4: integer-literal eval now produces real ints (was floats). `transpile.sx`'s `(= ty "integer") (parse-number ...)` path returns `float_of_string` per host's `parse-number`, so `42`, `$X`, etc. were floats that `(integer? v)` returned true for but `(integer->char v)` rejected. Wrapped in `truncate` so all integer literals coerce to strict int; added nil-guard with a descriptive error. Discovered while debugging Step 3b on-disk log (file:read_file on a charlist path failed at the inner `(map integer->char ...)` because charlist elements were floats). Conformance **761/761** (eval 406→408, +2 net; no other suites changed). Unblocks any path that does `integer->char` on int-literal-derived values — most notably `file:read_file` / `file:write_file` on charlist paths and binaries built from `$X` literals.
|
||||
- **2026-06-05** — Step 3b codec landed: `next/kernel/term_codec.erl` with `encode/1` + `decode/1` over a netstring-ish wire format (`a` atom / `i` int / `b` binary / `t` tuple / `l` list, each as `tag + decimal-length + ":" + body`; nil = `l0:`). Byte-clean — binary bodies may contain NUL, LF, or any byte; encoding stays parseable. Built end-to-end on the three substrate fixes (binary_to_list/list_to_binary + $X + atom_to_list/integer_to_list charlists). `decode/1` returns `{ok, Term, RestBinary}` so callers can stream multiple frames from one buffer. 18 acceptance tests in `next/tests/term_codec.sh`: encode bytes for every leaf type, round-trip for each, nested activity-shaped term (`{create, [{id,1},{actor,alice},{payload,<<104,105>>}]}`), 2-frame streaming, binary with embedded NUL+LF, bad-form returns `{error, badform}` not crash. Erlang conformance **759/759** unchanged (codec is in `next/`, not lib/erlang/). Step 3b on-disk segment writer (the second half — open/append/replay reading/writing the actual segment file) is the natural next iteration: encode each activity with `term_codec`, frame with a 4-byte big-endian length prefix, append to disk.
|
||||
|
||||
Reference in New Issue
Block a user