fed-sx-m1: Step 3c.a segment rotation — log:open_disk/3, <ActorId>-NNNNNN.log filename, threshold-driven rotation; 10/10 log_rotate tests
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 21s
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 21s
`next/kernel/log.erl` rewritten around a `seg_lens :: [N0, N1, ...]` per-segment entry-count list + a `seg_size` byte threshold. Filename
scheme moved from `<ActorId>.log` to `<ActorId>-NNNNNN.log` (6-digit zero-padded) so `file:list_dir`'s alphabetical sort coincides
with numeric order.
`open_disk/3(ActorId, BasePath, [{segment_size, N}])` opts a caller into a smaller rotation threshold; `open_disk/2` keeps a 1 GiB
default that effectively never rotates (preserves Step 3b acceptance — log_disk.sh unchanged in behaviour).
Rotation rule in `place_append/4`: if the active segment's pre-append encoded size is already >= threshold AND it holds at least one
entry, the new activity opens a fresh segment; otherwise it extends the current active segment. A single huge entry that exceeds
the threshold stays alone — never rotated recursively.
On reopen, `load_all_segments` lists the dir, filters `<ActorId>-NNNNNN.log`, sorts numerically (insertion sort — `lists:sort/1`
isn't registered in this port, only `lists:append/2`/`lists:reverse/1`/`lists:filter/2`/etc.), reads each via `try_read_segment`,
and concatenates the entries to rebuild flat `entries` + `seg_lens`.
Erlang-port gotchas worked around during this iteration:
(a) String literals like `"foo"` in this port are NOT charlists — `[H|T] = "foo"` badmatches and `length("foo")` errors as "not a
proper list". `parse_segment_name` builds prefix/suffix from `atom_to_list/1` + explicit `[$-]` / `[$., $l, $o, $g]` cons.
(b) Cross-arg variable repetition (`strip_prefix([C | Rest], [C | PRest])`) was rewritten to explicit `case C =:= P` for robustness.
(c) `Pattern = Binding` syntax in a case clause (`[_|_] = Lst when length(Lst) > 1 -> ...`) errors as "unsupported pattern type
'match'" — replaced with `Lst when is_list(Lst), length(Lst) > 1`.
Tests:
- new `next/tests/log_rotate.sh` (10 cases): no-opt single-seg-after-3, rotation-fires-on-threshold, rotated-chronological,
reopen-rebuilds-history, reopen-rebuilds-same-seg-shape, huge-single-entry-stays-1-seg, append-after-huge-keeps-order,
tip-monotonic-across-rotations.
- `next/tests/log_disk.sh` updated to the new filename (`corrupted-000000.log`); stays 12/12.
- Erlang conformance 761/761 unchanged (log.erl is in next/, not lib/erlang/).
3c.a ticked in plans/fed-sx-milestone-1.md; 3c.b (gen_server-mediated concurrent appends) is the next iteration.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -199,6 +199,8 @@ verify_signature(Activity, ActorState) ->
|
||||
- [x] **3a** — `log:open/2` + `log:append/2` + `log:tip/1` + `log:replay/3` + `log:entries/1` over an in-memory log state (per-actor seq; replay in append order; round-trip the stored activity). `next/tests/log_memory.sh` (12 cases).
|
||||
- [x] **3b** — Term codec + on-disk persistence. Codec: `next/kernel/term_codec.erl` `encode/1` + `decode/1` over netstring framing (`a/i/b/t/l` + length + body; binary bodies byte-clean — NUL/LF allowed). On-disk: `log:open_disk/2(ActorId, BasePath)` reads any existing segment file (charlist path = `BasePath ++ "/" ++ atom_to_list(ActorId) ++ ".log"`); `append/2` is polymorphic on a `{persisted, true}` state field and writes through. Frame format on disk: 4-byte big-endian length prefix + `term_codec:encode(Activity)`. `try_read_segment` catches throw/error and surfaces `{error, {corrupt, Reason}}`. 18 codec round-trips + 12 disk acceptance tests (`next/tests/term_codec.sh`, `next/tests/log_disk.sh`); 3a in-memory `open/2` semantics unchanged. `encode/1`/`decode/1` for atoms, integers, binaries, tuples, lists, nesting; netstring-ish framing (`a/i/b/t/l` tag + length + body); byte-clean (binary bodies may contain NUL/LF). 18 round-trip + streaming + bad-form tests in `next/tests/term_codec.sh`. On-disk segment writer (open/2 reads existing, append/2 writes-through, replay/3 reads from disk) is the next sub-step — codec is the load-bearing piece.
|
||||
- [ ] **3c** — Segment rotation at size threshold + gen_server-mediated concurrent appends.
|
||||
- [x] **3c.a** — Segment rotation. `log:open_disk/3(ActorId, BasePath, [{segment_size, N}])` opts in with a byte threshold; default `open_disk/2` keeps a 1 GiB threshold (effectively no rotation). Filename scheme moved to `<ActorId>-NNNNNN.log` (6-digit zero-padded index) so `file:list_dir`'s alphabetical sort matches numeric order. `append/2` checks `encoded_size(active)` BEFORE the append: if already ≥ threshold AND active has at least one entry, the new activity opens a fresh segment; otherwise it extends current active. Single huge entries stay alone (no recursive rotation). On reopen, every matching `<ActorId>-*.log` file is read, decoded, and concatenated in numeric order to rebuild flat entries + `seg_lens`. `next/tests/log_rotate.sh` 10/10 (no-opt single-seg, threshold-rotates, chronological after rotation, reopen rebuilds shape, huge-entry-alone, post-huge keeps order, tip monotonic) + `log_disk.sh` updated to the new filename and stays 12/12. Erlang conformance 761/761.
|
||||
- [ ] **3c.b** — gen_server-mediated concurrent appends.
|
||||
|
||||
**Blockers (Step 3b) — byte-level path resolved 2026-06-04:** `binary_to_list/1` and `list_to_binary/1` are now registered Erlang BIFs in `lib/erlang/runtime.sx` (Step 3b substrate fix, +9 ffi tests, 738/738 conformance). `list_to_binary` is iolist-aware: accepts nested cons of integer bytes (0-255) and/or binaries; `binary_to_list` returns a proper Erlang charlist of integers. Round-trip verified: `list_to_binary(binary_to_list(B)) =:= B`. On-disk segment writer (3b) can now build segment bytes from `[Header, IoListPayload]` and reconstruct on read — option (c) of the original workaround menu is now cheap. `$X` char literals now decode correctly **as of 2026-06-04**: the Erlang tokenizer's `(= ch "$")` branch (`lib/erlang/tokenizer.sx`) now emits the decimal char code as the token value instead of the raw `$X` text (which `parse-number` couldn't decode → nil). Plain chars use `char->integer` of the first char; the standard escape table (`\n=10 \t=9 \r=13 \s=32 \b=8 \e=27 \f=12 \v=11 \d=127 \0=0 \\=92 \"=34 \'=39`) handles `$\X` forms. So `[$h, $i | T]` patterns and `list_to_binary([$f,$e,$d])` both work end-to-end. +12 eval tests, 750/750. Combined with 3b's `binary_to_list`/`list_to_binary`, Erlang code can now read/write byte sequences and string-shaped char lists fluently. **All three substrate gaps resolved as of 2026-06-05.** `atom_to_list/1` and `integer_to_list/1` now return Erlang charlists (cons of int char codes — standard Erlang semantics) via a new `er-string->charlist` helper in `transpile.sx`. `list_to_atom/1` and `list_to_integer/1` accept either charlists OR SX strings (back-compat via the existing `er-source-to-string` coercer). Composition works end-to-end: `list_to_binary(atom_to_list(hello)) =:= <<104,101,108,108,111>>` and `integer_to_list(N)` round-trips through `list_to_integer`. 5 existing eval tests rewritten to charlist semantics, 8 new charlist-aware tests added (759/759). The full term-codec primitive set — `binary_to_list`, `list_to_binary`, `$X`, `atom_to_list`, `integer_to_list` charlist semantics, plus existing `file:read_file`/`write_file`/`list_dir` — is now in place.
|
||||
|
||||
@@ -1003,6 +1005,7 @@ A few things still under-specified; resolve as work begins.
|
||||
Newest first. One line per sub-deliverable commit. Erlang conformance gate
|
||||
(`bash lib/erlang/conformance.sh`) must remain 729/729 on every entry.
|
||||
|
||||
- **2026-06-05** — Step 3c.a segment rotation: `next/kernel/log.erl` rewritten around a `seg_lens :: [N0, N1, ...]` bookkeeping list (one entry-count per segment in numeric order, last is active) + `seg_size` threshold. Filename scheme now `<ActorId>-NNNNNN.log` (6-digit zero-padded so `file:list_dir`'s alphabetical sort = numeric). `open_disk/3(ActorId, BasePath, [{segment_size, N}])` opts a caller into a smaller rotation threshold; `open_disk/2` keeps a 1 GiB default that effectively never rotates (preserves Step 3b acceptance). Rotation rule (`place_append/4`): if the active segment's pre-append serialized size already ≥ threshold AND it holds at least one entry, the new activity opens a fresh segment — otherwise it extends current active. Single huge entry > threshold stays alone (no recursive rotation, no loop). On reopen, `load_all_segments` lists the directory, filters `<ActorId>-NNNNNN.log`, sorts numerically (insertion sort, since `lists:sort/1` isn't registered in this port — only `lists:append/2`/`lists:reverse/1`/`lists:filter/2` etc.), reads each via `try_read_segment`, and concatenates to rebuild flat `entries` + `seg_lens`. **Erlang-port gotchas hit & worked around:** (a) Erlang string literals like `"foo"` in this port are NOT charlists — `[H|T] = "foo"` badmatches, `length("foo")` errors as "not a proper list". `parse_segment_name` had to build prefix/suffix from `atom_to_list/1` + explicit `[$-]` / `[$., $l, $o, $g]` cons. (b) Cross-arg variable repetition (`strip_prefix([C | Rest], [C | PRest])`) works in tuple patterns but I rewrote it to explicit `case C =:= P of true -> ... false -> ...` for robustness. (c) `Pattern = Binding` syntax in a case clause (`[_|_] = Lst when length(Lst) > 1 -> ...`) errors "unsupported pattern type 'match'" — used `Lst when is_list(Lst), length(Lst) > 1` instead. New `next/tests/log_rotate.sh` 10/10: no-opt single-seg-after-3, rotation-fires-on-threshold, rotated-chronological, reopen-rebuilds-history, reopen-rebuilds-same-seg-shape, huge-single-entry-stays-1-seg, append-after-huge-keeps-order, tip-monotonic-across-rotations. Existing `next/tests/log_disk.sh` updated to the new filename (`corrupted-000000.log`) and stays 12/12. Erlang conformance **761/761** unchanged (log.erl is in next/, not lib/erlang/). Step 3c.a ticked; 3c.b (gen_server-mediated concurrent appends) is the next iteration.
|
||||
- **2026-06-05** — Step 3b on-disk log: `next/kernel/log.erl` gains `open_disk/2(ActorId, BasePath)` and a write-through `append/2`. New state field `{persisted, true} | {path, CharList}` keys the polymorphism — 3a's in-memory `open/2` stays untouched and tests unchanged. `segment_path/2` builds the path as a charlist (`base_chars(BasePath) ++ "/" ++ atom_to_list(ActorId) ++ ".log"`) so it works whether the caller passes a binary or charlist BasePath; everything flows through `er-source-to-string` cleanly. On-disk frame format: 4-byte big-endian length prefix + `term_codec:encode(Activity)`. Restart path: `try_read_segment` reads the whole segment, length-decodes each frame, decodes via `term_codec`, returns `{ok, Entries}`; missing file → `{ok, []}`; throw/error during decode → `{error, {corrupt, _}}`. `next/tests/log_disk.sh` 12/12: open-missing-fresh, append+reopen-entries-match, tip-resumes, replay-chronological, mixed-types (atom/int/binary/tuple/list) round-trip, append-after-reopen, corrupted-segment, per-actor isolation, 3a back-compat. Erlang conformance **761/761** unchanged (log.erl is in next/, not lib/erlang/). Step 3b is now FULLY ticked; 3c (segment rotation + gen_server-mediated concurrent appends) remains for the next iteration.
|
||||
- **2026-06-05** — Step 3b substrate fix #4: integer-literal eval now produces real ints (was floats). `transpile.sx`'s `(= ty "integer") (parse-number ...)` path returns `float_of_string` per host's `parse-number`, so `42`, `$X`, etc. were floats that `(integer? v)` returned true for but `(integer->char v)` rejected. Wrapped in `truncate` so all integer literals coerce to strict int; added nil-guard with a descriptive error. Discovered while debugging Step 3b on-disk log (file:read_file on a charlist path failed at the inner `(map integer->char ...)` because charlist elements were floats). Conformance **761/761** (eval 406→408, +2 net; no other suites changed). Unblocks any path that does `integer->char` on int-literal-derived values — most notably `file:read_file` / `file:write_file` on charlist paths and binaries built from `$X` literals.
|
||||
- **2026-06-05** — Step 3b codec landed: `next/kernel/term_codec.erl` with `encode/1` + `decode/1` over a netstring-ish wire format (`a` atom / `i` int / `b` binary / `t` tuple / `l` list, each as `tag + decimal-length + ":" + body`; nil = `l0:`). Byte-clean — binary bodies may contain NUL, LF, or any byte; encoding stays parseable. Built end-to-end on the three substrate fixes (binary_to_list/list_to_binary + $X + atom_to_list/integer_to_list charlists). `decode/1` returns `{ok, Term, RestBinary}` so callers can stream multiple frames from one buffer. 18 acceptance tests in `next/tests/term_codec.sh`: encode bytes for every leaf type, round-trip for each, nested activity-shaped term (`{create, [{id,1},{actor,alice},{payload,<<104,105>>}]}`), 2-frame streaming, binary with embedded NUL+LF, bad-form returns `{error, badform}` not crash. Erlang conformance **759/759** unchanged (codec is in `next/`, not lib/erlang/). Step 3b on-disk segment writer (the second half — open/append/replay reading/writing the actual segment file) is the natural next iteration: encode each activity with `term_codec`, frame with a 4-byte big-endian length prefix, append to disk.
|
||||
|
||||
Reference in New Issue
Block a user