From 4be6988963440c959603a85e67c8d47bd83fbd53 Mon Sep 17 00:00:00 2001 From: giles Date: Sat, 6 Jun 2026 19:14:01 +0000 Subject: [PATCH] =?UTF-8?q?persist:=20crash/restart=20recovery=20integrati?= =?UTF-8?q?on=20+=20migration=20notes=20=E2=80=94=20Phase=204=20complete?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit recovery.sx: 6-test end-to-end crash/restart of an order ledger (log + subscription kv read model + snapshot + compaction + invoice blob ref) on the durable backend; everything survives a restart over the same disk + content store, seq continues, two restarts converge. Migration notes (mem → durable under a live subsystem) added to the plan. Roadmap done, 111/111. Co-Authored-By: Claude Opus 4.8 (1M context) --- lib/persist/conformance.sh | 2 +- lib/persist/scoreboard.json | 7 +- lib/persist/scoreboard.md | 3 +- lib/persist/tests/recovery.sx | 126 ++++++++++++++++++++++++++++++++++ plans/persist-on-sx.md | 50 +++++++++++++- 5 files changed, 180 insertions(+), 8 deletions(-) create mode 100644 lib/persist/tests/recovery.sx diff --git a/lib/persist/conformance.sh b/lib/persist/conformance.sh index 847e2af1..46aededa 100755 --- a/lib/persist/conformance.sh +++ b/lib/persist/conformance.sh @@ -13,7 +13,7 @@ if [ ! -x "$SX_SERVER" ]; then exit 1 fi -SUITES=(event log kv project subscribe concurrency snapshot compaction durable blob) +SUITES=(event log kv project subscribe concurrency snapshot compaction durable blob recovery) OUT_JSON="lib/persist/scoreboard.json" OUT_MD="lib/persist/scoreboard.md" diff --git a/lib/persist/scoreboard.json b/lib/persist/scoreboard.json index c7cbfe23..c5f2f969 100644 --- a/lib/persist/scoreboard.json +++ b/lib/persist/scoreboard.json @@ -9,9 +9,10 @@ "snapshot": {"pass": 11, "fail": 0}, "compaction": {"pass": 11, "fail": 0}, "durable": {"pass": 15, "fail": 0}, - "blob": {"pass": 14, "fail": 0} + "blob": {"pass": 14, "fail": 0}, + "recovery": {"pass": 6, "fail": 0} }, - "total_pass": 105, + "total_pass": 111, "total_fail": 0, - "total": 105 + "total": 111 } diff --git a/lib/persist/scoreboard.md b/lib/persist/scoreboard.md index a2d95c81..d7f95884 100644 --- a/lib/persist/scoreboard.md +++ b/lib/persist/scoreboard.md @@ -14,4 +14,5 @@ _Generated by `lib/persist/conformance.sh`_ | compaction | 11 | 0 | 11 | | durable | 15 | 0 | 15 | | blob | 14 | 0 | 14 | -| **Total** | **105** | **0** | **105** | +| recovery | 6 | 0 | 6 | +| **Total** | **111** | **0** | **111** | diff --git a/lib/persist/tests/recovery.sx b/lib/persist/tests/recovery.sx new file mode 100644 index 00000000..b31054c3 --- /dev/null +++ b/lib/persist/tests/recovery.sx @@ -0,0 +1,126 @@ +; Phase 4 — crash/restart integration. A whole subsystem (an order ledger: +; event log + a kv read model kept by a subscription + a periodic snapshot + an +; invoice blob ref) on the durable backend must survive a restart. "Crash" = +; drop every in-process object (backend, hub, projections); "restart" = rebuild +; them over the SAME disk + blob store. Nothing but the disk and content store +; carries across, exactly as a real process restart. + +(define rec-count (fn (acc e) (+ acc 1))) + +(persist-test + "log survives restart and seq continues" + (let + ((disk (persist/mem-backend))) + (begin + (let + ((db (persist/mock-durable disk))) + (begin + (persist/append db "orders" "placed" 0 {:id "a"}) + (persist/append db "orders" "placed" 1 {:id "b"}))) + (let + ((db2 (persist/mock-durable disk))) + (list + (persist/project-fold db2 "orders" rec-count 0) + (persist/event-seq + (persist/append db2 "orders" "placed" 2 {:id "c"})))))) + (list 2 3)) +(persist-test + "subscription-driven kv read model survives restart" + (let + ((disk (persist/mem-backend))) + (begin + (let + ((h (persist/hub (persist/mock-durable disk)))) + (begin + (persist/subscribe + h + "orders" + (fn + (bk s e) + (persist/kv-update + bk + "order-count" + 0 + (fn (n) (+ n 1))))) + (persist/publish h "orders" "placed" 0 {}) + (persist/publish h "orders" "placed" 1 {}))) + (let + ((db2 (persist/mock-durable disk))) + (persist/kv-get db2 "order-count")))) + 2) +(persist-test + "snapshot taken before crash drives replay after restart" + (let + ((disk (persist/mem-backend))) + (begin + (let + ((db (persist/mock-durable disk))) + (begin + (persist/append db "orders" "placed" 0 {}) + (persist/append db "orders" "placed" 1 {}) + (persist/checkpoint db "orders" "count" rec-count 0) + (persist/append db "orders" "placed" 2 {}))) + (let + ((db2 (persist/mock-durable disk))) + (equal? + (persist/project-value + (persist/replay db2 "orders" "count" rec-count 0)) + (persist/project-fold db2 "orders" rec-count 0))))) + true) +(persist-test + "compacted log still replays correctly after restart" + (let + ((disk (persist/mem-backend))) + (begin + (let + ((db (persist/mock-durable disk))) + (begin + (persist/append db "orders" "placed" 0 {}) + (persist/append db "orders" "placed" 1 {}) + (persist/append db "orders" "placed" 2 {}) + (persist/compact db "orders" "count" rec-count 0) + (persist/append db "orders" "placed" 3 {}))) + (let + ((db2 (persist/mock-durable disk))) + (persist/project-value + (persist/replay db2 "orders" "count" rec-count 0))))) + 4) +(persist-test + "invoice blob ref survives restart, bytes fetched from content store" + (let + ((disk (persist/mem-backend)) (store (persist/mem-backend))) + (begin + (let + ((db (persist/mock-durable disk)) (blob (persist/mock-blob store))) + (persist/kv-put + db + "invoice" + (persist/blob-store blob "INVOICEPDF" "application/pdf"))) + (let + ((db2 (persist/mock-durable disk)) + (blob2 (persist/mock-blob store))) + (persist/blob-fetch blob2 (persist/kv-get db2 "invoice"))))) + "INVOICEPDF") +(persist-test + "two independent restarts converge to the same state (determinism)" + (let + ((disk (persist/mem-backend))) + (begin + (let + ((db (persist/mock-durable disk))) + (begin + (persist/append db "orders" "placed" 0 {}) + (persist/append db "orders" "placed" 1 {}) + (persist/append db "orders" "placed" 2 {}))) + (equal? + (persist/project-fold + (persist/mock-durable disk) + "orders" + rec-count + 0) + (persist/project-fold + (persist/mock-durable disk) + "orders" + rec-count + 0)))) + true) diff --git a/plans/persist-on-sx.md b/plans/persist-on-sx.md index 6f9915e0..ebbf1e01 100644 --- a/plans/persist-on-sx.md +++ b/plans/persist-on-sx.md @@ -42,7 +42,7 @@ read models (feeds, indices, audit logs) update incrementally. ## Status (rolling) -`bash lib/persist/conformance.sh` → **105/105** (Phases 1–3 done, Phase 4 in progress) +`bash lib/persist/conformance.sh` → **111/111** (Phases 1–4 complete) ## Ground rules @@ -105,14 +105,58 @@ lib/persist/backend.sx lib/persist/api.sx ## Phase 4 — Durable backends via kernel IO - [x] file/log backend driven through `perform` (IO-suspension boundary) - [x] blob backend interface (store ref/CID; bytes live in artdag/IPFS) -- [ ] crash/restart replay test (mock IO platform) -- [ ] migration notes for swapping mem → durable under a live subsystem +- [x] crash/restart replay test (mock IO platform) +- [x] migration notes for swapping mem → durable under a live subsystem + +### Migration notes — mem → durable under a live subsystem + +The facet API takes the backend as its first argument and never names a concrete +backend, so swapping storage is a one-line change at the open site: + +``` +(persist/open) ; in-memory (test / ephemeral) +(persist/mock-durable (persist/mem-backend)); durable protocol, in-process disk +(persist/durable-backend) ; production: ops cross perform → host +``` + +Everything above the backend — `append`/`read`/`project`/`subscribe`/`snapshot` +/`compact` — is byte-identical across all three. A subsystem migrates by: + +1. **Pick the seam.** The subsystem holds one backend value (today an in-memory + list). Replace its construction with `persist/open`/`durable-backend`; leave + every call site untouched. +2. **Backfill.** For an existing in-memory store, replay its current state into + the durable backend once (append historical events / `kv-put` current + values) before cutting reads over. New writes go to durable from then on. +3. **Read models rebuild themselves.** A projection is pure `(fold step seed)`; + after cutover, `persist/replay` (snapshot + tail) reconstructs every read + model from the durable log — no bespoke migration of derived state. +4. **Blobs first, by reference.** Move large payloads into the content store and + store only `persist/blob-ref`s; the log/kv stay small, so the backfill in (2) + never copies bytes. +5. **Concurrency is already handled.** Two writers racing a stream get a + `persist/conflict?` result, not corruption — the same on mem or durable, so + no new code is needed at cutover. + +The only behavioural difference durable introduces is that each op crosses the +kernel IO-suspension boundary (`perform`): under the real kernel the call +suspends and the host resumes it transparently, so the facet code is unaware. +Tests prove this by routing the identical request shapes through `persist/serve` +over an in-process disk (the mock-IO harness). ## Consumers (post-foundation, not in scope here) feed/-log, flow store, mod/audit, search index, acl grants, identity sessions all become `persist` log or kv. Track each migration in that subsystem's plan. ## Progress log +- **Phase 4c+4d (111/111) — Phase 4 complete, roadmap done.** `recovery.sx` — a + 6-test crash/restart integration: an order ledger (event log + subscription + kv read model + snapshot + compaction + invoice blob ref) over the durable + backend, where "crash" drops every in-process object and "restart" rebuilds + over the same disk + content store. Log, read model, snapshot, compacted + replay, and blob ref all survive; seq continues; two restarts converge + (determinism). Migration notes (mem → durable under a live subsystem) added + inline above. - **Phase 4b (105/105).** `blob.sx` — large objects stay out of persist. A blob ref is `{:cid :size :mime}`; the blob store is a SEPARATE injected dependency (`persist/blob-io` over an injectable transport, perform in prod / mock