persist: crash/restart recovery integration + migration notes — Phase 4 complete
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 37s

recovery.sx: 6-test end-to-end crash/restart of an order ledger (log +
subscription kv read model + snapshot + compaction + invoice blob ref) on the
durable backend; everything survives a restart over the same disk + content
store, seq continues, two restarts converge. Migration notes (mem → durable
under a live subsystem) added to the plan. Roadmap done, 111/111.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-06 19:14:01 +00:00
parent 1c7b602978
commit 4be6988963
5 changed files with 180 additions and 8 deletions

View File

@@ -13,7 +13,7 @@ if [ ! -x "$SX_SERVER" ]; then
exit 1
fi
SUITES=(event log kv project subscribe concurrency snapshot compaction durable blob)
SUITES=(event log kv project subscribe concurrency snapshot compaction durable blob recovery)
OUT_JSON="lib/persist/scoreboard.json"
OUT_MD="lib/persist/scoreboard.md"

View File

@@ -9,9 +9,10 @@
"snapshot": {"pass": 11, "fail": 0},
"compaction": {"pass": 11, "fail": 0},
"durable": {"pass": 15, "fail": 0},
"blob": {"pass": 14, "fail": 0}
"blob": {"pass": 14, "fail": 0},
"recovery": {"pass": 6, "fail": 0}
},
"total_pass": 105,
"total_pass": 111,
"total_fail": 0,
"total": 105
"total": 111
}

View File

@@ -14,4 +14,5 @@ _Generated by `lib/persist/conformance.sh`_
| compaction | 11 | 0 | 11 |
| durable | 15 | 0 | 15 |
| blob | 14 | 0 | 14 |
| **Total** | **105** | **0** | **105** |
| recovery | 6 | 0 | 6 |
| **Total** | **111** | **0** | **111** |

View File

@@ -0,0 +1,126 @@
; Phase 4 — crash/restart integration. A whole subsystem (an order ledger:
; event log + a kv read model kept by a subscription + a periodic snapshot + an
; invoice blob ref) on the durable backend must survive a restart. "Crash" =
; drop every in-process object (backend, hub, projections); "restart" = rebuild
; them over the SAME disk + blob store. Nothing but the disk and content store
; carries across, exactly as a real process restart.
(define rec-count (fn (acc e) (+ acc 1)))
(persist-test
"log survives restart and seq continues"
(let
((disk (persist/mem-backend)))
(begin
(let
((db (persist/mock-durable disk)))
(begin
(persist/append db "orders" "placed" 0 {:id "a"})
(persist/append db "orders" "placed" 1 {:id "b"})))
(let
((db2 (persist/mock-durable disk)))
(list
(persist/project-fold db2 "orders" rec-count 0)
(persist/event-seq
(persist/append db2 "orders" "placed" 2 {:id "c"}))))))
(list 2 3))
(persist-test
"subscription-driven kv read model survives restart"
(let
((disk (persist/mem-backend)))
(begin
(let
((h (persist/hub (persist/mock-durable disk))))
(begin
(persist/subscribe
h
"orders"
(fn
(bk s e)
(persist/kv-update
bk
"order-count"
0
(fn (n) (+ n 1)))))
(persist/publish h "orders" "placed" 0 {})
(persist/publish h "orders" "placed" 1 {})))
(let
((db2 (persist/mock-durable disk)))
(persist/kv-get db2 "order-count"))))
2)
(persist-test
"snapshot taken before crash drives replay after restart"
(let
((disk (persist/mem-backend)))
(begin
(let
((db (persist/mock-durable disk)))
(begin
(persist/append db "orders" "placed" 0 {})
(persist/append db "orders" "placed" 1 {})
(persist/checkpoint db "orders" "count" rec-count 0)
(persist/append db "orders" "placed" 2 {})))
(let
((db2 (persist/mock-durable disk)))
(equal?
(persist/project-value
(persist/replay db2 "orders" "count" rec-count 0))
(persist/project-fold db2 "orders" rec-count 0)))))
true)
(persist-test
"compacted log still replays correctly after restart"
(let
((disk (persist/mem-backend)))
(begin
(let
((db (persist/mock-durable disk)))
(begin
(persist/append db "orders" "placed" 0 {})
(persist/append db "orders" "placed" 1 {})
(persist/append db "orders" "placed" 2 {})
(persist/compact db "orders" "count" rec-count 0)
(persist/append db "orders" "placed" 3 {})))
(let
((db2 (persist/mock-durable disk)))
(persist/project-value
(persist/replay db2 "orders" "count" rec-count 0)))))
4)
(persist-test
"invoice blob ref survives restart, bytes fetched from content store"
(let
((disk (persist/mem-backend)) (store (persist/mem-backend)))
(begin
(let
((db (persist/mock-durable disk)) (blob (persist/mock-blob store)))
(persist/kv-put
db
"invoice"
(persist/blob-store blob "INVOICEPDF" "application/pdf")))
(let
((db2 (persist/mock-durable disk))
(blob2 (persist/mock-blob store)))
(persist/blob-fetch blob2 (persist/kv-get db2 "invoice")))))
"INVOICEPDF")
(persist-test
"two independent restarts converge to the same state (determinism)"
(let
((disk (persist/mem-backend)))
(begin
(let
((db (persist/mock-durable disk)))
(begin
(persist/append db "orders" "placed" 0 {})
(persist/append db "orders" "placed" 1 {})
(persist/append db "orders" "placed" 2 {})))
(equal?
(persist/project-fold
(persist/mock-durable disk)
"orders"
rec-count
0)
(persist/project-fold
(persist/mock-durable disk)
"orders"
rec-count
0))))
true)

View File

@@ -42,7 +42,7 @@ read models (feeds, indices, audit logs) update incrementally.
## Status (rolling)
`bash lib/persist/conformance.sh`**105/105** (Phases 13 done, Phase 4 in progress)
`bash lib/persist/conformance.sh`**111/111** (Phases 14 complete)
## Ground rules
@@ -105,14 +105,58 @@ lib/persist/backend.sx lib/persist/api.sx
## Phase 4 — Durable backends via kernel IO
- [x] file/log backend driven through `perform` (IO-suspension boundary)
- [x] blob backend interface (store ref/CID; bytes live in artdag/IPFS)
- [ ] crash/restart replay test (mock IO platform)
- [ ] migration notes for swapping mem → durable under a live subsystem
- [x] crash/restart replay test (mock IO platform)
- [x] migration notes for swapping mem → durable under a live subsystem
### Migration notes — mem → durable under a live subsystem
The facet API takes the backend as its first argument and never names a concrete
backend, so swapping storage is a one-line change at the open site:
```
(persist/open) ; in-memory (test / ephemeral)
(persist/mock-durable (persist/mem-backend)); durable protocol, in-process disk
(persist/durable-backend) ; production: ops cross perform → host
```
Everything above the backend — `append`/`read`/`project`/`subscribe`/`snapshot`
/`compact` — is byte-identical across all three. A subsystem migrates by:
1. **Pick the seam.** The subsystem holds one backend value (today an in-memory
list). Replace its construction with `persist/open`/`durable-backend`; leave
every call site untouched.
2. **Backfill.** For an existing in-memory store, replay its current state into
the durable backend once (append historical events / `kv-put` current
values) before cutting reads over. New writes go to durable from then on.
3. **Read models rebuild themselves.** A projection is pure `(fold step seed)`;
after cutover, `persist/replay` (snapshot + tail) reconstructs every read
model from the durable log — no bespoke migration of derived state.
4. **Blobs first, by reference.** Move large payloads into the content store and
store only `persist/blob-ref`s; the log/kv stay small, so the backfill in (2)
never copies bytes.
5. **Concurrency is already handled.** Two writers racing a stream get a
`persist/conflict?` result, not corruption — the same on mem or durable, so
no new code is needed at cutover.
The only behavioural difference durable introduces is that each op crosses the
kernel IO-suspension boundary (`perform`): under the real kernel the call
suspends and the host resumes it transparently, so the facet code is unaware.
Tests prove this by routing the identical request shapes through `persist/serve`
over an in-process disk (the mock-IO harness).
## Consumers (post-foundation, not in scope here)
feed/-log, flow store, mod/audit, search index, acl grants, identity sessions all
become `persist` log or kv. Track each migration in that subsystem's plan.
## Progress log
- **Phase 4c+4d (111/111) — Phase 4 complete, roadmap done.** `recovery.sx` — a
6-test crash/restart integration: an order ledger (event log + subscription
kv read model + snapshot + compaction + invoice blob ref) over the durable
backend, where "crash" drops every in-process object and "restart" rebuilds
over the same disk + content store. Log, read model, snapshot, compacted
replay, and blob ref all survive; seq continues; two restarts converge
(determinism). Migration notes (mem → durable under a live subsystem) added
inline above.
- **Phase 4b (105/105).** `blob.sx` — large objects stay out of persist. A blob
ref is `{:cid :size :mime}`; the blob store is a SEPARATE injected dependency
(`persist/blob-io` over an injectable transport, perform in prod / mock