Files
rose-ash/plans/persist-on-sx.md
giles 200b93c1f6
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 36s
persist: Blocker spec for the host durable-storage adapter
Document the one gap to real durability: a hosts/ servicer for the persist/*
IO ops. Includes the silent-data-loss repro (durable-backend currently no-ops
under sx_server's default resolver), the full op contract table, hard
invariants (monotonic last-seq, etc.), the blob adapter shape, where to
register in sx_server.ml, and an acceptance test (swap transport, run durable +
recovery suites against real storage, survive a real restart).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 20:52:44 +00:00

412 lines
24 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# persist-on-sx: Durable state on the SX kernel
> **DRAFT outline.** Foundation subsystem — the durable substrate the other five
> currently fake with in-memory mutable lists. Build this first.
>
> **"persist" = persistence / data store, NOT the shop.** The shop/commerce vertical
> is `commerce-on-sx`.
rose-ash needs durable state: every subsystem (feed log, flow store, mod audit,
search index, acl grants, sessions) today hand-rolls an in-memory structure that
vanishes on restart. `persist-on-sx` is the one durable substrate they share. It
lives directly on the SX kernel's IO-suspension primitives (`perform`/`cek-resume`
— the third CEK phase) so a read/write `perform`s and the kernel persists at the
boundary. Concrete storage backends are injected.
## Does it cover ALL persistence? No — and on purpose.
Event-sourcing-everything is a known trap (replay cost, event schema evolution,
awkward ad-hoc queries, 5MB images in a log). So persist owns the **durable
source-of-truth substrate**, exposed as **two facets over one backend protocol**,
with two things explicitly delegated out:
| Shape | Owner | Notes |
|-------|-------|-------|
| **Event streams** (append-only, history matters) | persist — **log facet** | feed activities, mod audit, order ledger, flow state, content edits |
| **Current-state values** (KV / document, no history) | persist — **kv facet** | profiles, stock counts, config, session blobs; also where projections materialize |
| **Snapshots / read models** (derived, queryable) | persist — projections → kv/log | rebuildable from the log; persisted so you don't replay to answer a query |
| **Blobs / large objects** (images, media) | **delegated** → content-addressed store (artdag/IPFS already) | persist stores the *reference/CID*, never the bytes |
| **Cache** (ephemeral, evictable) | **out of scope** | not persistence — different lifecycle (Redis-shaped) |
| **Ad-hoc relational query** | the subsystem, over a projected read model | the log is bad at "all orders by X in March"; project into a queryable kv/SQL backend |
So: persist is the **single durable substrate** for state that's either a stream of
changes or a current value — but it does **not** force everything into an event
log, it does **not** hold blobs (only their content-addressed refs), and it does
**not** do caching. Those boundaries are the whole point of calling it a substrate
rather than "the database."
End-state: `log` (append/read streams) + `kv` (get/put/delete by key) facets, an
injectable backend protocol (mem → file → Postgres → IPFS-ref), pure projections
with incremental snapshots, optimistic concurrency, and a subscription hook so
read models (feeds, indices, audit logs) update incrementally.
## Status (rolling)
`bash lib/persist/conformance.sh`**201/201** (Phases 14 complete + extensions + a reference migration)
## Ground rules
- **Scope:** only `lib/persist/**` and `plans/persist-on-sx.md`. May **import** the
kernel's IO-suspension surface (`perform`, platform IO ops) — verify what's
exported first. Do not add host primitives; a missing durable IO op is a Blockers
entry (it belongs in `hosts/`, out of scope).
- **Architecture:** an event is `{:stream :seq :type :at :data}`; the log is an
ordered append-only vector; a projection is `(fold step seed events)`; a kv value
is `(get/put/delete key)`. Both facets sit on one injected backend
`{:append :read :kv-get :kv-put :snapshot-read :snapshot-write}`. The in-memory
backend is the test default; real backends wire in unchanged.
- **Determinism:** replay is pure — same log → same state, always. No clocks or
randomness inside projections; time lives on the event.
- **Blobs:** store the content-address/CID and metadata; never the bytes. The blob
backend is a separate injected dependency.
- **Commits:** one feature per commit. Progress log + tick boxes.
## Architecture sketch
```
Command / write Read model / value
(append stream type data) (project stream step seed)
(kv-put key value) (kv-get key)
│ ▲
▼ │
lib/persist/event.sx lib/persist/project.sx
— {:stream :seq :type :at :data} — fold step seed; incremental from snapshot
│ ▲
▼ │
lib/persist/log.sx lib/persist/kv.sx lib/persist/snapshot.sx
— append/read — get/put/delete — checkpoint; replay = snapshot + tail
— optimistic seq — current-state
│ │ ▲
└──────────────────┴── (perform → backend) ───┘
lib/persist/backend.sx lib/persist/api.sx
— injected protocol — (persist/append) (persist/project)
— mem | file | pg | ipfs-ref — (persist/kv-get/put) (persist/subscribe)
└── blobs → content-addressed store (artdag/IPFS), by reference only
```
## Phase 1 — Log + kv + in-memory backend
- [x] `event.sx` — event record, stream/seq helpers
- [x] `backend.sx` — injectable protocol + in-memory impl (log + kv)
- [x] `log.sx``append` (optimistic seq), `read`, `read-from`
- [x] `kv.sx``get`/`put`/`delete` current-state
- [x] `api.sx` + tests + scoreboard + conformance.sh
## Phase 2 — Projections + subscriptions
- [x] `project.sx``(project stream step seed)`, incremental fold
- [x] subscription hook — projection / kv read model re-runs on append
- [x] concurrency conflict surfaced as a real result, not a crash
## Phase 3 — Snapshots + replay
- [x] `snapshot.sx` — checkpoint a projection; replay = snapshot + tail
- [x] compaction policy; replay-determinism tests
## Phase 4 — Durable backends via kernel IO
- [x] file/log backend driven through `perform` (IO-suspension boundary)
- [x] blob backend interface (store ref/CID; bytes live in artdag/IPFS)
- [x] crash/restart replay test (mock IO platform)
- [x] migration notes for swapping mem → durable under a live subsystem
### Migration notes — mem → durable under a live subsystem
The facet API takes the backend as its first argument and never names a concrete
backend, so swapping storage is a one-line change at the open site:
```
(persist/open) ; in-memory (test / ephemeral)
(persist/mock-durable (persist/mem-backend)); durable protocol, in-process disk
(persist/durable-backend) ; production: ops cross perform → host
```
Everything above the backend — `append`/`read`/`project`/`subscribe`/`snapshot`
/`compact` — is byte-identical across all three. A subsystem migrates by:
1. **Pick the seam.** The subsystem holds one backend value (today an in-memory
list). Replace its construction with `persist/open`/`durable-backend`; leave
every call site untouched.
2. **Backfill.** For an existing in-memory store, replay its current state into
the durable backend once (append historical events / `kv-put` current
values) before cutting reads over. New writes go to durable from then on.
3. **Read models rebuild themselves.** A projection is pure `(fold step seed)`;
after cutover, `persist/replay` (snapshot + tail) reconstructs every read
model from the durable log — no bespoke migration of derived state.
4. **Blobs first, by reference.** Move large payloads into the content store and
store only `persist/blob-ref`s; the log/kv stay small, so the backfill in (2)
never copies bytes.
5. **Concurrency is already handled.** Two writers racing a stream get a
`persist/conflict?` result, not corruption — the same on mem or durable, so
no new code is needed at cutover.
The only behavioural difference durable introduces is that each op crosses the
kernel IO-suspension boundary (`perform`): under the real kernel the call
suspends and the host resumes it transparently, so the facet code is unaware.
Tests prove this by routing the identical request shapes through `persist/serve`
over an in-process disk (the mock-IO harness).
## Extensions (post-roadmap)
- [x] `view.sx` — materialized views: bundle stream + fold + snapshot name;
`view-attach` keeps the snapshot current on every publish so `view-peek` is an
O(1) read. The consumer-facing read-model abstraction (feed indices, audit
rollups, search counters).
- [x] `kv.sx` CAS — `persist/kv-cas` (compare-and-swap) + `persist/kv-put-new`
(create-only): atomic current-state updates, conflict as a real value (kv
analogue of log `append-expect`). For sessions, acl grants, stock counts.
- [x] `catalog.sx` — stream catalog: `persist/streams`/`stream-count`/
`stream-exists?`/`total-events`. Backend `:streams` op (from seq high-water
marks, so compacted streams still list), threaded through mem + durable.
- [x] `query.sx` — read-side scans: `read-between` (seq range), `read-since`/
`read-window` (by `:at`), `read-by-type`, `read-where`, `count-where`. Pure
reads for audit windows / type filters / since-cursors.
- [x] `batch.sx``persist/append-batch` commits a list of `(type at data)`
specs as one contiguous block; `persist/append-batch-expect` is transactional
(all-or-nothing guarded by optimistic concurrency). For an order + its line
items as one commit.
- [x] `upcast.sx` — event schema evolution: register a pure `(event -> event)`
upcaster per type; `read-upcast`/`project-upcast` lift old events to the
current shape on read so projections see one shape. Immutable registry;
`upcast-data` helper merges new `:data` fields. Addresses the schema-evolution
trap without rewriting history.
- [x] `idempotency.sx` — exactly-once append under retries: `persist/append-once`
keyed by a caller idempotency key (per stream), returning the same event on a
repeat. Marker lives in kv, so idempotency holds across restart. `seen?` check.
- [x] `global.sx` — global commit ordering across streams (the primitive feed's
unified timeline needs). `persist/gappend` records a pointer in a reserved
`$global` index whose seq is the commit position; `read-global`/
`project-global` replay every event in commit order; `global-from` for
incremental consumers. Opt-in (plain `append` never touches it); reserved
index hidden from the public catalog. Deterministic across restart.
## Consumers (post-foundation, not in scope here)
feed/-log, flow store, mod/audit, search index, acl grants, identity sessions all
become `persist` log or kv. Track each migration in that subsystem's plan.
**Reference migration:** `lib/persist/examples/acl.sx` is a worked, tested
template — an ACL-grants store rebuilt on persist (grants/revokes as events,
current set as a projection, O(1) checks via a materialized view, an audit-window
query). It carries an explicit BEFORE (hand-rolled ephemeral map) → AFTER
diff in its header and proves the headline win (grants survive restart) on the
durable backend. Other subsystem loops copy this pattern; it does not touch the
real `lib/acl`.
## Progress log
- **Reference migration: acl grants (201/201).** `lib/persist/examples/acl.sx`
a worked, in-scope template migrating an ACL-grants store from a hand-rolled
ephemeral map to persist: grants/revokes as events, current set as a
projection, O(1) checks via a materialized view, audit via `read-window`.
Header carries the BEFORE→AFTER diff. 10 tests, incl. grants surviving restart
on the durable backend (the capability the BEFORE version lacked). The pattern
other subsystem loops copy.
- **Ext: global commit ordering (191/191).** `global.sx``persist/gappend`
records a pointer in a reserved `$global` index (its seq = global commit
position); `read-global`/`project-global` resolve pointers to events in commit
order; `global-from` for incremental global consumers. Opt-in; `$`-streams are
now reserved + hidden from the public catalog (`streams-all` reveals them).
Gives feed its cross-stream timeline. 11 tests incl. durable + restart
determinism.
- **Ext: exactly-once append (180/180).** `idempotency.sx`
`persist/append-once` appends at most once per (stream, idempotency key),
returning the same event on a repeat; the marker lives in kv so it survives
restart (verified on durable). `persist/seen?` check. 9 tests.
- **Ext: event schema evolution (171/171).** `upcast.sx` — per-type pure
`(event -> event)` upcasters in an immutable registry; `read-upcast`/
`project-upcast` lift legacy events to the current shape on read so
projections never branch on version. `upcast-data` merges new `:data` fields
keeping stream/seq/type/at. 9 tests incl. mixed old/new + durable.
- **Ext: atomic batch append (162/162).** `batch.sx``persist/append-batch`
commits `(type at data)` specs as one contiguous block (real cons-list, in
order); `persist/append-batch-expect` checks the stream is still at expected
before writing any event, so the batch is all-or-nothing under a concurrent
writer. 10 tests incl. conflict-writes-nothing + durable.
- **Ext: read-side query helpers (152/152).** `query.sx``read-between` (seq
range), `read-since`/`read-window` (by `:at`), `read-by-type`, `read-where`,
`count-where`. Pure scans over `persist/read`; for ad-hoc relational queries
consumers still project into a kv read model. 9 tests incl. durable.
- **Ext: stream catalog (143/143).** New backend op `:streams` (keys of the seq
high-water-mark dict, threaded through mem-backend + durable serve/io-backend)
so fully-compacted streams still enumerate. `catalog.sx`:
`persist/streams`/`stream-count`/`stream-exists?`/`total-events`. 10 tests
incl. durable + restart.
- **Ext: kv compare-and-swap (133/133).** `persist/kv-cas` sets a key only if
its current value equals expected, else returns `{:conflict :expected
:actual}`; `persist/kv-put-new` is create-only. The kv analogue of log
`append-expect` — atomic current-state for sessions/acl/stock. 11 tests incl.
racer + retry + durable backend.
- **Ext: materialized views (122/122).** `view.sx``persist/view` bundles
stream + step + seed + snapshot name; `view-attach` subscribes it to a hub so
every publish refreshes the snapshot incrementally; `view-peek` is then an
O(1) current read (no fold), `view-value` always folds the tail so it's never
stale. 11 tests incl. on durable backend + a sum-over-data view.
- **Phase 4c+4d (111/111) — Phase 4 complete, roadmap done.** `recovery.sx` — a
6-test crash/restart integration: an order ledger (event log + subscription
kv read model + snapshot + compaction + invoice blob ref) over the durable
backend, where "crash" drops every in-process object and "restart" rebuilds
over the same disk + content store. Log, read model, snapshot, compacted
replay, and blob ref all survive; seq continues; two restarts converge
(determinism). Migration notes (mem → durable under a live subsystem) added
inline above.
- **Phase 4b (105/105).** `blob.sx` — large objects stay out of persist. A blob
ref is `{:cid :size :mime}`; the blob store is a SEPARATE injected dependency
(`persist/blob-io` over an injectable transport, perform in prod / mock
content store in tests). `persist/blob-store` puts bytes and returns ONLY the
ref; `persist/blob-fetch` retrieves bytes via the ref. Mock store is
content-addressed (same bytes dedupe). 14 tests assert the invariant: a ref in
the log/kv carries the CID, never the bytes (`has-key? :bytes` is false).
- **Phase 4a (91/91).** `durable.sx` — a backend whose every op crosses the
kernel IO boundary via `(perform {:op "persist/..." :args (...)})`. The
transport is injectable: `persist/durable-backend` uses the kernel's
`perform` (suspends; host resumes); `persist/mock-durable` uses
`persist/serve` over an in-memory disk. `persist/serve` is the reference host
+ the mock-IO harness. Because the request shapes are identical, the ENTIRE
facet stack (log/kv/project/snapshot/compaction) runs unchanged on
mock-durable — verified. Crash/restart (drop backend, keep disk) recovers log
+ kv + snapshot by replay; seq counter continues. 15 tests. See Blockers for
why end-to-end perform suspension isn't exercised under sx_server.exe.
- **Phase 3b (76/76) — Phase 3 complete.** Backend refactor: `last-seq` is now
a monotonic per-stream high-water mark (backend `seqs` dict), not physical
length, so a compacted log keeps assigning climbing seqs. Added backend
`:truncate-through` + `persist/truncate`. `compaction.sx``persist/compact`
checkpoints then drops events with seq <= snapshot seq; `should-compact?`/
`maybe-compact` give an explicit "compact every N tail events" policy. 11
tests: post-compaction replay value == uncompacted full replay (determinism),
seq continuity after truncation, idempotence. `persist/count` = physical
stored count (shrinks on compaction) vs `persist/last-seq` = logical.
- **Phase 3a (65/65).** `snapshot.sx` — a snapshot is a projection state
`{:value :seq}` stored in the kv facet under `snapshot/<name>`.
`persist/checkpoint` replays + saves; `persist/replay` = snapshot + tail.
11 tests assert the headline both ways: snapshot+tail == full replay (value
and whole state), plus replay determinism.
- **Phase 2c (54/54) — Phase 2 complete.** `concurrency.sx` — optimistic
concurrency: `persist/append-expect b stream expected ...` refuses the append
if the stream advanced past `expected`, returning a conflict VALUE
`{:conflict true :expected :actual}` (never a crash, never a silent
overwrite). `persist/conflict?` + accessors; caller re-reads actual and
retries. 8 tests incl. two-writer race + retry.
- **Phase 2b (46/46).** `subscribe.sx``persist/hub` wraps a backend with
per-stream callbacks. `persist/publish` appends then fires subscribers
`(backend stream event)`; direct `persist/append` bypasses them by design
(bulk load/replay). Canonical use: callback re-runs `project-resume` or bumps
a kv counter so read models update on write. 9 tests.
- **Phase 2a (37/37).** `project.sx` — projection state `{:value :seq}`;
`persist/project` folds whole stream from seed, `persist/project-resume`
folds only the tail (seq > prior seq) so read models update incrementally.
step is pure `(value event) -> value`. 9 tests incl. resume==full-from-zero.
- **Phase 1 complete (28/28).** `event.sx` (event record + accessors),
`backend.sx` (injectable protocol + in-memory log/kv impl, closure state via
set!), `log.sx` (append/read/read-from, sequential per-stream seq, stream
isolation), `kv.sx` (get/put/delete/has?/keys/get-or/update), `api.sx`
(`persist/open` — mem default, backend injectable). conformance.sh + three
suites (event/log/kv). Gotcha logged in Blockers: `map` returns an
array-backed list not `equal?` to a `(list ...)` literal — assertions build
compared lists with list/nth.
## Blockers
### OPEN — host durable-storage adapter (the only gap to real durability)
**Owner:** a `hosts/` loop (NOT this one — `lib/persist/**` is the scope fence,
and `sx_build` is forbidden here). **Without it, durable persistence silently
drops all writes.**
**Symptom / minimal repro.** `persist/durable-backend` performs
`{:op "persist/..." :args (...)}` for every storage op. Under `sx_server.exe`
the kernel's default IO resolver answers unknown ops with `nil` — so the durable
backend does not error, it *silently no-ops*:
```
; load event/backend/log/durable, then:
(let ((b (persist/durable-backend)))
(begin (persist/append b "s" "x" 0 {})
(persist/append b "s" "x" 0 {})
(list (persist/event-seq (persist/append b "s" "x" 0 {}))
(persist/count b "s")
(persist/read b "s"))))
; => (1 0 nil) ; every append gets seq 1, nothing stored, reads empty — DATA LOSS
```
The in-memory backend (`persist/open`) is correct and complete; this gap is
*only* the production transport.
**What to build.** A host servicer that answers the `persist/*` IO ops against a
real store (sqlite/files/pg). It is the production twin of `persist/serve`
(`lib/persist/durable.sx`) — same op names, same request/response shapes — so
mirror that function and back it with durable storage instead of a mem-backend.
**Op contract** (request `{:op :args}` → response). `args` is a positional list;
events are dicts `{:stream :seq :type :at :data}`:
| op | args | returns | semantics |
|----|------|---------|-----------|
| `persist/append` | `(stream event)` | (ignored) | store `event` in `stream` |
| `persist/read` | `(stream)` | event list (oldest-first) | currently-stored events |
| `persist/last-seq` | `(stream)` | number | **monotonic high-water mark** (see below) |
| `persist/streams` | `()` | stream-name list | every stream ever appended to |
| `persist/truncate` | `(stream n)` | (ignored) | drop events with `seq <= n` |
| `persist/kv-get` | `(key)` | value or nil | |
| `persist/kv-put` | `(key val)` | (ignored) | upsert |
| `persist/kv-delete`| `(key)` | (ignored) | remove key |
| `persist/kv-has?` | `(key)` | boolean | |
| `persist/kv-keys` | `()` | key list | |
**Hard invariants** (the facets above rely on these; mem-backend + `persist/serve`
are the reference):
1. **`last-seq` is a per-stream monotonic counter, NOT the row count.** It must
keep climbing after `truncate`, so a compacted stream never reassigns a seq.
Store the counter separately from the rows.
2. `append` is the only seq-assigner upstream (`log.sx` does `last-seq + 1`); the
host must not renumber.
3. `read` returns events in append order with `:seq` intact (post-truncate it
returns only the surviving tail).
4. `streams` is the set of streams that ever had an append (survives full
compaction) — keep it keyed off the seq counters, like mem-backend's `seqs`.
5. Values round-trip structurally: dicts/lists/numbers/strings/nil/booleans in =
same out (event `:data`, kv values, blob refs).
**Blobs** are a *separate* adapter with the same pattern: ops `blob/put`
`(bytes mime)` → cid, `blob/get` `(cid)` → bytes, `blob/has?` `(cid)` → bool
(see `lib/persist/blob.sx` / `persist/blob-serve`). Back it with the
content-addressed store (artdag/IPFS); persist only ever stores the returned ref.
**Where to register.** `hosts/ocaml/bin/sx_server.ml`:
- the in-process resolver `Sx_types._cek_io_resolver` (~line 3864) — add a
`"persist/..."` match arm dispatching to the new storage module (used by
SSR/`eval_with_io`); and/or
- the bridge path in `cek_run_with_io` (~line 528576), which currently forwards
unknown ops via `io_request op args` to the external bridge — a Python-bridge
handler is the alternative home if storage lives Python-side.
Pick one home; the op names are the contract, not the location.
**Acceptance test.** Swap the transport: point a `persist/io-backend` at the new
host servicer (instead of `persist/serve` over a mem disk) and run the existing
`durable` + `recovery` suites — they must stay green, and state must survive an
actual process restart (kill the server, restart, replay → recovered). That is
exactly what `lib/persist/tests/durable.sx` and `recovery.sx` already assert
against the mock; the host adapter just makes the disk real.
---
- **Phase 4 perform-suspension not exercised end-to-end under sx_server.exe (by
design, not a bug).** The CEK suspension primitives (`cek-step-loop`,
`cek-resume`, `cek-suspended?`, `cek-io-request`) and a settable SX-level IO
hook are only bound by the `run_tests` OCaml binary (out of scope: hosts/, and
sx_build is forbidden). Under `sx_server.exe`, an unhandled `perform` resolves
through the OCaml io-request/io-response stdin bridge (production path) — not
callable from the pure-eval conformance harness. Resolution: the durable
backend's transport is injectable, so the production path is one line
`(perform req)` (kernel-handled) and ALL durable logic is tested through the
mock transport (`persist/serve` over an in-memory disk). The single untested
line is the kernel primitive itself. No host primitive needed; nothing to fix.
- **Not a blocker, a testing convention:** `map` returns an array-backed list
that is NOT `equal?` to a `(list ...)` cons-literal (two `map` results do
compare equal to each other). When asserting list-shaped results against a
`(list ...)` literal, build the compared value with `list`/`nth`/`cons`, not
`map`. `into`/list-coercion needs the IO bridge and is unusable in the
pure-eval harness.