# persist-on-sx: Durable state on the SX kernel > **DRAFT outline.** Foundation subsystem — the durable substrate the other five > currently fake with in-memory mutable lists. Build this first. > > **"persist" = persistence / data store, NOT the shop.** The shop/commerce vertical > is `commerce-on-sx`. rose-ash needs durable state: every subsystem (feed log, flow store, mod audit, search index, acl grants, sessions) today hand-rolls an in-memory structure that vanishes on restart. `persist-on-sx` is the one durable substrate they share. It lives directly on the SX kernel's IO-suspension primitives (`perform`/`cek-resume` — the third CEK phase) so a read/write `perform`s and the kernel persists at the boundary. Concrete storage backends are injected. ## Does it cover ALL persistence? No — and on purpose. Event-sourcing-everything is a known trap (replay cost, event schema evolution, awkward ad-hoc queries, 5MB images in a log). So persist owns the **durable source-of-truth substrate**, exposed as **two facets over one backend protocol**, with two things explicitly delegated out: | Shape | Owner | Notes | |-------|-------|-------| | **Event streams** (append-only, history matters) | persist — **log facet** | feed activities, mod audit, order ledger, flow state, content edits | | **Current-state values** (KV / document, no history) | persist — **kv facet** | profiles, stock counts, config, session blobs; also where projections materialize | | **Snapshots / read models** (derived, queryable) | persist — projections → kv/log | rebuildable from the log; persisted so you don't replay to answer a query | | **Blobs / large objects** (images, media) | **delegated** → content-addressed store (artdag/IPFS already) | persist stores the *reference/CID*, never the bytes | | **Cache** (ephemeral, evictable) | **out of scope** | not persistence — different lifecycle (Redis-shaped) | | **Ad-hoc relational query** | the subsystem, over a projected read model | the log is bad at "all orders by X in March"; project into a queryable kv/SQL backend | So: persist is the **single durable substrate** for state that's either a stream of changes or a current value — but it does **not** force everything into an event log, it does **not** hold blobs (only their content-addressed refs), and it does **not** do caching. Those boundaries are the whole point of calling it a substrate rather than "the database." End-state: `log` (append/read streams) + `kv` (get/put/delete by key) facets, an injectable backend protocol (mem → file → Postgres → IPFS-ref), pure projections with incremental snapshots, optimistic concurrency, and a subscription hook so read models (feeds, indices, audit logs) update incrementally. ## Status (rolling) `bash lib/persist/conformance.sh` → **201/201** (Phases 1–4 complete + extensions + a reference migration) ## Ground rules - **Scope:** only `lib/persist/**` and `plans/persist-on-sx.md`. May **import** the kernel's IO-suspension surface (`perform`, platform IO ops) — verify what's exported first. Do not add host primitives; a missing durable IO op is a Blockers entry (it belongs in `hosts/`, out of scope). - **Architecture:** an event is `{:stream :seq :type :at :data}`; the log is an ordered append-only vector; a projection is `(fold step seed events)`; a kv value is `(get/put/delete key)`. Both facets sit on one injected backend `{:append :read :kv-get :kv-put :snapshot-read :snapshot-write}`. The in-memory backend is the test default; real backends wire in unchanged. - **Determinism:** replay is pure — same log → same state, always. No clocks or randomness inside projections; time lives on the event. - **Blobs:** store the content-address/CID and metadata; never the bytes. The blob backend is a separate injected dependency. - **Commits:** one feature per commit. Progress log + tick boxes. ## Architecture sketch ``` Command / write Read model / value (append stream type data) (project stream step seed) (kv-put key value) (kv-get key) │ ▲ ▼ │ lib/persist/event.sx lib/persist/project.sx — {:stream :seq :type :at :data} — fold step seed; incremental from snapshot │ ▲ ▼ │ lib/persist/log.sx lib/persist/kv.sx lib/persist/snapshot.sx — append/read — get/put/delete — checkpoint; replay = snapshot + tail — optimistic seq — current-state │ │ ▲ └──────────────────┴── (perform → backend) ───┘ │ lib/persist/backend.sx lib/persist/api.sx — injected protocol — (persist/append) (persist/project) — mem | file | pg | ipfs-ref — (persist/kv-get/put) (persist/subscribe) │ └── blobs → content-addressed store (artdag/IPFS), by reference only ``` ## Phase 1 — Log + kv + in-memory backend - [x] `event.sx` — event record, stream/seq helpers - [x] `backend.sx` — injectable protocol + in-memory impl (log + kv) - [x] `log.sx` — `append` (optimistic seq), `read`, `read-from` - [x] `kv.sx` — `get`/`put`/`delete` current-state - [x] `api.sx` + tests + scoreboard + conformance.sh ## Phase 2 — Projections + subscriptions - [x] `project.sx` — `(project stream step seed)`, incremental fold - [x] subscription hook — projection / kv read model re-runs on append - [x] concurrency conflict surfaced as a real result, not a crash ## Phase 3 — Snapshots + replay - [x] `snapshot.sx` — checkpoint a projection; replay = snapshot + tail - [x] compaction policy; replay-determinism tests ## Phase 4 — Durable backends via kernel IO - [x] file/log backend driven through `perform` (IO-suspension boundary) - [x] blob backend interface (store ref/CID; bytes live in artdag/IPFS) - [x] crash/restart replay test (mock IO platform) - [x] migration notes for swapping mem → durable under a live subsystem ### Migration notes — mem → durable under a live subsystem The facet API takes the backend as its first argument and never names a concrete backend, so swapping storage is a one-line change at the open site: ``` (persist/open) ; in-memory (test / ephemeral) (persist/mock-durable (persist/mem-backend)); durable protocol, in-process disk (persist/durable-backend) ; production: ops cross perform → host ``` Everything above the backend — `append`/`read`/`project`/`subscribe`/`snapshot` /`compact` — is byte-identical across all three. A subsystem migrates by: 1. **Pick the seam.** The subsystem holds one backend value (today an in-memory list). Replace its construction with `persist/open`/`durable-backend`; leave every call site untouched. 2. **Backfill.** For an existing in-memory store, replay its current state into the durable backend once (append historical events / `kv-put` current values) before cutting reads over. New writes go to durable from then on. 3. **Read models rebuild themselves.** A projection is pure `(fold step seed)`; after cutover, `persist/replay` (snapshot + tail) reconstructs every read model from the durable log — no bespoke migration of derived state. 4. **Blobs first, by reference.** Move large payloads into the content store and store only `persist/blob-ref`s; the log/kv stay small, so the backfill in (2) never copies bytes. 5. **Concurrency is already handled.** Two writers racing a stream get a `persist/conflict?` result, not corruption — the same on mem or durable, so no new code is needed at cutover. The only behavioural difference durable introduces is that each op crosses the kernel IO-suspension boundary (`perform`): under the real kernel the call suspends and the host resumes it transparently, so the facet code is unaware. Tests prove this by routing the identical request shapes through `persist/serve` over an in-process disk (the mock-IO harness). ## Extensions (post-roadmap) - [x] `view.sx` — materialized views: bundle stream + fold + snapshot name; `view-attach` keeps the snapshot current on every publish so `view-peek` is an O(1) read. The consumer-facing read-model abstraction (feed indices, audit rollups, search counters). - [x] `kv.sx` CAS — `persist/kv-cas` (compare-and-swap) + `persist/kv-put-new` (create-only): atomic current-state updates, conflict as a real value (kv analogue of log `append-expect`). For sessions, acl grants, stock counts. - [x] `catalog.sx` — stream catalog: `persist/streams`/`stream-count`/ `stream-exists?`/`total-events`. Backend `:streams` op (from seq high-water marks, so compacted streams still list), threaded through mem + durable. - [x] `query.sx` — read-side scans: `read-between` (seq range), `read-since`/ `read-window` (by `:at`), `read-by-type`, `read-where`, `count-where`. Pure reads for audit windows / type filters / since-cursors. - [x] `batch.sx` — `persist/append-batch` commits a list of `(type at data)` specs as one contiguous block; `persist/append-batch-expect` is transactional (all-or-nothing guarded by optimistic concurrency). For an order + its line items as one commit. - [x] `upcast.sx` — event schema evolution: register a pure `(event -> event)` upcaster per type; `read-upcast`/`project-upcast` lift old events to the current shape on read so projections see one shape. Immutable registry; `upcast-data` helper merges new `:data` fields. Addresses the schema-evolution trap without rewriting history. - [x] `idempotency.sx` — exactly-once append under retries: `persist/append-once` keyed by a caller idempotency key (per stream), returning the same event on a repeat. Marker lives in kv, so idempotency holds across restart. `seen?` check. - [x] `global.sx` — global commit ordering across streams (the primitive feed's unified timeline needs). `persist/gappend` records a pointer in a reserved `$global` index whose seq is the commit position; `read-global`/ `project-global` replay every event in commit order; `global-from` for incremental consumers. Opt-in (plain `append` never touches it); reserved index hidden from the public catalog. Deterministic across restart. ## Consumers (post-foundation, not in scope here) feed/-log, flow store, mod/audit, search index, acl grants, identity sessions all become `persist` log or kv. Track each migration in that subsystem's plan. **Reference migration:** `lib/persist/examples/acl.sx` is a worked, tested template — an ACL-grants store rebuilt on persist (grants/revokes as events, current set as a projection, O(1) checks via a materialized view, an audit-window query). It carries an explicit BEFORE (hand-rolled ephemeral map) → AFTER diff in its header and proves the headline win (grants survive restart) on the durable backend. Other subsystem loops copy this pattern; it does not touch the real `lib/acl`. ## Progress log - **Reference migration: acl grants (201/201).** `lib/persist/examples/acl.sx` — a worked, in-scope template migrating an ACL-grants store from a hand-rolled ephemeral map to persist: grants/revokes as events, current set as a projection, O(1) checks via a materialized view, audit via `read-window`. Header carries the BEFORE→AFTER diff. 10 tests, incl. grants surviving restart on the durable backend (the capability the BEFORE version lacked). The pattern other subsystem loops copy. - **Ext: global commit ordering (191/191).** `global.sx` — `persist/gappend` records a pointer in a reserved `$global` index (its seq = global commit position); `read-global`/`project-global` resolve pointers to events in commit order; `global-from` for incremental global consumers. Opt-in; `$`-streams are now reserved + hidden from the public catalog (`streams-all` reveals them). Gives feed its cross-stream timeline. 11 tests incl. durable + restart determinism. - **Ext: exactly-once append (180/180).** `idempotency.sx` — `persist/append-once` appends at most once per (stream, idempotency key), returning the same event on a repeat; the marker lives in kv so it survives restart (verified on durable). `persist/seen?` check. 9 tests. - **Ext: event schema evolution (171/171).** `upcast.sx` — per-type pure `(event -> event)` upcasters in an immutable registry; `read-upcast`/ `project-upcast` lift legacy events to the current shape on read so projections never branch on version. `upcast-data` merges new `:data` fields keeping stream/seq/type/at. 9 tests incl. mixed old/new + durable. - **Ext: atomic batch append (162/162).** `batch.sx` — `persist/append-batch` commits `(type at data)` specs as one contiguous block (real cons-list, in order); `persist/append-batch-expect` checks the stream is still at expected before writing any event, so the batch is all-or-nothing under a concurrent writer. 10 tests incl. conflict-writes-nothing + durable. - **Ext: read-side query helpers (152/152).** `query.sx` — `read-between` (seq range), `read-since`/`read-window` (by `:at`), `read-by-type`, `read-where`, `count-where`. Pure scans over `persist/read`; for ad-hoc relational queries consumers still project into a kv read model. 9 tests incl. durable. - **Ext: stream catalog (143/143).** New backend op `:streams` (keys of the seq high-water-mark dict, threaded through mem-backend + durable serve/io-backend) so fully-compacted streams still enumerate. `catalog.sx`: `persist/streams`/`stream-count`/`stream-exists?`/`total-events`. 10 tests incl. durable + restart. - **Ext: kv compare-and-swap (133/133).** `persist/kv-cas` sets a key only if its current value equals expected, else returns `{:conflict :expected :actual}`; `persist/kv-put-new` is create-only. The kv analogue of log `append-expect` — atomic current-state for sessions/acl/stock. 11 tests incl. racer + retry + durable backend. - **Ext: materialized views (122/122).** `view.sx` — `persist/view` bundles stream + step + seed + snapshot name; `view-attach` subscribes it to a hub so every publish refreshes the snapshot incrementally; `view-peek` is then an O(1) current read (no fold), `view-value` always folds the tail so it's never stale. 11 tests incl. on durable backend + a sum-over-data view. - **Phase 4c+4d (111/111) — Phase 4 complete, roadmap done.** `recovery.sx` — a 6-test crash/restart integration: an order ledger (event log + subscription kv read model + snapshot + compaction + invoice blob ref) over the durable backend, where "crash" drops every in-process object and "restart" rebuilds over the same disk + content store. Log, read model, snapshot, compacted replay, and blob ref all survive; seq continues; two restarts converge (determinism). Migration notes (mem → durable under a live subsystem) added inline above. - **Phase 4b (105/105).** `blob.sx` — large objects stay out of persist. A blob ref is `{:cid :size :mime}`; the blob store is a SEPARATE injected dependency (`persist/blob-io` over an injectable transport, perform in prod / mock content store in tests). `persist/blob-store` puts bytes and returns ONLY the ref; `persist/blob-fetch` retrieves bytes via the ref. Mock store is content-addressed (same bytes dedupe). 14 tests assert the invariant: a ref in the log/kv carries the CID, never the bytes (`has-key? :bytes` is false). - **Phase 4a (91/91).** `durable.sx` — a backend whose every op crosses the kernel IO boundary via `(perform {:op "persist/..." :args (...)})`. The transport is injectable: `persist/durable-backend` uses the kernel's `perform` (suspends; host resumes); `persist/mock-durable` uses `persist/serve` over an in-memory disk. `persist/serve` is the reference host + the mock-IO harness. Because the request shapes are identical, the ENTIRE facet stack (log/kv/project/snapshot/compaction) runs unchanged on mock-durable — verified. Crash/restart (drop backend, keep disk) recovers log + kv + snapshot by replay; seq counter continues. 15 tests. See Blockers for why end-to-end perform suspension isn't exercised under sx_server.exe. - **Phase 3b (76/76) — Phase 3 complete.** Backend refactor: `last-seq` is now a monotonic per-stream high-water mark (backend `seqs` dict), not physical length, so a compacted log keeps assigning climbing seqs. Added backend `:truncate-through` + `persist/truncate`. `compaction.sx` — `persist/compact` checkpoints then drops events with seq <= snapshot seq; `should-compact?`/ `maybe-compact` give an explicit "compact every N tail events" policy. 11 tests: post-compaction replay value == uncompacted full replay (determinism), seq continuity after truncation, idempotence. `persist/count` = physical stored count (shrinks on compaction) vs `persist/last-seq` = logical. - **Phase 3a (65/65).** `snapshot.sx` — a snapshot is a projection state `{:value :seq}` stored in the kv facet under `snapshot/`. `persist/checkpoint` replays + saves; `persist/replay` = snapshot + tail. 11 tests assert the headline both ways: snapshot+tail == full replay (value and whole state), plus replay determinism. - **Phase 2c (54/54) — Phase 2 complete.** `concurrency.sx` — optimistic concurrency: `persist/append-expect b stream expected ...` refuses the append if the stream advanced past `expected`, returning a conflict VALUE `{:conflict true :expected :actual}` (never a crash, never a silent overwrite). `persist/conflict?` + accessors; caller re-reads actual and retries. 8 tests incl. two-writer race + retry. - **Phase 2b (46/46).** `subscribe.sx` — `persist/hub` wraps a backend with per-stream callbacks. `persist/publish` appends then fires subscribers `(backend stream event)`; direct `persist/append` bypasses them by design (bulk load/replay). Canonical use: callback re-runs `project-resume` or bumps a kv counter so read models update on write. 9 tests. - **Phase 2a (37/37).** `project.sx` — projection state `{:value :seq}`; `persist/project` folds whole stream from seed, `persist/project-resume` folds only the tail (seq > prior seq) so read models update incrementally. step is pure `(value event) -> value`. 9 tests incl. resume==full-from-zero. - **Phase 1 complete (28/28).** `event.sx` (event record + accessors), `backend.sx` (injectable protocol + in-memory log/kv impl, closure state via set!), `log.sx` (append/read/read-from, sequential per-stream seq, stream isolation), `kv.sx` (get/put/delete/has?/keys/get-or/update), `api.sx` (`persist/open` — mem default, backend injectable). conformance.sh + three suites (event/log/kv). Gotcha logged in Blockers: `map` returns an array-backed list not `equal?` to a `(list ...)` literal — assertions build compared lists with list/nth. ## Blockers ### OPEN — host durable-storage adapter (the only gap to real durability) **Owner:** a `hosts/` loop (NOT this one — `lib/persist/**` is the scope fence, and `sx_build` is forbidden here). **Without it, durable persistence silently drops all writes.** **Symptom / minimal repro.** `persist/durable-backend` performs `{:op "persist/..." :args (...)}` for every storage op. Under `sx_server.exe` the kernel's default IO resolver answers unknown ops with `nil` — so the durable backend does not error, it *silently no-ops*: ``` ; load event/backend/log/durable, then: (let ((b (persist/durable-backend))) (begin (persist/append b "s" "x" 0 {}) (persist/append b "s" "x" 0 {}) (list (persist/event-seq (persist/append b "s" "x" 0 {})) (persist/count b "s") (persist/read b "s")))) ; => (1 0 nil) ; every append gets seq 1, nothing stored, reads empty — DATA LOSS ``` The in-memory backend (`persist/open`) is correct and complete; this gap is *only* the production transport. **What to build.** A host servicer that answers the `persist/*` IO ops against a real store (sqlite/files/pg). It is the production twin of `persist/serve` (`lib/persist/durable.sx`) — same op names, same request/response shapes — so mirror that function and back it with durable storage instead of a mem-backend. **Op contract** (request `{:op :args}` → response). `args` is a positional list; events are dicts `{:stream :seq :type :at :data}`: | op | args | returns | semantics | |----|------|---------|-----------| | `persist/append` | `(stream event)` | (ignored) | store `event` in `stream` | | `persist/read` | `(stream)` | event list (oldest-first) | currently-stored events | | `persist/last-seq` | `(stream)` | number | **monotonic high-water mark** (see below) | | `persist/streams` | `()` | stream-name list | every stream ever appended to | | `persist/truncate` | `(stream n)` | (ignored) | drop events with `seq <= n` | | `persist/kv-get` | `(key)` | value or nil | | | `persist/kv-put` | `(key val)` | (ignored) | upsert | | `persist/kv-delete`| `(key)` | (ignored) | remove key | | `persist/kv-has?` | `(key)` | boolean | | | `persist/kv-keys` | `()` | key list | | **Hard invariants** (the facets above rely on these; mem-backend + `persist/serve` are the reference): 1. **`last-seq` is a per-stream monotonic counter, NOT the row count.** It must keep climbing after `truncate`, so a compacted stream never reassigns a seq. Store the counter separately from the rows. 2. `append` is the only seq-assigner upstream (`log.sx` does `last-seq + 1`); the host must not renumber. 3. `read` returns events in append order with `:seq` intact (post-truncate it returns only the surviving tail). 4. `streams` is the set of streams that ever had an append (survives full compaction) — keep it keyed off the seq counters, like mem-backend's `seqs`. 5. Values round-trip structurally: dicts/lists/numbers/strings/nil/booleans in = same out (event `:data`, kv values, blob refs). **Blobs** are a *separate* adapter with the same pattern: ops `blob/put` `(bytes mime)` → cid, `blob/get` `(cid)` → bytes, `blob/has?` `(cid)` → bool (see `lib/persist/blob.sx` / `persist/blob-serve`). Back it with the content-addressed store (artdag/IPFS); persist only ever stores the returned ref. **Where to register.** `hosts/ocaml/bin/sx_server.ml`: - the in-process resolver `Sx_types._cek_io_resolver` (~line 3864) — add a `"persist/..."` match arm dispatching to the new storage module (used by SSR/`eval_with_io`); and/or - the bridge path in `cek_run_with_io` (~line 528–576), which currently forwards unknown ops via `io_request op args` to the external bridge — a Python-bridge handler is the alternative home if storage lives Python-side. Pick one home; the op names are the contract, not the location. **Acceptance test.** Swap the transport: point a `persist/io-backend` at the new host servicer (instead of `persist/serve` over a mem disk) and run the existing `durable` + `recovery` suites — they must stay green, and state must survive an actual process restart (kill the server, restart, replay → recovered). That is exactly what `lib/persist/tests/durable.sx` and `recovery.sx` already assert against the mock; the host adapter just makes the disk real. --- - **Phase 4 perform-suspension not exercised end-to-end under sx_server.exe (by design, not a bug).** The CEK suspension primitives (`cek-step-loop`, `cek-resume`, `cek-suspended?`, `cek-io-request`) and a settable SX-level IO hook are only bound by the `run_tests` OCaml binary (out of scope: hosts/, and sx_build is forbidden). Under `sx_server.exe`, an unhandled `perform` resolves through the OCaml io-request/io-response stdin bridge (production path) — not callable from the pure-eval conformance harness. Resolution: the durable backend's transport is injectable, so the production path is one line `(perform req)` (kernel-handled) and ALL durable logic is tested through the mock transport (`persist/serve` over an in-memory disk). The single untested line is the kernel primitive itself. No host primitive needed; nothing to fix. - **Not a blocker, a testing convention:** `map` returns an array-backed list that is NOT `equal?` to a `(list ...)` cons-literal (two `map` results do compare equal to each other). When asserting list-shaped results against a `(list ...)` literal, build the compared value with `list`/`nth`/`cons`, not `map`. `into`/list-coercion needs the IO bridge and is unusable in the pure-eval harness.