Files
rose-ash/plans/persist-on-sx.md
giles ecdaeea223
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 27s
persist: materialized views — stay current on write, O(1) read + 11 tests
view.sx: persist/view bundles stream + fold + snapshot name; view-attach
subscribes it to a hub so each publish refreshes the snapshot incrementally,
making view-peek an O(1) current read. view-value always folds the tail so it
is never stale. The consumer read-model abstraction (feed indices, audit
rollups, search counters). 122/122.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 19:16:16 +00:00

244 lines
15 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# persist-on-sx: Durable state on the SX kernel
> **DRAFT outline.** Foundation subsystem — the durable substrate the other five
> currently fake with in-memory mutable lists. Build this first.
>
> **"persist" = persistence / data store, NOT the shop.** The shop/commerce vertical
> is `commerce-on-sx`.
rose-ash needs durable state: every subsystem (feed log, flow store, mod audit,
search index, acl grants, sessions) today hand-rolls an in-memory structure that
vanishes on restart. `persist-on-sx` is the one durable substrate they share. It
lives directly on the SX kernel's IO-suspension primitives (`perform`/`cek-resume`
— the third CEK phase) so a read/write `perform`s and the kernel persists at the
boundary. Concrete storage backends are injected.
## Does it cover ALL persistence? No — and on purpose.
Event-sourcing-everything is a known trap (replay cost, event schema evolution,
awkward ad-hoc queries, 5MB images in a log). So persist owns the **durable
source-of-truth substrate**, exposed as **two facets over one backend protocol**,
with two things explicitly delegated out:
| Shape | Owner | Notes |
|-------|-------|-------|
| **Event streams** (append-only, history matters) | persist — **log facet** | feed activities, mod audit, order ledger, flow state, content edits |
| **Current-state values** (KV / document, no history) | persist — **kv facet** | profiles, stock counts, config, session blobs; also where projections materialize |
| **Snapshots / read models** (derived, queryable) | persist — projections → kv/log | rebuildable from the log; persisted so you don't replay to answer a query |
| **Blobs / large objects** (images, media) | **delegated** → content-addressed store (artdag/IPFS already) | persist stores the *reference/CID*, never the bytes |
| **Cache** (ephemeral, evictable) | **out of scope** | not persistence — different lifecycle (Redis-shaped) |
| **Ad-hoc relational query** | the subsystem, over a projected read model | the log is bad at "all orders by X in March"; project into a queryable kv/SQL backend |
So: persist is the **single durable substrate** for state that's either a stream of
changes or a current value — but it does **not** force everything into an event
log, it does **not** hold blobs (only their content-addressed refs), and it does
**not** do caching. Those boundaries are the whole point of calling it a substrate
rather than "the database."
End-state: `log` (append/read streams) + `kv` (get/put/delete by key) facets, an
injectable backend protocol (mem → file → Postgres → IPFS-ref), pure projections
with incremental snapshots, optimistic concurrency, and a subscription hook so
read models (feeds, indices, audit logs) update incrementally.
## Status (rolling)
`bash lib/persist/conformance.sh`**122/122** (Phases 14 complete + extensions)
## Ground rules
- **Scope:** only `lib/persist/**` and `plans/persist-on-sx.md`. May **import** the
kernel's IO-suspension surface (`perform`, platform IO ops) — verify what's
exported first. Do not add host primitives; a missing durable IO op is a Blockers
entry (it belongs in `hosts/`, out of scope).
- **Architecture:** an event is `{:stream :seq :type :at :data}`; the log is an
ordered append-only vector; a projection is `(fold step seed events)`; a kv value
is `(get/put/delete key)`. Both facets sit on one injected backend
`{:append :read :kv-get :kv-put :snapshot-read :snapshot-write}`. The in-memory
backend is the test default; real backends wire in unchanged.
- **Determinism:** replay is pure — same log → same state, always. No clocks or
randomness inside projections; time lives on the event.
- **Blobs:** store the content-address/CID and metadata; never the bytes. The blob
backend is a separate injected dependency.
- **Commits:** one feature per commit. Progress log + tick boxes.
## Architecture sketch
```
Command / write Read model / value
(append stream type data) (project stream step seed)
(kv-put key value) (kv-get key)
│ ▲
▼ │
lib/persist/event.sx lib/persist/project.sx
— {:stream :seq :type :at :data} — fold step seed; incremental from snapshot
│ ▲
▼ │
lib/persist/log.sx lib/persist/kv.sx lib/persist/snapshot.sx
— append/read — get/put/delete — checkpoint; replay = snapshot + tail
— optimistic seq — current-state
│ │ ▲
└──────────────────┴── (perform → backend) ───┘
lib/persist/backend.sx lib/persist/api.sx
— injected protocol — (persist/append) (persist/project)
— mem | file | pg | ipfs-ref — (persist/kv-get/put) (persist/subscribe)
└── blobs → content-addressed store (artdag/IPFS), by reference only
```
## Phase 1 — Log + kv + in-memory backend
- [x] `event.sx` — event record, stream/seq helpers
- [x] `backend.sx` — injectable protocol + in-memory impl (log + kv)
- [x] `log.sx``append` (optimistic seq), `read`, `read-from`
- [x] `kv.sx``get`/`put`/`delete` current-state
- [x] `api.sx` + tests + scoreboard + conformance.sh
## Phase 2 — Projections + subscriptions
- [x] `project.sx``(project stream step seed)`, incremental fold
- [x] subscription hook — projection / kv read model re-runs on append
- [x] concurrency conflict surfaced as a real result, not a crash
## Phase 3 — Snapshots + replay
- [x] `snapshot.sx` — checkpoint a projection; replay = snapshot + tail
- [x] compaction policy; replay-determinism tests
## Phase 4 — Durable backends via kernel IO
- [x] file/log backend driven through `perform` (IO-suspension boundary)
- [x] blob backend interface (store ref/CID; bytes live in artdag/IPFS)
- [x] crash/restart replay test (mock IO platform)
- [x] migration notes for swapping mem → durable under a live subsystem
### Migration notes — mem → durable under a live subsystem
The facet API takes the backend as its first argument and never names a concrete
backend, so swapping storage is a one-line change at the open site:
```
(persist/open) ; in-memory (test / ephemeral)
(persist/mock-durable (persist/mem-backend)); durable protocol, in-process disk
(persist/durable-backend) ; production: ops cross perform → host
```
Everything above the backend — `append`/`read`/`project`/`subscribe`/`snapshot`
/`compact` — is byte-identical across all three. A subsystem migrates by:
1. **Pick the seam.** The subsystem holds one backend value (today an in-memory
list). Replace its construction with `persist/open`/`durable-backend`; leave
every call site untouched.
2. **Backfill.** For an existing in-memory store, replay its current state into
the durable backend once (append historical events / `kv-put` current
values) before cutting reads over. New writes go to durable from then on.
3. **Read models rebuild themselves.** A projection is pure `(fold step seed)`;
after cutover, `persist/replay` (snapshot + tail) reconstructs every read
model from the durable log — no bespoke migration of derived state.
4. **Blobs first, by reference.** Move large payloads into the content store and
store only `persist/blob-ref`s; the log/kv stay small, so the backfill in (2)
never copies bytes.
5. **Concurrency is already handled.** Two writers racing a stream get a
`persist/conflict?` result, not corruption — the same on mem or durable, so
no new code is needed at cutover.
The only behavioural difference durable introduces is that each op crosses the
kernel IO-suspension boundary (`perform`): under the real kernel the call
suspends and the host resumes it transparently, so the facet code is unaware.
Tests prove this by routing the identical request shapes through `persist/serve`
over an in-process disk (the mock-IO harness).
## Extensions (post-roadmap)
- [x] `view.sx` — materialized views: bundle stream + fold + snapshot name;
`view-attach` keeps the snapshot current on every publish so `view-peek` is an
O(1) read. The consumer-facing read-model abstraction (feed indices, audit
rollups, search counters).
## Consumers (post-foundation, not in scope here)
feed/-log, flow store, mod/audit, search index, acl grants, identity sessions all
become `persist` log or kv. Track each migration in that subsystem's plan.
## Progress log
- **Ext: materialized views (122/122).** `view.sx``persist/view` bundles
stream + step + seed + snapshot name; `view-attach` subscribes it to a hub so
every publish refreshes the snapshot incrementally; `view-peek` is then an
O(1) current read (no fold), `view-value` always folds the tail so it's never
stale. 11 tests incl. on durable backend + a sum-over-data view.
- **Phase 4c+4d (111/111) — Phase 4 complete, roadmap done.** `recovery.sx` — a
6-test crash/restart integration: an order ledger (event log + subscription
kv read model + snapshot + compaction + invoice blob ref) over the durable
backend, where "crash" drops every in-process object and "restart" rebuilds
over the same disk + content store. Log, read model, snapshot, compacted
replay, and blob ref all survive; seq continues; two restarts converge
(determinism). Migration notes (mem → durable under a live subsystem) added
inline above.
- **Phase 4b (105/105).** `blob.sx` — large objects stay out of persist. A blob
ref is `{:cid :size :mime}`; the blob store is a SEPARATE injected dependency
(`persist/blob-io` over an injectable transport, perform in prod / mock
content store in tests). `persist/blob-store` puts bytes and returns ONLY the
ref; `persist/blob-fetch` retrieves bytes via the ref. Mock store is
content-addressed (same bytes dedupe). 14 tests assert the invariant: a ref in
the log/kv carries the CID, never the bytes (`has-key? :bytes` is false).
- **Phase 4a (91/91).** `durable.sx` — a backend whose every op crosses the
kernel IO boundary via `(perform {:op "persist/..." :args (...)})`. The
transport is injectable: `persist/durable-backend` uses the kernel's
`perform` (suspends; host resumes); `persist/mock-durable` uses
`persist/serve` over an in-memory disk. `persist/serve` is the reference host
+ the mock-IO harness. Because the request shapes are identical, the ENTIRE
facet stack (log/kv/project/snapshot/compaction) runs unchanged on
mock-durable — verified. Crash/restart (drop backend, keep disk) recovers log
+ kv + snapshot by replay; seq counter continues. 15 tests. See Blockers for
why end-to-end perform suspension isn't exercised under sx_server.exe.
- **Phase 3b (76/76) — Phase 3 complete.** Backend refactor: `last-seq` is now
a monotonic per-stream high-water mark (backend `seqs` dict), not physical
length, so a compacted log keeps assigning climbing seqs. Added backend
`:truncate-through` + `persist/truncate`. `compaction.sx``persist/compact`
checkpoints then drops events with seq <= snapshot seq; `should-compact?`/
`maybe-compact` give an explicit "compact every N tail events" policy. 11
tests: post-compaction replay value == uncompacted full replay (determinism),
seq continuity after truncation, idempotence. `persist/count` = physical
stored count (shrinks on compaction) vs `persist/last-seq` = logical.
- **Phase 3a (65/65).** `snapshot.sx` — a snapshot is a projection state
`{:value :seq}` stored in the kv facet under `snapshot/<name>`.
`persist/checkpoint` replays + saves; `persist/replay` = snapshot + tail.
11 tests assert the headline both ways: snapshot+tail == full replay (value
and whole state), plus replay determinism.
- **Phase 2c (54/54) — Phase 2 complete.** `concurrency.sx` — optimistic
concurrency: `persist/append-expect b stream expected ...` refuses the append
if the stream advanced past `expected`, returning a conflict VALUE
`{:conflict true :expected :actual}` (never a crash, never a silent
overwrite). `persist/conflict?` + accessors; caller re-reads actual and
retries. 8 tests incl. two-writer race + retry.
- **Phase 2b (46/46).** `subscribe.sx``persist/hub` wraps a backend with
per-stream callbacks. `persist/publish` appends then fires subscribers
`(backend stream event)`; direct `persist/append` bypasses them by design
(bulk load/replay). Canonical use: callback re-runs `project-resume` or bumps
a kv counter so read models update on write. 9 tests.
- **Phase 2a (37/37).** `project.sx` — projection state `{:value :seq}`;
`persist/project` folds whole stream from seed, `persist/project-resume`
folds only the tail (seq > prior seq) so read models update incrementally.
step is pure `(value event) -> value`. 9 tests incl. resume==full-from-zero.
- **Phase 1 complete (28/28).** `event.sx` (event record + accessors),
`backend.sx` (injectable protocol + in-memory log/kv impl, closure state via
set!), `log.sx` (append/read/read-from, sequential per-stream seq, stream
isolation), `kv.sx` (get/put/delete/has?/keys/get-or/update), `api.sx`
(`persist/open` — mem default, backend injectable). conformance.sh + three
suites (event/log/kv). Gotcha logged in Blockers: `map` returns an
array-backed list not `equal?` to a `(list ...)` literal — assertions build
compared lists with list/nth.
## Blockers
- **Phase 4 perform-suspension not exercised end-to-end under sx_server.exe (by
design, not a bug).** The CEK suspension primitives (`cek-step-loop`,
`cek-resume`, `cek-suspended?`, `cek-io-request`) and a settable SX-level IO
hook are only bound by the `run_tests` OCaml binary (out of scope: hosts/, and
sx_build is forbidden). Under `sx_server.exe`, an unhandled `perform` resolves
through the OCaml io-request/io-response stdin bridge (production path) — not
callable from the pure-eval conformance harness. Resolution: the durable
backend's transport is injectable, so the production path is one line
`(perform req)` (kernel-handled) and ALL durable logic is tested through the
mock transport (`persist/serve` over an in-memory disk). The single untested
line is the kernel primitive itself. No host primitive needed; nothing to fix.
- **Not a blocker, a testing convention:** `map` returns an array-backed list
that is NOT `equal?` to a `(list ...)` cons-literal (two `map` results do
compare equal to each other). When asserting list-shaped results against a
`(list ...)` literal, build the compared value with `list`/`nth`/`cons`, not
`map`. `into`/list-coercion needs the IO bridge and is unusable in the
pure-eval harness.