view.sx: persist/view bundles stream + fold + snapshot name; view-attach subscribes it to a hub so each publish refreshes the snapshot incrementally, making view-peek an O(1) current read. view-value always folds the tail so it is never stale. The consumer read-model abstraction (feed indices, audit rollups, search counters). 122/122. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
15 KiB
persist-on-sx: Durable state on the SX kernel
DRAFT outline. Foundation subsystem — the durable substrate the other five currently fake with in-memory mutable lists. Build this first.
"persist" = persistence / data store, NOT the shop. The shop/commerce vertical is
commerce-on-sx.
rose-ash needs durable state: every subsystem (feed log, flow store, mod audit,
search index, acl grants, sessions) today hand-rolls an in-memory structure that
vanishes on restart. persist-on-sx is the one durable substrate they share. It
lives directly on the SX kernel's IO-suspension primitives (perform/cek-resume
— the third CEK phase) so a read/write performs and the kernel persists at the
boundary. Concrete storage backends are injected.
Does it cover ALL persistence? No — and on purpose.
Event-sourcing-everything is a known trap (replay cost, event schema evolution, awkward ad-hoc queries, 5MB images in a log). So persist owns the durable source-of-truth substrate, exposed as two facets over one backend protocol, with two things explicitly delegated out:
| Shape | Owner | Notes |
|---|---|---|
| Event streams (append-only, history matters) | persist — log facet | feed activities, mod audit, order ledger, flow state, content edits |
| Current-state values (KV / document, no history) | persist — kv facet | profiles, stock counts, config, session blobs; also where projections materialize |
| Snapshots / read models (derived, queryable) | persist — projections → kv/log | rebuildable from the log; persisted so you don't replay to answer a query |
| Blobs / large objects (images, media) | delegated → content-addressed store (artdag/IPFS already) | persist stores the reference/CID, never the bytes |
| Cache (ephemeral, evictable) | out of scope | not persistence — different lifecycle (Redis-shaped) |
| Ad-hoc relational query | the subsystem, over a projected read model | the log is bad at "all orders by X in March"; project into a queryable kv/SQL backend |
So: persist is the single durable substrate for state that's either a stream of changes or a current value — but it does not force everything into an event log, it does not hold blobs (only their content-addressed refs), and it does not do caching. Those boundaries are the whole point of calling it a substrate rather than "the database."
End-state: log (append/read streams) + kv (get/put/delete by key) facets, an
injectable backend protocol (mem → file → Postgres → IPFS-ref), pure projections
with incremental snapshots, optimistic concurrency, and a subscription hook so
read models (feeds, indices, audit logs) update incrementally.
Status (rolling)
bash lib/persist/conformance.sh → 122/122 (Phases 1–4 complete + extensions)
Ground rules
- Scope: only
lib/persist/**andplans/persist-on-sx.md. May import the kernel's IO-suspension surface (perform, platform IO ops) — verify what's exported first. Do not add host primitives; a missing durable IO op is a Blockers entry (it belongs inhosts/, out of scope). - Architecture: an event is
{:stream :seq :type :at :data}; the log is an ordered append-only vector; a projection is(fold step seed events); a kv value is(get/put/delete key). Both facets sit on one injected backend{:append :read :kv-get :kv-put :snapshot-read :snapshot-write}. The in-memory backend is the test default; real backends wire in unchanged. - Determinism: replay is pure — same log → same state, always. No clocks or randomness inside projections; time lives on the event.
- Blobs: store the content-address/CID and metadata; never the bytes. The blob backend is a separate injected dependency.
- Commits: one feature per commit. Progress log + tick boxes.
Architecture sketch
Command / write Read model / value
(append stream type data) (project stream step seed)
(kv-put key value) (kv-get key)
│ ▲
▼ │
lib/persist/event.sx lib/persist/project.sx
— {:stream :seq :type :at :data} — fold step seed; incremental from snapshot
│ ▲
▼ │
lib/persist/log.sx lib/persist/kv.sx lib/persist/snapshot.sx
— append/read — get/put/delete — checkpoint; replay = snapshot + tail
— optimistic seq — current-state
│ │ ▲
└──────────────────┴── (perform → backend) ───┘
│
lib/persist/backend.sx lib/persist/api.sx
— injected protocol — (persist/append) (persist/project)
— mem | file | pg | ipfs-ref — (persist/kv-get/put) (persist/subscribe)
│
└── blobs → content-addressed store (artdag/IPFS), by reference only
Phase 1 — Log + kv + in-memory backend
event.sx— event record, stream/seq helpersbackend.sx— injectable protocol + in-memory impl (log + kv)log.sx—append(optimistic seq),read,read-fromkv.sx—get/put/deletecurrent-stateapi.sx+ tests + scoreboard + conformance.sh
Phase 2 — Projections + subscriptions
project.sx—(project stream step seed), incremental fold- subscription hook — projection / kv read model re-runs on append
- concurrency conflict surfaced as a real result, not a crash
Phase 3 — Snapshots + replay
snapshot.sx— checkpoint a projection; replay = snapshot + tail- compaction policy; replay-determinism tests
Phase 4 — Durable backends via kernel IO
- file/log backend driven through
perform(IO-suspension boundary) - blob backend interface (store ref/CID; bytes live in artdag/IPFS)
- crash/restart replay test (mock IO platform)
- migration notes for swapping mem → durable under a live subsystem
Migration notes — mem → durable under a live subsystem
The facet API takes the backend as its first argument and never names a concrete backend, so swapping storage is a one-line change at the open site:
(persist/open) ; in-memory (test / ephemeral)
(persist/mock-durable (persist/mem-backend)); durable protocol, in-process disk
(persist/durable-backend) ; production: ops cross perform → host
Everything above the backend — append/read/project/subscribe/snapshot
/compact — is byte-identical across all three. A subsystem migrates by:
- Pick the seam. The subsystem holds one backend value (today an in-memory
list). Replace its construction with
persist/open/durable-backend; leave every call site untouched. - Backfill. For an existing in-memory store, replay its current state into
the durable backend once (append historical events /
kv-putcurrent values) before cutting reads over. New writes go to durable from then on. - Read models rebuild themselves. A projection is pure
(fold step seed); after cutover,persist/replay(snapshot + tail) reconstructs every read model from the durable log — no bespoke migration of derived state. - Blobs first, by reference. Move large payloads into the content store and
store only
persist/blob-refs; the log/kv stay small, so the backfill in (2) never copies bytes. - Concurrency is already handled. Two writers racing a stream get a
persist/conflict?result, not corruption — the same on mem or durable, so no new code is needed at cutover.
The only behavioural difference durable introduces is that each op crosses the
kernel IO-suspension boundary (perform): under the real kernel the call
suspends and the host resumes it transparently, so the facet code is unaware.
Tests prove this by routing the identical request shapes through persist/serve
over an in-process disk (the mock-IO harness).
Extensions (post-roadmap)
view.sx— materialized views: bundle stream + fold + snapshot name;view-attachkeeps the snapshot current on every publish soview-peekis an O(1) read. The consumer-facing read-model abstraction (feed indices, audit rollups, search counters).
Consumers (post-foundation, not in scope here)
feed/-log, flow store, mod/audit, search index, acl grants, identity sessions all
become persist log or kv. Track each migration in that subsystem's plan.
Progress log
- Ext: materialized views (122/122).
view.sx—persist/viewbundles stream + step + seed + snapshot name;view-attachsubscribes it to a hub so every publish refreshes the snapshot incrementally;view-peekis then an O(1) current read (no fold),view-valuealways folds the tail so it's never stale. 11 tests incl. on durable backend + a sum-over-data view. - Phase 4c+4d (111/111) — Phase 4 complete, roadmap done.
recovery.sx— a 6-test crash/restart integration: an order ledger (event log + subscription kv read model + snapshot + compaction + invoice blob ref) over the durable backend, where "crash" drops every in-process object and "restart" rebuilds over the same disk + content store. Log, read model, snapshot, compacted replay, and blob ref all survive; seq continues; two restarts converge (determinism). Migration notes (mem → durable under a live subsystem) added inline above. - Phase 4b (105/105).
blob.sx— large objects stay out of persist. A blob ref is{:cid :size :mime}; the blob store is a SEPARATE injected dependency (persist/blob-ioover an injectable transport, perform in prod / mock content store in tests).persist/blob-storeputs bytes and returns ONLY the ref;persist/blob-fetchretrieves bytes via the ref. Mock store is content-addressed (same bytes dedupe). 14 tests assert the invariant: a ref in the log/kv carries the CID, never the bytes (has-key? :bytesis false). - Phase 4a (91/91).
durable.sx— a backend whose every op crosses the kernel IO boundary via(perform {:op "persist/..." :args (...)}). The transport is injectable:persist/durable-backenduses the kernel'sperform(suspends; host resumes);persist/mock-durableusespersist/serveover an in-memory disk.persist/serveis the reference host- the mock-IO harness. Because the request shapes are identical, the ENTIRE facet stack (log/kv/project/snapshot/compaction) runs unchanged on mock-durable — verified. Crash/restart (drop backend, keep disk) recovers log
- kv + snapshot by replay; seq counter continues. 15 tests. See Blockers for why end-to-end perform suspension isn't exercised under sx_server.exe.
- Phase 3b (76/76) — Phase 3 complete. Backend refactor:
last-seqis now a monotonic per-stream high-water mark (backendseqsdict), not physical length, so a compacted log keeps assigning climbing seqs. Added backend:truncate-through+persist/truncate.compaction.sx—persist/compactcheckpoints then drops events with seq <= snapshot seq;should-compact?/maybe-compactgive an explicit "compact every N tail events" policy. 11 tests: post-compaction replay value == uncompacted full replay (determinism), seq continuity after truncation, idempotence.persist/count= physical stored count (shrinks on compaction) vspersist/last-seq= logical. - Phase 3a (65/65).
snapshot.sx— a snapshot is a projection state{:value :seq}stored in the kv facet undersnapshot/<name>.persist/checkpointreplays + saves;persist/replay= snapshot + tail. 11 tests assert the headline both ways: snapshot+tail == full replay (value and whole state), plus replay determinism. - Phase 2c (54/54) — Phase 2 complete.
concurrency.sx— optimistic concurrency:persist/append-expect b stream expected ...refuses the append if the stream advanced pastexpected, returning a conflict VALUE{:conflict true :expected :actual}(never a crash, never a silent overwrite).persist/conflict?+ accessors; caller re-reads actual and retries. 8 tests incl. two-writer race + retry. - Phase 2b (46/46).
subscribe.sx—persist/hubwraps a backend with per-stream callbacks.persist/publishappends then fires subscribers(backend stream event); directpersist/appendbypasses them by design (bulk load/replay). Canonical use: callback re-runsproject-resumeor bumps a kv counter so read models update on write. 9 tests. - Phase 2a (37/37).
project.sx— projection state{:value :seq};persist/projectfolds whole stream from seed,persist/project-resumefolds only the tail (seq > prior seq) so read models update incrementally. step is pure(value event) -> value. 9 tests incl. resume==full-from-zero. - Phase 1 complete (28/28).
event.sx(event record + accessors),backend.sx(injectable protocol + in-memory log/kv impl, closure state via set!),log.sx(append/read/read-from, sequential per-stream seq, stream isolation),kv.sx(get/put/delete/has?/keys/get-or/update),api.sx(persist/open— mem default, backend injectable). conformance.sh + three suites (event/log/kv). Gotcha logged in Blockers:mapreturns an array-backed list notequal?to a(list ...)literal — assertions build compared lists with list/nth.
Blockers
- Phase 4 perform-suspension not exercised end-to-end under sx_server.exe (by
design, not a bug). The CEK suspension primitives (
cek-step-loop,cek-resume,cek-suspended?,cek-io-request) and a settable SX-level IO hook are only bound by therun_testsOCaml binary (out of scope: hosts/, and sx_build is forbidden). Undersx_server.exe, an unhandledperformresolves through the OCaml io-request/io-response stdin bridge (production path) — not callable from the pure-eval conformance harness. Resolution: the durable backend's transport is injectable, so the production path is one line(perform req)(kernel-handled) and ALL durable logic is tested through the mock transport (persist/serveover an in-memory disk). The single untested line is the kernel primitive itself. No host primitive needed; nothing to fix. - Not a blocker, a testing convention:
mapreturns an array-backed list that is NOTequal?to a(list ...)cons-literal (twomapresults do compare equal to each other). When asserting list-shaped results against a(list ...)literal, build the compared value withlist/nth/cons, notmap.into/list-coercion needs the IO bridge and is unusable in the pure-eval harness.