Files
rose-ash/plans/persist-on-sx.md
giles aff7d1e84f
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 1m0s
persist: compaction — drop snapshotted prefix, monotonic seq + 11 tests
Backend now tracks last-seq as a monotonic high-water mark (survives
truncation) and exposes :truncate-through. compaction.sx: persist/compact
checkpoints then drops events with seq <= snapshot seq; should-compact?/
maybe-compact give an explicit every-N policy. Determinism: post-compaction
replay value == uncompacted full replay. Phase 3 complete, 76/76.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 18:42:06 +00:00

161 lines
9.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# persist-on-sx: Durable state on the SX kernel
> **DRAFT outline.** Foundation subsystem — the durable substrate the other five
> currently fake with in-memory mutable lists. Build this first.
>
> **"persist" = persistence / data store, NOT the shop.** The shop/commerce vertical
> is `commerce-on-sx`.
rose-ash needs durable state: every subsystem (feed log, flow store, mod audit,
search index, acl grants, sessions) today hand-rolls an in-memory structure that
vanishes on restart. `persist-on-sx` is the one durable substrate they share. It
lives directly on the SX kernel's IO-suspension primitives (`perform`/`cek-resume`
— the third CEK phase) so a read/write `perform`s and the kernel persists at the
boundary. Concrete storage backends are injected.
## Does it cover ALL persistence? No — and on purpose.
Event-sourcing-everything is a known trap (replay cost, event schema evolution,
awkward ad-hoc queries, 5MB images in a log). So persist owns the **durable
source-of-truth substrate**, exposed as **two facets over one backend protocol**,
with two things explicitly delegated out:
| Shape | Owner | Notes |
|-------|-------|-------|
| **Event streams** (append-only, history matters) | persist — **log facet** | feed activities, mod audit, order ledger, flow state, content edits |
| **Current-state values** (KV / document, no history) | persist — **kv facet** | profiles, stock counts, config, session blobs; also where projections materialize |
| **Snapshots / read models** (derived, queryable) | persist — projections → kv/log | rebuildable from the log; persisted so you don't replay to answer a query |
| **Blobs / large objects** (images, media) | **delegated** → content-addressed store (artdag/IPFS already) | persist stores the *reference/CID*, never the bytes |
| **Cache** (ephemeral, evictable) | **out of scope** | not persistence — different lifecycle (Redis-shaped) |
| **Ad-hoc relational query** | the subsystem, over a projected read model | the log is bad at "all orders by X in March"; project into a queryable kv/SQL backend |
So: persist is the **single durable substrate** for state that's either a stream of
changes or a current value — but it does **not** force everything into an event
log, it does **not** hold blobs (only their content-addressed refs), and it does
**not** do caching. Those boundaries are the whole point of calling it a substrate
rather than "the database."
End-state: `log` (append/read streams) + `kv` (get/put/delete by key) facets, an
injectable backend protocol (mem → file → Postgres → IPFS-ref), pure projections
with incremental snapshots, optimistic concurrency, and a subscription hook so
read models (feeds, indices, audit logs) update incrementally.
## Status (rolling)
`bash lib/persist/conformance.sh`**76/76** (Phases 13 done)
## Ground rules
- **Scope:** only `lib/persist/**` and `plans/persist-on-sx.md`. May **import** the
kernel's IO-suspension surface (`perform`, platform IO ops) — verify what's
exported first. Do not add host primitives; a missing durable IO op is a Blockers
entry (it belongs in `hosts/`, out of scope).
- **Architecture:** an event is `{:stream :seq :type :at :data}`; the log is an
ordered append-only vector; a projection is `(fold step seed events)`; a kv value
is `(get/put/delete key)`. Both facets sit on one injected backend
`{:append :read :kv-get :kv-put :snapshot-read :snapshot-write}`. The in-memory
backend is the test default; real backends wire in unchanged.
- **Determinism:** replay is pure — same log → same state, always. No clocks or
randomness inside projections; time lives on the event.
- **Blobs:** store the content-address/CID and metadata; never the bytes. The blob
backend is a separate injected dependency.
- **Commits:** one feature per commit. Progress log + tick boxes.
## Architecture sketch
```
Command / write Read model / value
(append stream type data) (project stream step seed)
(kv-put key value) (kv-get key)
│ ▲
▼ │
lib/persist/event.sx lib/persist/project.sx
— {:stream :seq :type :at :data} — fold step seed; incremental from snapshot
│ ▲
▼ │
lib/persist/log.sx lib/persist/kv.sx lib/persist/snapshot.sx
— append/read — get/put/delete — checkpoint; replay = snapshot + tail
— optimistic seq — current-state
│ │ ▲
└──────────────────┴── (perform → backend) ───┘
lib/persist/backend.sx lib/persist/api.sx
— injected protocol — (persist/append) (persist/project)
— mem | file | pg | ipfs-ref — (persist/kv-get/put) (persist/subscribe)
└── blobs → content-addressed store (artdag/IPFS), by reference only
```
## Phase 1 — Log + kv + in-memory backend
- [x] `event.sx` — event record, stream/seq helpers
- [x] `backend.sx` — injectable protocol + in-memory impl (log + kv)
- [x] `log.sx``append` (optimistic seq), `read`, `read-from`
- [x] `kv.sx``get`/`put`/`delete` current-state
- [x] `api.sx` + tests + scoreboard + conformance.sh
## Phase 2 — Projections + subscriptions
- [x] `project.sx``(project stream step seed)`, incremental fold
- [x] subscription hook — projection / kv read model re-runs on append
- [x] concurrency conflict surfaced as a real result, not a crash
## Phase 3 — Snapshots + replay
- [x] `snapshot.sx` — checkpoint a projection; replay = snapshot + tail
- [x] compaction policy; replay-determinism tests
## Phase 4 — Durable backends via kernel IO
- [ ] file/log backend driven through `perform` (IO-suspension boundary)
- [ ] blob backend interface (store ref/CID; bytes live in artdag/IPFS)
- [ ] crash/restart replay test (mock IO platform)
- [ ] migration notes for swapping mem → durable under a live subsystem
## Consumers (post-foundation, not in scope here)
feed/-log, flow store, mod/audit, search index, acl grants, identity sessions all
become `persist` log or kv. Track each migration in that subsystem's plan.
## Progress log
- **Phase 3b (76/76) — Phase 3 complete.** Backend refactor: `last-seq` is now
a monotonic per-stream high-water mark (backend `seqs` dict), not physical
length, so a compacted log keeps assigning climbing seqs. Added backend
`:truncate-through` + `persist/truncate`. `compaction.sx``persist/compact`
checkpoints then drops events with seq <= snapshot seq; `should-compact?`/
`maybe-compact` give an explicit "compact every N tail events" policy. 11
tests: post-compaction replay value == uncompacted full replay (determinism),
seq continuity after truncation, idempotence. `persist/count` = physical
stored count (shrinks on compaction) vs `persist/last-seq` = logical.
- **Phase 3a (65/65).** `snapshot.sx` — a snapshot is a projection state
`{:value :seq}` stored in the kv facet under `snapshot/<name>`.
`persist/checkpoint` replays + saves; `persist/replay` = snapshot + tail.
11 tests assert the headline both ways: snapshot+tail == full replay (value
and whole state), plus replay determinism.
- **Phase 2c (54/54) — Phase 2 complete.** `concurrency.sx` — optimistic
concurrency: `persist/append-expect b stream expected ...` refuses the append
if the stream advanced past `expected`, returning a conflict VALUE
`{:conflict true :expected :actual}` (never a crash, never a silent
overwrite). `persist/conflict?` + accessors; caller re-reads actual and
retries. 8 tests incl. two-writer race + retry.
- **Phase 2b (46/46).** `subscribe.sx``persist/hub` wraps a backend with
per-stream callbacks. `persist/publish` appends then fires subscribers
`(backend stream event)`; direct `persist/append` bypasses them by design
(bulk load/replay). Canonical use: callback re-runs `project-resume` or bumps
a kv counter so read models update on write. 9 tests.
- **Phase 2a (37/37).** `project.sx` — projection state `{:value :seq}`;
`persist/project` folds whole stream from seed, `persist/project-resume`
folds only the tail (seq > prior seq) so read models update incrementally.
step is pure `(value event) -> value`. 9 tests incl. resume==full-from-zero.
- **Phase 1 complete (28/28).** `event.sx` (event record + accessors),
`backend.sx` (injectable protocol + in-memory log/kv impl, closure state via
set!), `log.sx` (append/read/read-from, sequential per-stream seq, stream
isolation), `kv.sx` (get/put/delete/has?/keys/get-or/update), `api.sx`
(`persist/open` — mem default, backend injectable). conformance.sh + three
suites (event/log/kv). Gotcha logged in Blockers: `map` returns an
array-backed list not `equal?` to a `(list ...)` literal — assertions build
compared lists with list/nth.
## Blockers
- **Not a blocker, a testing convention:** `map` returns an array-backed list
that is NOT `equal?` to a `(list ...)` cons-literal (two `map` results do
compare equal to each other). When asserting list-shaped results against a
`(list ...)` literal, build the compared value with `list`/`nth`/`cons`, not
`map`. `into`/list-coercion needs the IO bridge and is unusable in the
pure-eval harness.