persist: Blocker spec for the host durable-storage adapter
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 36s

Document the one gap to real durability: a hosts/ servicer for the persist/*
IO ops. Includes the silent-data-loss repro (durable-backend currently no-ops
under sx_server's default resolver), the full op contract table, hard
invariants (monotonic last-seq, etc.), the blob adapter shape, where to
register in sx_server.ml, and an acceptance test (swap transport, run durable +
recovery suites against real storage, survive a real restart).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-06 20:52:44 +00:00
parent 84d5732b38
commit 200b93c1f6

View File

@@ -308,6 +308,90 @@ real `lib/acl`.
compared lists with list/nth.
## Blockers
### OPEN — host durable-storage adapter (the only gap to real durability)
**Owner:** a `hosts/` loop (NOT this one — `lib/persist/**` is the scope fence,
and `sx_build` is forbidden here). **Without it, durable persistence silently
drops all writes.**
**Symptom / minimal repro.** `persist/durable-backend` performs
`{:op "persist/..." :args (...)}` for every storage op. Under `sx_server.exe`
the kernel's default IO resolver answers unknown ops with `nil` — so the durable
backend does not error, it *silently no-ops*:
```
; load event/backend/log/durable, then:
(let ((b (persist/durable-backend)))
(begin (persist/append b "s" "x" 0 {})
(persist/append b "s" "x" 0 {})
(list (persist/event-seq (persist/append b "s" "x" 0 {}))
(persist/count b "s")
(persist/read b "s"))))
; => (1 0 nil) ; every append gets seq 1, nothing stored, reads empty — DATA LOSS
```
The in-memory backend (`persist/open`) is correct and complete; this gap is
*only* the production transport.
**What to build.** A host servicer that answers the `persist/*` IO ops against a
real store (sqlite/files/pg). It is the production twin of `persist/serve`
(`lib/persist/durable.sx`) — same op names, same request/response shapes — so
mirror that function and back it with durable storage instead of a mem-backend.
**Op contract** (request `{:op :args}` → response). `args` is a positional list;
events are dicts `{:stream :seq :type :at :data}`:
| op | args | returns | semantics |
|----|------|---------|-----------|
| `persist/append` | `(stream event)` | (ignored) | store `event` in `stream` |
| `persist/read` | `(stream)` | event list (oldest-first) | currently-stored events |
| `persist/last-seq` | `(stream)` | number | **monotonic high-water mark** (see below) |
| `persist/streams` | `()` | stream-name list | every stream ever appended to |
| `persist/truncate` | `(stream n)` | (ignored) | drop events with `seq <= n` |
| `persist/kv-get` | `(key)` | value or nil | |
| `persist/kv-put` | `(key val)` | (ignored) | upsert |
| `persist/kv-delete`| `(key)` | (ignored) | remove key |
| `persist/kv-has?` | `(key)` | boolean | |
| `persist/kv-keys` | `()` | key list | |
**Hard invariants** (the facets above rely on these; mem-backend + `persist/serve`
are the reference):
1. **`last-seq` is a per-stream monotonic counter, NOT the row count.** It must
keep climbing after `truncate`, so a compacted stream never reassigns a seq.
Store the counter separately from the rows.
2. `append` is the only seq-assigner upstream (`log.sx` does `last-seq + 1`); the
host must not renumber.
3. `read` returns events in append order with `:seq` intact (post-truncate it
returns only the surviving tail).
4. `streams` is the set of streams that ever had an append (survives full
compaction) — keep it keyed off the seq counters, like mem-backend's `seqs`.
5. Values round-trip structurally: dicts/lists/numbers/strings/nil/booleans in =
same out (event `:data`, kv values, blob refs).
**Blobs** are a *separate* adapter with the same pattern: ops `blob/put`
`(bytes mime)` → cid, `blob/get` `(cid)` → bytes, `blob/has?` `(cid)` → bool
(see `lib/persist/blob.sx` / `persist/blob-serve`). Back it with the
content-addressed store (artdag/IPFS); persist only ever stores the returned ref.
**Where to register.** `hosts/ocaml/bin/sx_server.ml`:
- the in-process resolver `Sx_types._cek_io_resolver` (~line 3864) — add a
`"persist/..."` match arm dispatching to the new storage module (used by
SSR/`eval_with_io`); and/or
- the bridge path in `cek_run_with_io` (~line 528576), which currently forwards
unknown ops via `io_request op args` to the external bridge — a Python-bridge
handler is the alternative home if storage lives Python-side.
Pick one home; the op names are the contract, not the location.
**Acceptance test.** Swap the transport: point a `persist/io-backend` at the new
host servicer (instead of `persist/serve` over a mem disk) and run the existing
`durable` + `recovery` suites — they must stay green, and state must survive an
actual process restart (kill the server, restart, replay → recovered). That is
exactly what `lib/persist/tests/durable.sx` and `recovery.sx` already assert
against the mock; the host adapter just makes the disk real.
---
- **Phase 4 perform-suspension not exercised end-to-end under sx_server.exe (by
design, not a bug).** The CEK suspension primitives (`cek-step-loop`,
`cek-resume`, `cek-suspended?`, `cek-io-request`) and a settable SX-level IO