Merge loops/host-persist into architecture: host durable-storage adapter (persist/* + blob/* on disk, restart-safe)
This commit is contained in:
94
plans/agent-briefings/host-persist-loop.md
Normal file
94
plans/agent-briefings/host-persist-loop.md
Normal file
@@ -0,0 +1,94 @@
|
||||
# host-persist loop agent (single agent, builds the durable storage host)
|
||||
|
||||
Role: make `lib/persist`'s durable backend **actually durable**. The persist
|
||||
substrate (`lib/persist/**`, 201/201 tests) performs `{:op "persist/..." :args}`
|
||||
IO requests for every storage op; under `sx_server.exe` today nothing services
|
||||
them, so writes silently vanish. You build the **host-side adapter** that answers
|
||||
those ops against real on-disk storage — the one piece standing between persist
|
||||
and "all subsystems share a durable substrate."
|
||||
|
||||
```
|
||||
worktree: /root/rose-ash-loops/host-persist
|
||||
branch: loops/host-persist (push origin/loops/host-persist; NEVER main/architecture)
|
||||
```
|
||||
|
||||
## The authoritative contract — read this first, every restart
|
||||
|
||||
`plans/persist-on-sx.md` → **Blockers → "OPEN — host durable-storage adapter"**.
|
||||
That entry is the spec: the silent-data-loss repro, the full op contract table,
|
||||
the hard invariants (monotonic `last-seq`, etc.), the blob adapter shape, where
|
||||
to register in `sx_server.ml`, and the acceptance test. Do not restate it here —
|
||||
read it there and implement it. The reference implementation to mirror is
|
||||
`persist/serve` in `lib/persist/durable.sx` (same op names, same shapes).
|
||||
|
||||
## Restart baseline — check before iterating
|
||||
|
||||
1. Read the Blocker spec (above) + this briefing.
|
||||
2. `git log --oneline -8` on `loops/host-persist` to see what's done.
|
||||
3. Is there a worktree-local build? `ls hosts/ocaml/_build/default/bin/sx_server.exe`.
|
||||
Fresh worktrees have none — the first build is the first task.
|
||||
4. If an acceptance suite exists (e.g. `hosts/ocaml/test/persist_durable_*` or a
|
||||
`lib/persist/tests/durable-real.sx`), run it against the **worktree-built**
|
||||
binary. Green before new work.
|
||||
|
||||
## The queue (phases)
|
||||
|
||||
- **Phase 0 — reproduce.** Confirm the silent-data-loss repro from the spec under
|
||||
this worktree. Builds your mental model; costs one short run.
|
||||
- **Phase 1 — storage module.** A new OCaml module under `hosts/ocaml/` that
|
||||
implements the op contract over **real persistent storage**. Start simple and
|
||||
correct: a filesystem-backed store (one append-only file per stream + a kv
|
||||
file + a per-stream seq high-water file), or SQLite if the toolchain has it.
|
||||
Honour every invariant in the spec — especially: `last-seq` is a monotonic
|
||||
counter stored separately from rows so it survives `truncate`; values
|
||||
round-trip structurally.
|
||||
- **Phase 2 — register.** Wire a `"persist/..."` arm into the kernel's IO
|
||||
resolver (`Sx_types._cek_io_resolver`, ~line 3864 of `hosts/ocaml/bin/sx_server.ml`)
|
||||
and/or the `cek_run_with_io` bridge path (~528–576), dispatching to the storage
|
||||
module. Op names are the contract — do not rename.
|
||||
- **Phase 3 — acceptance.** New tests that use `persist/durable-backend` (REAL
|
||||
`perform`, not the mock) run under the freshly-built worktree binary: the
|
||||
`durable` + `recovery` semantics must pass, and a **real process restart**
|
||||
(start the built server, write, stop it, start again, replay) must recover
|
||||
state from disk. Put host-owned tests under a host path (e.g.
|
||||
`hosts/ocaml/test/`) — do not churn persist's existing suites.
|
||||
- **Phase 4 — blob adapter.** Same pattern for `blob/put|get|has?` backed by a
|
||||
content-addressed directory; persist stores only the ref.
|
||||
|
||||
Every iteration: implement → build → test → commit (short factual message) →
|
||||
push → update `plans/persist-on-sx.md` (tick the Blocker toward CLOSED, append a
|
||||
dated Progress-log line, newest first) → next.
|
||||
|
||||
## Ground rules (hard)
|
||||
|
||||
- **Build is your job** (unlike the persist loop). But build **only in this
|
||||
worktree's `_build`** via `dune` from `/root/rose-ash-loops/host-persist`.
|
||||
**NEVER overwrite the shared binary** at
|
||||
`/root/rose-ash/hosts/ocaml/_build/default/bin/sx_server.exe` — every sibling
|
||||
loop uses it; clobbering it breaks them all. Point acceptance tests at the
|
||||
worktree binary (`hosts/ocaml/_build/default/bin/sx_server.exe` *inside this
|
||||
worktree*).
|
||||
- **First build is slow** (full OCaml). The `sx_build` MCP tool has a ~600s
|
||||
watchdog that may kill it — prefer `dune build bin/sx_server.exe` (or `@all`)
|
||||
run via Bash with `run_in_background: true` and a long timeout, then poll.
|
||||
- **NEVER `pkill sx_server`** — siblings share the process/binary. Start your own
|
||||
server on a throwaway path/port for restart tests and stop only that PID; bound
|
||||
every run with `timeout`.
|
||||
- **Scope:** `hosts/**`, host-owned test files, and the Blocker entry +
|
||||
Progress log in `plans/persist-on-sx.md`. Do **not** modify `lib/persist/**`
|
||||
source (the persist loop owns it; its API is your contract, not your code) —
|
||||
if you need an upstream change, leave a note in the Blocker entry.
|
||||
- **Determinism:** replay from disk must equal the in-memory semantics; same log
|
||||
→ same state.
|
||||
- **Commits:** one feature per commit; push to `origin/loops/host-persist`.
|
||||
- **SX files:** `sx-tree` MCP tools ONLY, `file:` not `path:`, `sx_validate`
|
||||
after edits. (Most of your work is OCaml — edit those with normal tools.)
|
||||
|
||||
## Definition of done
|
||||
|
||||
The Blocker entry flips to **CLOSED**: `persist/durable-backend` writes land on
|
||||
disk, survive a real server restart, and the durable + recovery acceptance suites
|
||||
are green against the worktree-built binary. At that point a subsystem migrated
|
||||
per `lib/persist/examples/acl.sx` is genuinely durable.
|
||||
|
||||
Go. Read the Blocker spec; reproduce the gap; build the storage module.
|
||||
@@ -197,6 +197,30 @@ durable backend. Other subsystem loops copy this pattern; it does not touch the
|
||||
real `lib/acl`.
|
||||
|
||||
## Progress log
|
||||
- **Host durable-storage adapter — blob adapter LIVE, Blocker CLOSED (8/8).**
|
||||
Added content-addressed `blob/put|get|has?` to `sx_persist_store.ml`: bytes
|
||||
land in `<root>/blobs/<cid>` keyed by a CIDv1 (raw codec, sha2-256 via
|
||||
`Sx_cid`/`Sx_sha2`); `put` is idempotent (identical bytes → same cid → same
|
||||
file); persist stores only the `{:cid :size :mime}` ref, never the bytes. The
|
||||
`(eval ...)`/bridge/resolver fall-through already routes any unowned op to the
|
||||
store, so no new wiring was needed. `persist_durable_test.sh` extended:
|
||||
blob round-trip, content-address idempotency, and bytes+ref-in-kv surviving a
|
||||
real process restart — all green. Existing mock blob suite 14/0 against the
|
||||
worktree binary. The host durable-storage Blocker is now CLOSED.
|
||||
- **Host durable-storage adapter — durable+recovery LIVE (5/5 acceptance).**
|
||||
`hosts/ocaml/lib/sx_persist_store.ml` services every `persist/*` IO op against
|
||||
real on-disk storage (append-only log + separate monotonic `.seq` high-water
|
||||
per stream + per-key kv files; SX-serialized, structurally round-tripping).
|
||||
Wired into all three kernel IO sites in `hosts/ocaml/bin/sx_server.ml`: the
|
||||
`(eval ...)` suspension loop (was the silent-no-op), the `cek_run_with_io`
|
||||
bridge, and the in-process `_cek_io_resolver`. The spec's data-loss repro now
|
||||
returns `(3 3 3)` instead of `(1 0 nil)`. `last-seq` climbs across `truncate`;
|
||||
`streams` survives compaction. New `hosts/ocaml/test/persist_durable_test.sh`
|
||||
drives `persist/durable-backend` (REAL `perform`) under the worktree binary —
|
||||
durable, monotonic-seq, streams, kv, and a **real process restart** (write,
|
||||
exit, fresh process, replay from `$SX_PERSIST_DIR`) all green. Existing mock
|
||||
suites unaffected (durable 15/0, recovery 6/0 against worktree binary).
|
||||
Remaining: blob adapter (`blob/put|get|has?`).
|
||||
- **Reference migration: acl grants (201/201).** `lib/persist/examples/acl.sx` —
|
||||
a worked, in-scope template migrating an ACL-grants store from a hand-rolled
|
||||
ephemeral map to persist: grants/revokes as events, current set as a
|
||||
@@ -309,11 +333,26 @@ real `lib/acl`.
|
||||
|
||||
## Blockers
|
||||
|
||||
### OPEN — host durable-storage adapter (the only gap to real durability)
|
||||
### CLOSED — host durable-storage adapter (real durability is live)
|
||||
|
||||
**Owner:** a `hosts/` loop (NOT this one — `lib/persist/**` is the scope fence,
|
||||
and `sx_build` is forbidden here). **Without it, durable persistence silently
|
||||
drops all writes.**
|
||||
**Owner:** the `loops/host-persist` loop (`hosts/**` scope; `lib/persist/**` is
|
||||
its contract, not its code).
|
||||
|
||||
**Resolution (2026-06-06):** `hosts/ocaml/lib/sx_persist_store.ml` services every
|
||||
`persist/*` AND `blob/*` IO op against real on-disk storage, wired into all three
|
||||
IO sites of `hosts/ocaml/bin/sx_server.ml` (the `(eval ...)` suspension loop, the
|
||||
`cek_run_with_io` bridge, and the in-process `_cek_io_resolver`). The
|
||||
silent-data-loss repro below now returns the correct values instead of
|
||||
`(1 0 nil)`. Acceptance + real-process-restart recovery green
|
||||
(`hosts/ocaml/test/persist_durable_test.sh`, 8/8): durable round-trip,
|
||||
monotonic-seq-across-truncate, streams-survive-compaction, kv, blob
|
||||
content-addressing/idempotency, and bytes+refs surviving an actual process
|
||||
restart. Existing mock suites unaffected (blob 14/0, durable 10/0, recovery,
|
||||
against the worktree binary). A subsystem migrated per
|
||||
`lib/persist/examples/acl.sx` is now genuinely durable. Original spec retained
|
||||
below for reference.
|
||||
|
||||
**Was:** without it, durable persistence silently dropped all writes.
|
||||
|
||||
**Symptom / minimal repro.** `persist/durable-backend` performs
|
||||
`{:op "persist/..." :args (...)}` for every storage op. Under `sx_server.exe`
|
||||
|
||||
Reference in New Issue
Block a user