Files
rose-ash/plans/agent-briefings/host-persist-loop.md
giles 65f274c573
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 42s
briefings: add host-persist loop briefing (durable storage host adapter)
Briefing for the loop that builds the host-side servicer for persist/* IO ops,
making lib/persist's durable backend actually durable. Points at the Blocker
spec in plans/persist-on-sx.md as the authoritative contract; hard rules on
build isolation (worktree _build only, never clobber the shared binary) and not
pkilling the shared sx_server.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 22:18:03 +00:00

95 lines
5.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# host-persist loop agent (single agent, builds the durable storage host)
Role: make `lib/persist`'s durable backend **actually durable**. The persist
substrate (`lib/persist/**`, 201/201 tests) performs `{:op "persist/..." :args}`
IO requests for every storage op; under `sx_server.exe` today nothing services
them, so writes silently vanish. You build the **host-side adapter** that answers
those ops against real on-disk storage — the one piece standing between persist
and "all subsystems share a durable substrate."
```
worktree: /root/rose-ash-loops/host-persist
branch: loops/host-persist (push origin/loops/host-persist; NEVER main/architecture)
```
## The authoritative contract — read this first, every restart
`plans/persist-on-sx.md`**Blockers → "OPEN — host durable-storage adapter"**.
That entry is the spec: the silent-data-loss repro, the full op contract table,
the hard invariants (monotonic `last-seq`, etc.), the blob adapter shape, where
to register in `sx_server.ml`, and the acceptance test. Do not restate it here —
read it there and implement it. The reference implementation to mirror is
`persist/serve` in `lib/persist/durable.sx` (same op names, same shapes).
## Restart baseline — check before iterating
1. Read the Blocker spec (above) + this briefing.
2. `git log --oneline -8` on `loops/host-persist` to see what's done.
3. Is there a worktree-local build? `ls hosts/ocaml/_build/default/bin/sx_server.exe`.
Fresh worktrees have none — the first build is the first task.
4. If an acceptance suite exists (e.g. `hosts/ocaml/test/persist_durable_*` or a
`lib/persist/tests/durable-real.sx`), run it against the **worktree-built**
binary. Green before new work.
## The queue (phases)
- **Phase 0 — reproduce.** Confirm the silent-data-loss repro from the spec under
this worktree. Builds your mental model; costs one short run.
- **Phase 1 — storage module.** A new OCaml module under `hosts/ocaml/` that
implements the op contract over **real persistent storage**. Start simple and
correct: a filesystem-backed store (one append-only file per stream + a kv
file + a per-stream seq high-water file), or SQLite if the toolchain has it.
Honour every invariant in the spec — especially: `last-seq` is a monotonic
counter stored separately from rows so it survives `truncate`; values
round-trip structurally.
- **Phase 2 — register.** Wire a `"persist/..."` arm into the kernel's IO
resolver (`Sx_types._cek_io_resolver`, ~line 3864 of `hosts/ocaml/bin/sx_server.ml`)
and/or the `cek_run_with_io` bridge path (~528576), dispatching to the storage
module. Op names are the contract — do not rename.
- **Phase 3 — acceptance.** New tests that use `persist/durable-backend` (REAL
`perform`, not the mock) run under the freshly-built worktree binary: the
`durable` + `recovery` semantics must pass, and a **real process restart**
(start the built server, write, stop it, start again, replay) must recover
state from disk. Put host-owned tests under a host path (e.g.
`hosts/ocaml/test/`) — do not churn persist's existing suites.
- **Phase 4 — blob adapter.** Same pattern for `blob/put|get|has?` backed by a
content-addressed directory; persist stores only the ref.
Every iteration: implement → build → test → commit (short factual message) →
push → update `plans/persist-on-sx.md` (tick the Blocker toward CLOSED, append a
dated Progress-log line, newest first) → next.
## Ground rules (hard)
- **Build is your job** (unlike the persist loop). But build **only in this
worktree's `_build`** via `dune` from `/root/rose-ash-loops/host-persist`.
**NEVER overwrite the shared binary** at
`/root/rose-ash/hosts/ocaml/_build/default/bin/sx_server.exe` — every sibling
loop uses it; clobbering it breaks them all. Point acceptance tests at the
worktree binary (`hosts/ocaml/_build/default/bin/sx_server.exe` *inside this
worktree*).
- **First build is slow** (full OCaml). The `sx_build` MCP tool has a ~600s
watchdog that may kill it — prefer `dune build bin/sx_server.exe` (or `@all`)
run via Bash with `run_in_background: true` and a long timeout, then poll.
- **NEVER `pkill sx_server`** — siblings share the process/binary. Start your own
server on a throwaway path/port for restart tests and stop only that PID; bound
every run with `timeout`.
- **Scope:** `hosts/**`, host-owned test files, and the Blocker entry +
Progress log in `plans/persist-on-sx.md`. Do **not** modify `lib/persist/**`
source (the persist loop owns it; its API is your contract, not your code) —
if you need an upstream change, leave a note in the Blocker entry.
- **Determinism:** replay from disk must equal the in-memory semantics; same log
→ same state.
- **Commits:** one feature per commit; push to `origin/loops/host-persist`.
- **SX files:** `sx-tree` MCP tools ONLY, `file:` not `path:`, `sx_validate`
after edits. (Most of your work is OCaml — edit those with normal tools.)
## Definition of done
The Blocker entry flips to **CLOSED**: `persist/durable-backend` writes land on
disk, survive a real server restart, and the durable + recovery acceptance suites
are green against the worktree-built binary. At that point a subsystem migrated
per `lib/persist/examples/acl.sx` is genuinely durable.
Go. Read the Blocker spec; reproduce the gap; build the storage module.