Files
rose-ash/plans/agent-briefings/host-persist-loop.md
giles 65f274c573
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 42s
briefings: add host-persist loop briefing (durable storage host adapter)
Briefing for the loop that builds the host-side servicer for persist/* IO ops,
making lib/persist's durable backend actually durable. Points at the Blocker
spec in plans/persist-on-sx.md as the authoritative contract; hard rules on
build isolation (worktree _build only, never clobber the shared binary) and not
pkilling the shared sx_server.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 22:18:03 +00:00

5.3 KiB
Raw Blame History

host-persist loop agent (single agent, builds the durable storage host)

Role: make lib/persist's durable backend actually durable. The persist substrate (lib/persist/**, 201/201 tests) performs {:op "persist/..." :args} IO requests for every storage op; under sx_server.exe today nothing services them, so writes silently vanish. You build the host-side adapter that answers those ops against real on-disk storage — the one piece standing between persist and "all subsystems share a durable substrate."

worktree:    /root/rose-ash-loops/host-persist
branch:      loops/host-persist   (push origin/loops/host-persist; NEVER main/architecture)

The authoritative contract — read this first, every restart

plans/persist-on-sx.mdBlockers → "OPEN — host durable-storage adapter". That entry is the spec: the silent-data-loss repro, the full op contract table, the hard invariants (monotonic last-seq, etc.), the blob adapter shape, where to register in sx_server.ml, and the acceptance test. Do not restate it here — read it there and implement it. The reference implementation to mirror is persist/serve in lib/persist/durable.sx (same op names, same shapes).

Restart baseline — check before iterating

  1. Read the Blocker spec (above) + this briefing.
  2. git log --oneline -8 on loops/host-persist to see what's done.
  3. Is there a worktree-local build? ls hosts/ocaml/_build/default/bin/sx_server.exe. Fresh worktrees have none — the first build is the first task.
  4. If an acceptance suite exists (e.g. hosts/ocaml/test/persist_durable_* or a lib/persist/tests/durable-real.sx), run it against the worktree-built binary. Green before new work.

The queue (phases)

  • Phase 0 — reproduce. Confirm the silent-data-loss repro from the spec under this worktree. Builds your mental model; costs one short run.
  • Phase 1 — storage module. A new OCaml module under hosts/ocaml/ that implements the op contract over real persistent storage. Start simple and correct: a filesystem-backed store (one append-only file per stream + a kv file + a per-stream seq high-water file), or SQLite if the toolchain has it. Honour every invariant in the spec — especially: last-seq is a monotonic counter stored separately from rows so it survives truncate; values round-trip structurally.
  • Phase 2 — register. Wire a "persist/..." arm into the kernel's IO resolver (Sx_types._cek_io_resolver, ~line 3864 of hosts/ocaml/bin/sx_server.ml) and/or the cek_run_with_io bridge path (~528576), dispatching to the storage module. Op names are the contract — do not rename.
  • Phase 3 — acceptance. New tests that use persist/durable-backend (REAL perform, not the mock) run under the freshly-built worktree binary: the durable + recovery semantics must pass, and a real process restart (start the built server, write, stop it, start again, replay) must recover state from disk. Put host-owned tests under a host path (e.g. hosts/ocaml/test/) — do not churn persist's existing suites.
  • Phase 4 — blob adapter. Same pattern for blob/put|get|has? backed by a content-addressed directory; persist stores only the ref.

Every iteration: implement → build → test → commit (short factual message) → push → update plans/persist-on-sx.md (tick the Blocker toward CLOSED, append a dated Progress-log line, newest first) → next.

Ground rules (hard)

  • Build is your job (unlike the persist loop). But build only in this worktree's _build via dune from /root/rose-ash-loops/host-persist. NEVER overwrite the shared binary at /root/rose-ash/hosts/ocaml/_build/default/bin/sx_server.exe — every sibling loop uses it; clobbering it breaks them all. Point acceptance tests at the worktree binary (hosts/ocaml/_build/default/bin/sx_server.exe inside this worktree).
  • First build is slow (full OCaml). The sx_build MCP tool has a ~600s watchdog that may kill it — prefer dune build bin/sx_server.exe (or @all) run via Bash with run_in_background: true and a long timeout, then poll.
  • NEVER pkill sx_server — siblings share the process/binary. Start your own server on a throwaway path/port for restart tests and stop only that PID; bound every run with timeout.
  • Scope: hosts/**, host-owned test files, and the Blocker entry + Progress log in plans/persist-on-sx.md. Do not modify lib/persist/** source (the persist loop owns it; its API is your contract, not your code) — if you need an upstream change, leave a note in the Blocker entry.
  • Determinism: replay from disk must equal the in-memory semantics; same log → same state.
  • Commits: one feature per commit; push to origin/loops/host-persist.
  • SX files: sx-tree MCP tools ONLY, file: not path:, sx_validate after edits. (Most of your work is OCaml — edit those with normal tools.)

Definition of done

The Blocker entry flips to CLOSED: persist/durable-backend writes land on disk, survive a real server restart, and the durable + recovery acceptance suites are green against the worktree-built binary. At that point a subsystem migrated per lib/persist/examples/acl.sx is genuinely durable.

Go. Read the Blocker spec; reproduce the gap; build the storage module.