diff --git a/plans/agent-briefings/search-loop.md b/plans/agent-briefings/search-loop.md new file mode 100644 index 00000000..ee2346fa --- /dev/null +++ b/plans/agent-briefings/search-loop.md @@ -0,0 +1,110 @@ +# search-on-sx loop agent (single agent, queue-driven) + +Role: iterates `plans/search-on-sx.md` forever. **Full-text + structured search on +Haskell** — tokenize, inverted index, query AST, boolean + phrase + ranked +queries (TF-IDF / BM25), ACL-aware post-filter, federated index merge. Typed ADTs +make query parsing clean; lazy lists make posting-list iteration efficient. Sits on +`lib/haskell/` (1514/1514 already green); adds a search-shaped vocabulary on top. + +``` +description: search-on-sx queue loop +subagent_type: general-purpose +run_in_background: true +isolation: worktree +``` + +## Prompt + +You are the sole background agent working `plans/search-on-sx.md`. Isolated +worktree `/root/rose-ash-loops/search` on branch `loops/search`, forever, one +commit per feature. Push to `origin/loops/search` after every commit. Never touch +`main` or `architecture`. + +## Restart baseline — check before iterating + +1. Read `plans/search-on-sx.md` — roadmap + Progress log. +2. `ls lib/search/` — pick up from the most advanced file. +3. If `lib/search/tests/*.sx` exist, run them via `bash lib/search/conformance.sh`. + Green before new work. +4. If `lib/search/scoreboard.md` exists, that's your baseline. +5. Read the `lib/haskell/` public API once — that's your substrate. `lib/haskell/ + haskell.sx` exists; also study `runtime.sx`, `eval.sx`, `parser.sx`, `infer.sx`, + `match.sx`, `map.sx`, `set.sx`, `testlib.sx`. Learn how to declare ADTs, pattern + match, and use the `Map`/`Set` helpers before writing index code. Verify the real + exported names with sx_find_all / grep — don't assume from the plan's sketch. + +## The queue + +Phase order per `plans/search-on-sx.md`: + +- **Phase 1** — tokenize + inverted index + simple term lookup + (`Map Term [(DocId,[Pos])]`, insert/lookup, `(search/index doc)`, + `(search/query term)`). +- **Phase 2** — query AST + boolean/phrase eval (Term | And | Or | Not | Phrase; + posting-list set ops; positional phrase match). +- **Phase 3** — ranking (TF-IDF, BM25), top-N. +- **Phase 4** — ACL-aware post-filter + federation (merge per-peer indices). + +Within a phase, pick the checkbox that unlocks the most tests per effort. + +Every iteration: implement → test → commit → tick `[ ]` → Progress log → next. + +## Ground rules (hard) + +- **Scope:** only `lib/search/**` and `plans/search-on-sx.md`. Do **not** edit + `spec/`, `hosts/`, `shared/`, other `lib//` dirs, `lib/stdlib.sx`, or + `lib/` root. May **import** from `lib/haskell/` only (its public API). Do **not** + modify Haskell. +- **NEVER call `sx_build`.** 600s watchdog. If the sx_server binary is broken → + Blockers entry, stop. Run tests by invoking the sx_server binary directly from a + conformance.sh (model it on `lib/haskell/conformance.sh`), pointing `SX_SERVER` + at `/root/rose-ash/hosts/ocaml/_build/default/bin/sx_server.exe` — fresh + worktrees have no `_build/`, so the relative path won't resolve. +- **Shared-file issues** → plan's Blockers with minimal repro; don't fix here. +- **SX files:** `sx-tree` MCP tools ONLY. **They take `file:` not `path:`** — a + wrong key yields `Yojson Type_error("Expected string, got null")`, which looks + like a broken binary but is just a param mismatch. `sx_validate` after edits. + Path-based edits (`sx_replace_node`) count comment headers in their indices and + can clobber the wrong node — re-read after, or prefer `sx_write_file` for small + files. +- **Unicode in `.sx`:** raw UTF-8 only, never `\uXXXX` escapes. +- **Commit granularity:** one feature per commit. Short factual messages + (`search: phrase query positional match + 7 tests`). Push to `origin/loops/search`. +- **Plan file:** update Progress log (newest first) + tick boxes every commit. + +## search-specific gotchas + +- **Posting lists are the hot path.** Keep them sorted by DocId so boolean AND/OR + are linear merges, not nested scans. Phrase match needs positions, so store + `(DocId, [Pos])` — don't drop positions early to save space; you can't recover them. +- **Tokenization decides recall.** Normalize consistently (lowercase, strip + punctuation) on BOTH index and query side, or queries silently miss. Test the + index/query symmetry explicitly. +- **Ranking must be deterministic on ties.** TF-IDF/BM25 scores collide; always + add a stable tiebreak (DocId ascending) or tests flake. +- **ACL filter is per-viewer and post-ranking.** Filter the result list against the + viewer, after scoring — never bake visibility into the index (the same index + serves all viewers). Inject the permit predicate; don't hardwire an ACL module + that doesn't exist yet. +- **Federation merges indices, not results.** Merging per-peer inverted indices + (union posting lists per term) is cleaner and rank-correct vs merging ranked + result lists. Mock peer indices in tests. + +## General gotchas (all loops) + +- SX `do` = R7RS iteration. Use `begin` for multi-expr sequences. +- `cond`/`when`/`let` clauses evaluate only the last expr — wrap multiples in `begin`. +- `let` is parallel, not sequential — nest `let`s when a binding references an earlier one. +- `env-bind!` creates a binding; `env-set!` mutates an existing one (walks scope chain). +- `sx_validate` after every structural edit. +- Namespace-prefix all guest helpers (`search/...`) — short/host-colliding names + get silently shadowed or hang the runtime. + +## Style + +- No comments in `.sx` unless non-obvious. +- No new planning docs — update `plans/search-on-sx.md` inline. +- Short, factual commit messages. +- One feature per iteration. Commit. Log. Push. Next. + +Go. Start by reading the plan; find the first unchecked `[ ]`; implement it.