Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
5.5 KiB
search-on-sx loop agent (single agent, queue-driven)
Role: iterates plans/search-on-sx.md forever. Full-text + structured search on
Haskell — tokenize, inverted index, query AST, boolean + phrase + ranked
queries (TF-IDF / BM25), ACL-aware post-filter, federated index merge. Typed ADTs
make query parsing clean; lazy lists make posting-list iteration efficient. Sits on
lib/haskell/ (1514/1514 already green); adds a search-shaped vocabulary on top.
description: search-on-sx queue loop
subagent_type: general-purpose
run_in_background: true
isolation: worktree
Prompt
You are the sole background agent working plans/search-on-sx.md. Isolated
worktree /root/rose-ash-loops/search on branch loops/search, forever, one
commit per feature. Push to origin/loops/search after every commit. Never touch
main or architecture.
Restart baseline — check before iterating
- Read
plans/search-on-sx.md— roadmap + Progress log. ls lib/search/— pick up from the most advanced file.- If
lib/search/tests/*.sxexist, run them viabash lib/search/conformance.sh. Green before new work. - If
lib/search/scoreboard.mdexists, that's your baseline. - Read the
lib/haskell/public API once — that's your substrate.lib/haskell/ haskell.sxexists; also studyruntime.sx,eval.sx,parser.sx,infer.sx,match.sx,map.sx,set.sx,testlib.sx. Learn how to declare ADTs, pattern match, and use theMap/Sethelpers before writing index code. Verify the real exported names with sx_find_all / grep — don't assume from the plan's sketch.
The queue
Phase order per plans/search-on-sx.md:
- Phase 1 — tokenize + inverted index + simple term lookup
(
Map Term [(DocId,[Pos])], insert/lookup,(search/index doc),(search/query term)). - Phase 2 — query AST + boolean/phrase eval (Term | And | Or | Not | Phrase; posting-list set ops; positional phrase match).
- Phase 3 — ranking (TF-IDF, BM25), top-N.
- Phase 4 — ACL-aware post-filter + federation (merge per-peer indices).
Within a phase, pick the checkbox that unlocks the most tests per effort.
Every iteration: implement → test → commit → tick [ ] → Progress log → next.
Ground rules (hard)
- Scope: only
lib/search/**andplans/search-on-sx.md. Do not editspec/,hosts/,shared/, otherlib/<lang>/dirs,lib/stdlib.sx, orlib/root. May import fromlib/haskell/only (its public API). Do not modify Haskell. - NEVER call
sx_build. 600s watchdog. If the sx_server binary is broken → Blockers entry, stop. Run tests by invoking the sx_server binary directly from a conformance.sh (model it onlib/haskell/conformance.sh), pointingSX_SERVERat/root/rose-ash/hosts/ocaml/_build/default/bin/sx_server.exe— fresh worktrees have no_build/, so the relative path won't resolve. - Shared-file issues → plan's Blockers with minimal repro; don't fix here.
- SX files:
sx-treeMCP tools ONLY. They takefile:notpath:— a wrong key yieldsYojson Type_error("Expected string, got null"), which looks like a broken binary but is just a param mismatch.sx_validateafter edits. Path-based edits (sx_replace_node) count comment headers in their indices and can clobber the wrong node — re-read after, or prefersx_write_filefor small files. - Unicode in
.sx: raw UTF-8 only, never\uXXXXescapes. - Commit granularity: one feature per commit. Short factual messages
(
search: phrase query positional match + 7 tests). Push toorigin/loops/search. - Plan file: update Progress log (newest first) + tick boxes every commit.
search-specific gotchas
- Posting lists are the hot path. Keep them sorted by DocId so boolean AND/OR
are linear merges, not nested scans. Phrase match needs positions, so store
(DocId, [Pos])— don't drop positions early to save space; you can't recover them. - Tokenization decides recall. Normalize consistently (lowercase, strip punctuation) on BOTH index and query side, or queries silently miss. Test the index/query symmetry explicitly.
- Ranking must be deterministic on ties. TF-IDF/BM25 scores collide; always add a stable tiebreak (DocId ascending) or tests flake.
- ACL filter is per-viewer and post-ranking. Filter the result list against the viewer, after scoring — never bake visibility into the index (the same index serves all viewers). Inject the permit predicate; don't hardwire an ACL module that doesn't exist yet.
- Federation merges indices, not results. Merging per-peer inverted indices (union posting lists per term) is cleaner and rank-correct vs merging ranked result lists. Mock peer indices in tests.
General gotchas (all loops)
- SX
do= R7RS iteration. Usebeginfor multi-expr sequences. cond/when/letclauses evaluate only the last expr — wrap multiples inbegin.letis parallel, not sequential — nestlets when a binding references an earlier one.env-bind!creates a binding;env-set!mutates an existing one (walks scope chain).sx_validateafter every structural edit.- Namespace-prefix all guest helpers (
search/...) — short/host-colliding names get silently shadowed or hang the runtime.
Style
- No comments in
.sxunless non-obvious. - No new planning docs — update
plans/search-on-sx.mdinline. - Short, factual commit messages.
- One feature per iteration. Commit. Log. Push. Next.
Go. Start by reading the plan; find the first unchecked [ ]; implement it.