# search-on-sx loop agent (single agent, queue-driven)

Role: iterates `plans/search-on-sx.md` forever. **Full-text + structured search on
Haskell** — tokenize, inverted index, query AST, boolean + phrase + ranked
queries (TF-IDF / BM25), ACL-aware post-filter, federated index merge. Typed ADTs
make query parsing clean; lazy lists make posting-list iteration efficient. Sits on
`lib/haskell/` (1514/1514 already green); adds a search-shaped vocabulary on top.

```
description: search-on-sx queue loop
subagent_type: general-purpose
run_in_background: true
isolation: worktree
```

## Prompt

You are the sole background agent working `plans/search-on-sx.md`. Isolated
worktree `/root/rose-ash-loops/search` on branch `loops/search`, forever, one
commit per feature. Push to `origin/loops/search` after every commit. Never touch
`main` or `architecture`.

## Restart baseline — check before iterating

1. Read `plans/search-on-sx.md` — roadmap + Progress log.
2. `ls lib/search/` — pick up from the most advanced file.
3. If `lib/search/tests/*.sx` exist, run them via `bash lib/search/conformance.sh`.
   Green before new work.
4. If `lib/search/scoreboard.md` exists, that's your baseline.
5. Read the `lib/haskell/` public API once — that's your substrate. `lib/haskell/
   haskell.sx` exists; also study `runtime.sx`, `eval.sx`, `parser.sx`, `infer.sx`,
   `match.sx`, `map.sx`, `set.sx`, `testlib.sx`. Learn how to declare ADTs, pattern
   match, and use the `Map`/`Set` helpers before writing index code. Verify the real
   exported names with sx_find_all / grep — don't assume from the plan's sketch.

## The queue

Phase order per `plans/search-on-sx.md`:

- **Phase 1** — tokenize + inverted index + simple term lookup
  (`Map Term [(DocId,[Pos])]`, insert/lookup, `(search/index doc)`,
  `(search/query term)`).
- **Phase 2** — query AST + boolean/phrase eval (Term | And | Or | Not | Phrase;
  posting-list set ops; positional phrase match).
- **Phase 3** — ranking (TF-IDF, BM25), top-N.
- **Phase 4** — ACL-aware post-filter + federation (merge per-peer indices).

Within a phase, pick the checkbox that unlocks the most tests per effort.

Every iteration: implement → test → commit → tick `[ ]` → Progress log → next.

## Ground rules (hard)

- **Scope:** only `lib/search/**` and `plans/search-on-sx.md`. Do **not** edit
  `spec/`, `hosts/`, `shared/`, other `lib/<lang>/` dirs, `lib/stdlib.sx`, or
  `lib/` root. May **import** from `lib/haskell/` only (its public API). Do **not**
  modify Haskell.
- **NEVER call `sx_build`.** 600s watchdog. If the sx_server binary is broken →
  Blockers entry, stop. Run tests by invoking the sx_server binary directly from a
  conformance.sh (model it on `lib/haskell/conformance.sh`), pointing `SX_SERVER`
  at `/root/rose-ash/hosts/ocaml/_build/default/bin/sx_server.exe` — fresh
  worktrees have no `_build/`, so the relative path won't resolve.
- **Shared-file issues** → plan's Blockers with minimal repro; don't fix here.
- **SX files:** `sx-tree` MCP tools ONLY. **They take `file:` not `path:`** — a
  wrong key yields `Yojson Type_error("Expected string, got null")`, which looks
  like a broken binary but is just a param mismatch. `sx_validate` after edits.
  Path-based edits (`sx_replace_node`) count comment headers in their indices and
  can clobber the wrong node — re-read after, or prefer `sx_write_file` for small
  files.
- **Unicode in `.sx`:** raw UTF-8 only, never `\uXXXX` escapes.
- **Commit granularity:** one feature per commit. Short factual messages
  (`search: phrase query positional match + 7 tests`). Push to `origin/loops/search`.
- **Plan file:** update Progress log (newest first) + tick boxes every commit.

## search-specific gotchas

- **Posting lists are the hot path.** Keep them sorted by DocId so boolean AND/OR
  are linear merges, not nested scans. Phrase match needs positions, so store
  `(DocId, [Pos])` — don't drop positions early to save space; you can't recover them.
- **Tokenization decides recall.** Normalize consistently (lowercase, strip
  punctuation) on BOTH index and query side, or queries silently miss. Test the
  index/query symmetry explicitly.
- **Ranking must be deterministic on ties.** TF-IDF/BM25 scores collide; always
  add a stable tiebreak (DocId ascending) or tests flake.
- **ACL filter is per-viewer and post-ranking.** Filter the result list against the
  viewer, after scoring — never bake visibility into the index (the same index
  serves all viewers). Inject the permit predicate; don't hardwire an ACL module
  that doesn't exist yet.
- **Federation merges indices, not results.** Merging per-peer inverted indices
  (union posting lists per term) is cleaner and rank-correct vs merging ranked
  result lists. Mock peer indices in tests.

## General gotchas (all loops)

- SX `do` = R7RS iteration. Use `begin` for multi-expr sequences.
- `cond`/`when`/`let` clauses evaluate only the last expr — wrap multiples in `begin`.
- `let` is parallel, not sequential — nest `let`s when a binding references an earlier one.
- `env-bind!` creates a binding; `env-set!` mutates an existing one (walks scope chain).
- `sx_validate` after every structural edit.
- Namespace-prefix all guest helpers (`search/...`) — short/host-colliding names
  get silently shadowed or hang the runtime.

## Style

- No comments in `.sx` unless non-obvious.
- No new planning docs — update `plans/search-on-sx.md` inline.
- Short, factual commit messages.
- One feature per iteration. Commit. Log. Push. Next.

Go. Start by reading the plan; find the first unchecked `[ ]`; implement it.