# search-on-sx: Full-text + structured search on Haskell rose-ash needs search across pages, posts, threads, federated content. Tokenize, index, query, rank, filter by visibility. Typed ADTs make query parsing clean, lazy lists make posting-list iteration efficient, and Haskell-on-SX is at 1514/1514. End-state: a Haskell-on-SX layer with inverted index, query AST, boolean + phrase + ranked queries (TF-IDF, BM25), ACL-aware post-filter, and a federation extension that merges per-peer indices. ## Status (rolling) `bash lib/search/conformance.sh` → **0/0** (not yet started) ## Ground rules - **Scope:** only touch `lib/search/**` and `plans/search-on-sx.md`. Do **not** edit `spec/`, `hosts/`, `shared/`, `lib/haskell/**`, or other `lib//`. You may **import** from `lib/haskell/` (public API in `lib/haskell/haskell.sx`); do **not** modify Haskell. - **Shared-file issues** go under "Blockers" with a minimal repro; do not fix here. - **SX files:** use `sx-tree` MCP tools only. - **Architecture:** index = `Map Term [(DocId, [Pos])]`. Query AST = ADT. Eval = fold of posting lists with set ops + ranking math. Ranking is pure (no IO until result emission). - **Commits:** one feature per commit. Keep Progress log updated and tick boxes. ## Architecture sketch ``` Document Query {:id :text :tags} "alice AND bob OR phrase \"x y\"" │ │ ▼ ▼ lib/search/tokenize.sx lib/search/parse.sx — tokenize :: Text → [Term] — parse :: Text → Query — normalize (lowercase, strip) — Query = Term | And | Or — (optionally) stem | Not | Phrase │ │ ▼ ▼ lib/search/index.sx lib/search/eval.sx — Map Term [(DocId, [Pos])] — eval :: Index → Query → [DocId] — insert / delete / lookup — boolean + phrase positions — persistence (optional later) │ │ ▼ └────────────────► lib/search/rank.sx — TF-IDF / BM25 scoring — top-N │ ▼ lib/search/api.sx — (search/index doc) — (search/query q) — (search/top n q) │ ▼ lib/search/fed.sx — federated query (merge peer results) — ACL filter post-merge ``` ## Phase 1 — Tokenize + index - [ ] `lib/search/tokenize.sx` — normalize (lowercase, strip punctuation), split on whitespace, return positions - [ ] `lib/search/index.sx` — inverted index data structure (typed `Map` from haskell lib); `insert`, `delete`, `lookup` - [ ] `lib/search/api.sx` — `(search/index doc)`, `(search/lookup term)` - [ ] `lib/search/tests/index.sx` — 15+ cases: tokenize, insert + lookup, update, delete, multi-doc - [ ] `lib/search/scoreboard.{json,md}` - [ ] `lib/search/conformance.sh` ## Phase 2 — Query AST + boolean evaluation - [ ] Query ADT: `Term Text | And Query Query | Or Query Query | Not Query | Phrase [Text]` - [ ] `lib/search/parse.sx` — query syntax parser (boolean operators, quoted phrases) - [ ] `lib/search/eval.sx` — boolean eval via set ops on posting lists - [ ] phrase eval — adjacency check using positions - [ ] `lib/search/tests/boolean.sx` — 25+ cases: term, and, or, not, phrase, composition, parser edge cases ## Phase 3 — Ranking - [ ] document frequency tracking — extend index with `df` per term - [ ] TF-IDF scoring - [ ] BM25 scoring (configurable k1, b) - [ ] top-N retrieval (heap-based) - [ ] `lib/search/tests/rank.sx` — 20+ cases: TF-IDF behavior, BM25 vs TF-IDF, ranking stability, top-N correctness ## Phase 4 — ACL filter + federation - [ ] post-filter — each candidate result tested via `(acl/permit? viewer :read doc)` - [ ] federated query — fan out to peer instances via fed-sx, merge results - [ ] merge policy — interleave by rank, dedupe by `(peer, doc-id)` - [ ] `lib/search/tests/integration.sx` — federated search with ACL filter ## Progress log (loop fills this in) ## Blockers (loop fills this in)