Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 19s
Plans for acl-on-sx (Datalog), flow-on-sx (Scheme), feed-on-sx (APL), mod-on-sx (Prolog), search-on-sx (Haskell). Each is a 4-phase queue sitting on its respective guest language, targeting rose-ash needs: access control, durable workflows, activity feeds, moderation, search. Federation extension in Phase 4 of each (plugs into fed-sx). Briefings for the three loops we're kicking off now: acl-loop, flow-loop, feed-loop. mod-sx and search-sx briefings will follow once the first three have surfaced any shared infrastructure worth extracting to lib/guest/. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4.6 KiB
4.6 KiB
search-on-sx: Full-text + structured search on Haskell
rose-ash needs search across pages, posts, threads, federated content. Tokenize, index, query, rank, filter by visibility. Typed ADTs make query parsing clean, lazy lists make posting-list iteration efficient, and Haskell-on-SX is at 1514/1514.
End-state: a Haskell-on-SX layer with inverted index, query AST, boolean + phrase + ranked queries (TF-IDF, BM25), ACL-aware post-filter, and a federation extension that merges per-peer indices.
Status (rolling)
bash lib/search/conformance.sh → 0/0 (not yet started)
Ground rules
- Scope: only touch
lib/search/**andplans/search-on-sx.md. Do not editspec/,hosts/,shared/,lib/haskell/**, or otherlib/<lang>/. You may import fromlib/haskell/(public API inlib/haskell/haskell.sx); do not modify Haskell. - Shared-file issues go under "Blockers" with a minimal repro; do not fix here.
- SX files: use
sx-treeMCP tools only. - Architecture: index =
Map Term [(DocId, [Pos])]. Query AST = ADT. Eval = fold of posting lists with set ops + ranking math. Ranking is pure (no IO until result emission). - Commits: one feature per commit. Keep Progress log updated and tick boxes.
Architecture sketch
Document Query
{:id :text :tags} "alice AND bob OR phrase \"x y\""
│ │
▼ ▼
lib/search/tokenize.sx lib/search/parse.sx
— tokenize :: Text → [Term] — parse :: Text → Query
— normalize (lowercase, strip) — Query = Term | And | Or
— (optionally) stem | Not | Phrase
│ │
▼ ▼
lib/search/index.sx lib/search/eval.sx
— Map Term [(DocId, [Pos])] — eval :: Index → Query → [DocId]
— insert / delete / lookup — boolean + phrase positions
— persistence (optional later) │
│ ▼
└────────────────► lib/search/rank.sx
— TF-IDF / BM25 scoring
— top-N
│
▼
lib/search/api.sx
— (search/index doc)
— (search/query q)
— (search/top n q)
│
▼
lib/search/fed.sx
— federated query (merge peer results)
— ACL filter post-merge
Phase 1 — Tokenize + index
lib/search/tokenize.sx— normalize (lowercase, strip punctuation), split on whitespace, return positionslib/search/index.sx— inverted index data structure (typedMapfrom haskell lib);insert,delete,lookuplib/search/api.sx—(search/index doc),(search/lookup term)lib/search/tests/index.sx— 15+ cases: tokenize, insert + lookup, update, delete, multi-doclib/search/scoreboard.{json,md}lib/search/conformance.sh
Phase 2 — Query AST + boolean evaluation
- Query ADT:
Term Text | And Query Query | Or Query Query | Not Query | Phrase [Text] lib/search/parse.sx— query syntax parser (boolean operators, quoted phrases)lib/search/eval.sx— boolean eval via set ops on posting lists- phrase eval — adjacency check using positions
lib/search/tests/boolean.sx— 25+ cases: term, and, or, not, phrase, composition, parser edge cases
Phase 3 — Ranking
- document frequency tracking — extend index with
dfper term - TF-IDF scoring
- BM25 scoring (configurable k1, b)
- top-N retrieval (heap-based)
lib/search/tests/rank.sx— 20+ cases: TF-IDF behavior, BM25 vs TF-IDF, ranking stability, top-N correctness
Phase 4 — ACL filter + federation
- post-filter — each candidate result tested via
(acl/permit? viewer :read doc) - federated query — fan out to peer instances via fed-sx, merge results
- merge policy — interleave by rank, dedupe by
(peer, doc-id) lib/search/tests/integration.sx— federated search with ACL filter
Progress log
(loop fills this in)
Blockers
(loop fills this in)