Files
rose-ash/plans/search-on-sx.md
giles c3a0727645
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 19s
plans: five rose-ash subsystem plans + three loop briefings
Plans for acl-on-sx (Datalog), flow-on-sx (Scheme), feed-on-sx (APL),
mod-on-sx (Prolog), search-on-sx (Haskell). Each is a 4-phase queue
sitting on its respective guest language, targeting rose-ash needs:
access control, durable workflows, activity feeds, moderation, search.
Federation extension in Phase 4 of each (plugs into fed-sx).

Briefings for the three loops we're kicking off now: acl-loop,
flow-loop, feed-loop. mod-sx and search-sx briefings will follow
once the first three have surfaced any shared infrastructure
worth extracting to lib/guest/.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-06 15:55:39 +00:00

4.6 KiB

search-on-sx: Full-text + structured search on Haskell

rose-ash needs search across pages, posts, threads, federated content. Tokenize, index, query, rank, filter by visibility. Typed ADTs make query parsing clean, lazy lists make posting-list iteration efficient, and Haskell-on-SX is at 1514/1514.

End-state: a Haskell-on-SX layer with inverted index, query AST, boolean + phrase + ranked queries (TF-IDF, BM25), ACL-aware post-filter, and a federation extension that merges per-peer indices.

Status (rolling)

bash lib/search/conformance.sh0/0 (not yet started)

Ground rules

  • Scope: only touch lib/search/** and plans/search-on-sx.md. Do not edit spec/, hosts/, shared/, lib/haskell/**, or other lib/<lang>/. You may import from lib/haskell/ (public API in lib/haskell/haskell.sx); do not modify Haskell.
  • Shared-file issues go under "Blockers" with a minimal repro; do not fix here.
  • SX files: use sx-tree MCP tools only.
  • Architecture: index = Map Term [(DocId, [Pos])]. Query AST = ADT. Eval = fold of posting lists with set ops + ranking math. Ranking is pure (no IO until result emission).
  • Commits: one feature per commit. Keep Progress log updated and tick boxes.

Architecture sketch

Document                               Query
  {:id :text :tags}                       "alice AND bob OR phrase \"x y\""
        │                                       │
        ▼                                       ▼
lib/search/tokenize.sx                  lib/search/parse.sx
  — tokenize :: Text → [Term]             — parse :: Text → Query
  — normalize (lowercase, strip)          — Query = Term | And | Or
  — (optionally) stem                              | Not | Phrase
        │                                       │
        ▼                                       ▼
lib/search/index.sx                     lib/search/eval.sx
  — Map Term [(DocId, [Pos])]             — eval :: Index → Query → [DocId]
  — insert / delete / lookup              — boolean + phrase positions
  — persistence (optional later)                 │
        │                                       ▼
        └────────────────► lib/search/rank.sx
                            — TF-IDF / BM25 scoring
                            — top-N
                                  │
                                  ▼
                          lib/search/api.sx
                            — (search/index doc)
                            — (search/query q)
                            — (search/top n q)
                                  │
                                  ▼
                          lib/search/fed.sx
                            — federated query (merge peer results)
                            — ACL filter post-merge

Phase 1 — Tokenize + index

  • lib/search/tokenize.sx — normalize (lowercase, strip punctuation), split on whitespace, return positions
  • lib/search/index.sx — inverted index data structure (typed Map from haskell lib); insert, delete, lookup
  • lib/search/api.sx(search/index doc), (search/lookup term)
  • lib/search/tests/index.sx — 15+ cases: tokenize, insert + lookup, update, delete, multi-doc
  • lib/search/scoreboard.{json,md}
  • lib/search/conformance.sh

Phase 2 — Query AST + boolean evaluation

  • Query ADT: Term Text | And Query Query | Or Query Query | Not Query | Phrase [Text]
  • lib/search/parse.sx — query syntax parser (boolean operators, quoted phrases)
  • lib/search/eval.sx — boolean eval via set ops on posting lists
  • phrase eval — adjacency check using positions
  • lib/search/tests/boolean.sx — 25+ cases: term, and, or, not, phrase, composition, parser edge cases

Phase 3 — Ranking

  • document frequency tracking — extend index with df per term
  • TF-IDF scoring
  • BM25 scoring (configurable k1, b)
  • top-N retrieval (heap-based)
  • lib/search/tests/rank.sx — 20+ cases: TF-IDF behavior, BM25 vs TF-IDF, ranking stability, top-N correctness

Phase 4 — ACL filter + federation

  • post-filter — each candidate result tested via (acl/permit? viewer :read doc)
  • federated query — fan out to peer instances via fed-sx, merge results
  • merge policy — interleave by rank, dedupe by (peer, doc-id)
  • lib/search/tests/integration.sx — federated search with ACL filter

Progress log

(loop fills this in)

Blockers

(loop fills this in)