Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 19s
Plans for acl-on-sx (Datalog), flow-on-sx (Scheme), feed-on-sx (APL), mod-on-sx (Prolog), search-on-sx (Haskell). Each is a 4-phase queue sitting on its respective guest language, targeting rose-ash needs: access control, durable workflows, activity feeds, moderation, search. Federation extension in Phase 4 of each (plugs into fed-sx). Briefings for the three loops we're kicking off now: acl-loop, flow-loop, feed-loop. mod-sx and search-sx briefings will follow once the first three have surfaced any shared infrastructure worth extracting to lib/guest/. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
107 lines
4.6 KiB
Markdown
107 lines
4.6 KiB
Markdown
# search-on-sx: Full-text + structured search on Haskell
|
|
|
|
rose-ash needs search across pages, posts, threads, federated content. Tokenize,
|
|
index, query, rank, filter by visibility. Typed ADTs make query parsing clean,
|
|
lazy lists make posting-list iteration efficient, and Haskell-on-SX is at 1514/1514.
|
|
|
|
End-state: a Haskell-on-SX layer with inverted index, query AST, boolean +
|
|
phrase + ranked queries (TF-IDF, BM25), ACL-aware post-filter, and a federation
|
|
extension that merges per-peer indices.
|
|
|
|
## Status (rolling)
|
|
|
|
`bash lib/search/conformance.sh` → **0/0** (not yet started)
|
|
|
|
## Ground rules
|
|
|
|
- **Scope:** only touch `lib/search/**` and `plans/search-on-sx.md`. Do **not** edit
|
|
`spec/`, `hosts/`, `shared/`, `lib/haskell/**`, or other `lib/<lang>/`. You may
|
|
**import** from `lib/haskell/` (public API in `lib/haskell/haskell.sx`); do **not**
|
|
modify Haskell.
|
|
- **Shared-file issues** go under "Blockers" with a minimal repro; do not fix here.
|
|
- **SX files:** use `sx-tree` MCP tools only.
|
|
- **Architecture:** index = `Map Term [(DocId, [Pos])]`. Query AST = ADT. Eval =
|
|
fold of posting lists with set ops + ranking math. Ranking is pure (no IO until
|
|
result emission).
|
|
- **Commits:** one feature per commit. Keep Progress log updated and tick boxes.
|
|
|
|
## Architecture sketch
|
|
|
|
```
|
|
Document Query
|
|
{:id :text :tags} "alice AND bob OR phrase \"x y\""
|
|
│ │
|
|
▼ ▼
|
|
lib/search/tokenize.sx lib/search/parse.sx
|
|
— tokenize :: Text → [Term] — parse :: Text → Query
|
|
— normalize (lowercase, strip) — Query = Term | And | Or
|
|
— (optionally) stem | Not | Phrase
|
|
│ │
|
|
▼ ▼
|
|
lib/search/index.sx lib/search/eval.sx
|
|
— Map Term [(DocId, [Pos])] — eval :: Index → Query → [DocId]
|
|
— insert / delete / lookup — boolean + phrase positions
|
|
— persistence (optional later) │
|
|
│ ▼
|
|
└────────────────► lib/search/rank.sx
|
|
— TF-IDF / BM25 scoring
|
|
— top-N
|
|
│
|
|
▼
|
|
lib/search/api.sx
|
|
— (search/index doc)
|
|
— (search/query q)
|
|
— (search/top n q)
|
|
│
|
|
▼
|
|
lib/search/fed.sx
|
|
— federated query (merge peer results)
|
|
— ACL filter post-merge
|
|
```
|
|
|
|
## Phase 1 — Tokenize + index
|
|
|
|
- [ ] `lib/search/tokenize.sx` — normalize (lowercase, strip punctuation), split on
|
|
whitespace, return positions
|
|
- [ ] `lib/search/index.sx` — inverted index data structure (typed `Map` from
|
|
haskell lib); `insert`, `delete`, `lookup`
|
|
- [ ] `lib/search/api.sx` — `(search/index doc)`, `(search/lookup term)`
|
|
- [ ] `lib/search/tests/index.sx` — 15+ cases: tokenize, insert + lookup, update,
|
|
delete, multi-doc
|
|
- [ ] `lib/search/scoreboard.{json,md}`
|
|
- [ ] `lib/search/conformance.sh`
|
|
|
|
## Phase 2 — Query AST + boolean evaluation
|
|
|
|
- [ ] Query ADT: `Term Text | And Query Query | Or Query Query | Not Query |
|
|
Phrase [Text]`
|
|
- [ ] `lib/search/parse.sx` — query syntax parser (boolean operators, quoted phrases)
|
|
- [ ] `lib/search/eval.sx` — boolean eval via set ops on posting lists
|
|
- [ ] phrase eval — adjacency check using positions
|
|
- [ ] `lib/search/tests/boolean.sx` — 25+ cases: term, and, or, not, phrase,
|
|
composition, parser edge cases
|
|
|
|
## Phase 3 — Ranking
|
|
|
|
- [ ] document frequency tracking — extend index with `df` per term
|
|
- [ ] TF-IDF scoring
|
|
- [ ] BM25 scoring (configurable k1, b)
|
|
- [ ] top-N retrieval (heap-based)
|
|
- [ ] `lib/search/tests/rank.sx` — 20+ cases: TF-IDF behavior, BM25 vs TF-IDF,
|
|
ranking stability, top-N correctness
|
|
|
|
## Phase 4 — ACL filter + federation
|
|
|
|
- [ ] post-filter — each candidate result tested via `(acl/permit? viewer :read doc)`
|
|
- [ ] federated query — fan out to peer instances via fed-sx, merge results
|
|
- [ ] merge policy — interleave by rank, dedupe by `(peer, doc-id)`
|
|
- [ ] `lib/search/tests/integration.sx` — federated search with ACL filter
|
|
|
|
## Progress log
|
|
|
|
(loop fills this in)
|
|
|
|
## Blockers
|
|
|
|
(loop fills this in)
|