search: Phase 2 query AST + boolean/phrase eval + 28 tests

Query ADT (Term|And|Or|Not|Phrase) and evalQuery over docid-sorted posting lists: boolean ops as linear merges, Not over the allDocs universe, Phrase via positional adjacency. Batched both test suites into one program eval each (search-batch) so they finish under heavy CPU load. 46/46. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 18:47:42 +00:00
parent b8cf3eb1b8
commit 0f0da0319c
9 changed files with 264 additions and 125 deletions
--- a/plans/search-on-sx.md
+++ b/plans/search-on-sx.md
@@ -76,13 +76,14 @@ lib/search/index.sx                     lib/search/eval.sx

 ## Phase 2 — Query AST + boolean evaluation

- [ ] Query ADT: `Term Text | And Query Query | Or Query Query | Not Query |
-  Phrase [Text]`
+- [x] Query ADT: `Term String | And Query Query | Or Query Query | Not Query |
+  Phrase [String]` (in `lib/search/query.sx`)
 - [ ] `lib/search/parse.sx` — query syntax parser (boolean operators, quoted phrases)
- [ ] `lib/search/eval.sx` — boolean eval via set ops on posting lists
- [ ] phrase eval — adjacency check using positions
- [ ] `lib/search/tests/boolean.sx` — 25+ cases: term, and, or, not, phrase,
-  composition, parser edge cases
+- [x] `lib/search/query.sx` — boolean eval via set ops on docid-sorted posting lists
+  (sortedUnion/Inter/Diff, Not over allDocs universe)
+- [x] phrase eval — positional adjacency check (phraseInDoc / phraseStartsAt)
+- [x] `lib/search/tests/boolean.sx` — 28 cases: term, and, or, not, phrase,
+  composition (parser edge cases move to the parse.sx suite)

 ## Phase 3 — Ranking

@@ -102,6 +103,14 @@ lib/search/index.sx                     lib/search/eval.sx

 ## Progress log

+- **Phase 2 boolean/phrase eval (46/46 total).** Query ADT
+  `Term|And|Or|Not|Phrase` + `evalQuery :: Index -> Query -> [DocId]` in query.sx.
+  Boolean ops are linear merges over docid-sorted posting lists; Not subtracts from
+  the allDocs universe; Phrase checks positional adjacency. 28 tests in boolean.sx.
+  Refactored both suites to **batch all cases into one program eval** (search-batch
+  in testlib) — under the heavy CPU load on this box (~11 on 2 cores), 18–28 separate
+  hk-eval-program calls timed out; one combined eval per suite is ~20× faster.
+  Parser (parse.sx) is the remaining Phase 2 box.
 - **Phase 1 complete (18/18).** Tokenizer (lowercase + strip punctuation + positions),
  inverted index as sorted assoc-list `[(Term,[(DocId,[Pos])])]`, indexDoc/deleteDoc/
  lookupTerm/docFreq/allTerms. Search lib is Haskell source assembled into `search/src`