search: Phase 2 query parser + 32 tests
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 46s

Query tokenizer + recursive-descent parser: OR<AND<NOT precedence, implicit AND
on adjacency, quoted phrases, parens, case-insensitive keywords. parseQuery,
searchQuery, showQ. Worked around haskell-on-sx parser limits (ord-based
delimiters; multi-clause fns instead of []-pattern case alts). 78/78.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-06 19:43:10 +00:00
parent 0f0da0319c
commit 4c84decc01
7 changed files with 189 additions and 9 deletions

View File

@@ -10,7 +10,7 @@ extension that merges per-peer indices.
## Status (rolling)
`bash lib/search/conformance.sh`**18/18** (Phase 1 complete)
`bash lib/search/conformance.sh`**78/78** (Phases 12 complete)
## Ground rules
@@ -78,7 +78,9 @@ lib/search/index.sx lib/search/eval.sx
- [x] Query ADT: `Term String | And Query Query | Or Query Query | Not Query |
Phrase [String]` (in `lib/search/query.sx`)
- [ ] `lib/search/parse.sx` — query syntax parser (boolean operators, quoted phrases)
- [x] `lib/search/parse.sx` — query syntax parser: tokenizer + recursive-descent
(OR < AND < NOT precedence, implicit AND on adjacency, quoted phrases, parens,
case-insensitive keywords); `parseQuery`, `searchQuery`, `showQ`
- [x] `lib/search/query.sx` — boolean eval via set ops on docid-sorted posting lists
(sortedUnion/Inter/Diff, Not over allDocs universe)
- [x] phrase eval — positional adjacency check (phraseInDoc / phraseStartsAt)
@@ -103,6 +105,16 @@ lib/search/index.sx lib/search/eval.sx
## Progress log
- **Phase 2 complete — parser (78/78 total).** Query tokenizer (ord-based
delimiters, quoted phrases) + recursive-descent parser with OR<AND<NOT precedence,
implicit AND on adjacency, parens, case-insensitive keywords. `parseQuery`,
`searchQuery`, `showQ` (canonical render for AST tests). 32 tests in parse.sx.
**haskell-on-sx parser gotchas hit while writing this (see parse.sx header):**
(1) escaped char literals like `'\"'` break the tokenizer — match delimiters by
`ord c == 34`; (2) an `[]` *pattern* inside a `case` alt breaks the parser — use
multi-clause functions instead; (3) `case`/constructor patterns and `let (a,b)=..`
are fine. Embedded Haskell string literals in a `.sx` source string need single
`\"`, not `\\\"`.
- **Phase 2 boolean/phrase eval (46/46 total).** Query ADT
`Term|And|Or|Not|Phrase` + `evalQuery :: Index -> Query -> [DocId]` in query.sx.
Boolean ops are linear merges over docid-sorted posting lists; Not subtracts from