searchRankTfIdf/searchRankBm25 parse a boolean query, filter docs via evalQuery,
then rank survivors by relevance over the query's leaf terms (queryTerms) — the
filter-then-rank pattern. 225/225.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A synonym map [(Term,[Term])] expands a query term to itself + synonyms
(expandTerm); synDocs unions and synRankTfIdf ranks the expanded set. 214/214.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
nearDocs k t1 t2 returns docs where both terms occur within k positions
(unordered); candidates from the posting intersection, filtered on positional
postings. 205/205.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Deterministic English suffix stripping (stem), stemText/stemTokens, indexStemmed.
Worked around two haskell-on-sx string gotchas: take/drop over a String yield
char codes (rebuild via joinChars . map chr), and isSuffixOf's reverse trips ++
(manual suffix compare). 196/196.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
highlight marks query-matching (normalized) tokens with [..]; snippet extracts a
context window around the first match. 178/178.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
editDist as an O(m*n) row-based Levenshtein DP (naive recursion is exponential
and times out under load); fuzzyTerms/fuzzyDocs/fuzzyRankTfIdf expand a term to
indexed terms within a max edit distance. 166/166.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
paginate windows a ranked list (take lim . drop off); pageTfIdf/pageBm25 and
resultCount. 148/148.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
prefixTerms matches indexed terms by prefix (allTerms + isPrefixOf); prefixDocs
unions their docs; prefixRankTfIdf ranks via the matched terms. 136/136.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
fedIndex merges per-peer inverted indices (union posting lists per term) after
relabelling local DocIds to global gid = peer*1000 + local — dedupe by
(peer,doc-id) is automatic and positions survive, so ranking runs once over the
merge and interleaves peers by score. ACL is a post-rank filter over an injected
permit predicate (searchTfIdfAcl/topNTfIdfAcl/searchBm25Acl). Roadmap complete,
122/122.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
rankTfIdf and rankBm25 (configurable k1/b) over the candidate set, float scores
with deterministic DocId tiebreak; topNTfIdf/topNBm25. df/idf derived from
posting-list length. Tests cover tf/idf behavior, a BM25-vs-TF-IDF flip from
length-norm + tf-saturation, the b-parameter effect, tiebreak stability. 101/101.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Query ADT (Term|And|Or|Not|Phrase) and evalQuery over docid-sorted posting
lists: boolean ops as linear merges, Not over the allDocs universe, Phrase via
positional adjacency. Batched both test suites into one program eval each
(search-batch) so they finish under heavy CPU load. 46/46.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Tokenizer (lowercase, strip punctuation, positions) and a sorted assoc-list
inverted index [(Term,[(DocId,[Pos])])] with indexDoc/deleteDoc/lookupTerm/
docFreq/allTerms. Search lib is haskell-on-sx source assembled into search/src;
tests reuse hk-test counters via a search-eval helper. conformance.sh models
lib/haskell.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>