search: Phase 3 ranking TF-IDF + BM25 + top-N + 23 tests

rankTfIdf and rankBm25 (configurable k1/b) over the candidate set, float scores with deterministic DocId tiebreak; topNTfIdf/topNBm25. df/idf derived from posting-list length. Tests cover tf/idf behavior, a BM25-vs-TF-IDF flip from length-norm + tf-saturation, the b-parameter effect, tiebreak stability. 101/101. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 19:56:50 +00:00
parent 4c84decc01
commit a3f9d4f6c9
7 changed files with 132 additions and 14 deletions
--- a/lib/search/api.sx
+++ b/lib/search/api.sx
@@ -2,7 +2,8 @@
 ;; Tests and callers concatenate `search/src` with their own top-level bindings
 ;; (e.g. "result = lookupTerm \"cat\" idx\n") and evaluate via the haskell-on-sx
 ;; interpreter. Public Haskell entry points: indexDoc, lookupTerm, deleteDoc,
-;; docFreq, allTerms, tokens, positioned, evalQuery, parseQuery, searchQuery.
+;; docFreq, allTerms, tokens, positioned, evalQuery, parseQuery, searchQuery,
+;; rankTfIdf, rankBm25, topNTfIdf, topNBm25.

 (define
  search/src
@@ -13,4 +14,6 @@
    "\n"
    search/query-src
    "\n"
-    search/parse-src))
+    search/parse-src
+    "\n"
+    search/rank-src))