search: Phase 2 query AST + boolean/phrase eval + 28 tests
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 53s
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 53s
Query ADT (Term|And|Or|Not|Phrase) and evalQuery over docid-sorted posting lists: boolean ops as linear merges, Not over the allDocs universe, Phrase via positional adjacency. Batched both test suites into one program eval each (search-batch) so they finish under heavy CPU load. 46/46. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -2,6 +2,8 @@
|
|||||||
;; Tests and callers concatenate `search/src` with their own top-level bindings
|
;; Tests and callers concatenate `search/src` with their own top-level bindings
|
||||||
;; (e.g. "result = lookupTerm \"cat\" idx\n") and evaluate via the haskell-on-sx
|
;; (e.g. "result = lookupTerm \"cat\" idx\n") and evaluate via the haskell-on-sx
|
||||||
;; interpreter. Public Haskell entry points: indexDoc, lookupTerm, deleteDoc,
|
;; interpreter. Public Haskell entry points: indexDoc, lookupTerm, deleteDoc,
|
||||||
;; docFreq, allTerms, tokens, positioned.
|
;; docFreq, allTerms, tokens, positioned, evalQuery, parseQuery.
|
||||||
|
|
||||||
(define search/src (str search/tokenize-src "\n" search/index-src))
|
(define
|
||||||
|
search/src
|
||||||
|
(str search/tokenize-src "\n" search/index-src "\n" search/query-src))
|
||||||
|
|||||||
@@ -20,10 +20,12 @@ PRELOADS=(
|
|||||||
lib/haskell/testlib.sx
|
lib/haskell/testlib.sx
|
||||||
lib/search/tokenize.sx
|
lib/search/tokenize.sx
|
||||||
lib/search/index.sx
|
lib/search/index.sx
|
||||||
|
lib/search/query.sx
|
||||||
lib/search/api.sx
|
lib/search/api.sx
|
||||||
lib/search/testlib.sx
|
lib/search/testlib.sx
|
||||||
)
|
)
|
||||||
|
|
||||||
SUITES=(
|
SUITES=(
|
||||||
"index:lib/search/tests/index.sx"
|
"index:lib/search/tests/index.sx"
|
||||||
|
"boolean:lib/search/tests/boolean.sx"
|
||||||
)
|
)
|
||||||
|
|||||||
11
lib/search/query.sx
Normal file
11
lib/search/query.sx
Normal file
@@ -0,0 +1,11 @@
|
|||||||
|
;; search query AST + boolean/phrase evaluation — Haskell source fragment.
|
||||||
|
;; Depends on tokenize + index.
|
||||||
|
;; data Query = Term String | And Query Query | Or Query Query
|
||||||
|
;; | Not Query | Phrase [String]
|
||||||
|
;; evalQuery :: Index -> Query -> [DocId] (sorted, unique)
|
||||||
|
;; Boolean ops are linear merges over docid-sorted posting lists; Not uses
|
||||||
|
;; allDocs as the universe; Phrase checks positional adjacency.
|
||||||
|
|
||||||
|
(define
|
||||||
|
search/query-src
|
||||||
|
"data Query = Term String | And Query Query | Or Query Query | Not Query | Phrase [String]\ndocsWith t idx = map fst (lookupTerm t idx)\nsortedUnion [] ys = ys\nsortedUnion xs [] = xs\nsortedUnion (x:xs) (y:ys) = if x < y then x : sortedUnion xs (y:ys) else if x > y then y : sortedUnion (x:xs) ys else x : sortedUnion xs ys\nsortedInter [] ys = []\nsortedInter xs [] = []\nsortedInter (x:xs) (y:ys) = if x < y then sortedInter xs (y:ys) else if x > y then sortedInter (x:xs) ys else x : sortedInter xs ys\nsortedDiff [] ys = []\nsortedDiff xs [] = xs\nsortedDiff (x:xs) (y:ys) = if x < y then x : sortedDiff xs (y:ys) else if x > y then sortedDiff (x:xs) ys else sortedDiff xs ys\nmergeDocs acc e = sortedUnion acc (map fst (snd e))\nallDocs idx = foldl mergeDocs [] idx\nposIn t d idx = case lookup d (lookupTerm t idx) of { Nothing -> []; Just ps -> ps }\nelemSorted x [] = False\nelemSorted x (y:ys) = if x == y then True else if x < y then False else elemSorted x ys\nphraseAtAll [] d idx p i = True\nphraseAtAll (t:ts) d idx p i = if elemSorted (p + i) (posIn t d idx) then phraseAtAll ts d idx p (i + 1) else False\nphraseStartsAt ts d idx p = phraseAtAll ts d idx p 0\nphraseInDoc [] d idx = True\nphraseInDoc (t0:rest) d idx = any (phraseStartsAt (t0:rest) d idx) (posIn t0 d idx)\nphraseHere ts idx d = phraseInDoc ts d idx\ninterStep idx acc tt = sortedInter acc (docsWith tt idx)\nphraseCands [] idx = allDocs idx\nphraseCands (t:ts) idx = foldl (interStep idx) (docsWith t idx) ts\nphraseDocs ts idx = filter (phraseHere ts idx) (phraseCands ts idx)\nevalQuery idx q = case q of { Term t -> docsWith t idx ; And a b -> sortedInter (evalQuery idx a) (evalQuery idx b) ; Or a b -> sortedUnion (evalQuery idx a) (evalQuery idx b) ; Not a -> sortedDiff (allDocs idx) (evalQuery idx a) ; Phrase ts -> phraseDocs ts idx }\n")
|
||||||
@@ -1,10 +1,11 @@
|
|||||||
{
|
{
|
||||||
"lang": "search",
|
"lang": "search",
|
||||||
"total_passed": 18,
|
"total_passed": 46,
|
||||||
"total_failed": 0,
|
"total_failed": 0,
|
||||||
"total": 18,
|
"total": 46,
|
||||||
"suites": [
|
"suites": [
|
||||||
{"name":"index","passed":18,"failed":0,"total":18}
|
{"name":"index","passed":18,"failed":0,"total":18},
|
||||||
|
{"name":"boolean","passed":28,"failed":0,"total":28}
|
||||||
],
|
],
|
||||||
"generated": "2026-06-06T18:12:50+00:00"
|
"generated": "2026-06-06T18:46:54+00:00"
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -1,7 +1,8 @@
|
|||||||
# search scoreboard
|
# search scoreboard
|
||||||
|
|
||||||
**18 / 18 passing** (0 failure(s)).
|
**46 / 46 passing** (0 failure(s)).
|
||||||
|
|
||||||
| Suite | Passed | Total | Status |
|
| Suite | Passed | Total | Status |
|
||||||
|-------|--------|-------|--------|
|
|-------|--------|-------|--------|
|
||||||
| index | 18 | 18 | ok |
|
| index | 18 | 18 | ok |
|
||||||
|
| boolean | 28 | 28 | ok |
|
||||||
|
|||||||
@@ -27,3 +27,24 @@
|
|||||||
(search-hk->sx
|
(search-hk->sx
|
||||||
(hk-deep-force
|
(hk-deep-force
|
||||||
(get (hk-eval-program (hk-core (str search/src extra))) name)))))
|
(get (hk-eval-program (hk-core (str search/src extra))) name)))))
|
||||||
|
|
||||||
|
(define
|
||||||
|
search-join
|
||||||
|
(fn
|
||||||
|
(sep xs)
|
||||||
|
(cond
|
||||||
|
((empty? xs) "")
|
||||||
|
((empty? (rest xs)) (first xs))
|
||||||
|
(:else (str (first xs) sep (search-join sep (rest xs)))))))
|
||||||
|
|
||||||
|
;; Batch many haskell expressions into ONE program evaluation (amortizes the
|
||||||
|
;; cost of parsing/binding search/src — important under heavy CPU load).
|
||||||
|
;; `setup` is extra top-level Haskell; `exprs` is a list of expression strings
|
||||||
|
;; whose results form a single haskell list. Returns the SX list of results.
|
||||||
|
(define
|
||||||
|
search-batch
|
||||||
|
(fn
|
||||||
|
(setup exprs)
|
||||||
|
(search-eval
|
||||||
|
(str setup "\nresult = [" (search-join ", " exprs) "]\n")
|
||||||
|
"result")))
|
||||||
|
|||||||
123
lib/search/tests/boolean.sx
Normal file
123
lib/search/tests/boolean.sx
Normal file
@@ -0,0 +1,123 @@
|
|||||||
|
;; Phase 2 — query AST + boolean/phrase evaluation (hand-built Query values).
|
||||||
|
;; Corpus:
|
||||||
|
;; doc 1 "the quick brown dog" -> the quick brown dog
|
||||||
|
;; doc 2 "a quick brown fox" -> a quick brown fox
|
||||||
|
;; doc 3 "the dog barks loudly" -> the dog barks loudly
|
||||||
|
;; All queries run in ONE program evaluation (search-batch) to stay fast.
|
||||||
|
|
||||||
|
(define
|
||||||
|
search-corpus
|
||||||
|
"idx = indexDoc 3 \"the dog barks loudly\" (indexDoc 2 \"a quick brown fox\" (indexDoc 1 \"the quick brown dog\" emptyIndex))\n")
|
||||||
|
|
||||||
|
(define
|
||||||
|
bool-cases
|
||||||
|
(list
|
||||||
|
(list
|
||||||
|
"term in two docs"
|
||||||
|
"evalQuery idx (Term \"quick\")"
|
||||||
|
(list 1 2))
|
||||||
|
(list
|
||||||
|
"term in two docs (the)"
|
||||||
|
"evalQuery idx (Term \"the\")"
|
||||||
|
(list 1 3))
|
||||||
|
(list "term in one doc" "evalQuery idx (Term \"fox\")" (list 2))
|
||||||
|
(list "term absent" "evalQuery idx (Term \"zzz\")" (list))
|
||||||
|
(list
|
||||||
|
"term case-sensitive at AST level"
|
||||||
|
"evalQuery idx (Term \"QUICK\")"
|
||||||
|
(list))
|
||||||
|
(list "term on empty index" "evalQuery emptyIndex (Term \"cat\")" (list))
|
||||||
|
(list
|
||||||
|
"and both terms"
|
||||||
|
"evalQuery idx (And (Term \"quick\") (Term \"brown\"))"
|
||||||
|
(list 1 2))
|
||||||
|
(list
|
||||||
|
"and overlap subset"
|
||||||
|
"evalQuery idx (And (Term \"the\") (Term \"dog\"))"
|
||||||
|
(list 1 3))
|
||||||
|
(list
|
||||||
|
"and disjoint is empty"
|
||||||
|
"evalQuery idx (And (Term \"the\") (Term \"fox\"))"
|
||||||
|
(list))
|
||||||
|
(list
|
||||||
|
"and right-nested"
|
||||||
|
"evalQuery idx (And (Term \"the\") (And (Term \"dog\") (Term \"barks\")))"
|
||||||
|
(list 3))
|
||||||
|
(list
|
||||||
|
"or two singletons"
|
||||||
|
"evalQuery idx (Or (Term \"fox\") (Term \"barks\"))"
|
||||||
|
(list 2 3))
|
||||||
|
(list
|
||||||
|
"or all docs"
|
||||||
|
"evalQuery idx (Or (Term \"quick\") (Term \"the\"))"
|
||||||
|
(list 1 2 3))
|
||||||
|
(list
|
||||||
|
"or with absent term"
|
||||||
|
"evalQuery idx (Or (Term \"fox\") (Term \"zzz\"))"
|
||||||
|
(list 2))
|
||||||
|
(list "not term" "evalQuery idx (Not (Term \"the\"))" (list 2))
|
||||||
|
(list "not term 2" "evalQuery idx (Not (Term \"quick\"))" (list 3))
|
||||||
|
(list
|
||||||
|
"and with not"
|
||||||
|
"evalQuery idx (And (Term \"quick\") (Not (Term \"the\")))"
|
||||||
|
(list 2))
|
||||||
|
(list
|
||||||
|
"double negation"
|
||||||
|
"evalQuery idx (Not (Not (Term \"fox\")))"
|
||||||
|
(list 2))
|
||||||
|
(list
|
||||||
|
"or of and with term"
|
||||||
|
"evalQuery idx (Or (And (Term \"the\") (Term \"dog\")) (Term \"fox\"))"
|
||||||
|
(list 1 2 3))
|
||||||
|
(list
|
||||||
|
"phrase adjacent both docs"
|
||||||
|
"evalQuery idx (Phrase [\"quick\", \"brown\"])"
|
||||||
|
(list 1 2))
|
||||||
|
(list
|
||||||
|
"phrase adjacent one doc"
|
||||||
|
"evalQuery idx (Phrase [\"brown\", \"dog\"])"
|
||||||
|
(list 1))
|
||||||
|
(list
|
||||||
|
"phrase the quick"
|
||||||
|
"evalQuery idx (Phrase [\"the\", \"quick\"])"
|
||||||
|
(list 1))
|
||||||
|
(list
|
||||||
|
"phrase dog barks"
|
||||||
|
"evalQuery idx (Phrase [\"dog\", \"barks\"])"
|
||||||
|
(list 3))
|
||||||
|
(list
|
||||||
|
"phrase non-adjacent empty"
|
||||||
|
"evalQuery idx (Phrase [\"quick\", \"dog\"])"
|
||||||
|
(list))
|
||||||
|
(list
|
||||||
|
"phrase order matters"
|
||||||
|
"evalQuery idx (Phrase [\"brown\", \"quick\"])"
|
||||||
|
(list))
|
||||||
|
(list
|
||||||
|
"phrase single term"
|
||||||
|
"evalQuery idx (Phrase [\"dog\"])"
|
||||||
|
(list 1 3))
|
||||||
|
(list
|
||||||
|
"phrase three terms"
|
||||||
|
"evalQuery idx (Phrase [\"the\", \"dog\", \"barks\"])"
|
||||||
|
(list 3))
|
||||||
|
(list
|
||||||
|
"and of phrase and term"
|
||||||
|
"evalQuery idx (And (Phrase [\"quick\", \"brown\"]) (Term \"dog\"))"
|
||||||
|
(list 1))
|
||||||
|
(list
|
||||||
|
"not of phrase"
|
||||||
|
"evalQuery idx (Not (Phrase [\"quick\", \"brown\"]))"
|
||||||
|
(list 3))))
|
||||||
|
|
||||||
|
(define
|
||||||
|
bool-results
|
||||||
|
(search-batch search-corpus (map (fn (c) (nth c 1)) bool-cases)))
|
||||||
|
|
||||||
|
(map-indexed
|
||||||
|
(fn
|
||||||
|
(i c)
|
||||||
|
(hk-test (nth c 0) (nth bool-results i) (nth c 2)))
|
||||||
|
bool-cases)
|
||||||
|
|
||||||
|
{:fail hk-test-fail :pass hk-test-pass :fails hk-test-fails}
|
||||||
@@ -1,119 +1,88 @@
|
|||||||
;; Phase 1 — tokenize + inverted index.
|
;; Phase 1 — tokenize + inverted index.
|
||||||
|
;; All cases run in ONE program evaluation (search-batch) to stay fast under load.
|
||||||
|
;; Scalar results (docFreq) are wrapped as singleton lists so the batch is a list
|
||||||
|
;; of lists.
|
||||||
|
|
||||||
(hk-test
|
(define
|
||||||
|
index-cases
|
||||||
|
(list
|
||||||
|
(list
|
||||||
"tokens basic lowercases"
|
"tokens basic lowercases"
|
||||||
(search-eval "\nresult = tokens \"The Cat sat\"\n" "result")
|
"tokens \"The Cat sat\""
|
||||||
(list "the" "cat" "sat"))
|
(list "the" "cat" "sat"))
|
||||||
|
(list
|
||||||
(hk-test
|
|
||||||
"tokens strips punctuation"
|
"tokens strips punctuation"
|
||||||
(search-eval "\nresult = tokens \"Hello, World!\"\n" "result")
|
"tokens \"Hello, World!\""
|
||||||
(list "hello" "world"))
|
(list "hello" "world"))
|
||||||
|
(list "tokens collapses whitespace" "tokens \" a b \"" (list "a" "b"))
|
||||||
(hk-test
|
(list "tokens empty is empty" "tokens \"\"" (list))
|
||||||
"tokens collapses whitespace"
|
(list "tokens keeps digits" "tokens \"abc123 x9\"" (list "abc123" "x9"))
|
||||||
(search-eval "\nresult = tokens \" a b \"\n" "result")
|
(list
|
||||||
(list "a" "b"))
|
|
||||||
|
|
||||||
(hk-test
|
|
||||||
"tokens empty is empty"
|
|
||||||
(search-eval "\nresult = tokens \"\"\n" "result")
|
|
||||||
(list))
|
|
||||||
|
|
||||||
(hk-test
|
|
||||||
"tokens keeps digits"
|
|
||||||
(search-eval "\nresult = tokens \"abc123 x9\"\n" "result")
|
|
||||||
(list "abc123" "x9"))
|
|
||||||
|
|
||||||
(hk-test
|
|
||||||
"positioned attaches ordinals"
|
"positioned attaches ordinals"
|
||||||
(search-eval "\nresult = positioned \"a b a\"\n" "result")
|
"positioned \"a b a\""
|
||||||
(list (list "a" 0) (list "b" 1) (list "a" 2)))
|
(list
|
||||||
|
(list "a" 0)
|
||||||
(hk-test
|
(list "b" 1)
|
||||||
|
(list "a" 2)))
|
||||||
|
(list
|
||||||
"index + lookup single doc"
|
"index + lookup single doc"
|
||||||
(search-eval
|
"lookupTerm \"cat\" (indexDoc 1 \"the cat sat\" emptyIndex)"
|
||||||
"\nresult = lookupTerm \"cat\" (indexDoc 1 \"the cat sat\" emptyIndex)\n"
|
|
||||||
"result")
|
|
||||||
(list (list 1 (list 1))))
|
(list (list 1 (list 1))))
|
||||||
|
(list
|
||||||
(hk-test
|
|
||||||
"lookup missing term is empty"
|
"lookup missing term is empty"
|
||||||
(search-eval
|
"lookupTerm \"dog\" (indexDoc 1 \"the cat sat\" emptyIndex)"
|
||||||
"\nresult = lookupTerm \"dog\" (indexDoc 1 \"the cat sat\" emptyIndex)\n"
|
|
||||||
"result")
|
|
||||||
(list))
|
(list))
|
||||||
|
(list
|
||||||
(hk-test
|
|
||||||
"lookup records all positions"
|
"lookup records all positions"
|
||||||
(search-eval
|
"lookupTerm \"the\" (indexDoc 1 \"the cat the dog the\" emptyIndex)"
|
||||||
"\nresult = lookupTerm \"the\" (indexDoc 1 \"the cat the dog the\" emptyIndex)\n"
|
|
||||||
"result")
|
|
||||||
(list (list 1 (list 0 2 4))))
|
(list (list 1 (list 0 2 4))))
|
||||||
|
(list
|
||||||
(hk-test
|
|
||||||
"multi-doc posting list sorted by docid"
|
"multi-doc posting list sorted by docid"
|
||||||
(search-eval
|
"lookupTerm \"x\" (indexDoc 1 \"x y\" (indexDoc 2 \"x z\" emptyIndex))"
|
||||||
"\nresult = lookupTerm \"x\" (indexDoc 1 \"x y\" (indexDoc 2 \"x z\" emptyIndex))\n"
|
|
||||||
"result")
|
|
||||||
(list
|
(list
|
||||||
(list 1 (list 0))
|
(list 1 (list 0))
|
||||||
(list 2 (list 0))))
|
(list 2 (list 0))))
|
||||||
|
(list
|
||||||
(hk-test
|
|
||||||
"index/query case symmetry"
|
"index/query case symmetry"
|
||||||
(search-eval
|
"lookupTerm \"cat\" (indexDoc 1 \"CAT Cat cat\" emptyIndex)"
|
||||||
"\nresult = lookupTerm \"cat\" (indexDoc 1 \"CAT Cat cat\" emptyIndex)\n"
|
|
||||||
"result")
|
|
||||||
(list (list 1 (list 0 1 2))))
|
(list (list 1 (list 0 1 2))))
|
||||||
|
(list
|
||||||
(hk-test
|
|
||||||
"re-index replaces a doc"
|
"re-index replaces a doc"
|
||||||
(search-eval
|
"lookupTerm \"a\" (indexDoc 1 \"a a a\" (indexDoc 1 \"a\" emptyIndex))"
|
||||||
"\nresult = lookupTerm \"a\" (indexDoc 1 \"a a a\" (indexDoc 1 \"a\" emptyIndex))\n"
|
|
||||||
"result")
|
|
||||||
(list (list 1 (list 0 1 2))))
|
(list (list 1 (list 0 1 2))))
|
||||||
|
(list
|
||||||
(hk-test
|
|
||||||
"delete removes a doc"
|
"delete removes a doc"
|
||||||
(search-eval
|
"lookupTerm \"cat\" (deleteDoc 1 (indexDoc 1 \"the cat\" emptyIndex))"
|
||||||
"\nresult = lookupTerm \"cat\" (deleteDoc 1 (indexDoc 1 \"the cat\" emptyIndex))\n"
|
|
||||||
"result")
|
|
||||||
(list))
|
(list))
|
||||||
|
(list
|
||||||
(hk-test
|
|
||||||
"delete leaves other docs"
|
"delete leaves other docs"
|
||||||
(search-eval
|
"lookupTerm \"cat\" (deleteDoc 2 (indexDoc 2 \"big cat\" (indexDoc 1 \"the cat\" emptyIndex)))"
|
||||||
"\nresult = lookupTerm \"cat\" (deleteDoc 2 (indexDoc 2 \"big cat\" (indexDoc 1 \"the cat\" emptyIndex)))\n"
|
|
||||||
"result")
|
|
||||||
(list (list 1 (list 1))))
|
(list (list 1 (list 1))))
|
||||||
|
(list
|
||||||
(hk-test
|
|
||||||
"docFreq counts docs"
|
"docFreq counts docs"
|
||||||
(search-eval
|
"[docFreq \"cat\" (indexDoc 2 \"a cat\" (indexDoc 1 \"the cat\" emptyIndex))]"
|
||||||
"\nresult = docFreq \"cat\" (indexDoc 2 \"a cat\" (indexDoc 1 \"the cat\" emptyIndex))\n"
|
(list 2))
|
||||||
"result")
|
(list
|
||||||
2)
|
|
||||||
|
|
||||||
(hk-test
|
|
||||||
"docFreq zero for missing"
|
"docFreq zero for missing"
|
||||||
(search-eval
|
"[docFreq \"zzz\" (indexDoc 1 \"a b\" emptyIndex)]"
|
||||||
"\nresult = docFreq \"zzz\" (indexDoc 1 \"a b\" emptyIndex)\n"
|
(list 0))
|
||||||
"result")
|
(list
|
||||||
0)
|
|
||||||
|
|
||||||
(hk-test
|
|
||||||
"allTerms sorted and unique"
|
"allTerms sorted and unique"
|
||||||
(search-eval
|
"allTerms (indexDoc 1 \"banana apple cherry apple\" emptyIndex)"
|
||||||
"\nresult = allTerms (indexDoc 1 \"banana apple cherry apple\" emptyIndex)\n"
|
|
||||||
"result")
|
|
||||||
(list "apple" "banana" "cherry"))
|
(list "apple" "banana" "cherry"))
|
||||||
|
(list
|
||||||
(hk-test
|
|
||||||
"allTerms merged across docs"
|
"allTerms merged across docs"
|
||||||
(search-eval
|
"allTerms (indexDoc 2 \"d a\" (indexDoc 1 \"c b\" emptyIndex))"
|
||||||
"\nresult = allTerms (indexDoc 2 \"d a\" (indexDoc 1 \"c b\" emptyIndex))\n"
|
(list "a" "b" "c" "d"))))
|
||||||
"result")
|
|
||||||
(list "a" "b" "c" "d"))
|
(define
|
||||||
|
index-results
|
||||||
|
(search-batch "" (map (fn (c) (nth c 1)) index-cases)))
|
||||||
|
|
||||||
|
(map-indexed
|
||||||
|
(fn
|
||||||
|
(i c)
|
||||||
|
(hk-test (nth c 0) (nth index-results i) (nth c 2)))
|
||||||
|
index-cases)
|
||||||
|
|
||||||
{:fail hk-test-fail :pass hk-test-pass :fails hk-test-fails}
|
{:fail hk-test-fail :pass hk-test-pass :fails hk-test-fails}
|
||||||
|
|||||||
@@ -76,13 +76,14 @@ lib/search/index.sx lib/search/eval.sx
|
|||||||
|
|
||||||
## Phase 2 — Query AST + boolean evaluation
|
## Phase 2 — Query AST + boolean evaluation
|
||||||
|
|
||||||
- [ ] Query ADT: `Term Text | And Query Query | Or Query Query | Not Query |
|
- [x] Query ADT: `Term String | And Query Query | Or Query Query | Not Query |
|
||||||
Phrase [Text]`
|
Phrase [String]` (in `lib/search/query.sx`)
|
||||||
- [ ] `lib/search/parse.sx` — query syntax parser (boolean operators, quoted phrases)
|
- [ ] `lib/search/parse.sx` — query syntax parser (boolean operators, quoted phrases)
|
||||||
- [ ] `lib/search/eval.sx` — boolean eval via set ops on posting lists
|
- [x] `lib/search/query.sx` — boolean eval via set ops on docid-sorted posting lists
|
||||||
- [ ] phrase eval — adjacency check using positions
|
(sortedUnion/Inter/Diff, Not over allDocs universe)
|
||||||
- [ ] `lib/search/tests/boolean.sx` — 25+ cases: term, and, or, not, phrase,
|
- [x] phrase eval — positional adjacency check (phraseInDoc / phraseStartsAt)
|
||||||
composition, parser edge cases
|
- [x] `lib/search/tests/boolean.sx` — 28 cases: term, and, or, not, phrase,
|
||||||
|
composition (parser edge cases move to the parse.sx suite)
|
||||||
|
|
||||||
## Phase 3 — Ranking
|
## Phase 3 — Ranking
|
||||||
|
|
||||||
@@ -102,6 +103,14 @@ lib/search/index.sx lib/search/eval.sx
|
|||||||
|
|
||||||
## Progress log
|
## Progress log
|
||||||
|
|
||||||
|
- **Phase 2 boolean/phrase eval (46/46 total).** Query ADT
|
||||||
|
`Term|And|Or|Not|Phrase` + `evalQuery :: Index -> Query -> [DocId]` in query.sx.
|
||||||
|
Boolean ops are linear merges over docid-sorted posting lists; Not subtracts from
|
||||||
|
the allDocs universe; Phrase checks positional adjacency. 28 tests in boolean.sx.
|
||||||
|
Refactored both suites to **batch all cases into one program eval** (search-batch
|
||||||
|
in testlib) — under the heavy CPU load on this box (~11 on 2 cores), 18–28 separate
|
||||||
|
hk-eval-program calls timed out; one combined eval per suite is ~20× faster.
|
||||||
|
Parser (parse.sx) is the remaining Phase 2 box.
|
||||||
- **Phase 1 complete (18/18).** Tokenizer (lowercase + strip punctuation + positions),
|
- **Phase 1 complete (18/18).** Tokenizer (lowercase + strip punctuation + positions),
|
||||||
inverted index as sorted assoc-list `[(Term,[(DocId,[Pos])])]`, indexDoc/deleteDoc/
|
inverted index as sorted assoc-list `[(Term,[(DocId,[Pos])])]`, indexDoc/deleteDoc/
|
||||||
lookupTerm/docFreq/allTerms. Search lib is Haskell source assembled into `search/src`
|
lookupTerm/docFreq/allTerms. Search lib is Haskell source assembled into `search/src`
|
||||||
|
|||||||
Reference in New Issue
Block a user