search: Phase 1 tokenizer + inverted index + 18 tests
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 53s
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 53s
Tokenizer (lowercase, strip punctuation, positions) and a sorted assoc-list inverted index [(Term,[(DocId,[Pos])])] with indexDoc/deleteDoc/lookupTerm/ docFreq/allTerms. Search lib is haskell-on-sx source assembled into search/src; tests reuse hk-test counters via a search-eval helper. conformance.sh models lib/haskell. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
119
lib/search/tests/index.sx
Normal file
119
lib/search/tests/index.sx
Normal file
@@ -0,0 +1,119 @@
|
||||
;; Phase 1 — tokenize + inverted index.
|
||||
|
||||
(hk-test
|
||||
"tokens basic lowercases"
|
||||
(search-eval "\nresult = tokens \"The Cat sat\"\n" "result")
|
||||
(list "the" "cat" "sat"))
|
||||
|
||||
(hk-test
|
||||
"tokens strips punctuation"
|
||||
(search-eval "\nresult = tokens \"Hello, World!\"\n" "result")
|
||||
(list "hello" "world"))
|
||||
|
||||
(hk-test
|
||||
"tokens collapses whitespace"
|
||||
(search-eval "\nresult = tokens \" a b \"\n" "result")
|
||||
(list "a" "b"))
|
||||
|
||||
(hk-test
|
||||
"tokens empty is empty"
|
||||
(search-eval "\nresult = tokens \"\"\n" "result")
|
||||
(list))
|
||||
|
||||
(hk-test
|
||||
"tokens keeps digits"
|
||||
(search-eval "\nresult = tokens \"abc123 x9\"\n" "result")
|
||||
(list "abc123" "x9"))
|
||||
|
||||
(hk-test
|
||||
"positioned attaches ordinals"
|
||||
(search-eval "\nresult = positioned \"a b a\"\n" "result")
|
||||
(list (list "a" 0) (list "b" 1) (list "a" 2)))
|
||||
|
||||
(hk-test
|
||||
"index + lookup single doc"
|
||||
(search-eval
|
||||
"\nresult = lookupTerm \"cat\" (indexDoc 1 \"the cat sat\" emptyIndex)\n"
|
||||
"result")
|
||||
(list (list 1 (list 1))))
|
||||
|
||||
(hk-test
|
||||
"lookup missing term is empty"
|
||||
(search-eval
|
||||
"\nresult = lookupTerm \"dog\" (indexDoc 1 \"the cat sat\" emptyIndex)\n"
|
||||
"result")
|
||||
(list))
|
||||
|
||||
(hk-test
|
||||
"lookup records all positions"
|
||||
(search-eval
|
||||
"\nresult = lookupTerm \"the\" (indexDoc 1 \"the cat the dog the\" emptyIndex)\n"
|
||||
"result")
|
||||
(list (list 1 (list 0 2 4))))
|
||||
|
||||
(hk-test
|
||||
"multi-doc posting list sorted by docid"
|
||||
(search-eval
|
||||
"\nresult = lookupTerm \"x\" (indexDoc 1 \"x y\" (indexDoc 2 \"x z\" emptyIndex))\n"
|
||||
"result")
|
||||
(list
|
||||
(list 1 (list 0))
|
||||
(list 2 (list 0))))
|
||||
|
||||
(hk-test
|
||||
"index/query case symmetry"
|
||||
(search-eval
|
||||
"\nresult = lookupTerm \"cat\" (indexDoc 1 \"CAT Cat cat\" emptyIndex)\n"
|
||||
"result")
|
||||
(list (list 1 (list 0 1 2))))
|
||||
|
||||
(hk-test
|
||||
"re-index replaces a doc"
|
||||
(search-eval
|
||||
"\nresult = lookupTerm \"a\" (indexDoc 1 \"a a a\" (indexDoc 1 \"a\" emptyIndex))\n"
|
||||
"result")
|
||||
(list (list 1 (list 0 1 2))))
|
||||
|
||||
(hk-test
|
||||
"delete removes a doc"
|
||||
(search-eval
|
||||
"\nresult = lookupTerm \"cat\" (deleteDoc 1 (indexDoc 1 \"the cat\" emptyIndex))\n"
|
||||
"result")
|
||||
(list))
|
||||
|
||||
(hk-test
|
||||
"delete leaves other docs"
|
||||
(search-eval
|
||||
"\nresult = lookupTerm \"cat\" (deleteDoc 2 (indexDoc 2 \"big cat\" (indexDoc 1 \"the cat\" emptyIndex)))\n"
|
||||
"result")
|
||||
(list (list 1 (list 1))))
|
||||
|
||||
(hk-test
|
||||
"docFreq counts docs"
|
||||
(search-eval
|
||||
"\nresult = docFreq \"cat\" (indexDoc 2 \"a cat\" (indexDoc 1 \"the cat\" emptyIndex))\n"
|
||||
"result")
|
||||
2)
|
||||
|
||||
(hk-test
|
||||
"docFreq zero for missing"
|
||||
(search-eval
|
||||
"\nresult = docFreq \"zzz\" (indexDoc 1 \"a b\" emptyIndex)\n"
|
||||
"result")
|
||||
0)
|
||||
|
||||
(hk-test
|
||||
"allTerms sorted and unique"
|
||||
(search-eval
|
||||
"\nresult = allTerms (indexDoc 1 \"banana apple cherry apple\" emptyIndex)\n"
|
||||
"result")
|
||||
(list "apple" "banana" "cherry"))
|
||||
|
||||
(hk-test
|
||||
"allTerms merged across docs"
|
||||
(search-eval
|
||||
"\nresult = allTerms (indexDoc 2 \"d a\" (indexDoc 1 \"c b\" emptyIndex))\n"
|
||||
"result")
|
||||
(list "a" "b" "c" "d"))
|
||||
|
||||
{:fail hk-test-fail :pass hk-test-pass :fails hk-test-fails}
|
||||
Reference in New Issue
Block a user