Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 53s
Tokenizer (lowercase, strip punctuation, positions) and a sorted assoc-list inverted index [(Term,[(DocId,[Pos])])] with indexDoc/deleteDoc/lookupTerm/ docFreq/allTerms. Search lib is haskell-on-sx source assembled into search/src; tests reuse hk-test counters via a search-eval helper. conformance.sh models lib/haskell. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
9 lines
970 B
Plaintext
9 lines
970 B
Plaintext
;; search tokenizer — Haskell source fragment.
|
|
;; normalize (lowercase + strip punctuation), split on whitespace, attach positions.
|
|
;; tokens :: String -> [String]
|
|
;; positioned :: String -> [(String, Int)] -- 0-based ordinal positions
|
|
|
|
(define
|
|
search/tokenize-src
|
|
"lowerChar c = chr (toLower (ord c))\nnormChar c = if isAlphaNum c then lowerChar c else ' '\nisBlankCh c = c == ' '\ndropBlanks [] = []\ndropBlanks (c:cs) = if isBlankCh c then dropBlanks cs else c:cs\ntakeWord [] = []\ntakeWord (c:cs) = if isBlankCh c then [] else c : takeWord cs\nafterWord [] = []\nafterWord (c:cs) = if isBlankCh c then c:cs else afterWord cs\nsplitWords s = let s2 = dropBlanks s in if null s2 then [] else takeWord s2 : splitWords (afterWord s2)\nappendStr a b = a ++ b\njoinChars cs = foldr appendStr \"\" cs\ntokens s = map joinChars (splitWords (map normChar s))\nposFrom i [] = []\nposFrom i (x:xs) = (x, i) : posFrom (i + 1) xs\npositioned s = posFrom 0 (tokens s)\n")
|