search: stemming (suffix stripping) + 18 tests
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 16s
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 16s
Deterministic English suffix stripping (stem), stemText/stemTokens, indexStemmed. Worked around two haskell-on-sx string gotchas: take/drop over a String yield char codes (rebuild via joinChars . map chr), and isSuffixOf's reverse trips ++ (manual suffix compare). 196/196. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -116,10 +116,16 @@ lib/search/index.sx lib/search/eval.sx
|
||||
- [x] result pagination (offset / limit) — `paginate`, `pageTfIdf`, `pageBm25`,
|
||||
`resultCount` — 12 tests
|
||||
- [x] snippet / highlight generation (`highlight`, `snippet`) — 12 tests
|
||||
- [ ] stemming (suffix stripping) — recall-improving normalizer
|
||||
- [x] stemming (suffix stripping) — `stem`, `stemText`, `stemTokens`, `indexStemmed`
|
||||
— 18 tests
|
||||
|
||||
## Progress log
|
||||
|
||||
- **Extension: stemming (196/196 total).** Deterministic English suffix stripping
|
||||
(`stem`), `stemText`/`stemTokens`, `indexStemmed`. Two haskell-on-sx gotchas: take/drop
|
||||
over a String yield char CODES not char strings (rebuild via `joinChars . map chr`),
|
||||
and isSuffixOf's `reverse` trips `++` on the String repr (manual suffix compare). All
|
||||
five planned extensions now done; the loop can keep adding search vocabulary. 18 tests.
|
||||
- **Extension: highlight/snippet (178/178 total).** `highlight terms text` marks
|
||||
query-matching (normalized) tokens with [..]; `snippet ctx terms text` extracts a
|
||||
context window around the first match. 12 tests.
|
||||
|
||||
Reference in New Issue
Block a user