search: stemming (suffix stripping) + 18 tests
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 16s

Deterministic English suffix stripping (stem), stemText/stemTokens, indexStemmed.
Worked around two haskell-on-sx string gotchas: take/drop over a String yield
char codes (rebuild via joinChars . map chr), and isSuffixOf's reverse trips ++
(manual suffix compare). 196/196.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-06 22:50:19 +00:00
parent 7231cb651f
commit 911a2f57c0
7 changed files with 82 additions and 8 deletions

View File

@@ -116,10 +116,16 @@ lib/search/index.sx lib/search/eval.sx
- [x] result pagination (offset / limit) — `paginate`, `pageTfIdf`, `pageBm25`,
`resultCount` — 12 tests
- [x] snippet / highlight generation (`highlight`, `snippet`) — 12 tests
- [ ] stemming (suffix stripping) — recall-improving normalizer
- [x] stemming (suffix stripping) — `stem`, `stemText`, `stemTokens`, `indexStemmed`
— 18 tests
## Progress log
- **Extension: stemming (196/196 total).** Deterministic English suffix stripping
(`stem`), `stemText`/`stemTokens`, `indexStemmed`. Two haskell-on-sx gotchas: take/drop
over a String yield char CODES not char strings (rebuild via `joinChars . map chr`),
and isSuffixOf's `reverse` trips `++` on the String repr (manual suffix compare). All
five planned extensions now done; the loop can keep adding search vocabulary. 18 tests.
- **Extension: highlight/snippet (178/178 total).** `highlight terms text` marks
query-matching (normalized) tokens with [..]; `snippet ctx terms text` extracts a
context window around the first match. 12 tests.