From e2de5a4675bc6c37610c5886a9c9ec91360d7c66 Mon Sep 17 00:00:00 2001
From: giles <giles.bradshaw@rose-ash.com>
Date: Sat, 6 Jun 2026 17:27:20 +0000
Subject: [PATCH 01/15] briefings: add search-on-sx loop briefing

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 plans/agent-briefings/search-loop.md | 110 +++++++++++++++++++++++++++
 1 file changed, 110 insertions(+)
 create mode 100644 plans/agent-briefings/search-loop.md
diff --git a/plans/agent-briefings/search-loop.md b/plans/agent-briefings/search-loop.md
new file mode 100644
index 00000000..ee2346fa
--- /dev/null
+++ b/plans/agent-briefings/search-loop.md
@@ -0,0 +1,110 @@
+# search-on-sx loop agent (single agent, queue-driven)
+
+Role: iterates `plans/search-on-sx.md` forever. **Full-text + structured search on
+Haskell** — tokenize, inverted index, query AST, boolean + phrase + ranked
+queries (TF-IDF / BM25), ACL-aware post-filter, federated index merge. Typed ADTs
+make query parsing clean; lazy lists make posting-list iteration efficient. Sits on
+`lib/haskell/` (1514/1514 already green); adds a search-shaped vocabulary on top.
+
+```
+description: search-on-sx queue loop
+subagent_type: general-purpose
+run_in_background: true
+isolation: worktree
+```
+
+## Prompt
+
+You are the sole background agent working `plans/search-on-sx.md`. Isolated
+worktree `/root/rose-ash-loops/search` on branch `loops/search`, forever, one
+commit per feature. Push to `origin/loops/search` after every commit. Never touch
+`main` or `architecture`.
+
+## Restart baseline — check before iterating
+
+1. Read `plans/search-on-sx.md` — roadmap + Progress log.
+2. `ls lib/search/` — pick up from the most advanced file.
+3. If `lib/search/tests/*.sx` exist, run them via `bash lib/search/conformance.sh`.
+   Green before new work.
+4. If `lib/search/scoreboard.md` exists, that's your baseline.
+5. Read the `lib/haskell/` public API once — that's your substrate. `lib/haskell/
+   haskell.sx` exists; also study `runtime.sx`, `eval.sx`, `parser.sx`, `infer.sx`,
+   `match.sx`, `map.sx`, `set.sx`, `testlib.sx`. Learn how to declare ADTs, pattern
+   match, and use the `Map`/`Set` helpers before writing index code. Verify the real
+   exported names with sx_find_all / grep — don't assume from the plan's sketch.
+
+## The queue
+
+Phase order per `plans/search-on-sx.md`:
+
+- **Phase 1** — tokenize + inverted index + simple term lookup
+  (`Map Term [(DocId,[Pos])]`, insert/lookup, `(search/index doc)`,
+  `(search/query term)`).
+- **Phase 2** — query AST + boolean/phrase eval (Term | And | Or | Not | Phrase;
+  posting-list set ops; positional phrase match).
+- **Phase 3** — ranking (TF-IDF, BM25), top-N.
+- **Phase 4** — ACL-aware post-filter + federation (merge per-peer indices).
+
+Within a phase, pick the checkbox that unlocks the most tests per effort.
+
+Every iteration: implement → test → commit → tick `[ ]` → Progress log → next.
+
+## Ground rules (hard)
+
+- **Scope:** only `lib/search/**` and `plans/search-on-sx.md`. Do **not** edit
+  `spec/`, `hosts/`, `shared/`, other `lib/<lang>/` dirs, `lib/stdlib.sx`, or
+  `lib/` root. May **import** from `lib/haskell/` only (its public API). Do **not**
+  modify Haskell.
+- **NEVER call `sx_build`.** 600s watchdog. If the sx_server binary is broken →
+  Blockers entry, stop. Run tests by invoking the sx_server binary directly from a
+  conformance.sh (model it on `lib/haskell/conformance.sh`), pointing `SX_SERVER`
+  at `/root/rose-ash/hosts/ocaml/_build/default/bin/sx_server.exe` — fresh
+  worktrees have no `_build/`, so the relative path won't resolve.
+- **Shared-file issues** → plan's Blockers with minimal repro; don't fix here.
+- **SX files:** `sx-tree` MCP tools ONLY. **They take `file:` not `path:`** — a
+  wrong key yields `Yojson Type_error("Expected string, got null")`, which looks
+  like a broken binary but is just a param mismatch. `sx_validate` after edits.
+  Path-based edits (`sx_replace_node`) count comment headers in their indices and
+  can clobber the wrong node — re-read after, or prefer `sx_write_file` for small
+  files.
+- **Unicode in `.sx`:** raw UTF-8 only, never `\uXXXX` escapes.
+- **Commit granularity:** one feature per commit. Short factual messages
+  (`search: phrase query positional match + 7 tests`). Push to `origin/loops/search`.
+- **Plan file:** update Progress log (newest first) + tick boxes every commit.
+
+## search-specific gotchas
+
+- **Posting lists are the hot path.** Keep them sorted by DocId so boolean AND/OR
+  are linear merges, not nested scans. Phrase match needs positions, so store
+  `(DocId, [Pos])` — don't drop positions early to save space; you can't recover them.
+- **Tokenization decides recall.** Normalize consistently (lowercase, strip
+  punctuation) on BOTH index and query side, or queries silently miss. Test the
+  index/query symmetry explicitly.
+- **Ranking must be deterministic on ties.** TF-IDF/BM25 scores collide; always
+  add a stable tiebreak (DocId ascending) or tests flake.
+- **ACL filter is per-viewer and post-ranking.** Filter the result list against the
+  viewer, after scoring — never bake visibility into the index (the same index
+  serves all viewers). Inject the permit predicate; don't hardwire an ACL module
+  that doesn't exist yet.
+- **Federation merges indices, not results.** Merging per-peer inverted indices
+  (union posting lists per term) is cleaner and rank-correct vs merging ranked
+  result lists. Mock peer indices in tests.
+
+## General gotchas (all loops)
+
+- SX `do` = R7RS iteration. Use `begin` for multi-expr sequences.
+- `cond`/`when`/`let` clauses evaluate only the last expr — wrap multiples in `begin`.
+- `let` is parallel, not sequential — nest `let`s when a binding references an earlier one.
+- `env-bind!` creates a binding; `env-set!` mutates an existing one (walks scope chain).
+- `sx_validate` after every structural edit.
+- Namespace-prefix all guest helpers (`search/...`) — short/host-colliding names
+  get silently shadowed or hang the runtime.
+
+## Style
+
+- No comments in `.sx` unless non-obvious.
+- No new planning docs — update `plans/search-on-sx.md` inline.
+- Short, factual commit messages.
+- One feature per iteration. Commit. Log. Push. Next.
+
+Go. Start by reading the plan; find the first unchecked `[ ]`; implement it.

From b8cf3eb1b86516176f310ccac640cfeee77356f8 Mon Sep 17 00:00:00 2001
From: giles <giles.bradshaw@rose-ash.com>
Date: Sat, 6 Jun 2026 18:21:49 +0000
Subject: [PATCH 02/15] search: Phase 1 tokenizer + inverted index + 18 tests

Tokenizer (lowercase, strip punctuation, positions) and a sorted assoc-list
inverted index [(Term,[(DocId,[Pos])])] with indexDoc/deleteDoc/lookupTerm/
docFreq/allTerms. Search lib is haskell-on-sx source assembled into search/src;
tests reuse hk-test counters via a search-eval helper. conformance.sh models
lib/haskell.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 lib/search/api.sx           |   7 +++
 lib/search/conformance.conf |  29 +++++++++
 lib/search/conformance.sh   |   3 +
 lib/search/index.sx         |  15 +++++
 lib/search/scoreboard.json  |  10 +++
 lib/search/scoreboard.md    |   7 +++
 lib/search/testlib.sx       |  29 +++++++++
 lib/search/tests/index.sx   | 119 ++++++++++++++++++++++++++++++++++++
 lib/search/tokenize.sx      |   8 +++
 plans/search-on-sx.md       |  36 +++++++----
 10 files changed, 252 insertions(+), 11 deletions(-)
 create mode 100644 lib/search/api.sx
 create mode 100644 lib/search/conformance.conf
 create mode 100755 lib/search/conformance.sh
 create mode 100644 lib/search/index.sx
 create mode 100644 lib/search/scoreboard.json
 create mode 100644 lib/search/scoreboard.md
 create mode 100644 lib/search/testlib.sx
 create mode 100644 lib/search/tests/index.sx
 create mode 100644 lib/search/tokenize.sx

diff --git a/lib/search/api.sx b/lib/search/api.sx
new file mode 100644
index 00000000..8a06d444
--- /dev/null
+++ b/lib/search/api.sx
@@ -0,0 +1,7 @@
+;; search public API — assembles the canonical Haskell source from all layers.
+;; Tests and callers concatenate `search/src` with their own top-level bindings
+;; (e.g. "result = lookupTerm \"cat\" idx\n") and evaluate via the haskell-on-sx
+;; interpreter. Public Haskell entry points: indexDoc, lookupTerm, deleteDoc,
+;; docFreq, allTerms, tokens, positioned.
+
+(define search/src (str search/tokenize-src "\n" search/index-src))
diff --git a/lib/search/conformance.conf b/lib/search/conformance.conf
new file mode 100644
index 00000000..cc75c6e0
--- /dev/null
+++ b/lib/search/conformance.conf
@@ -0,0 +1,29 @@
+# search-on-sx conformance config — sourced by lib/guest/conformance.sh.
+
+LANG_NAME=search
+SCOREBOARD_DIR=lib/search
+MODE=counters
+COUNTERS_PASS=hk-test-pass
+COUNTERS_FAIL=hk-test-fail
+TIMEOUT_PER_SUITE=600
+
+PRELOADS=(
+  lib/haskell/tokenizer.sx
+  lib/haskell/layout.sx
+  lib/haskell/parser.sx
+  lib/haskell/desugar.sx
+  lib/haskell/runtime.sx
+  lib/haskell/match.sx
+  lib/haskell/eval.sx
+  lib/haskell/map.sx
+  lib/haskell/set.sx
+  lib/haskell/testlib.sx
+  lib/search/tokenize.sx
+  lib/search/index.sx
+  lib/search/api.sx
+  lib/search/testlib.sx
+)
+
+SUITES=(
+  "index:lib/search/tests/index.sx"
+)
diff --git a/lib/search/conformance.sh b/lib/search/conformance.sh
new file mode 100755
index 00000000..e50befa3
--- /dev/null
+++ b/lib/search/conformance.sh
@@ -0,0 +1,3 @@
+#!/usr/bin/env bash
+# Thin wrapper — see lib/guest/conformance.sh and lib/search/conformance.conf.
+exec bash "$(dirname "$0")/../guest/conformance.sh" "$(dirname "$0")/conformance.conf" "$@"
diff --git a/lib/search/index.sx b/lib/search/index.sx
new file mode 100644
index 00000000..3d285ec9
--- /dev/null
+++ b/lib/search/index.sx
@@ -0,0 +1,15 @@
+;; search inverted index — Haskell source fragment (depends on tokenize).
+;; Index = [(Term, [(DocId, [Pos])])], sorted by Term; postings sorted by DocId.
+;; Data.Map's public API lacks toList/keys/map/filter, so a sorted assoc-list
+;; index is used — it is the conceptual `Map Term [(DocId,[Pos])]` and exposes
+;; term iteration (allTerms) and df naturally for ranking.
+;;   emptyIndex :: Index
+;;   indexDoc   :: DocId -> String -> Index -> Index   (re-index replaces)
+;;   lookupTerm :: Term -> Index -> [(DocId, [Pos])]
+;;   deleteDoc  :: DocId -> Index -> Index
+;;   docFreq    :: Term -> Index -> Int
+;;   allTerms   :: Index -> [Term]
+
+(define
+  search/index-src
+  "emptyIndex = []\ngroupBump [] t p = [(t, [p])]\ngroupBump (g:gs) t p = if fst g == t then (t, snd g ++ [p]) : gs else g : groupBump gs t p\ngroupStep acc tp = groupBump acc (fst tp) (snd tp)\ngroupTok pairs = foldl groupStep [] pairs\ninsPosting d ps [] = [(d, ps)]\ninsPosting d ps (q:qs) = if d < fst q then (d, ps) : q : qs else if d == fst q then (d, ps) : qs else q : insPosting d ps qs\ninsTerm t d ps [] = [(t, [(d, ps)])]\ninsTerm t d ps (e:es) = if t < fst e then (t, [(d, ps)]) : e : es else if t == fst e then (fst e, insPosting d ps (snd e)) : es else e : insTerm t d ps es\nindexStep d ix tp = insTerm (fst tp) d (snd tp) ix\nindexDoc d text idx = foldl (indexStep d) idx (groupTok (positioned text))\nlookupTerm t idx = case lookup t idx of { Nothing -> []; Just pl -> pl }\ndocFreq t idx = length (lookupTerm t idx)\nallTerms idx = map fst idx\npostingKeep d q = fst q /= d\ndropTermDoc d e = (fst e, filter (postingKeep d) (snd e))\nplKeep e = not (null (snd e))\ndeleteDoc d idx = filter plKeep (map (dropTermDoc d) idx)\n")
diff --git a/lib/search/scoreboard.json b/lib/search/scoreboard.json
new file mode 100644
index 00000000..4c5202b0
--- /dev/null
+++ b/lib/search/scoreboard.json
@@ -0,0 +1,10 @@
+{
+  "lang": "search",
+  "total_passed": 18,
+  "total_failed": 0,
+  "total": 18,
+  "suites": [
+    {"name":"index","passed":18,"failed":0,"total":18}
+  ],
+  "generated": "2026-06-06T18:12:50+00:00"
+}
diff --git a/lib/search/scoreboard.md b/lib/search/scoreboard.md
new file mode 100644
index 00000000..cf9cabce
--- /dev/null
+++ b/lib/search/scoreboard.md
@@ -0,0 +1,7 @@
+# search scoreboard
+
+**18 / 18 passing** (0 failure(s)).
+
+| Suite | Passed | Total | Status |
+|-------|--------|-------|--------|
+| index | 18 | 18 | ok |
diff --git a/lib/search/testlib.sx b/lib/search/testlib.sx
new file mode 100644
index 00000000..9c965b05
--- /dev/null
+++ b/lib/search/testlib.sx
@@ -0,0 +1,29 @@
+;; search test helpers — convert forced haskell values to plain SX and run
+;; programs built on top of search/src. Reuses hk-test / counters from
+;; lib/haskell/testlib.sx (preloaded by the conformance config).
+
+;; Recursively turn a forced HK value into plain SX:
+;;   cons-list  -> SX list,  Tuple -> SX list,  leaves unchanged.
+(define
+  search-hk->sx
+  (fn
+    (v)
+    (cond
+      ((and (list? v) (not (empty? v)) (= (first v) "[]")) (list))
+      ((and (list? v) (not (empty? v)) (= (first v) ":"))
+        (cons
+          (search-hk->sx (nth v 1))
+          (search-hk->sx (nth v 2))))
+      ((and (list? v) (not (empty? v)) (= (first v) "Tuple"))
+        (map search-hk->sx (rest v)))
+      (:else v))))
+
+;; Evaluate `extra` (extra top-level Haskell bindings) on top of search/src
+;; and return binding `name` as plain SX.
+(define
+  search-eval
+  (fn
+    (extra name)
+    (search-hk->sx
+      (hk-deep-force
+        (get (hk-eval-program (hk-core (str search/src extra))) name)))))
diff --git a/lib/search/tests/index.sx b/lib/search/tests/index.sx
new file mode 100644
index 00000000..2e9cb700
--- /dev/null
+++ b/lib/search/tests/index.sx
@@ -0,0 +1,119 @@
+;; Phase 1 — tokenize + inverted index.
+
+(hk-test
+  "tokens basic lowercases"
+  (search-eval "\nresult = tokens \"The Cat sat\"\n" "result")
+  (list "the" "cat" "sat"))
+
+(hk-test
+  "tokens strips punctuation"
+  (search-eval "\nresult = tokens \"Hello, World!\"\n" "result")
+  (list "hello" "world"))
+
+(hk-test
+  "tokens collapses whitespace"
+  (search-eval "\nresult = tokens \"  a   b  \"\n" "result")
+  (list "a" "b"))
+
+(hk-test
+  "tokens empty is empty"
+  (search-eval "\nresult = tokens \"\"\n" "result")
+  (list))
+
+(hk-test
+  "tokens keeps digits"
+  (search-eval "\nresult = tokens \"abc123 x9\"\n" "result")
+  (list "abc123" "x9"))
+
+(hk-test
+  "positioned attaches ordinals"
+  (search-eval "\nresult = positioned \"a b a\"\n" "result")
+  (list (list "a" 0) (list "b" 1) (list "a" 2)))
+
+(hk-test
+  "index + lookup single doc"
+  (search-eval
+    "\nresult = lookupTerm \"cat\" (indexDoc 1 \"the cat sat\" emptyIndex)\n"
+    "result")
+  (list (list 1 (list 1))))
+
+(hk-test
+  "lookup missing term is empty"
+  (search-eval
+    "\nresult = lookupTerm \"dog\" (indexDoc 1 \"the cat sat\" emptyIndex)\n"
+    "result")
+  (list))
+
+(hk-test
+  "lookup records all positions"
+  (search-eval
+    "\nresult = lookupTerm \"the\" (indexDoc 1 \"the cat the dog the\" emptyIndex)\n"
+    "result")
+  (list (list 1 (list 0 2 4))))
+
+(hk-test
+  "multi-doc posting list sorted by docid"
+  (search-eval
+    "\nresult = lookupTerm \"x\" (indexDoc 1 \"x y\" (indexDoc 2 \"x z\" emptyIndex))\n"
+    "result")
+  (list
+    (list 1 (list 0))
+    (list 2 (list 0))))
+
+(hk-test
+  "index/query case symmetry"
+  (search-eval
+    "\nresult = lookupTerm \"cat\" (indexDoc 1 \"CAT Cat cat\" emptyIndex)\n"
+    "result")
+  (list (list 1 (list 0 1 2))))
+
+(hk-test
+  "re-index replaces a doc"
+  (search-eval
+    "\nresult = lookupTerm \"a\" (indexDoc 1 \"a a a\" (indexDoc 1 \"a\" emptyIndex))\n"
+    "result")
+  (list (list 1 (list 0 1 2))))
+
+(hk-test
+  "delete removes a doc"
+  (search-eval
+    "\nresult = lookupTerm \"cat\" (deleteDoc 1 (indexDoc 1 \"the cat\" emptyIndex))\n"
+    "result")
+  (list))
+
+(hk-test
+  "delete leaves other docs"
+  (search-eval
+    "\nresult = lookupTerm \"cat\" (deleteDoc 2 (indexDoc 2 \"big cat\" (indexDoc 1 \"the cat\" emptyIndex)))\n"
+    "result")
+  (list (list 1 (list 1))))
+
+(hk-test
+  "docFreq counts docs"
+  (search-eval
+    "\nresult = docFreq \"cat\" (indexDoc 2 \"a cat\" (indexDoc 1 \"the cat\" emptyIndex))\n"
+    "result")
+  2)
+
+(hk-test
+  "docFreq zero for missing"
+  (search-eval
+    "\nresult = docFreq \"zzz\" (indexDoc 1 \"a b\" emptyIndex)\n"
+    "result")
+  0)
+
+(hk-test
+  "allTerms sorted and unique"
+  (search-eval
+    "\nresult = allTerms (indexDoc 1 \"banana apple cherry apple\" emptyIndex)\n"
+    "result")
+  (list "apple" "banana" "cherry"))
+
+(hk-test
+  "allTerms merged across docs"
+  (search-eval
+    "\nresult = allTerms (indexDoc 2 \"d a\" (indexDoc 1 \"c b\" emptyIndex))\n"
+    "result")
+  (list "a" "b" "c" "d"))
+
+{:fail hk-test-fail :pass hk-test-pass :fails hk-test-fails}
diff --git a/lib/search/tokenize.sx b/lib/search/tokenize.sx
new file mode 100644
index 00000000..7c1d74d3
--- /dev/null
+++ b/lib/search/tokenize.sx
@@ -0,0 +1,8 @@
+;; search tokenizer — Haskell source fragment.
+;; normalize (lowercase + strip punctuation), split on whitespace, attach positions.
+;; tokens :: String -> [String]
+;; positioned :: String -> [(String, Int)]  -- 0-based ordinal positions
+
+(define
+  search/tokenize-src
+  "lowerChar c = chr (toLower (ord c))\nnormChar c = if isAlphaNum c then lowerChar c else ' '\nisBlankCh c = c == ' '\ndropBlanks [] = []\ndropBlanks (c:cs) = if isBlankCh c then dropBlanks cs else c:cs\ntakeWord [] = []\ntakeWord (c:cs) = if isBlankCh c then [] else c : takeWord cs\nafterWord [] = []\nafterWord (c:cs) = if isBlankCh c then c:cs else afterWord cs\nsplitWords s = let s2 = dropBlanks s in if null s2 then [] else takeWord s2 : splitWords (afterWord s2)\nappendStr a b = a ++ b\njoinChars cs = foldr appendStr \"\" cs\ntokens s = map joinChars (splitWords (map normChar s))\nposFrom i [] = []\nposFrom i (x:xs) = (x, i) : posFrom (i + 1) xs\npositioned s = posFrom 0 (tokens s)\n")
diff --git a/plans/search-on-sx.md b/plans/search-on-sx.md
index 9e0045d4..1baf6e9a 100644
--- a/plans/search-on-sx.md
+++ b/plans/search-on-sx.md
@@ -10,7 +10,7 @@ extension that merges per-peer indices.
 
 ## Status (rolling)
 
-`bash lib/search/conformance.sh` → **0/0** (not yet started)
+`bash lib/search/conformance.sh` → **18/18** (Phase 1 complete)
 
 ## Ground rules
 
@@ -61,15 +61,18 @@ lib/search/index.sx                     lib/search/eval.sx
 
 ## Phase 1 — Tokenize + index
 
-- [ ] `lib/search/tokenize.sx` — normalize (lowercase, strip punctuation), split on
+- [x] `lib/search/tokenize.sx` — normalize (lowercase, strip punctuation), split on
   whitespace, return positions
-- [ ] `lib/search/index.sx` — inverted index data structure (typed `Map` from
-  haskell lib); `insert`, `delete`, `lookup`
-- [ ] `lib/search/api.sx` — `(search/index doc)`, `(search/lookup term)`
-- [ ] `lib/search/tests/index.sx` — 15+ cases: tokenize, insert + lookup, update,
-  delete, multi-doc
-- [ ] `lib/search/scoreboard.{json,md}`
-- [ ] `lib/search/conformance.sh`
+- [x] `lib/search/index.sx` — inverted index data structure; `indexDoc`, `deleteDoc`,
+  `lookupTerm`, `docFreq`, `allTerms`. (Data.Map's public API lacks
+  toList/keys/map/filter, so a sorted assoc-list `[(Term,[(DocId,[Pos])])]` is used —
+  the conceptual `Map Term [(DocId,[Pos])]` with free term iteration.)
+- [x] `lib/search/api.sx` — assembles `search/src` (tokenize + index); Haskell entry
+  points `indexDoc` / `lookupTerm`
+- [x] `lib/search/tests/index.sx` — 18 cases: tokenize, insert + lookup, update,
+  delete, multi-doc, positions, docFreq, allTerms
+- [x] `lib/search/scoreboard.{json,md}`
+- [x] `lib/search/conformance.sh`
 
 ## Phase 2 — Query AST + boolean evaluation
 
@@ -99,8 +102,19 @@ lib/search/index.sx                     lib/search/eval.sx
 
 ## Progress log
 
-(loop fills this in)
+- **Phase 1 complete (18/18).** Tokenizer (lowercase + strip punctuation + positions),
+  inverted index as sorted assoc-list `[(Term,[(DocId,[Pos])])]`, indexDoc/deleteDoc/
+  lookupTerm/docFreq/allTerms. Search lib is Haskell source assembled into `search/src`
+  and evaluated via the haskell-on-sx interpreter; tests reuse `hk-test` counters and a
+  `search-eval` helper that forces HK values to plain SX. conformance.sh models
+  lib/haskell (MODE=counters, COUNTERS_PASS/FAIL=hk-test-pass/fail).
 
 ## Blockers
 
-(loop fills this in)
+- **None.** Note: the box is heavily CPU-oversubscribed by sibling loop agents
+  (load ~11 on 2 cores); each program eval is ~10× slower than nominal, so suite
+  timeout is set to 600s. Runs are correct, just slow.
+- **Data.Map public API gap (informational, not fixing):** the haskell-on-sx
+  `import Data.Map` binds only empty/singleton/insert/lookup/member/size/null/delete/
+  insertWith/adjust/findWithDefault — no toList/keys/elems/map/filter/unionWith. Index
+  uses a pure assoc-list instead so term iteration and federation merge stay simple.

From 0f0da0319c83285644f5b8299b8c09b3816a93c5 Mon Sep 17 00:00:00 2001
From: giles <giles.bradshaw@rose-ash.com>
Date: Sat, 6 Jun 2026 18:47:42 +0000
Subject: [PATCH 03/15] search: Phase 2 query AST + boolean/phrase eval + 28
 tests

Query ADT (Term|And|Or|Not|Phrase) and evalQuery over docid-sorted posting
lists: boolean ops as linear merges, Not over the allDocs universe, Phrase via
positional adjacency. Batched both test suites into one program eval each
(search-batch) so they finish under heavy CPU load. 46/46.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 lib/search/api.sx           |   6 +-
 lib/search/conformance.conf |   2 +
 lib/search/query.sx         |  11 ++
 lib/search/scoreboard.json  |   9 +-
 lib/search/scoreboard.md    |   3 +-
 lib/search/testlib.sx       |  21 ++++
 lib/search/tests/boolean.sx | 123 +++++++++++++++++++++++
 lib/search/tests/index.sx   | 193 +++++++++++++++---------------------
 plans/search-on-sx.md       |  21 ++--
 9 files changed, 264 insertions(+), 125 deletions(-)
 create mode 100644 lib/search/query.sx
 create mode 100644 lib/search/tests/boolean.sx

diff --git a/lib/search/api.sx b/lib/search/api.sx
index 8a06d444..e2da2bb6 100644
--- a/lib/search/api.sx
+++ b/lib/search/api.sx
@@ -2,6 +2,8 @@
 ;; Tests and callers concatenate `search/src` with their own top-level bindings
 ;; (e.g. "result = lookupTerm \"cat\" idx\n") and evaluate via the haskell-on-sx
 ;; interpreter. Public Haskell entry points: indexDoc, lookupTerm, deleteDoc,
-;; docFreq, allTerms, tokens, positioned.
+;; docFreq, allTerms, tokens, positioned, evalQuery, parseQuery.
 
-(define search/src (str search/tokenize-src "\n" search/index-src))
+(define
+  search/src
+  (str search/tokenize-src "\n" search/index-src "\n" search/query-src))
diff --git a/lib/search/conformance.conf b/lib/search/conformance.conf
index cc75c6e0..4e418e9f 100644
--- a/lib/search/conformance.conf
+++ b/lib/search/conformance.conf
@@ -20,10 +20,12 @@ PRELOADS=(
   lib/haskell/testlib.sx
   lib/search/tokenize.sx
   lib/search/index.sx
+  lib/search/query.sx
   lib/search/api.sx
   lib/search/testlib.sx
 )
 
 SUITES=(
   "index:lib/search/tests/index.sx"
+  "boolean:lib/search/tests/boolean.sx"
 )
diff --git a/lib/search/query.sx b/lib/search/query.sx
new file mode 100644
index 00000000..23025908
--- /dev/null
+++ b/lib/search/query.sx
@@ -0,0 +1,11 @@
+;; search query AST + boolean/phrase evaluation — Haskell source fragment.
+;; Depends on tokenize + index.
+;;   data Query = Term String | And Query Query | Or Query Query
+;;              | Not Query | Phrase [String]
+;;   evalQuery :: Index -> Query -> [DocId]   (sorted, unique)
+;; Boolean ops are linear merges over docid-sorted posting lists; Not uses
+;; allDocs as the universe; Phrase checks positional adjacency.
+
+(define
+  search/query-src
+  "data Query = Term String | And Query Query | Or Query Query | Not Query | Phrase [String]\ndocsWith t idx = map fst (lookupTerm t idx)\nsortedUnion [] ys = ys\nsortedUnion xs [] = xs\nsortedUnion (x:xs) (y:ys) = if x < y then x : sortedUnion xs (y:ys) else if x > y then y : sortedUnion (x:xs) ys else x : sortedUnion xs ys\nsortedInter [] ys = []\nsortedInter xs [] = []\nsortedInter (x:xs) (y:ys) = if x < y then sortedInter xs (y:ys) else if x > y then sortedInter (x:xs) ys else x : sortedInter xs ys\nsortedDiff [] ys = []\nsortedDiff xs [] = xs\nsortedDiff (x:xs) (y:ys) = if x < y then x : sortedDiff xs (y:ys) else if x > y then sortedDiff (x:xs) ys else sortedDiff xs ys\nmergeDocs acc e = sortedUnion acc (map fst (snd e))\nallDocs idx = foldl mergeDocs [] idx\nposIn t d idx = case lookup d (lookupTerm t idx) of { Nothing -> []; Just ps -> ps }\nelemSorted x [] = False\nelemSorted x (y:ys) = if x == y then True else if x < y then False else elemSorted x ys\nphraseAtAll [] d idx p i = True\nphraseAtAll (t:ts) d idx p i = if elemSorted (p + i) (posIn t d idx) then phraseAtAll ts d idx p (i + 1) else False\nphraseStartsAt ts d idx p = phraseAtAll ts d idx p 0\nphraseInDoc [] d idx = True\nphraseInDoc (t0:rest) d idx = any (phraseStartsAt (t0:rest) d idx) (posIn t0 d idx)\nphraseHere ts idx d = phraseInDoc ts d idx\ninterStep idx acc tt = sortedInter acc (docsWith tt idx)\nphraseCands [] idx = allDocs idx\nphraseCands (t:ts) idx = foldl (interStep idx) (docsWith t idx) ts\nphraseDocs ts idx = filter (phraseHere ts idx) (phraseCands ts idx)\nevalQuery idx q = case q of { Term t -> docsWith t idx ; And a b -> sortedInter (evalQuery idx a) (evalQuery idx b) ; Or a b -> sortedUnion (evalQuery idx a) (evalQuery idx b) ; Not a -> sortedDiff (allDocs idx) (evalQuery idx a) ; Phrase ts -> phraseDocs ts idx }\n")
diff --git a/lib/search/scoreboard.json b/lib/search/scoreboard.json
index 4c5202b0..51e8a2ec 100644
--- a/lib/search/scoreboard.json
+++ b/lib/search/scoreboard.json
@@ -1,10 +1,11 @@
 {
   "lang": "search",
-  "total_passed": 18,
+  "total_passed": 46,
   "total_failed": 0,
-  "total": 18,
+  "total": 46,
   "suites": [
-    {"name":"index","passed":18,"failed":0,"total":18}
+    {"name":"index","passed":18,"failed":0,"total":18},
+    {"name":"boolean","passed":28,"failed":0,"total":28}
   ],
-  "generated": "2026-06-06T18:12:50+00:00"
+  "generated": "2026-06-06T18:46:54+00:00"
 }
diff --git a/lib/search/scoreboard.md b/lib/search/scoreboard.md
index cf9cabce..a214ce29 100644
--- a/lib/search/scoreboard.md
+++ b/lib/search/scoreboard.md
@@ -1,7 +1,8 @@
 # search scoreboard
 
-**18 / 18 passing** (0 failure(s)).
+**46 / 46 passing** (0 failure(s)).
 
 | Suite | Passed | Total | Status |
 |-------|--------|-------|--------|
 | index | 18 | 18 | ok |
+| boolean | 28 | 28 | ok |
diff --git a/lib/search/testlib.sx b/lib/search/testlib.sx
index 9c965b05..1e2212d0 100644
--- a/lib/search/testlib.sx
+++ b/lib/search/testlib.sx
@@ -27,3 +27,24 @@
     (search-hk->sx
       (hk-deep-force
         (get (hk-eval-program (hk-core (str search/src extra))) name)))))
+
+(define
+  search-join
+  (fn
+    (sep xs)
+    (cond
+      ((empty? xs) "")
+      ((empty? (rest xs)) (first xs))
+      (:else (str (first xs) sep (search-join sep (rest xs)))))))
+
+;; Batch many haskell expressions into ONE program evaluation (amortizes the
+;; cost of parsing/binding search/src — important under heavy CPU load).
+;; `setup` is extra top-level Haskell; `exprs` is a list of expression strings
+;; whose results form a single haskell list. Returns the SX list of results.
+(define
+  search-batch
+  (fn
+    (setup exprs)
+    (search-eval
+      (str setup "\nresult = [" (search-join ", " exprs) "]\n")
+      "result")))
diff --git a/lib/search/tests/boolean.sx b/lib/search/tests/boolean.sx
new file mode 100644
index 00000000..f6e48ea8
--- /dev/null
+++ b/lib/search/tests/boolean.sx
@@ -0,0 +1,123 @@
+;; Phase 2 — query AST + boolean/phrase evaluation (hand-built Query values).
+;; Corpus:
+;;   doc 1 "the quick brown dog"   -> the quick brown dog
+;;   doc 2 "a quick brown fox"     -> a quick brown fox
+;;   doc 3 "the dog barks loudly"  -> the dog barks loudly
+;; All queries run in ONE program evaluation (search-batch) to stay fast.
+
+(define
+  search-corpus
+  "idx = indexDoc 3 \"the dog barks loudly\" (indexDoc 2 \"a quick brown fox\" (indexDoc 1 \"the quick brown dog\" emptyIndex))\n")
+
+(define
+  bool-cases
+  (list
+    (list
+      "term in two docs"
+      "evalQuery idx (Term \"quick\")"
+      (list 1 2))
+    (list
+      "term in two docs (the)"
+      "evalQuery idx (Term \"the\")"
+      (list 1 3))
+    (list "term in one doc" "evalQuery idx (Term \"fox\")" (list 2))
+    (list "term absent" "evalQuery idx (Term \"zzz\")" (list))
+    (list
+      "term case-sensitive at AST level"
+      "evalQuery idx (Term \"QUICK\")"
+      (list))
+    (list "term on empty index" "evalQuery emptyIndex (Term \"cat\")" (list))
+    (list
+      "and both terms"
+      "evalQuery idx (And (Term \"quick\") (Term \"brown\"))"
+      (list 1 2))
+    (list
+      "and overlap subset"
+      "evalQuery idx (And (Term \"the\") (Term \"dog\"))"
+      (list 1 3))
+    (list
+      "and disjoint is empty"
+      "evalQuery idx (And (Term \"the\") (Term \"fox\"))"
+      (list))
+    (list
+      "and right-nested"
+      "evalQuery idx (And (Term \"the\") (And (Term \"dog\") (Term \"barks\")))"
+      (list 3))
+    (list
+      "or two singletons"
+      "evalQuery idx (Or (Term \"fox\") (Term \"barks\"))"
+      (list 2 3))
+    (list
+      "or all docs"
+      "evalQuery idx (Or (Term \"quick\") (Term \"the\"))"
+      (list 1 2 3))
+    (list
+      "or with absent term"
+      "evalQuery idx (Or (Term \"fox\") (Term \"zzz\"))"
+      (list 2))
+    (list "not term" "evalQuery idx (Not (Term \"the\"))" (list 2))
+    (list "not term 2" "evalQuery idx (Not (Term \"quick\"))" (list 3))
+    (list
+      "and with not"
+      "evalQuery idx (And (Term \"quick\") (Not (Term \"the\")))"
+      (list 2))
+    (list
+      "double negation"
+      "evalQuery idx (Not (Not (Term \"fox\")))"
+      (list 2))
+    (list
+      "or of and with term"
+      "evalQuery idx (Or (And (Term \"the\") (Term \"dog\")) (Term \"fox\"))"
+      (list 1 2 3))
+    (list
+      "phrase adjacent both docs"
+      "evalQuery idx (Phrase [\"quick\", \"brown\"])"
+      (list 1 2))
+    (list
+      "phrase adjacent one doc"
+      "evalQuery idx (Phrase [\"brown\", \"dog\"])"
+      (list 1))
+    (list
+      "phrase the quick"
+      "evalQuery idx (Phrase [\"the\", \"quick\"])"
+      (list 1))
+    (list
+      "phrase dog barks"
+      "evalQuery idx (Phrase [\"dog\", \"barks\"])"
+      (list 3))
+    (list
+      "phrase non-adjacent empty"
+      "evalQuery idx (Phrase [\"quick\", \"dog\"])"
+      (list))
+    (list
+      "phrase order matters"
+      "evalQuery idx (Phrase [\"brown\", \"quick\"])"
+      (list))
+    (list
+      "phrase single term"
+      "evalQuery idx (Phrase [\"dog\"])"
+      (list 1 3))
+    (list
+      "phrase three terms"
+      "evalQuery idx (Phrase [\"the\", \"dog\", \"barks\"])"
+      (list 3))
+    (list
+      "and of phrase and term"
+      "evalQuery idx (And (Phrase [\"quick\", \"brown\"]) (Term \"dog\"))"
+      (list 1))
+    (list
+      "not of phrase"
+      "evalQuery idx (Not (Phrase [\"quick\", \"brown\"]))"
+      (list 3))))
+
+(define
+  bool-results
+  (search-batch search-corpus (map (fn (c) (nth c 1)) bool-cases)))
+
+(map-indexed
+  (fn
+    (i c)
+    (hk-test (nth c 0) (nth bool-results i) (nth c 2)))
+  bool-cases)
+
+{:fail hk-test-fail :pass hk-test-pass :fails hk-test-fails}
diff --git a/lib/search/tests/index.sx b/lib/search/tests/index.sx
index 2e9cb700..9415866f 100644
--- a/lib/search/tests/index.sx
+++ b/lib/search/tests/index.sx
@@ -1,119 +1,88 @@
 ;; Phase 1 — tokenize + inverted index.
+;; All cases run in ONE program evaluation (search-batch) to stay fast under load.
+;; Scalar results (docFreq) are wrapped as singleton lists so the batch is a list
+;; of lists.
 
-(hk-test
-  "tokens basic lowercases"
-  (search-eval "\nresult = tokens \"The Cat sat\"\n" "result")
-  (list "the" "cat" "sat"))
-
-(hk-test
-  "tokens strips punctuation"
-  (search-eval "\nresult = tokens \"Hello, World!\"\n" "result")
-  (list "hello" "world"))
-
-(hk-test
-  "tokens collapses whitespace"
-  (search-eval "\nresult = tokens \"  a   b  \"\n" "result")
-  (list "a" "b"))
-
-(hk-test
-  "tokens empty is empty"
-  (search-eval "\nresult = tokens \"\"\n" "result")
-  (list))
-
-(hk-test
-  "tokens keeps digits"
-  (search-eval "\nresult = tokens \"abc123 x9\"\n" "result")
-  (list "abc123" "x9"))
-
-(hk-test
-  "positioned attaches ordinals"
-  (search-eval "\nresult = positioned \"a b a\"\n" "result")
-  (list (list "a" 0) (list "b" 1) (list "a" 2)))
-
-(hk-test
-  "index + lookup single doc"
-  (search-eval
-    "\nresult = lookupTerm \"cat\" (indexDoc 1 \"the cat sat\" emptyIndex)\n"
-    "result")
-  (list (list 1 (list 1))))
-
-(hk-test
-  "lookup missing term is empty"
-  (search-eval
-    "\nresult = lookupTerm \"dog\" (indexDoc 1 \"the cat sat\" emptyIndex)\n"
-    "result")
-  (list))
-
-(hk-test
-  "lookup records all positions"
-  (search-eval
-    "\nresult = lookupTerm \"the\" (indexDoc 1 \"the cat the dog the\" emptyIndex)\n"
-    "result")
-  (list (list 1 (list 0 2 4))))
-
-(hk-test
-  "multi-doc posting list sorted by docid"
-  (search-eval
-    "\nresult = lookupTerm \"x\" (indexDoc 1 \"x y\" (indexDoc 2 \"x z\" emptyIndex))\n"
-    "result")
+(define
+  index-cases
   (list
-    (list 1 (list 0))
-    (list 2 (list 0))))
+    (list
+      "tokens basic lowercases"
+      "tokens \"The Cat sat\""
+      (list "the" "cat" "sat"))
+    (list
+      "tokens strips punctuation"
+      "tokens \"Hello, World!\""
+      (list "hello" "world"))
+    (list "tokens collapses whitespace" "tokens \"  a   b  \"" (list "a" "b"))
+    (list "tokens empty is empty" "tokens \"\"" (list))
+    (list "tokens keeps digits" "tokens \"abc123 x9\"" (list "abc123" "x9"))
+    (list
+      "positioned attaches ordinals"
+      "positioned \"a b a\""
+      (list
+        (list "a" 0)
+        (list "b" 1)
+        (list "a" 2)))
+    (list
+      "index + lookup single doc"
+      "lookupTerm \"cat\" (indexDoc 1 \"the cat sat\" emptyIndex)"
+      (list (list 1 (list 1))))
+    (list
+      "lookup missing term is empty"
+      "lookupTerm \"dog\" (indexDoc 1 \"the cat sat\" emptyIndex)"
+      (list))
+    (list
+      "lookup records all positions"
+      "lookupTerm \"the\" (indexDoc 1 \"the cat the dog the\" emptyIndex)"
+      (list (list 1 (list 0 2 4))))
+    (list
+      "multi-doc posting list sorted by docid"
+      "lookupTerm \"x\" (indexDoc 1 \"x y\" (indexDoc 2 \"x z\" emptyIndex))"
+      (list
+        (list 1 (list 0))
+        (list 2 (list 0))))
+    (list
+      "index/query case symmetry"
+      "lookupTerm \"cat\" (indexDoc 1 \"CAT Cat cat\" emptyIndex)"
+      (list (list 1 (list 0 1 2))))
+    (list
+      "re-index replaces a doc"
+      "lookupTerm \"a\" (indexDoc 1 \"a a a\" (indexDoc 1 \"a\" emptyIndex))"
+      (list (list 1 (list 0 1 2))))
+    (list
+      "delete removes a doc"
+      "lookupTerm \"cat\" (deleteDoc 1 (indexDoc 1 \"the cat\" emptyIndex))"
+      (list))
+    (list
+      "delete leaves other docs"
+      "lookupTerm \"cat\" (deleteDoc 2 (indexDoc 2 \"big cat\" (indexDoc 1 \"the cat\" emptyIndex)))"
+      (list (list 1 (list 1))))
+    (list
+      "docFreq counts docs"
+      "[docFreq \"cat\" (indexDoc 2 \"a cat\" (indexDoc 1 \"the cat\" emptyIndex))]"
+      (list 2))
+    (list
+      "docFreq zero for missing"
+      "[docFreq \"zzz\" (indexDoc 1 \"a b\" emptyIndex)]"
+      (list 0))
+    (list
+      "allTerms sorted and unique"
+      "allTerms (indexDoc 1 \"banana apple cherry apple\" emptyIndex)"
+      (list "apple" "banana" "cherry"))
+    (list
+      "allTerms merged across docs"
+      "allTerms (indexDoc 2 \"d a\" (indexDoc 1 \"c b\" emptyIndex))"
+      (list "a" "b" "c" "d"))))
 
-(hk-test
-  "index/query case symmetry"
-  (search-eval
-    "\nresult = lookupTerm \"cat\" (indexDoc 1 \"CAT Cat cat\" emptyIndex)\n"
-    "result")
-  (list (list 1 (list 0 1 2))))
+(define
+  index-results
+  (search-batch "" (map (fn (c) (nth c 1)) index-cases)))
 
-(hk-test
-  "re-index replaces a doc"
-  (search-eval
-    "\nresult = lookupTerm \"a\" (indexDoc 1 \"a a a\" (indexDoc 1 \"a\" emptyIndex))\n"
-    "result")
-  (list (list 1 (list 0 1 2))))
-
-(hk-test
-  "delete removes a doc"
-  (search-eval
-    "\nresult = lookupTerm \"cat\" (deleteDoc 1 (indexDoc 1 \"the cat\" emptyIndex))\n"
-    "result")
-  (list))
-
-(hk-test
-  "delete leaves other docs"
-  (search-eval
-    "\nresult = lookupTerm \"cat\" (deleteDoc 2 (indexDoc 2 \"big cat\" (indexDoc 1 \"the cat\" emptyIndex)))\n"
-    "result")
-  (list (list 1 (list 1))))
-
-(hk-test
-  "docFreq counts docs"
-  (search-eval
-    "\nresult = docFreq \"cat\" (indexDoc 2 \"a cat\" (indexDoc 1 \"the cat\" emptyIndex))\n"
-    "result")
-  2)
-
-(hk-test
-  "docFreq zero for missing"
-  (search-eval
-    "\nresult = docFreq \"zzz\" (indexDoc 1 \"a b\" emptyIndex)\n"
-    "result")
-  0)
-
-(hk-test
-  "allTerms sorted and unique"
-  (search-eval
-    "\nresult = allTerms (indexDoc 1 \"banana apple cherry apple\" emptyIndex)\n"
-    "result")
-  (list "apple" "banana" "cherry"))
-
-(hk-test
-  "allTerms merged across docs"
-  (search-eval
-    "\nresult = allTerms (indexDoc 2 \"d a\" (indexDoc 1 \"c b\" emptyIndex))\n"
-    "result")
-  (list "a" "b" "c" "d"))
+(map-indexed
+  (fn
+    (i c)
+    (hk-test (nth c 0) (nth index-results i) (nth c 2)))
+  index-cases)
 
 {:fail hk-test-fail :pass hk-test-pass :fails hk-test-fails}
diff --git a/plans/search-on-sx.md b/plans/search-on-sx.md
index 1baf6e9a..43c60240 100644
--- a/plans/search-on-sx.md
+++ b/plans/search-on-sx.md
@@ -76,13 +76,14 @@ lib/search/index.sx                     lib/search/eval.sx
 
 ## Phase 2 — Query AST + boolean evaluation
 
-- [ ] Query ADT: `Term Text | And Query Query | Or Query Query | Not Query |
-  Phrase [Text]`
+- [x] Query ADT: `Term String | And Query Query | Or Query Query | Not Query |
+  Phrase [String]` (in `lib/search/query.sx`)
 - [ ] `lib/search/parse.sx` — query syntax parser (boolean operators, quoted phrases)
-- [ ] `lib/search/eval.sx` — boolean eval via set ops on posting lists
-- [ ] phrase eval — adjacency check using positions
-- [ ] `lib/search/tests/boolean.sx` — 25+ cases: term, and, or, not, phrase,
-  composition, parser edge cases
+- [x] `lib/search/query.sx` — boolean eval via set ops on docid-sorted posting lists
+  (sortedUnion/Inter/Diff, Not over allDocs universe)
+- [x] phrase eval — positional adjacency check (phraseInDoc / phraseStartsAt)
+- [x] `lib/search/tests/boolean.sx` — 28 cases: term, and, or, not, phrase,
+  composition (parser edge cases move to the parse.sx suite)
 
 ## Phase 3 — Ranking
 
@@ -102,6 +103,14 @@ lib/search/index.sx                     lib/search/eval.sx
 
 ## Progress log
 
+- **Phase 2 boolean/phrase eval (46/46 total).** Query ADT
+  `Term|And|Or|Not|Phrase` + `evalQuery :: Index -> Query -> [DocId]` in query.sx.
+  Boolean ops are linear merges over docid-sorted posting lists; Not subtracts from
+  the allDocs universe; Phrase checks positional adjacency. 28 tests in boolean.sx.
+  Refactored both suites to **batch all cases into one program eval** (search-batch
+  in testlib) — under the heavy CPU load on this box (~11 on 2 cores), 18–28 separate
+  hk-eval-program calls timed out; one combined eval per suite is ~20× faster.
+  Parser (parse.sx) is the remaining Phase 2 box.
 - **Phase 1 complete (18/18).** Tokenizer (lowercase + strip punctuation + positions),
   inverted index as sorted assoc-list `[(Term,[(DocId,[Pos])])]`, indexDoc/deleteDoc/
   lookupTerm/docFreq/allTerms. Search lib is Haskell source assembled into `search/src`

From 4c84decc016eaa896a532e5a95712a8bf304a22a Mon Sep 17 00:00:00 2001
From: giles <giles.bradshaw@rose-ash.com>
Date: Sat, 6 Jun 2026 19:43:10 +0000
Subject: [PATCH 04/15] search: Phase 2 query parser + 32 tests

Query tokenizer + recursive-descent parser: OR<AND<NOT precedence, implicit AND
on adjacency, quoted phrases, parens, case-insensitive keywords. parseQuery,
searchQuery, showQ. Worked around haskell-on-sx parser limits (ord-based
delimiters; multi-clause fns instead of []-pattern case alts). 78/78.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 lib/search/api.sx           |  11 ++-
 lib/search/conformance.conf |   2 +
 lib/search/parse.sx         |  18 +++++
 lib/search/scoreboard.json  |   9 +--
 lib/search/scoreboard.md    |   3 +-
 lib/search/tests/parse.sx   | 139 ++++++++++++++++++++++++++++++++++++
 plans/search-on-sx.md       |  16 ++++-
 7 files changed, 189 insertions(+), 9 deletions(-)
 create mode 100644 lib/search/parse.sx
 create mode 100644 lib/search/tests/parse.sx

diff --git a/lib/search/api.sx b/lib/search/api.sx
index e2da2bb6..5a275f4d 100644
--- a/lib/search/api.sx
+++ b/lib/search/api.sx
@@ -2,8 +2,15 @@
 ;; Tests and callers concatenate `search/src` with their own top-level bindings
 ;; (e.g. "result = lookupTerm \"cat\" idx\n") and evaluate via the haskell-on-sx
 ;; interpreter. Public Haskell entry points: indexDoc, lookupTerm, deleteDoc,
-;; docFreq, allTerms, tokens, positioned, evalQuery, parseQuery.
+;; docFreq, allTerms, tokens, positioned, evalQuery, parseQuery, searchQuery.
 
 (define
   search/src
-  (str search/tokenize-src "\n" search/index-src "\n" search/query-src))
+  (str
+    search/tokenize-src
+    "\n"
+    search/index-src
+    "\n"
+    search/query-src
+    "\n"
+    search/parse-src))
diff --git a/lib/search/conformance.conf b/lib/search/conformance.conf
index 4e418e9f..6e9e8309 100644
--- a/lib/search/conformance.conf
+++ b/lib/search/conformance.conf
@@ -21,6 +21,7 @@ PRELOADS=(
   lib/search/tokenize.sx
   lib/search/index.sx
   lib/search/query.sx
+  lib/search/parse.sx
   lib/search/api.sx
   lib/search/testlib.sx
 )
@@ -28,4 +29,5 @@ PRELOADS=(
 SUITES=(
   "index:lib/search/tests/index.sx"
   "boolean:lib/search/tests/boolean.sx"
+  "parse:lib/search/tests/parse.sx"
 )
diff --git a/lib/search/parse.sx b/lib/search/parse.sx
new file mode 100644
index 00000000..a1dc4c8b
--- /dev/null
+++ b/lib/search/parse.sx
@@ -0,0 +1,18 @@
+;; search query parser — Haskell source fragment. Depends on tokenize + query.
+;; Grammar (precedence OR < AND < NOT):
+;;   expr   = orExpr
+;;   orExpr = andExpr (OR andExpr)*
+;;   andExpr= notExpr ((AND | <implicit>) notExpr)*   -- adjacency means AND
+;;   notExpr= NOT notExpr | atom
+;;   atom   = '(' expr ')' | '"' word+ '"' | word
+;; Keywords AND/OR/NOT are case-insensitive; bare words are normalized via tokens.
+;; Gotchas: delimiters matched by ord (escaped char literals like '\"' break the
+;; haskell-on-sx tokenizer); an [] *pattern* inside a `case` alt also breaks the
+;; parser, so qNormTerm/qDropRP/showQ are written as multi-clause functions.
+;;   parseQuery  :: String -> Query
+;;   searchQuery :: String -> Index -> [DocId]
+;;   showQ       :: Query -> String   -- canonical render for tests/debug
+
+(define
+  search/parse-src
+  "data QTok = TAnd | TOr | TNot | TLP | TRP | TWord String | TPhrase [String]\nqIsSpace c = ord c == 32\nqIsLP c = ord c == 40\nqIsRP c = ord c == 41\nqIsQuote c = ord c == 34\nqDelim c = qIsSpace c || qIsLP c || qIsRP c || qIsQuote c\nqReadWord [] = ([], [])\nqReadWord (c:cs) = if qDelim c then ([], c:cs) else let (w, rest) = qReadWord cs in (c:w, rest)\nqReadPhrase [] = ([], [])\nqReadPhrase (c:cs) = if qIsQuote c then ([], cs) else let (w, rest) = qReadPhrase cs in (c:w, rest)\ntoUpperCh c = chr (toUpper (ord c))\nqUpper w = joinChars (map toUpperCh w)\nqFirstTok [] = \"\"\nqFirstTok (x:xs) = x\nqNormTerm w = qFirstTok (tokens w)\nqClassify w = if qUpper w == \"AND\" then TAnd else if qUpper w == \"OR\" then TOr else if qUpper w == \"NOT\" then TNot else TWord (qNormTerm w)\nqPhraseTok cs = let (p, rest) = qReadPhrase cs in TPhrase (tokens p) : qtokens rest\nqWordTok cs = let (w, rest) = qReadWord cs in qClassify w : qtokens rest\nqtokens [] = []\nqtokens (c:cs) = if qIsSpace c then qtokens cs else if qIsLP c then TLP : qtokens cs else if qIsRP c then TRP : qtokens cs else if qIsQuote c then qPhraseTok cs else qWordTok (c:cs)\nqDropRP (q, (TRP:rest)) = (q, rest)\nqDropRP (q, ts) = (q, ts)\nparseAtom [] = (Term \"\", [])\nparseAtom (TLP:ts) = qDropRP (parseExpr ts)\nparseAtom (TPhrase ps : ts) = (Phrase ps, ts)\nparseAtom (TWord w : ts) = (Term w, ts)\nparseAtom ts = (Term \"\", ts)\nqWrapNot (q, ts) = (Not q, ts)\nparseNot (TNot:ts) = qWrapNot (parseNot ts)\nparseNot ts = parseAtom ts\nqStartsAtom (TWord w : ts) = True\nqStartsAtom (TPhrase p : ts) = True\nqStartsAtom (TLP : ts) = True\nqStartsAtom (TNot : ts) = True\nqStartsAtom ts = False\nqAndStep left ts = let (r, rest) = parseNot ts in parseAndR (And left r) rest\nparseAndR left (TAnd:ts) = qAndStep left ts\nparseAndR left ts = if qStartsAtom ts then qAndStep left ts else (left, ts)\nparseAnd ts = let (l, rest) = parseNot ts in parseAndR l rest\nparseOrR left (TOr:ts) = let (r, rest) = parseAnd ts in parseOrR (Or left r) rest\nparseOrR left ts = (left, ts)\nparseExpr ts = let (l, rest) = parseAnd ts in parseOrR l rest\nparseQuery s = fst (parseExpr (qtokens s))\nsearchQuery s idx = evalQuery idx (parseQuery s)\njoinSp [] = \"\"\njoinSp [x] = x\njoinSp (x:xs) = x ++ \"-\" ++ joinSp xs\nshowQ (Term t) = \"T:\" ++ t\nshowQ (And a b) = \"(\" ++ showQ a ++ \" & \" ++ showQ b ++ \")\"\nshowQ (Or a b) = \"(\" ++ showQ a ++ \" | \" ++ showQ b ++ \")\"\nshowQ (Not a) = \"!\" ++ showQ a\nshowQ (Phrase ts) = \"P:\" ++ joinSp ts\n")
diff --git a/lib/search/scoreboard.json b/lib/search/scoreboard.json
index 51e8a2ec..4aab2a38 100644
--- a/lib/search/scoreboard.json
+++ b/lib/search/scoreboard.json
@@ -1,11 +1,12 @@
 {
   "lang": "search",
-  "total_passed": 46,
+  "total_passed": 78,
   "total_failed": 0,
-  "total": 46,
+  "total": 78,
   "suites": [
     {"name":"index","passed":18,"failed":0,"total":18},
-    {"name":"boolean","passed":28,"failed":0,"total":28}
+    {"name":"boolean","passed":28,"failed":0,"total":28},
+    {"name":"parse","passed":32,"failed":0,"total":32}
   ],
-  "generated": "2026-06-06T18:46:54+00:00"
+  "generated": "2026-06-06T19:42:39+00:00"
 }
diff --git a/lib/search/scoreboard.md b/lib/search/scoreboard.md
index a214ce29..0a71fd42 100644
--- a/lib/search/scoreboard.md
+++ b/lib/search/scoreboard.md
@@ -1,8 +1,9 @@
 # search scoreboard
 
-**46 / 46 passing** (0 failure(s)).
+**78 / 78 passing** (0 failure(s)).
 
 | Suite | Passed | Total | Status |
 |-------|--------|-------|--------|
 | index | 18 | 18 | ok |
 | boolean | 28 | 28 | ok |
+| parse | 32 | 32 | ok |
diff --git a/lib/search/tests/parse.sx b/lib/search/tests/parse.sx
new file mode 100644
index 00000000..8f7f0ebd
--- /dev/null
+++ b/lib/search/tests/parse.sx
@@ -0,0 +1,139 @@
+;; Phase 2 — query parser (parseQuery / searchQuery).
+;; AST cases assert showQ (parseQuery s); search cases assert searchQuery s idx
+;; against the standard corpus. Each group runs in one batched program eval.
+;;   doc 1 "the quick brown dog"   doc 2 "a quick brown fox"   doc 3 "the dog barks loudly"
+
+(define
+  parse-corpus
+  "idx = indexDoc 3 \"the dog barks loudly\" (indexDoc 2 \"a quick brown fox\" (indexDoc 1 \"the quick brown dog\" emptyIndex))\n")
+
+(define
+  ast-cases
+  (list
+    (list "single term" "showQ (parseQuery \"cat\")" "T:cat")
+    (list "term normalized" "showQ (parseQuery \"CAT\")" "T:cat")
+    (list "explicit and" "showQ (parseQuery \"cat AND dog\")" "(T:cat & T:dog)")
+    (list
+      "lowercase and keyword"
+      "showQ (parseQuery \"cat and dog\")"
+      "(T:cat & T:dog)")
+    (list "implicit and" "showQ (parseQuery \"cat dog\")" "(T:cat & T:dog)")
+    (list "or" "showQ (parseQuery \"cat OR dog\")" "(T:cat | T:dog)")
+    (list "not" "showQ (parseQuery \"NOT cat\")" "!T:cat")
+    (list
+      "and binds tighter than or"
+      "showQ (parseQuery \"cat AND dog OR bird\")"
+      "((T:cat & T:dog) | T:bird)")
+    (list
+      "or then and"
+      "showQ (parseQuery \"cat OR dog AND bird\")"
+      "(T:cat | (T:dog & T:bird))")
+    (list
+      "parens override precedence"
+      "showQ (parseQuery \"(cat OR dog) AND bird\")"
+      "((T:cat | T:dog) & T:bird)")
+    (list
+      "and with not"
+      "showQ (parseQuery \"cat AND NOT dog\")"
+      "(T:cat & !T:dog)")
+    (list
+      "two-word phrase"
+      "showQ (parseQuery \"\\\"quick brown\\\"\")"
+      "P:quick-brown")
+    (list
+      "three-word phrase"
+      "showQ (parseQuery \"\\\"quick brown fox\\\"\")"
+      "P:quick-brown-fox")
+    (list
+      "and left-assoc"
+      "showQ (parseQuery \"a AND b AND c\")"
+      "((T:a & T:b) & T:c)")
+    (list
+      "or left-assoc"
+      "showQ (parseQuery \"a OR b OR c\")"
+      "((T:a | T:b) | T:c)")
+    (list
+      "punctuation stripped"
+      "showQ (parseQuery \"cat, dog!\")"
+      "(T:cat & T:dog)")))
+
+(define
+  search-cases
+  (list
+    (list "term" "searchQuery \"quick\" idx" (list 1 2))
+    (list
+      "term normalized"
+      "searchQuery \"QUICK\" idx"
+      (list 1 2))
+    (list
+      "explicit and"
+      "searchQuery \"quick AND brown\" idx"
+      (list 1 2))
+    (list
+      "implicit and"
+      "searchQuery \"quick brown\" idx"
+      (list 1 2))
+    (list "and disjoint" "searchQuery \"the AND fox\" idx" (list))
+    (list "or" "searchQuery \"fox OR barks\" idx" (list 2 3))
+    (list "not" "searchQuery \"NOT the\" idx" (list 2))
+    (list "and not" "searchQuery \"quick AND NOT the\" idx" (list 2))
+    (list
+      "precedence and-or"
+      "searchQuery \"the AND dog OR fox\" idx"
+      (list 1 2 3))
+    (list
+      "precedence or-and"
+      "searchQuery \"fox OR the AND dog\" idx"
+      (list 1 2 3))
+    (list
+      "parens"
+      "searchQuery \"the AND (dog OR fox)\" idx"
+      (list 1 3))
+    (list
+      "phrase"
+      "searchQuery \"\\\"quick brown\\\"\" idx"
+      (list 1 2))
+    (list
+      "phrase one doc"
+      "searchQuery \"\\\"brown dog\\\"\" idx"
+      (list 1))
+    (list
+      "phrase and term"
+      "searchQuery \"\\\"quick brown\\\" AND dog\" idx"
+      (list 1))
+    (list
+      "not phrase"
+      "searchQuery \"NOT \\\"quick brown\\\"\" idx"
+      (list 3))
+    (list
+      "implicit and terms"
+      "searchQuery \"dog barks\" idx"
+      (list 3))))
+
+(define
+  ast-results
+  (search-batch "" (map (fn (c) (nth c 1)) ast-cases)))
+(define
+  search-results
+  (search-batch
+    parse-corpus
+    (map (fn (c) (nth c 1)) search-cases)))
+
+(map-indexed
+  (fn
+    (i c)
+    (hk-test
+      (str "ast: " (nth c 0))
+      (nth ast-results i)
+      (nth c 2)))
+  ast-cases)
+(map-indexed
+  (fn
+    (i c)
+    (hk-test
+      (str "search: " (nth c 0))
+      (nth search-results i)
+      (nth c 2)))
+  search-cases)
+
+{:fail hk-test-fail :pass hk-test-pass :fails hk-test-fails}
diff --git a/plans/search-on-sx.md b/plans/search-on-sx.md
index 43c60240..1ebb57b6 100644
--- a/plans/search-on-sx.md
+++ b/plans/search-on-sx.md
@@ -10,7 +10,7 @@ extension that merges per-peer indices.
 
 ## Status (rolling)
 
-`bash lib/search/conformance.sh` → **18/18** (Phase 1 complete)
+`bash lib/search/conformance.sh` → **78/78** (Phases 1–2 complete)
 
 ## Ground rules
 
@@ -78,7 +78,9 @@ lib/search/index.sx                     lib/search/eval.sx
 
 - [x] Query ADT: `Term String | And Query Query | Or Query Query | Not Query |
   Phrase [String]` (in `lib/search/query.sx`)
-- [ ] `lib/search/parse.sx` — query syntax parser (boolean operators, quoted phrases)
+- [x] `lib/search/parse.sx` — query syntax parser: tokenizer + recursive-descent
+  (OR < AND < NOT precedence, implicit AND on adjacency, quoted phrases, parens,
+  case-insensitive keywords); `parseQuery`, `searchQuery`, `showQ`
 - [x] `lib/search/query.sx` — boolean eval via set ops on docid-sorted posting lists
   (sortedUnion/Inter/Diff, Not over allDocs universe)
 - [x] phrase eval — positional adjacency check (phraseInDoc / phraseStartsAt)
@@ -103,6 +105,16 @@ lib/search/index.sx                     lib/search/eval.sx
 
 ## Progress log
 
+- **Phase 2 complete — parser (78/78 total).** Query tokenizer (ord-based
+  delimiters, quoted phrases) + recursive-descent parser with OR<AND<NOT precedence,
+  implicit AND on adjacency, parens, case-insensitive keywords. `parseQuery`,
+  `searchQuery`, `showQ` (canonical render for AST tests). 32 tests in parse.sx.
+  **haskell-on-sx parser gotchas hit while writing this (see parse.sx header):**
+  (1) escaped char literals like `'\"'` break the tokenizer — match delimiters by
+  `ord c == 34`; (2) an `[]` *pattern* inside a `case` alt breaks the parser — use
+  multi-clause functions instead; (3) `case`/constructor patterns and `let (a,b)=..`
+  are fine. Embedded Haskell string literals in a `.sx` source string need single
+  `\"`, not `\\\"`.
 - **Phase 2 boolean/phrase eval (46/46 total).** Query ADT
   `Term|And|Or|Not|Phrase` + `evalQuery :: Index -> Query -> [DocId]` in query.sx.
   Boolean ops are linear merges over docid-sorted posting lists; Not subtracts from

From a3f9d4f6c90e66d6efd282e726eb5297d6328e44 Mon Sep 17 00:00:00 2001
From: giles <giles.bradshaw@rose-ash.com>
Date: Sat, 6 Jun 2026 19:56:50 +0000
Subject: [PATCH 05/15] search: Phase 3 ranking TF-IDF + BM25 + top-N + 23
 tests

rankTfIdf and rankBm25 (configurable k1/b) over the candidate set, float scores
with deterministic DocId tiebreak; topNTfIdf/topNBm25. df/idf derived from
posting-list length. Tests cover tf/idf behavior, a BM25-vs-TF-IDF flip from
length-norm + tf-saturation, the b-parameter effect, tiebreak stability. 101/101.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 lib/search/api.sx           |  7 ++-
 lib/search/conformance.conf |  2 +
 lib/search/rank.sx          | 14 ++++++
 lib/search/scoreboard.json  |  9 ++--
 lib/search/scoreboard.md    |  3 +-
 lib/search/tests/rank.sx    | 90 +++++++++++++++++++++++++++++++++++++
 plans/search-on-sx.md       | 21 ++++++---
 7 files changed, 132 insertions(+), 14 deletions(-)
 create mode 100644 lib/search/rank.sx
 create mode 100644 lib/search/tests/rank.sx

diff --git a/lib/search/api.sx b/lib/search/api.sx
index 5a275f4d..2eaeac96 100644
--- a/lib/search/api.sx
+++ b/lib/search/api.sx
@@ -2,7 +2,8 @@
 ;; Tests and callers concatenate `search/src` with their own top-level bindings
 ;; (e.g. "result = lookupTerm \"cat\" idx\n") and evaluate via the haskell-on-sx
 ;; interpreter. Public Haskell entry points: indexDoc, lookupTerm, deleteDoc,
-;; docFreq, allTerms, tokens, positioned, evalQuery, parseQuery, searchQuery.
+;; docFreq, allTerms, tokens, positioned, evalQuery, parseQuery, searchQuery,
+;; rankTfIdf, rankBm25, topNTfIdf, topNBm25.
 
 (define
   search/src
@@ -13,4 +14,6 @@
     "\n"
     search/query-src
     "\n"
-    search/parse-src))
+    search/parse-src
+    "\n"
+    search/rank-src))
diff --git a/lib/search/conformance.conf b/lib/search/conformance.conf
index 6e9e8309..9793c9cc 100644
--- a/lib/search/conformance.conf
+++ b/lib/search/conformance.conf
@@ -22,6 +22,7 @@ PRELOADS=(
   lib/search/index.sx
   lib/search/query.sx
   lib/search/parse.sx
+  lib/search/rank.sx
   lib/search/api.sx
   lib/search/testlib.sx
 )
@@ -30,4 +31,5 @@ SUITES=(
   "index:lib/search/tests/index.sx"
   "boolean:lib/search/tests/boolean.sx"
   "parse:lib/search/tests/parse.sx"
+  "rank:lib/search/tests/rank.sx"
 )
diff --git a/lib/search/rank.sx b/lib/search/rank.sx
new file mode 100644
index 00000000..efe40bb5
--- /dev/null
+++ b/lib/search/rank.sx
@@ -0,0 +1,14 @@
+;; search ranking — Haskell source fragment. Depends on tokenize + index + query.
+;; Ranked retrieval over the candidate set (docs containing any query term).
+;; Scores are floats; ties broken by DocId ascending (deterministic).
+;;   numDocs    :: Index -> Int
+;;   docFreq    :: Term -> Index -> Int           (from index)
+;;   docLen     :: DocId -> Index -> Int
+;;   rankTfIdf  :: [Term] -> Index -> [DocId]
+;;   topNTfIdf  :: Int -> [Term] -> Index -> [DocId]
+;;   rankBm25   :: Float -> Float -> [Term] -> Index -> [DocId]   (k1, b)
+;;   topNBm25   :: Int -> Float -> Float -> [Term] -> Index -> [DocId]
+
+(define
+  search/rank-src
+  "numDocs idx = length (allDocs idx)\ntfIn t d idx = length (posIn t d idx)\nqIdf n df = if df == 0 then 0 else log (n / df)\nidf t idx = qIdf (numDocs idx) (docFreq t idx)\ntermScoreTf idx d t = tfIn t d idx * idf t idx\ntfidfDoc ts idx d = sum (map (termScoreTf idx d) ts)\ncandStep idx acc t = sortedUnion acc (docsWith t idx)\ncandDocs ts idx = foldl (candStep idx) [] ts\ncmpScore p1 p2 = if fst p1 > fst p2 then LT else if fst p1 < fst p2 then GT else compare (snd p1) (snd p2)\nmkPair f ts idx d = (f ts idx d, d)\nrankWith f ts idx = map snd (sortBy cmpScore (map (mkPair f ts idx) (candDocs ts idx)))\nrankTfIdf ts idx = rankWith tfidfDoc ts idx\ntopNTfIdf n ts idx = take n (rankTfIdf ts idx)\ntfAt d idx t = tfIn t d idx\ndocLen d idx = sum (map (tfAt d idx) (allTerms idx))\nlenAt idx d = docLen d idx\navgDocLen idx = sum (map (lenAt idx) (allDocs idx)) / numDocs idx\nbm25idf t idx = log ((numDocs idx - docFreq t idx + 0.5) / (docFreq t idx + 0.5) + 1)\nbm25Term k1 b avgdl idx d t = bm25idf t idx * (tfIn t d idx * (k1 + 1)) / (tfIn t d idx + k1 * (1 - b + b * docLen d idx / avgdl))\nbm25Doc k1 b ts idx d = sum (map (bm25Term k1 b (avgDocLen idx) idx d) ts)\nrankBm25 k1 b ts idx = rankWith (bm25Doc k1 b) ts idx\ntopNBm25 n k1 b ts idx = take n (rankBm25 k1 b ts idx)\n")
diff --git a/lib/search/scoreboard.json b/lib/search/scoreboard.json
index 4aab2a38..eb9509f9 100644
--- a/lib/search/scoreboard.json
+++ b/lib/search/scoreboard.json
@@ -1,12 +1,13 @@
 {
   "lang": "search",
-  "total_passed": 78,
+  "total_passed": 101,
   "total_failed": 0,
-  "total": 78,
+  "total": 101,
   "suites": [
     {"name":"index","passed":18,"failed":0,"total":18},
     {"name":"boolean","passed":28,"failed":0,"total":28},
-    {"name":"parse","passed":32,"failed":0,"total":32}
+    {"name":"parse","passed":32,"failed":0,"total":32},
+    {"name":"rank","passed":23,"failed":0,"total":23}
   ],
-  "generated": "2026-06-06T19:42:39+00:00"
+  "generated": "2026-06-06T19:56:08+00:00"
 }
diff --git a/lib/search/scoreboard.md b/lib/search/scoreboard.md
index 0a71fd42..747a4d04 100644
--- a/lib/search/scoreboard.md
+++ b/lib/search/scoreboard.md
@@ -1,9 +1,10 @@
 # search scoreboard
 
-**78 / 78 passing** (0 failure(s)).
+**101 / 101 passing** (0 failure(s)).
 
 | Suite | Passed | Total | Status |
 |-------|--------|-------|--------|
 | index | 18 | 18 | ok |
 | boolean | 28 | 28 | ok |
 | parse | 32 | 32 | ok |
+| rank | 23 | 23 | ok |
diff --git a/lib/search/tests/rank.sx b/lib/search/tests/rank.sx
new file mode 100644
index 00000000..6200106f
--- /dev/null
+++ b/lib/search/tests/rank.sx
@@ -0,0 +1,90 @@
+;; Phase 3 — ranking (TF-IDF, BM25, top-N). Deterministic: ties broken by DocId.
+;; Corpora:
+;;   idx1: 1 "alpha alpha alpha gamma"  2 "alpha"  3 "beta"
+;;   idx2: 1 "cat"  2 "cat cat dog elephant frog grape"  3 "zzz"
+;;   idx3: 1 "kite"  2 "kite"   (identical docs -> tiebreak)
+
+(define
+  rank-setup
+  "idx1 = indexDoc 3 \"beta\" (indexDoc 2 \"alpha\" (indexDoc 1 \"alpha alpha alpha gamma\" emptyIndex))\nidx2 = indexDoc 3 \"zzz\" (indexDoc 2 \"cat cat dog elephant frog grape\" (indexDoc 1 \"cat\" emptyIndex))\nidx3 = indexDoc 2 \"kite\" (indexDoc 1 \"kite\" emptyIndex)\n")
+
+(define
+  rank-cases
+  (list
+    (list
+      "tfidf tf ordering"
+      "rankTfIdf [\"alpha\"] idx1"
+      (list 1 2))
+    (list
+      "tfidf rare term boosts"
+      "rankTfIdf [\"alpha\", \"beta\"] idx1"
+      (list 1 3 2))
+    (list
+      "tfidf single-doc term"
+      "rankTfIdf [\"gamma\"] idx1"
+      (list 1))
+    (list "tfidf absent term empty" "rankTfIdf [\"nope\"] idx1" (list))
+    (list "tfidf empty query empty" "rankTfIdf [] idx1" (list))
+    (list
+      "tfidf candidate union tie by docid"
+      "rankTfIdf [\"beta\", \"gamma\"] idx1"
+      (list 1 3))
+    (list
+      "tfidf tf ordering idx2"
+      "rankTfIdf [\"cat\"] idx2"
+      (list 2 1))
+    (list "topN tfidf 1" "topNTfIdf 1 [\"alpha\"] idx1" (list 1))
+    (list
+      "topN tfidf 2"
+      "topNTfIdf 2 [\"alpha\", \"beta\"] idx1"
+      (list 1 3))
+    (list
+      "topN exceeds results"
+      "topNTfIdf 10 [\"gamma\"] idx1"
+      (list 1))
+    (list "topN zero" "topNTfIdf 0 [\"alpha\"] idx1" (list))
+    (list
+      "bm25 tf+length flips tfidf"
+      "rankBm25 1.5 0.75 [\"cat\"] idx2"
+      (list 1 2))
+    (list
+      "bm25 b=0 ignores length"
+      "rankBm25 1.5 0.0 [\"cat\"] idx2"
+      (list 2 1))
+    (list
+      "bm25 alpha idx1"
+      "rankBm25 1.5 0.75 [\"alpha\"] idx1"
+      (list 1 2))
+    (list "bm25 absent empty" "rankBm25 1.5 0.75 [\"nope\"] idx1" (list))
+    (list
+      "bm25 single-doc term"
+      "rankBm25 1.5 0.75 [\"gamma\"] idx1"
+      (list 1))
+    (list "bm25 topN 1" "topNBm25 1 1.5 0.75 [\"cat\"] idx2" (list 1))
+    (list
+      "bm25 same candidate set"
+      "sort (rankBm25 1.5 0.75 [\"alpha\", \"beta\"] idx1)"
+      (list 1 2 3))
+    (list
+      "tfidf stable tiebreak"
+      "rankTfIdf [\"kite\"] idx3"
+      (list 1 2))
+    (list
+      "bm25 stable tiebreak"
+      "rankBm25 1.5 0.75 [\"kite\"] idx3"
+      (list 1 2))
+    (list "numDocs" "[numDocs idx1]" (list 3))
+    (list "docLen counts tokens" "[docLen 1 idx1]" (list 4))
+    (list "docFreq via index" "[docFreq \"alpha\" idx1]" (list 2))))
+
+(define
+  rank-results
+  (search-batch rank-setup (map (fn (c) (nth c 1)) rank-cases)))
+
+(map-indexed
+  (fn
+    (i c)
+    (hk-test (nth c 0) (nth rank-results i) (nth c 2)))
+  rank-cases)
+
+{:fail hk-test-fail :pass hk-test-pass :fails hk-test-fails}
diff --git a/plans/search-on-sx.md b/plans/search-on-sx.md
index 1ebb57b6..d1c0689b 100644
--- a/plans/search-on-sx.md
+++ b/plans/search-on-sx.md
@@ -10,7 +10,7 @@ extension that merges per-peer indices.
 
 ## Status (rolling)
 
-`bash lib/search/conformance.sh` → **78/78** (Phases 1–2 complete)
+`bash lib/search/conformance.sh` → **101/101** (Phases 1–3 complete)
 
 ## Ground rules
 
@@ -89,12 +89,13 @@ lib/search/index.sx                     lib/search/eval.sx
 
 ## Phase 3 — Ranking
 
-- [ ] document frequency tracking — extend index with `df` per term
-- [ ] TF-IDF scoring
-- [ ] BM25 scoring (configurable k1, b)
-- [ ] top-N retrieval (heap-based)
-- [ ] `lib/search/tests/rank.sx` — 20+ cases: TF-IDF behavior, BM25 vs TF-IDF,
-  ranking stability, top-N correctness
+- [x] document frequency — `docFreq`/`idf`/`bm25idf` derived from the index
+  (posting-list length); no separate df store needed
+- [x] TF-IDF scoring (`rankTfIdf`)
+- [x] BM25 scoring, configurable k1/b (`rankBm25 k1 b`)
+- [x] top-N retrieval (`topNTfIdf`/`topNBm25` — sortBy + take; stable DocId tiebreak)
+- [x] `lib/search/tests/rank.sx` — 23 cases: TF-IDF tf/idf behavior, BM25 length-norm
+  + tf-saturation flips vs TF-IDF, b-parameter effect, tiebreak stability, top-N
 
 ## Phase 4 — ACL filter + federation
 
@@ -105,6 +106,12 @@ lib/search/index.sx                     lib/search/eval.sx
 
 ## Progress log
 
+- **Phase 3 complete — ranking (101/101 total).** TF-IDF (`rankTfIdf`) and BM25
+  (`rankBm25 k1 b`) over the candidate set (docs containing any query term), scores
+  as floats with deterministic DocId-ascending tiebreak; `topNTfIdf`/`topNBm25` via
+  sortBy+take. df/idf derived from posting-list length (no separate df store). 23
+  tests incl. a BM25-vs-TF-IDF flip (length-norm + tf-saturation) and the b-parameter
+  effect. Float division/`log`/float literals all work in haskell-on-sx.
 - **Phase 2 complete — parser (78/78 total).** Query tokenizer (ord-based
   delimiters, quoted phrases) + recursive-descent parser with OR<AND<NOT precedence,
   implicit AND on adjacency, parens, case-insensitive keywords. `parseQuery`,

From 77ab827b91d29c9bec992bedd4c7b8e23252103e Mon Sep 17 00:00:00 2001
From: giles <giles.bradshaw@rose-ash.com>
Date: Sat, 6 Jun 2026 20:08:08 +0000
Subject: [PATCH 06/15] search: Phase 4 federation merge + ACL post-filter + 21
 tests
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

fedIndex merges per-peer inverted indices (union posting lists per term) after
relabelling local DocIds to global gid = peer*1000 + local — dedupe by
(peer,doc-id) is automatic and positions survive, so ranking runs once over the
merge and interleaves peers by score. ACL is a post-rank filter over an injected
permit predicate (searchTfIdfAcl/topNTfIdfAcl/searchBm25Acl). Roadmap complete,
122/122.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 lib/search/api.sx               |   7 ++-
 lib/search/conformance.conf     |   2 +
 lib/search/fed.sx               |  16 +++++
 lib/search/scoreboard.json      |   9 +--
 lib/search/scoreboard.md        |   3 +-
 lib/search/tests/integration.sx | 102 ++++++++++++++++++++++++++++++++
 plans/search-on-sx.md           |  21 +++++--
 7 files changed, 148 insertions(+), 12 deletions(-)
 create mode 100644 lib/search/fed.sx
 create mode 100644 lib/search/tests/integration.sx

diff --git a/lib/search/api.sx b/lib/search/api.sx
index 2eaeac96..a9a3fe12 100644
--- a/lib/search/api.sx
+++ b/lib/search/api.sx
@@ -3,7 +3,8 @@
 ;; (e.g. "result = lookupTerm \"cat\" idx\n") and evaluate via the haskell-on-sx
 ;; interpreter. Public Haskell entry points: indexDoc, lookupTerm, deleteDoc,
 ;; docFreq, allTerms, tokens, positioned, evalQuery, parseQuery, searchQuery,
-;; rankTfIdf, rankBm25, topNTfIdf, topNBm25.
+;; rankTfIdf, rankBm25, topNTfIdf, topNBm25, fedIndex, aclFilter, searchTfIdfAcl,
+;; topNTfIdfAcl, searchBm25Acl.
 
 (define
   search/src
@@ -16,4 +17,6 @@
     "\n"
     search/parse-src
     "\n"
-    search/rank-src))
+    search/rank-src
+    "\n"
+    search/fed-src))
diff --git a/lib/search/conformance.conf b/lib/search/conformance.conf
index 9793c9cc..b2ef2f74 100644
--- a/lib/search/conformance.conf
+++ b/lib/search/conformance.conf
@@ -23,6 +23,7 @@ PRELOADS=(
   lib/search/query.sx
   lib/search/parse.sx
   lib/search/rank.sx
+  lib/search/fed.sx
   lib/search/api.sx
   lib/search/testlib.sx
 )
@@ -32,4 +33,5 @@ SUITES=(
   "boolean:lib/search/tests/boolean.sx"
   "parse:lib/search/tests/parse.sx"
   "rank:lib/search/tests/rank.sx"
+  "integration:lib/search/tests/integration.sx"
 )
diff --git a/lib/search/fed.sx b/lib/search/fed.sx
new file mode 100644
index 00000000..36b59462
--- /dev/null
+++ b/lib/search/fed.sx
@@ -0,0 +1,16 @@
+;; search federation + ACL — Haskell source fragment. Depends on index + rank.
+;; Federation merges per-peer INDICES (not ranked results): each peer's local
+;; DocIds are relabelled to global ids `gid peer local = peer*1000 + local`
+;; (dedupe by (peer,doc-id) is automatic via the bijection), then posting lists
+;; are unioned per term. Ranking then runs once over the merged index, which is
+;; rank-correct. ACL is a post-rank filter: an injected `permit :: DocId -> Bool`
+;; predicate (viewer baked in by the caller) — never baked into the index.
+;;   fedIndex       :: [(PeerId, Index)] -> Index
+;;   aclFilter      :: (DocId -> Bool) -> [DocId] -> [DocId]
+;;   searchTfIdfAcl :: (DocId -> Bool) -> [Term] -> Index -> [DocId]
+;;   topNTfIdfAcl   :: Int -> (DocId -> Bool) -> [Term] -> Index -> [DocId]
+;;   searchBm25Acl  :: (DocId -> Bool) -> Float -> Float -> [Term] -> Index -> [DocId]
+
+(define
+  search/fed-src
+  "gid peer local = peer * 1000 + local\nfedRelabelPosting peer p = (gid peer (fst p), snd p)\nfedRelabelEntry peer e = (fst e, map (fedRelabelPosting peer) (snd e))\nfedRelabelIndex peer idx = map (fedRelabelEntry peer) idx\nfedInsP p [] = [p]\nfedInsP p (q:qs) = if fst p < fst q then p : q : qs else if fst p == fst q then p : qs else q : fedInsP p qs\nfedMergePL a b = foldr fedInsP b a\nfedInsTerm t pl [] = [(t, pl)]\nfedInsTerm t pl (x:xs) = if t < fst x then (t, pl) : x : xs else if t == fst x then (fst x, fedMergePL pl (snd x)) : xs else x : fedInsTerm t pl xs\nfedMergeEntry idx e = fedInsTerm (fst e) (snd e) idx\nfedMergeTwo a b = foldl fedMergeEntry a b\nfedAddPeer acc pair = fedMergeTwo acc (fedRelabelIndex (fst pair) (snd pair))\nfedIndex pairs = foldl fedAddPeer emptyIndex pairs\naclFilter permit docs = filter permit docs\nsearchTfIdfAcl permit ts idx = aclFilter permit (rankTfIdf ts idx)\ntopNTfIdfAcl n permit ts idx = take n (aclFilter permit (rankTfIdf ts idx))\nsearchBm25Acl permit k1 b ts idx = aclFilter permit (rankBm25 k1 b ts idx)\n")
diff --git a/lib/search/scoreboard.json b/lib/search/scoreboard.json
index eb9509f9..d1cb07da 100644
--- a/lib/search/scoreboard.json
+++ b/lib/search/scoreboard.json
@@ -1,13 +1,14 @@
 {
   "lang": "search",
-  "total_passed": 101,
+  "total_passed": 122,
   "total_failed": 0,
-  "total": 101,
+  "total": 122,
   "suites": [
     {"name":"index","passed":18,"failed":0,"total":18},
     {"name":"boolean","passed":28,"failed":0,"total":28},
     {"name":"parse","passed":32,"failed":0,"total":32},
-    {"name":"rank","passed":23,"failed":0,"total":23}
+    {"name":"rank","passed":23,"failed":0,"total":23},
+    {"name":"integration","passed":21,"failed":0,"total":21}
   ],
-  "generated": "2026-06-06T19:56:08+00:00"
+  "generated": "2026-06-06T20:07:30+00:00"
 }
diff --git a/lib/search/scoreboard.md b/lib/search/scoreboard.md
index 747a4d04..03a1d66c 100644
--- a/lib/search/scoreboard.md
+++ b/lib/search/scoreboard.md
@@ -1,6 +1,6 @@
 # search scoreboard
 
-**101 / 101 passing** (0 failure(s)).
+**122 / 122 passing** (0 failure(s)).
 
 | Suite | Passed | Total | Status |
 |-------|--------|-------|--------|
@@ -8,3 +8,4 @@
 | boolean | 28 | 28 | ok |
 | parse | 32 | 32 | ok |
 | rank | 23 | 23 | ok |
+| integration | 21 | 21 | ok |
diff --git a/lib/search/tests/integration.sx b/lib/search/tests/integration.sx
new file mode 100644
index 00000000..8c10685e
--- /dev/null
+++ b/lib/search/tests/integration.sx
@@ -0,0 +1,102 @@
+;; Phase 4 — federation (merge per-peer indices) + ACL post-filter.
+;; Peers (global id = peer*1000 + local):
+;;   peer 1: 1 "alpha beta"  2 "alpha gamma"     -> 1001 1002
+;;   peer 2: 1 "alpha delta" 2 "beta gamma"      -> 2001 2002
+;; ACL predicates are injected (viewer baked in by the caller), applied post-rank.
+
+(define
+  fed-setup
+  "p1 = indexDoc 2 \"alpha gamma\" (indexDoc 1 \"alpha beta\" emptyIndex)\np2 = indexDoc 2 \"beta gamma\" (indexDoc 1 \"alpha delta\" emptyIndex)\nfed = fedIndex [(1, p1), (2, p2)]\npermitP1 g = g < 2000\npermitNone g = False\npermitList g = elem g [1002, 2001]\n")
+
+(define
+  fed-cases
+  (list
+    (list
+      "fed merges all docs"
+      "sort (allDocs fed)"
+      (list 1001 1002 2001 2002))
+    (list
+      "fed docFreq across peers"
+      "[docFreq \"alpha\" fed]"
+      (list 3))
+    (list "fed docFreq beta" "[docFreq \"beta\" fed]" (list 2))
+    (list "fed numDocs" "[numDocs fed]" (list 4))
+    (list
+      "fed term lookup spans peers"
+      "map fst (lookupTerm \"gamma\" fed)"
+      (list 1002 2002))
+    (list
+      "fed preserves positions"
+      "lookupTerm \"beta\" fed"
+      (list
+        (list 1001 (list 1))
+        (list 2002 (list 0))))
+    (list
+      "fed rank alpha tie by gid"
+      "rankTfIdf [\"alpha\"] fed"
+      (list 1001 1002 2001))
+    (list
+      "fed rank beta"
+      "rankTfIdf [\"beta\"] fed"
+      (list 1001 2002))
+    (list
+      "fed boolean and"
+      "searchQuery \"alpha AND beta\" fed"
+      (list 1001))
+    (list
+      "fed boolean or"
+      "searchQuery \"delta OR barks\" fed"
+      (list 2001))
+    (list
+      "fed phrase within peer1"
+      "searchQuery \"\\\"alpha beta\\\"\" fed"
+      (list 1001))
+    (list
+      "fed phrase within peer2"
+      "searchQuery \"\\\"beta gamma\\\"\" fed"
+      (list 2002))
+    (list
+      "fed phrase peer2 alpha delta"
+      "searchQuery \"\\\"alpha delta\\\"\" fed"
+      (list 2001))
+    (list "fed empty peer list" "allDocs (fedIndex [])" (list))
+    (list
+      "fed single relabelled peer"
+      "rankTfIdf [\"alpha\"] (fedIndex [(5, p1)])"
+      (list 5001 5002))
+    (list
+      "acl peer1 only"
+      "aclFilter permitP1 (rankTfIdf [\"alpha\"] fed)"
+      (list 1001 1002))
+    (list
+      "acl allowlist preserves rank order"
+      "aclFilter permitList (rankTfIdf [\"alpha\"] fed)"
+      (list 1002 2001))
+    (list
+      "acl topN after filter"
+      "topNTfIdfAcl 1 permitP1 [\"alpha\"] fed"
+      (list 1001))
+    (list
+      "acl denies all"
+      "aclFilter permitNone (rankTfIdf [\"alpha\"] fed)"
+      (list))
+    (list
+      "acl on bm25"
+      "searchBm25Acl permitP1 1.5 0.75 [\"alpha\"] fed"
+      (list 1001 1002))
+    (list
+      "acl end-to-end tfidf"
+      "searchTfIdfAcl permitP1 [\"alpha\"] fed"
+      (list 1001 1002))))
+
+(define
+  fed-results
+  (search-batch fed-setup (map (fn (c) (nth c 1)) fed-cases)))
+
+(map-indexed
+  (fn
+    (i c)
+    (hk-test (nth c 0) (nth fed-results i) (nth c 2)))
+  fed-cases)
+
+{:fail hk-test-fail :pass hk-test-pass :fails hk-test-fails}
diff --git a/plans/search-on-sx.md b/plans/search-on-sx.md
index d1c0689b..7324db74 100644
--- a/plans/search-on-sx.md
+++ b/plans/search-on-sx.md
@@ -10,7 +10,7 @@ extension that merges per-peer indices.
 
 ## Status (rolling)
 
-`bash lib/search/conformance.sh` → **101/101** (Phases 1–3 complete)
+`bash lib/search/conformance.sh` → **122/122** (Phases 1–4 complete)
 
 ## Ground rules
 
@@ -99,13 +99,24 @@ lib/search/index.sx                     lib/search/eval.sx
 
 ## Phase 4 — ACL filter + federation
 
-- [ ] post-filter — each candidate result tested via `(acl/permit? viewer :read doc)`
-- [ ] federated query — fan out to peer instances via fed-sx, merge results
-- [ ] merge policy — interleave by rank, dedupe by `(peer, doc-id)`
-- [ ] `lib/search/tests/integration.sx` — federated search with ACL filter
+- [x] post-filter — `aclFilter`/`searchTfIdfAcl`/`topNTfIdfAcl`/`searchBm25Acl` take an
+  injected `permit :: DocId -> Bool` predicate, applied post-rank (never in the index)
+- [x] federated query — `fedIndex :: [(PeerId, Index)] -> Index` merges per-peer
+  inverted indices (union posting lists per term); rank/search run once over the merge
+- [x] merge policy — relabel local DocIds to global `gid = peer*1000 + local`
+  (bijection ⇒ dedupe by (peer,doc-id) is automatic); ranking interleaves peers by score
+- [x] `lib/search/tests/integration.sx` — 21 cases: index merge, cross-peer df/lookup,
+  position preservation, boolean/phrase over the merge, ACL filter + top-N + bm25
 
 ## Progress log
 
+- **Phase 4 complete — federation + ACL (122/122 total). Roadmap done.** `fedIndex`
+  merges per-peer inverted indices (union posting lists per term) after relabelling
+  local DocIds to global `gid = peer*1000 + local` — the bijection makes (peer,doc-id)
+  dedupe automatic and keeps positions, so ranking runs once over the merge and
+  interleaves peers by score (rank-correct). ACL is a post-rank `filter` over an
+  injected `permit :: DocId -> Bool` (viewer baked in by the caller) — never in the
+  index; `searchTfIdfAcl`/`topNTfIdfAcl`/`searchBm25Acl`. 21 integration tests.
 - **Phase 3 complete — ranking (101/101 total).** TF-IDF (`rankTfIdf`) and BM25
   (`rankBm25 k1 b`) over the candidate set (docs containing any query term), scores
   as floats with deterministic DocId-ascending tiebreak; `topNTfIdf`/`topNBm25` via

From 9d3b775b2583ea1fe9d8e4f0ef54675794e24bbd Mon Sep 17 00:00:00 2001
From: giles <giles.bradshaw@rose-ash.com>
Date: Sat, 6 Jun 2026 20:22:23 +0000
Subject: [PATCH 07/15] search: prefix/wildcard queries + 14 tests

prefixTerms matches indexed terms by prefix (allTerms + isPrefixOf); prefixDocs
unions their docs; prefixRankTfIdf ranks via the matched terms. 136/136.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 lib/search/api.sx           |  6 ++--
 lib/search/conformance.conf |  2 ++
 lib/search/prefix.sx        | 10 ++++++
 lib/search/scoreboard.json  |  9 +++---
 lib/search/scoreboard.md    |  3 +-
 lib/search/tests/prefix.sx  | 63 +++++++++++++++++++++++++++++++++++++
 plans/search-on-sx.md       | 11 +++++++
 7 files changed, 97 insertions(+), 7 deletions(-)
 create mode 100644 lib/search/prefix.sx
 create mode 100644 lib/search/tests/prefix.sx

diff --git a/lib/search/api.sx b/lib/search/api.sx
index a9a3fe12..84918b5e 100644
--- a/lib/search/api.sx
+++ b/lib/search/api.sx
@@ -4,7 +4,7 @@
 ;; interpreter. Public Haskell entry points: indexDoc, lookupTerm, deleteDoc,
 ;; docFreq, allTerms, tokens, positioned, evalQuery, parseQuery, searchQuery,
 ;; rankTfIdf, rankBm25, topNTfIdf, topNBm25, fedIndex, aclFilter, searchTfIdfAcl,
-;; topNTfIdfAcl, searchBm25Acl.
+;; topNTfIdfAcl, searchBm25Acl, prefixTerms, prefixDocs, prefixRankTfIdf.
 
 (define
   search/src
@@ -19,4 +19,6 @@
     "\n"
     search/rank-src
     "\n"
-    search/fed-src))
+    search/fed-src
+    "\n"
+    search/prefix-src))
diff --git a/lib/search/conformance.conf b/lib/search/conformance.conf
index b2ef2f74..c5d09b5c 100644
--- a/lib/search/conformance.conf
+++ b/lib/search/conformance.conf
@@ -24,6 +24,7 @@ PRELOADS=(
   lib/search/parse.sx
   lib/search/rank.sx
   lib/search/fed.sx
+  lib/search/prefix.sx
   lib/search/api.sx
   lib/search/testlib.sx
 )
@@ -34,4 +35,5 @@ SUITES=(
   "parse:lib/search/tests/parse.sx"
   "rank:lib/search/tests/rank.sx"
   "integration:lib/search/tests/integration.sx"
+  "prefix:lib/search/tests/prefix.sx"
 )
diff --git a/lib/search/prefix.sx b/lib/search/prefix.sx
new file mode 100644
index 00000000..d50a5b1b
--- /dev/null
+++ b/lib/search/prefix.sx
@@ -0,0 +1,10 @@
+;; search prefix / wildcard queries — Haskell source fragment. Depends on index +
+;; rank (reuses candStep / rankTfIdf). A prefix matches every indexed term that
+;; starts with it; the matching terms are unioned (OR) into a docid set.
+;;   prefixTerms     :: String -> Index -> [Term]      (sorted, from allTerms)
+;;   prefixDocs      :: String -> Index -> [DocId]     (sorted union)
+;;   prefixRankTfIdf :: String -> Index -> [DocId]     (ranked by the matched terms)
+
+(define
+  search/prefix-src
+  "prefixTerms pre idx = filter (isPrefixOf pre) (allTerms idx)\nprefixDocs pre idx = foldl (candStep idx) [] (prefixTerms pre idx)\nprefixRankTfIdf pre idx = rankTfIdf (prefixTerms pre idx) idx\n")
diff --git a/lib/search/scoreboard.json b/lib/search/scoreboard.json
index d1cb07da..df5e60d7 100644
--- a/lib/search/scoreboard.json
+++ b/lib/search/scoreboard.json
@@ -1,14 +1,15 @@
 {
   "lang": "search",
-  "total_passed": 122,
+  "total_passed": 136,
   "total_failed": 0,
-  "total": 122,
+  "total": 136,
   "suites": [
     {"name":"index","passed":18,"failed":0,"total":18},
     {"name":"boolean","passed":28,"failed":0,"total":28},
     {"name":"parse","passed":32,"failed":0,"total":32},
     {"name":"rank","passed":23,"failed":0,"total":23},
-    {"name":"integration","passed":21,"failed":0,"total":21}
+    {"name":"integration","passed":21,"failed":0,"total":21},
+    {"name":"prefix","passed":14,"failed":0,"total":14}
   ],
-  "generated": "2026-06-06T20:07:30+00:00"
+  "generated": "2026-06-06T20:21:41+00:00"
 }
diff --git a/lib/search/scoreboard.md b/lib/search/scoreboard.md
index 03a1d66c..0578f296 100644
--- a/lib/search/scoreboard.md
+++ b/lib/search/scoreboard.md
@@ -1,6 +1,6 @@
 # search scoreboard
 
-**122 / 122 passing** (0 failure(s)).
+**136 / 136 passing** (0 failure(s)).
 
 | Suite | Passed | Total | Status |
 |-------|--------|-------|--------|
@@ -9,3 +9,4 @@
 | parse | 32 | 32 | ok |
 | rank | 23 | 23 | ok |
 | integration | 21 | 21 | ok |
+| prefix | 14 | 14 | ok |
diff --git a/lib/search/tests/prefix.sx b/lib/search/tests/prefix.sx
new file mode 100644
index 00000000..97776491
--- /dev/null
+++ b/lib/search/tests/prefix.sx
@@ -0,0 +1,63 @@
+;; Extension — prefix / wildcard queries.
+;; Corpus: 1 "alpha alpine"  2 "beta apple"  3 "banana alpha"
+;; allTerms sorted: alpha alpine apple banana beta
+
+(define
+  prefix-setup
+  "idx = indexDoc 3 \"banana alpha\" (indexDoc 2 \"beta apple\" (indexDoc 1 \"alpha alpine\" emptyIndex))\n")
+
+(define
+  prefix-cases
+  (list
+    (list
+      "prefix terms two matches"
+      "prefixTerms \"al\" idx"
+      (list "alpha" "alpine"))
+    (list
+      "prefix terms narrower"
+      "prefixTerms \"alp\" idx"
+      (list "alpha" "alpine"))
+    (list
+      "prefix terms wide"
+      "prefixTerms \"a\" idx"
+      (list "alpha" "alpine" "apple"))
+    (list "prefix terms single" "prefixTerms \"ban\" idx" (list "banana"))
+    (list "prefix terms exact term" "prefixTerms \"beta\" idx" (list "beta"))
+    (list "prefix terms none" "prefixTerms \"z\" idx" (list))
+    (list
+      "prefix docs union"
+      "prefixDocs \"al\" idx"
+      (list 1 3))
+    (list "prefix docs single term" "prefixDocs \"ban\" idx" (list 3))
+    (list
+      "prefix docs wide"
+      "prefixDocs \"a\" idx"
+      (list 1 2 3))
+    (list "prefix docs none" "prefixDocs \"z\" idx" (list))
+    (list
+      "prefix docs exact"
+      "prefixDocs \"alpha\" idx"
+      (list 1 3))
+    (list
+      "prefix rank ranks by matched terms"
+      "prefixRankTfIdf \"al\" idx"
+      (list 1 3))
+    (list
+      "prefix rank single doc"
+      "prefixRankTfIdf \"ban\" idx"
+      (list 3))
+    (list "prefix rank empty" "prefixRankTfIdf \"z\" idx" (list))))
+
+(define
+  prefix-results
+  (search-batch
+    prefix-setup
+    (map (fn (c) (nth c 1)) prefix-cases)))
+
+(map-indexed
+  (fn
+    (i c)
+    (hk-test (nth c 0) (nth prefix-results i) (nth c 2)))
+  prefix-cases)
+
+{:fail hk-test-fail :pass hk-test-pass :fails hk-test-fails}
diff --git a/plans/search-on-sx.md b/plans/search-on-sx.md
index 7324db74..2f440bd0 100644
--- a/plans/search-on-sx.md
+++ b/plans/search-on-sx.md
@@ -108,8 +108,19 @@ lib/search/index.sx                     lib/search/eval.sx
 - [x] `lib/search/tests/integration.sx` — 21 cases: index merge, cross-peer df/lookup,
   position preservation, boolean/phrase over the merge, ACL filter + top-N + bm25
 
+## Extensions (post-roadmap, search-shaped vocabulary)
+
+- [x] prefix / wildcard queries (`prefixTerms`, `prefixDocs`, `prefixRankTfIdf`) — 14 tests
+- [ ] fuzzy matching — edit distance term expansion
+- [ ] result pagination (offset / limit)
+- [ ] snippet / highlight generation
+- [ ] stemming (suffix stripping) — recall-improving normalizer
+
 ## Progress log
 
+- **Extension: prefix/wildcard queries (136/136 total).** `prefixTerms` matches every
+  indexed term starting with a prefix (via allTerms + isPrefixOf); `prefixDocs` unions
+  their docs; `prefixRankTfIdf` ranks treating the matched terms as the query. 14 tests.
 - **Phase 4 complete — federation + ACL (122/122 total). Roadmap done.** `fedIndex`
   merges per-peer inverted indices (union posting lists per term) after relabelling
   local DocIds to global `gid = peer*1000 + local` — the bijection makes (peer,doc-id)

From 3ab8270a584ef20b5ed9d63819bae72e8dd28ea4 Mon Sep 17 00:00:00 2001
From: giles <giles.bradshaw@rose-ash.com>
Date: Sat, 6 Jun 2026 20:55:25 +0000
Subject: [PATCH 08/15] search: result pagination (offset/limit) + 12 tests

paginate windows a ranked list (take lim . drop off); pageTfIdf/pageBm25 and
resultCount. 148/148.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 lib/search/api.sx           |  7 +++--
 lib/search/conformance.conf |  2 ++
 lib/search/page.sx          | 11 ++++++++
 lib/search/scoreboard.json  |  9 ++++---
 lib/search/scoreboard.md    |  3 ++-
 lib/search/tests/page.sx    | 53 +++++++++++++++++++++++++++++++++++++
 plans/search-on-sx.md       |  7 ++++-
 7 files changed, 84 insertions(+), 8 deletions(-)
 create mode 100644 lib/search/page.sx
 create mode 100644 lib/search/tests/page.sx

diff --git a/lib/search/api.sx b/lib/search/api.sx
index 84918b5e..cef49db4 100644
--- a/lib/search/api.sx
+++ b/lib/search/api.sx
@@ -4,7 +4,8 @@
 ;; interpreter. Public Haskell entry points: indexDoc, lookupTerm, deleteDoc,
 ;; docFreq, allTerms, tokens, positioned, evalQuery, parseQuery, searchQuery,
 ;; rankTfIdf, rankBm25, topNTfIdf, topNBm25, fedIndex, aclFilter, searchTfIdfAcl,
-;; topNTfIdfAcl, searchBm25Acl, prefixTerms, prefixDocs, prefixRankTfIdf.
+;; topNTfIdfAcl, searchBm25Acl, prefixTerms, prefixDocs, prefixRankTfIdf,
+;; paginate, pageTfIdf, pageBm25, resultCount.
 
 (define
   search/src
@@ -21,4 +22,6 @@
     "\n"
     search/fed-src
     "\n"
-    search/prefix-src))
+    search/prefix-src
+    "\n"
+    search/page-src))
diff --git a/lib/search/conformance.conf b/lib/search/conformance.conf
index c5d09b5c..79b14819 100644
--- a/lib/search/conformance.conf
+++ b/lib/search/conformance.conf
@@ -25,6 +25,7 @@ PRELOADS=(
   lib/search/rank.sx
   lib/search/fed.sx
   lib/search/prefix.sx
+  lib/search/page.sx
   lib/search/api.sx
   lib/search/testlib.sx
 )
@@ -36,4 +37,5 @@ SUITES=(
   "rank:lib/search/tests/rank.sx"
   "integration:lib/search/tests/integration.sx"
   "prefix:lib/search/tests/prefix.sx"
+  "page:lib/search/tests/page.sx"
 )
diff --git a/lib/search/page.sx b/lib/search/page.sx
new file mode 100644
index 00000000..93b57dd4
--- /dev/null
+++ b/lib/search/page.sx
@@ -0,0 +1,11 @@
+;; search pagination — Haskell source fragment. Depends on rank.
+;; Windows a ranked result list by offset/limit (offset >= length -> empty;
+;; limit clamps to what remains).
+;;   paginate    :: Int -> Int -> [DocId] -> [DocId]   (offset, limit)
+;;   pageTfIdf   :: Int -> Int -> [Term] -> Index -> [DocId]
+;;   pageBm25    :: Int -> Int -> Float -> Float -> [Term] -> Index -> [DocId]
+;;   resultCount :: [Term] -> Index -> Int
+
+(define
+  search/page-src
+  "paginate off lim docs = take lim (drop off docs)\npageTfIdf off lim ts idx = paginate off lim (rankTfIdf ts idx)\npageBm25 off lim k1 b ts idx = paginate off lim (rankBm25 k1 b ts idx)\nresultCount ts idx = length (rankTfIdf ts idx)\n")
diff --git a/lib/search/scoreboard.json b/lib/search/scoreboard.json
index df5e60d7..16472224 100644
--- a/lib/search/scoreboard.json
+++ b/lib/search/scoreboard.json
@@ -1,15 +1,16 @@
 {
   "lang": "search",
-  "total_passed": 136,
+  "total_passed": 148,
   "total_failed": 0,
-  "total": 136,
+  "total": 148,
   "suites": [
     {"name":"index","passed":18,"failed":0,"total":18},
     {"name":"boolean","passed":28,"failed":0,"total":28},
     {"name":"parse","passed":32,"failed":0,"total":32},
     {"name":"rank","passed":23,"failed":0,"total":23},
     {"name":"integration","passed":21,"failed":0,"total":21},
-    {"name":"prefix","passed":14,"failed":0,"total":14}
+    {"name":"prefix","passed":14,"failed":0,"total":14},
+    {"name":"page","passed":12,"failed":0,"total":12}
   ],
-  "generated": "2026-06-06T20:21:41+00:00"
+  "generated": "2026-06-06T20:54:50+00:00"
 }
diff --git a/lib/search/scoreboard.md b/lib/search/scoreboard.md
index 0578f296..9cdc93b3 100644
--- a/lib/search/scoreboard.md
+++ b/lib/search/scoreboard.md
@@ -1,6 +1,6 @@
 # search scoreboard
 
-**136 / 136 passing** (0 failure(s)).
+**148 / 148 passing** (0 failure(s)).
 
 | Suite | Passed | Total | Status |
 |-------|--------|-------|--------|
@@ -10,3 +10,4 @@
 | rank | 23 | 23 | ok |
 | integration | 21 | 21 | ok |
 | prefix | 14 | 14 | ok |
+| page | 12 | 12 | ok |
diff --git a/lib/search/tests/page.sx b/lib/search/tests/page.sx
new file mode 100644
index 00000000..6ad77310
--- /dev/null
+++ b/lib/search/tests/page.sx
@@ -0,0 +1,53 @@
+;; Extension — result pagination (offset / limit) over ranked results.
+;; Corpus (tf of "x" descending): 1 x4  2 x3  3 x2  4 x1  5 y(no x)
+;; rankTfIdf ["x"] -> [1,2,3,4]
+
+(define
+  page-setup
+  "idx = indexDoc 5 \"y\" (indexDoc 4 \"x\" (indexDoc 3 \"x x\" (indexDoc 2 \"x x x\" (indexDoc 1 \"x x x x other\" emptyIndex))))\n")
+
+(define
+  page-cases
+  (list
+    (list "first page" "pageTfIdf 0 2 [\"x\"] idx" (list 1 2))
+    (list
+      "second page"
+      "pageTfIdf 2 2 [\"x\"] idx"
+      (list 3 4))
+    (list
+      "sliding window"
+      "pageTfIdf 1 2 [\"x\"] idx"
+      (list 2 3))
+    (list
+      "limit exceeds remaining"
+      "pageTfIdf 3 10 [\"x\"] idx"
+      (list 4))
+    (list "offset past end" "pageTfIdf 4 2 [\"x\"] idx" (list))
+    (list "limit zero" "pageTfIdf 0 0 [\"x\"] idx" (list))
+    (list
+      "whole result"
+      "pageTfIdf 0 10 [\"x\"] idx"
+      (list 1 2 3 4))
+    (list
+      "paginate raw list"
+      "paginate 1 2 [10, 20, 30, 40]"
+      (list 20 30))
+    (list "paginate raw past end" "paginate 9 2 [10, 20]" (list))
+    (list
+      "bm25 page window size"
+      "[length (pageBm25 0 2 1.5 0.75 [\"x\"] idx)]"
+      (list 2))
+    (list "result count" "[resultCount [\"x\"] idx]" (list 4))
+    (list "result count zero" "[resultCount [\"zzz\"] idx]" (list 0))))
+
+(define
+  page-results
+  (search-batch page-setup (map (fn (c) (nth c 1)) page-cases)))
+
+(map-indexed
+  (fn
+    (i c)
+    (hk-test (nth c 0) (nth page-results i) (nth c 2)))
+  page-cases)
+
+{:fail hk-test-fail :pass hk-test-pass :fails hk-test-fails}
diff --git a/plans/search-on-sx.md b/plans/search-on-sx.md
index 2f440bd0..32444f20 100644
--- a/plans/search-on-sx.md
+++ b/plans/search-on-sx.md
@@ -112,12 +112,17 @@ lib/search/index.sx                     lib/search/eval.sx
 
 - [x] prefix / wildcard queries (`prefixTerms`, `prefixDocs`, `prefixRankTfIdf`) — 14 tests
 - [ ] fuzzy matching — edit distance term expansion
-- [ ] result pagination (offset / limit)
+- [x] result pagination (offset / limit) — `paginate`, `pageTfIdf`, `pageBm25`,
+  `resultCount` — 12 tests
 - [ ] snippet / highlight generation
 - [ ] stemming (suffix stripping) — recall-improving normalizer
 
 ## Progress log
 
+- **Extension: pagination (148/148 total).** `paginate off lim` windows a ranked list
+  (take lim . drop off); `pageTfIdf`/`pageBm25` + `resultCount`. 12 tests. Note the
+  full conformance now runs 8 suites sequentially and needs an overall timeout ~1900s
+  under the heavy box load.
 - **Extension: prefix/wildcard queries (136/136 total).** `prefixTerms` matches every
   indexed term starting with a prefix (via allTerms + isPrefixOf); `prefixDocs` unions
   their docs; `prefixRankTfIdf` ranks treating the matched terms as the query. 14 tests.

From 5945b51cfd5a271b761b9aa34765b1fcad622050 Mon Sep 17 00:00:00 2001
From: giles <giles.bradshaw@rose-ash.com>
Date: Sat, 6 Jun 2026 21:47:56 +0000
Subject: [PATCH 09/15] search: fuzzy matching via edit distance + 18 tests

editDist as an O(m*n) row-based Levenshtein DP (naive recursion is exponential
and times out under load); fuzzyTerms/fuzzyDocs/fuzzyRankTfIdf expand a term to
indexed terms within a max edit distance. 166/166.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 lib/search/api.sx           |  7 +++-
 lib/search/conformance.conf |  2 +
 lib/search/fuzzy.sx         | 12 ++++++
 lib/search/scoreboard.json  |  9 +++--
 lib/search/scoreboard.md    |  3 +-
 lib/search/tests/fuzzy.sx   | 74 +++++++++++++++++++++++++++++++++++++
 plans/search-on-sx.md       |  7 +++-
 7 files changed, 106 insertions(+), 8 deletions(-)
 create mode 100644 lib/search/fuzzy.sx
 create mode 100644 lib/search/tests/fuzzy.sx

diff --git a/lib/search/api.sx b/lib/search/api.sx
index cef49db4..c55c7f31 100644
--- a/lib/search/api.sx
+++ b/lib/search/api.sx
@@ -5,7 +5,8 @@
 ;; docFreq, allTerms, tokens, positioned, evalQuery, parseQuery, searchQuery,
 ;; rankTfIdf, rankBm25, topNTfIdf, topNBm25, fedIndex, aclFilter, searchTfIdfAcl,
 ;; topNTfIdfAcl, searchBm25Acl, prefixTerms, prefixDocs, prefixRankTfIdf,
-;; paginate, pageTfIdf, pageBm25, resultCount.
+;; paginate, pageTfIdf, pageBm25, resultCount, editDist, fuzzyTerms, fuzzyDocs,
+;; fuzzyRankTfIdf.
 
 (define
   search/src
@@ -24,4 +25,6 @@
     "\n"
     search/prefix-src
     "\n"
-    search/page-src))
+    search/page-src
+    "\n"
+    search/fuzzy-src))
diff --git a/lib/search/conformance.conf b/lib/search/conformance.conf
index 79b14819..0fef2c39 100644
--- a/lib/search/conformance.conf
+++ b/lib/search/conformance.conf
@@ -26,6 +26,7 @@ PRELOADS=(
   lib/search/fed.sx
   lib/search/prefix.sx
   lib/search/page.sx
+  lib/search/fuzzy.sx
   lib/search/api.sx
   lib/search/testlib.sx
 )
@@ -38,4 +39,5 @@ SUITES=(
   "integration:lib/search/tests/integration.sx"
   "prefix:lib/search/tests/prefix.sx"
   "page:lib/search/tests/page.sx"
+  "fuzzy:lib/search/tests/fuzzy.sx"
 )
diff --git a/lib/search/fuzzy.sx b/lib/search/fuzzy.sx
new file mode 100644
index 00000000..9a757abb
--- /dev/null
+++ b/lib/search/fuzzy.sx
@@ -0,0 +1,12 @@
+;; search fuzzy matching — Haskell source fragment. Depends on index + rank.
+;; Levenshtein edit distance (O(m*n) row-based DP — the naive recursive version is
+;; exponential and far too slow under load) expands a query term to all indexed
+;; terms within a max distance, then unions / ranks their docs.
+;;   editDist        :: String -> String -> Int
+;;   fuzzyTerms      :: Int -> String -> Index -> [Term]   (sorted)
+;;   fuzzyDocs       :: Int -> String -> Index -> [DocId]  (sorted union)
+;;   fuzzyRankTfIdf  :: Int -> String -> Index -> [DocId]
+
+(define
+  search/fuzzy-src
+  "edMin3 a b c = min a (min b c)\nedCost x y = if x == y then 0 else 1\nedUpto i n = if i > n then [] else i : edUpto (i + 1) n\nedLast [x] = x\nedLast (x:xs) = edLast xs\nedNrow x [] prev left = []\nedNrow x (y:ys) prev left = let v = edMin3 (head (tail prev) + 1) (left + 1) (head prev + edCost x y) in v : edNrow x ys (tail prev) v\nedRow x ys prev = let f = head prev + 1 in f : edNrow x ys prev f\nedRows [] ys prev = prev\nedRows (x:xs) ys prev = edRows xs ys (edRow x ys prev)\neditDist xs ys = edLast (edRows xs ys (edUpto 0 (length ys)))\nqWithinDist maxd term t = editDist term t <= maxd\nfuzzyTerms maxd term idx = filter (qWithinDist maxd term) (allTerms idx)\nfuzzyDocs maxd term idx = foldl (candStep idx) [] (fuzzyTerms maxd term idx)\nfuzzyRankTfIdf maxd term idx = rankTfIdf (fuzzyTerms maxd term idx) idx\n")
diff --git a/lib/search/scoreboard.json b/lib/search/scoreboard.json
index 16472224..b0baf95a 100644
--- a/lib/search/scoreboard.json
+++ b/lib/search/scoreboard.json
@@ -1,8 +1,8 @@
 {
   "lang": "search",
-  "total_passed": 148,
+  "total_passed": 166,
   "total_failed": 0,
-  "total": 148,
+  "total": 166,
   "suites": [
     {"name":"index","passed":18,"failed":0,"total":18},
     {"name":"boolean","passed":28,"failed":0,"total":28},
@@ -10,7 +10,8 @@
     {"name":"rank","passed":23,"failed":0,"total":23},
     {"name":"integration","passed":21,"failed":0,"total":21},
     {"name":"prefix","passed":14,"failed":0,"total":14},
-    {"name":"page","passed":12,"failed":0,"total":12}
+    {"name":"page","passed":12,"failed":0,"total":12},
+    {"name":"fuzzy","passed":18,"failed":0,"total":18}
   ],
-  "generated": "2026-06-06T20:54:50+00:00"
+  "generated": "2026-06-06T21:47:28+00:00"
 }
diff --git a/lib/search/scoreboard.md b/lib/search/scoreboard.md
index 9cdc93b3..74440558 100644
--- a/lib/search/scoreboard.md
+++ b/lib/search/scoreboard.md
@@ -1,6 +1,6 @@
 # search scoreboard
 
-**148 / 148 passing** (0 failure(s)).
+**166 / 166 passing** (0 failure(s)).
 
 | Suite | Passed | Total | Status |
 |-------|--------|-------|--------|
@@ -11,3 +11,4 @@
 | integration | 21 | 21 | ok |
 | prefix | 14 | 14 | ok |
 | page | 12 | 12 | ok |
+| fuzzy | 18 | 18 | ok |
diff --git a/lib/search/tests/fuzzy.sx b/lib/search/tests/fuzzy.sx
new file mode 100644
index 00000000..0b5c3fbd
--- /dev/null
+++ b/lib/search/tests/fuzzy.sx
@@ -0,0 +1,74 @@
+;; Extension — fuzzy matching via Levenshtein edit distance.
+;; Corpus: 1 "color flavor"  2 "colour kitten"  3 "colored"
+;; allTerms: color colored colour flavor kitten
+
+(define
+  fuzzy-setup
+  "idx = indexDoc 3 \"colored\" (indexDoc 2 \"colour kitten\" (indexDoc 1 \"color flavor\" emptyIndex))\n")
+
+(define
+  fuzzy-cases
+  (list
+    (list
+      "editDist substitution"
+      "[editDist \"kitten\" \"sitten\"]"
+      (list 1))
+    (list "editDist equal" "[editDist \"abc\" \"abc\"]" (list 0))
+    (list "editDist deletion" "[editDist \"abc\" \"ab\"]" (list 1))
+    (list "editDist insertion" "[editDist \"ab\" \"abc\"]" (list 1))
+    (list "editDist from empty" "[editDist \"\" \"abc\"]" (list 3))
+    (list "editDist both empty" "[editDist \"\" \"\"]" (list 0))
+    (list
+      "editDist classic"
+      "[editDist \"kitten\" \"sitting\"]"
+      (list 3))
+    (list
+      "editDist color colour"
+      "[editDist \"color\" \"colour\"]"
+      (list 1))
+    (list
+      "editDist color colored"
+      "[editDist \"color\" \"colored\"]"
+      (list 2))
+    (list
+      "fuzzy terms dist 1"
+      "fuzzyTerms 1 \"color\" idx"
+      (list "color" "colour"))
+    (list
+      "fuzzy terms dist 2"
+      "fuzzyTerms 2 \"color\" idx"
+      (list "color" "colored" "colour"))
+    (list "fuzzy terms exact" "fuzzyTerms 0 \"color\" idx" (list "color"))
+    (list
+      "fuzzy terms other word"
+      "fuzzyTerms 1 \"flavour\" idx"
+      (list "flavor"))
+    (list
+      "fuzzy docs dist 1"
+      "fuzzyDocs 1 \"color\" idx"
+      (list 1 2))
+    (list
+      "fuzzy docs dist 2"
+      "fuzzyDocs 2 \"color\" idx"
+      (list 1 2 3))
+    (list "fuzzy docs none" "fuzzyDocs 1 \"zzzzz\" idx" (list))
+    (list
+      "fuzzy rank dist 1"
+      "fuzzyRankTfIdf 1 \"color\" idx"
+      (list 1 2))
+    (list
+      "fuzzy rank dist 2"
+      "fuzzyRankTfIdf 2 \"color\" idx"
+      (list 1 2 3))))
+
+(define
+  fuzzy-results
+  (search-batch fuzzy-setup (map (fn (c) (nth c 1)) fuzzy-cases)))
+
+(map-indexed
+  (fn
+    (i c)
+    (hk-test (nth c 0) (nth fuzzy-results i) (nth c 2)))
+  fuzzy-cases)
+
+{:fail hk-test-fail :pass hk-test-pass :fails hk-test-fails}
diff --git a/plans/search-on-sx.md b/plans/search-on-sx.md
index 32444f20..791c04dc 100644
--- a/plans/search-on-sx.md
+++ b/plans/search-on-sx.md
@@ -111,7 +111,8 @@ lib/search/index.sx                     lib/search/eval.sx
 ## Extensions (post-roadmap, search-shaped vocabulary)
 
 - [x] prefix / wildcard queries (`prefixTerms`, `prefixDocs`, `prefixRankTfIdf`) — 14 tests
-- [ ] fuzzy matching — edit distance term expansion
+- [x] fuzzy matching — edit distance term expansion (`editDist`, `fuzzyTerms`,
+  `fuzzyDocs`, `fuzzyRankTfIdf`) — 18 tests
 - [x] result pagination (offset / limit) — `paginate`, `pageTfIdf`, `pageBm25`,
   `resultCount` — 12 tests
 - [ ] snippet / highlight generation
@@ -119,6 +120,10 @@ lib/search/index.sx                     lib/search/eval.sx
 
 ## Progress log
 
+- **Extension: fuzzy matching (166/166 total).** Levenshtein `editDist` as an O(m*n)
+  row-based DP (the naive recursive version is exponential and times out under load),
+  `fuzzyTerms`/`fuzzyDocs`/`fuzzyRankTfIdf` expand a term to indexed terms within a max
+  edit distance. 18 tests.
 - **Extension: pagination (148/148 total).** `paginate off lim` windows a ranked list
   (take lim . drop off); `pageTfIdf`/`pageBm25` + `resultCount`. 12 tests. Note the
   full conformance now runs 8 suites sequentially and needs an overall timeout ~1900s

From 7231cb651f1d7be649adfe70ade87485352ed255 Mon Sep 17 00:00:00 2001
From: giles <giles.bradshaw@rose-ash.com>
Date: Sat, 6 Jun 2026 22:08:00 +0000
Subject: [PATCH 10/15] search: highlight + snippet generation + 12 tests

highlight marks query-matching (normalized) tokens with [..]; snippet extracts a
context window around the first match. 178/178.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 lib/search/api.sx             |  6 ++--
 lib/search/conformance.conf   |  2 ++
 lib/search/highlight.sx       | 10 ++++++
 lib/search/scoreboard.json    |  9 ++---
 lib/search/scoreboard.md      |  3 +-
 lib/search/tests/highlight.sx | 66 +++++++++++++++++++++++++++++++++++
 plans/search-on-sx.md         |  5 ++-
 7 files changed, 93 insertions(+), 8 deletions(-)
 create mode 100644 lib/search/highlight.sx
 create mode 100644 lib/search/tests/highlight.sx

diff --git a/lib/search/api.sx b/lib/search/api.sx
index c55c7f31..7abbe781 100644
--- a/lib/search/api.sx
+++ b/lib/search/api.sx
@@ -6,7 +6,7 @@
 ;; rankTfIdf, rankBm25, topNTfIdf, topNBm25, fedIndex, aclFilter, searchTfIdfAcl,
 ;; topNTfIdfAcl, searchBm25Acl, prefixTerms, prefixDocs, prefixRankTfIdf,
 ;; paginate, pageTfIdf, pageBm25, resultCount, editDist, fuzzyTerms, fuzzyDocs,
-;; fuzzyRankTfIdf.
+;; fuzzyRankTfIdf, highlight, snippet.
 
 (define
   search/src
@@ -27,4 +27,6 @@
     "\n"
     search/page-src
     "\n"
-    search/fuzzy-src))
+    search/fuzzy-src
+    "\n"
+    search/highlight-src))
diff --git a/lib/search/conformance.conf b/lib/search/conformance.conf
index 0fef2c39..28c7ddf6 100644
--- a/lib/search/conformance.conf
+++ b/lib/search/conformance.conf
@@ -27,6 +27,7 @@ PRELOADS=(
   lib/search/prefix.sx
   lib/search/page.sx
   lib/search/fuzzy.sx
+  lib/search/highlight.sx
   lib/search/api.sx
   lib/search/testlib.sx
 )
@@ -40,4 +41,5 @@ SUITES=(
   "prefix:lib/search/tests/prefix.sx"
   "page:lib/search/tests/page.sx"
   "fuzzy:lib/search/tests/fuzzy.sx"
+  "highlight:lib/search/tests/highlight.sx"
 )
diff --git a/lib/search/highlight.sx b/lib/search/highlight.sx
new file mode 100644
index 00000000..4c5def99
--- /dev/null
+++ b/lib/search/highlight.sx
@@ -0,0 +1,10 @@
+;; search highlight / snippet — Haskell source fragment. Depends on tokenize.
+;; Operates on document text (not the index): marks query-matching tokens with
+;; [..] and extracts a context window around the first match. Tokens are
+;; normalized (lowercase, punctuation-stripped) by `tokens`, matching index side.
+;;   highlight :: [Term] -> String -> String
+;;   snippet   :: Int -> [Term] -> String -> String   (ctx tokens each side of 1st match)
+
+(define
+  search/highlight-src
+  "hlMark terms t = if elem t terms then \"[\" ++ t ++ \"]\" else t\nhighlight terms text = unwords (map (hlMark terms) (tokens text))\nhlIdxFrom terms [] i = 0 - 1\nhlIdxFrom terms (t:ts) i = if elem t terms then i else hlIdxFrom terms ts (i + 1)\nhlIdx terms toks = hlIdxFrom terms toks 0\nhlMax0 x = if x < 0 then 0 else x\nsnipStart ctx i = if i < 0 then 0 else hlMax0 (i - ctx)\nsnipToks ctx terms toks = unwords (map (hlMark terms) (take (2 * ctx + 1) (drop (snipStart ctx (hlIdx terms toks)) toks)))\nsnippet ctx terms text = snipToks ctx terms (tokens text)\n")
diff --git a/lib/search/scoreboard.json b/lib/search/scoreboard.json
index b0baf95a..a3ebb24c 100644
--- a/lib/search/scoreboard.json
+++ b/lib/search/scoreboard.json
@@ -1,8 +1,8 @@
 {
   "lang": "search",
-  "total_passed": 166,
+  "total_passed": 178,
   "total_failed": 0,
-  "total": 166,
+  "total": 178,
   "suites": [
     {"name":"index","passed":18,"failed":0,"total":18},
     {"name":"boolean","passed":28,"failed":0,"total":28},
@@ -11,7 +11,8 @@
     {"name":"integration","passed":21,"failed":0,"total":21},
     {"name":"prefix","passed":14,"failed":0,"total":14},
     {"name":"page","passed":12,"failed":0,"total":12},
-    {"name":"fuzzy","passed":18,"failed":0,"total":18}
+    {"name":"fuzzy","passed":18,"failed":0,"total":18},
+    {"name":"highlight","passed":12,"failed":0,"total":12}
   ],
-  "generated": "2026-06-06T21:47:28+00:00"
+  "generated": "2026-06-06T22:07:05+00:00"
 }
diff --git a/lib/search/scoreboard.md b/lib/search/scoreboard.md
index 74440558..767c5fc2 100644
--- a/lib/search/scoreboard.md
+++ b/lib/search/scoreboard.md
@@ -1,6 +1,6 @@
 # search scoreboard
 
-**166 / 166 passing** (0 failure(s)).
+**178 / 178 passing** (0 failure(s)).
 
 | Suite | Passed | Total | Status |
 |-------|--------|-------|--------|
@@ -12,3 +12,4 @@
 | prefix | 14 | 14 | ok |
 | page | 12 | 12 | ok |
 | fuzzy | 18 | 18 | ok |
+| highlight | 12 | 12 | ok |
diff --git a/lib/search/tests/highlight.sx b/lib/search/tests/highlight.sx
new file mode 100644
index 00000000..3a5001d5
--- /dev/null
+++ b/lib/search/tests/highlight.sx
@@ -0,0 +1,66 @@
+;; Extension — highlight + snippet over document text.
+;; Text: "the quick brown fox jumps"
+
+(define
+  hl-cases
+  (list
+    (list
+      "highlight two terms"
+      "highlight [\"quick\", \"fox\"] \"the quick brown fox jumps\""
+      "the [quick] brown [fox] jumps")
+    (list
+      "highlight none"
+      "highlight [] \"the quick brown fox jumps\""
+      "the quick brown fox jumps")
+    (list
+      "highlight absent term"
+      "highlight [\"zzz\"] \"the quick brown fox jumps\""
+      "the quick brown fox jumps")
+    (list
+      "highlight first token"
+      "highlight [\"the\"] \"the quick brown fox jumps\""
+      "[the] quick brown fox jumps")
+    (list
+      "highlight normalizes text"
+      "highlight [\"quick\"] \"The Quick, brown!\""
+      "the [quick] brown")
+    (list
+      "snippet around middle"
+      "snippet 1 [\"brown\"] \"the quick brown fox jumps\""
+      "quick [brown] fox")
+    (list
+      "snippet at start"
+      "snippet 1 [\"the\"] \"the quick brown fox jumps\""
+      "[the] quick brown")
+    (list
+      "snippet near end"
+      "snippet 1 [\"fox\"] \"the quick brown fox jumps\""
+      "brown [fox] jumps")
+    (list
+      "snippet ctx zero"
+      "snippet 0 [\"brown\"] \"the quick brown fox jumps\""
+      "[brown]")
+    (list
+      "snippet clamps at end"
+      "snippet 2 [\"jumps\"] \"the quick brown fox jumps\""
+      "brown fox [jumps]")
+    (list
+      "snippet no match shows head"
+      "snippet 1 [\"zzz\"] \"the quick brown fox jumps\""
+      "the quick brown")
+    (list
+      "snippet wide window"
+      "snippet 5 [\"brown\"] \"the quick brown fox jumps\""
+      "the quick [brown] fox jumps")))
+
+(define
+  hl-results
+  (search-batch "" (map (fn (c) (nth c 1)) hl-cases)))
+
+(map-indexed
+  (fn
+    (i c)
+    (hk-test (nth c 0) (nth hl-results i) (nth c 2)))
+  hl-cases)
+
+{:fail hk-test-fail :pass hk-test-pass :fails hk-test-fails}
diff --git a/plans/search-on-sx.md b/plans/search-on-sx.md
index 791c04dc..b2702653 100644
--- a/plans/search-on-sx.md
+++ b/plans/search-on-sx.md
@@ -115,11 +115,14 @@ lib/search/index.sx                     lib/search/eval.sx
   `fuzzyDocs`, `fuzzyRankTfIdf`) — 18 tests
 - [x] result pagination (offset / limit) — `paginate`, `pageTfIdf`, `pageBm25`,
   `resultCount` — 12 tests
-- [ ] snippet / highlight generation
+- [x] snippet / highlight generation (`highlight`, `snippet`) — 12 tests
 - [ ] stemming (suffix stripping) — recall-improving normalizer
 
 ## Progress log
 
+- **Extension: highlight/snippet (178/178 total).** `highlight terms text` marks
+  query-matching (normalized) tokens with [..]; `snippet ctx terms text` extracts a
+  context window around the first match. 12 tests.
 - **Extension: fuzzy matching (166/166 total).** Levenshtein `editDist` as an O(m*n)
   row-based DP (the naive recursive version is exponential and times out under load),
   `fuzzyTerms`/`fuzzyDocs`/`fuzzyRankTfIdf` expand a term to indexed terms within a max

From 911a2f57c07ff99bef4e986a682b31af220cb6ea Mon Sep 17 00:00:00 2001
From: giles <giles.bradshaw@rose-ash.com>
Date: Sat, 6 Jun 2026 22:50:19 +0000
Subject: [PATCH 11/15] search: stemming (suffix stripping) + 18 tests

Deterministic English suffix stripping (stem), stemText/stemTokens, indexStemmed.
Worked around two haskell-on-sx string gotchas: take/drop over a String yield
char codes (rebuild via joinChars . map chr), and isSuffixOf's reverse trips ++
(manual suffix compare). 196/196.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 lib/search/api.sx           |  6 +++--
 lib/search/conformance.conf |  2 ++
 lib/search/scoreboard.json  |  9 +++----
 lib/search/scoreboard.md    |  3 ++-
 lib/search/stem.sx          | 15 ++++++++++++
 lib/search/tests/stem.sx    | 47 +++++++++++++++++++++++++++++++++++++
 plans/search-on-sx.md       |  8 ++++++-
 7 files changed, 82 insertions(+), 8 deletions(-)
 create mode 100644 lib/search/stem.sx
 create mode 100644 lib/search/tests/stem.sx

diff --git a/lib/search/api.sx b/lib/search/api.sx
index 7abbe781..5ac85924 100644
--- a/lib/search/api.sx
+++ b/lib/search/api.sx
@@ -6,7 +6,7 @@
 ;; rankTfIdf, rankBm25, topNTfIdf, topNBm25, fedIndex, aclFilter, searchTfIdfAcl,
 ;; topNTfIdfAcl, searchBm25Acl, prefixTerms, prefixDocs, prefixRankTfIdf,
 ;; paginate, pageTfIdf, pageBm25, resultCount, editDist, fuzzyTerms, fuzzyDocs,
-;; fuzzyRankTfIdf, highlight, snippet.
+;; fuzzyRankTfIdf, highlight, snippet, stem, stemText, stemTokens, indexStemmed.
 
 (define
   search/src
@@ -29,4 +29,6 @@
     "\n"
     search/fuzzy-src
     "\n"
-    search/highlight-src))
+    search/highlight-src
+    "\n"
+    search/stem-src))
diff --git a/lib/search/conformance.conf b/lib/search/conformance.conf
index 28c7ddf6..8c5375b7 100644
--- a/lib/search/conformance.conf
+++ b/lib/search/conformance.conf
@@ -28,6 +28,7 @@ PRELOADS=(
   lib/search/page.sx
   lib/search/fuzzy.sx
   lib/search/highlight.sx
+  lib/search/stem.sx
   lib/search/api.sx
   lib/search/testlib.sx
 )
@@ -42,4 +43,5 @@ SUITES=(
   "page:lib/search/tests/page.sx"
   "fuzzy:lib/search/tests/fuzzy.sx"
   "highlight:lib/search/tests/highlight.sx"
+  "stem:lib/search/tests/stem.sx"
 )
diff --git a/lib/search/scoreboard.json b/lib/search/scoreboard.json
index a3ebb24c..4c88e5e3 100644
--- a/lib/search/scoreboard.json
+++ b/lib/search/scoreboard.json
@@ -1,8 +1,8 @@
 {
   "lang": "search",
-  "total_passed": 178,
+  "total_passed": 196,
   "total_failed": 0,
-  "total": 178,
+  "total": 196,
   "suites": [
     {"name":"index","passed":18,"failed":0,"total":18},
     {"name":"boolean","passed":28,"failed":0,"total":28},
@@ -12,7 +12,8 @@
     {"name":"prefix","passed":14,"failed":0,"total":14},
     {"name":"page","passed":12,"failed":0,"total":12},
     {"name":"fuzzy","passed":18,"failed":0,"total":18},
-    {"name":"highlight","passed":12,"failed":0,"total":12}
+    {"name":"highlight","passed":12,"failed":0,"total":12},
+    {"name":"stem","passed":18,"failed":0,"total":18}
   ],
-  "generated": "2026-06-06T22:07:05+00:00"
+  "generated": "2026-06-06T22:49:33+00:00"
 }
diff --git a/lib/search/scoreboard.md b/lib/search/scoreboard.md
index 767c5fc2..7e20b449 100644
--- a/lib/search/scoreboard.md
+++ b/lib/search/scoreboard.md
@@ -1,6 +1,6 @@
 # search scoreboard
 
-**178 / 178 passing** (0 failure(s)).
+**196 / 196 passing** (0 failure(s)).
 
 | Suite | Passed | Total | Status |
 |-------|--------|-------|--------|
@@ -13,3 +13,4 @@
 | page | 12 | 12 | ok |
 | fuzzy | 18 | 18 | ok |
 | highlight | 12 | 12 | ok |
+| stem | 18 | 18 | ok |
diff --git a/lib/search/stem.sx b/lib/search/stem.sx
new file mode 100644
index 00000000..816c3269
--- /dev/null
+++ b/lib/search/stem.sx
@@ -0,0 +1,15 @@
+;; search stemming — Haskell source fragment. Depends on tokenize + index.
+;; Lightweight, deterministic English suffix stripping (recall-improving
+;; normalizer). Rules are checked most-specific first; conservative length guards
+;; avoid mangling short words. Not a full Porter stemmer.
+;; Gotcha: take/drop over a String yield char CODES (ints), not char strings, so
+;; rebuild strings with `stStr = joinChars . map chr`. (isSuffixOf's reverse also
+;; trips `++` on the String representation, hence the manual stEnds.)
+;;   stem        :: String -> String
+;;   stemText    :: String -> String        (tokenize + stem + rejoin)
+;;   stemTokens  :: String -> [String]
+;;   indexStemmed:: DocId -> String -> Index -> Index   (index the stemmed text)
+
+(define
+  search/stem-src
+  "stStr cs = joinChars (map chr cs)\nstEnds suf w = let n = length w in let m = length suf in if m > n then False else stStr (drop (n - m) w) == suf\nstDropEnd k w = stStr (take (length w - k) w)\nstem w = if stEnds \"ies\" w && length w >= 5 then stDropEnd 3 w ++ \"y\" else if stEnds \"ss\" w then w else if stEnds \"es\" w && length w >= 5 then stDropEnd 2 w else if stEnds \"s\" w && length w >= 4 then stDropEnd 1 w else if stEnds \"ing\" w && length w >= 6 then stDropEnd 3 w else if stEnds \"ed\" w && length w >= 5 then stDropEnd 2 w else w\nstemTokens s = map stem (tokens s)\nstemText s = unwords (stemTokens s)\nindexStemmed d text idx = indexDoc d (stemText text) idx\n")
diff --git a/lib/search/tests/stem.sx b/lib/search/tests/stem.sx
new file mode 100644
index 00000000..cffd6c36
--- /dev/null
+++ b/lib/search/tests/stem.sx
@@ -0,0 +1,47 @@
+;; Extension — stemming (suffix stripping). Scalar string results wrapped in [].
+
+(define
+  stem-cases
+  (list
+    (list "stem plural s" "[stem \"cats\"]" (list "cat"))
+    (list "stem plural dogs" "[stem \"dogs\"]" (list "dog"))
+    (list "stem keeps ss" "[stem \"pass\"]" (list "pass"))
+    (list "stem short s unchanged" "[stem \"is\"]" (list "is"))
+    (list "stem es boxes" "[stem \"boxes\"]" (list "box"))
+    (list "stem es wishes" "[stem \"wishes\"]" (list "wish"))
+    (list "stem ies cities" "[stem \"cities\"]" (list "city"))
+    (list "stem ies parties" "[stem \"parties\"]" (list "party"))
+    (list "stem ing jumping" "[stem \"jumping\"]" (list "jump"))
+    (list "stem ing running literal" "[stem \"running\"]" (list "runn"))
+    (list "stem ed jumped" "[stem \"jumped\"]" (list "jump"))
+    (list "stem ed wanted" "[stem \"wanted\"]" (list "want"))
+    (list "stem short ed unchanged" "[stem \"red\"]" (list "red"))
+    (list "stem no suffix" "[stem \"cat\"]" (list "cat"))
+    (list
+      "stemText normalizes and stems"
+      "[stemText \"Cats Running!\"]"
+      (list "cat runn"))
+    (list
+      "stemTokens list"
+      "stemTokens \"boxes and cats\""
+      (list "box" "and" "cat"))
+    (list
+      "indexStemmed unifies plural"
+      "map fst (lookupTerm \"cat\" (indexStemmed 2 \"a cat\" (indexStemmed 1 \"the cats\" emptyIndex)))"
+      (list 1 2))
+    (list
+      "indexStemmed stem query"
+      "map fst (lookupTerm (stem \"boxes\") (indexStemmed 1 \"many boxes\" emptyIndex))"
+      (list 1))))
+
+(define
+  stem-results
+  (search-batch "" (map (fn (c) (nth c 1)) stem-cases)))
+
+(map-indexed
+  (fn
+    (i c)
+    (hk-test (nth c 0) (nth stem-results i) (nth c 2)))
+  stem-cases)
+
+{:fail hk-test-fail :pass hk-test-pass :fails hk-test-fails}
diff --git a/plans/search-on-sx.md b/plans/search-on-sx.md
index b2702653..c2c71b7b 100644
--- a/plans/search-on-sx.md
+++ b/plans/search-on-sx.md
@@ -116,10 +116,16 @@ lib/search/index.sx                     lib/search/eval.sx
 - [x] result pagination (offset / limit) — `paginate`, `pageTfIdf`, `pageBm25`,
   `resultCount` — 12 tests
 - [x] snippet / highlight generation (`highlight`, `snippet`) — 12 tests
-- [ ] stemming (suffix stripping) — recall-improving normalizer
+- [x] stemming (suffix stripping) — `stem`, `stemText`, `stemTokens`, `indexStemmed`
+  — 18 tests
 
 ## Progress log
 
+- **Extension: stemming (196/196 total).** Deterministic English suffix stripping
+  (`stem`), `stemText`/`stemTokens`, `indexStemmed`. Two haskell-on-sx gotchas: take/drop
+  over a String yield char CODES not char strings (rebuild via `joinChars . map chr`),
+  and isSuffixOf's `reverse` trips `++` on the String repr (manual suffix compare). All
+  five planned extensions now done; the loop can keep adding search vocabulary. 18 tests.
 - **Extension: highlight/snippet (178/178 total).** `highlight terms text` marks
   query-matching (normalized) tokens with [..]; `snippet ctx terms text` extracts a
   context window around the first match. 12 tests.

From cf4e613e433c70ace0cfa4f9858bb09383478ae0 Mon Sep 17 00:00:00 2001
From: giles <giles.bradshaw@rose-ash.com>
Date: Sat, 6 Jun 2026 23:01:42 +0000
Subject: [PATCH 12/15] search: proximity/NEAR search + 9 tests

nearDocs k t1 t2 returns docs where both terms occur within k positions
(unordered); candidates from the posting intersection, filtered on positional
postings. 205/205.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 lib/search/api.sx           |  7 ++++--
 lib/search/conformance.conf |  2 ++
 lib/search/near.sx          |  8 ++++++
 lib/search/scoreboard.json  |  9 ++++---
 lib/search/scoreboard.md    |  3 ++-
 lib/search/tests/near.sx    | 49 +++++++++++++++++++++++++++++++++++++
 plans/search-on-sx.md       |  4 +++
 7 files changed, 75 insertions(+), 7 deletions(-)
 create mode 100644 lib/search/near.sx
 create mode 100644 lib/search/tests/near.sx

diff --git a/lib/search/api.sx b/lib/search/api.sx
index 5ac85924..84f5e943 100644
--- a/lib/search/api.sx
+++ b/lib/search/api.sx
@@ -6,7 +6,8 @@
 ;; rankTfIdf, rankBm25, topNTfIdf, topNBm25, fedIndex, aclFilter, searchTfIdfAcl,
 ;; topNTfIdfAcl, searchBm25Acl, prefixTerms, prefixDocs, prefixRankTfIdf,
 ;; paginate, pageTfIdf, pageBm25, resultCount, editDist, fuzzyTerms, fuzzyDocs,
-;; fuzzyRankTfIdf, highlight, snippet, stem, stemText, stemTokens, indexStemmed.
+;; fuzzyRankTfIdf, highlight, snippet, stem, stemText, stemTokens, indexStemmed,
+;; nearDocs.
 
 (define
   search/src
@@ -31,4 +32,6 @@
     "\n"
     search/highlight-src
     "\n"
-    search/stem-src))
+    search/stem-src
+    "\n"
+    search/near-src))
diff --git a/lib/search/conformance.conf b/lib/search/conformance.conf
index 8c5375b7..f92d61f5 100644
--- a/lib/search/conformance.conf
+++ b/lib/search/conformance.conf
@@ -29,6 +29,7 @@ PRELOADS=(
   lib/search/fuzzy.sx
   lib/search/highlight.sx
   lib/search/stem.sx
+  lib/search/near.sx
   lib/search/api.sx
   lib/search/testlib.sx
 )
@@ -44,4 +45,5 @@ SUITES=(
   "fuzzy:lib/search/tests/fuzzy.sx"
   "highlight:lib/search/tests/highlight.sx"
   "stem:lib/search/tests/stem.sx"
+  "near:lib/search/tests/near.sx"
 )
diff --git a/lib/search/near.sx b/lib/search/near.sx
new file mode 100644
index 00000000..93893abc
--- /dev/null
+++ b/lib/search/near.sx
@@ -0,0 +1,8 @@
+;; search proximity (NEAR) — Haskell source fragment. Depends on query (posIn,
+;; docsWith, sortedInter). Finds docs where two terms occur within k positions of
+;; each other (unordered), using the positional postings.
+;;   nearDocs :: Int -> Term -> Term -> Index -> [DocId]   (sorted)
+
+(define
+  search/near-src
+  "nrAbsDiff a b = if a > b then a - b else b - a\nnrCloseTo k x [] = False\nnrCloseTo k x (y:ys) = if nrAbsDiff x y <= k then True else nrCloseTo k x ys\nnrAnyClose k [] ys = False\nnrAnyClose k (x:xs) ys = if nrCloseTo k x ys then True else nrAnyClose k xs ys\nnearInDoc k t1 t2 d idx = nrAnyClose k (posIn t1 d idx) (posIn t2 d idx)\nnearHere k t1 t2 idx d = nearInDoc k t1 t2 d idx\nnearDocs k t1 t2 idx = filter (nearHere k t1 t2 idx) (sortedInter (docsWith t1 idx) (docsWith t2 idx))\n")
diff --git a/lib/search/scoreboard.json b/lib/search/scoreboard.json
index 4c88e5e3..a7c01f7d 100644
--- a/lib/search/scoreboard.json
+++ b/lib/search/scoreboard.json
@@ -1,8 +1,8 @@
 {
   "lang": "search",
-  "total_passed": 196,
+  "total_passed": 205,
   "total_failed": 0,
-  "total": 196,
+  "total": 205,
   "suites": [
     {"name":"index","passed":18,"failed":0,"total":18},
     {"name":"boolean","passed":28,"failed":0,"total":28},
@@ -13,7 +13,8 @@
     {"name":"page","passed":12,"failed":0,"total":12},
     {"name":"fuzzy","passed":18,"failed":0,"total":18},
     {"name":"highlight","passed":12,"failed":0,"total":12},
-    {"name":"stem","passed":18,"failed":0,"total":18}
+    {"name":"stem","passed":18,"failed":0,"total":18},
+    {"name":"near","passed":9,"failed":0,"total":9}
   ],
-  "generated": "2026-06-06T22:49:33+00:00"
+  "generated": "2026-06-06T23:01:07+00:00"
 }
diff --git a/lib/search/scoreboard.md b/lib/search/scoreboard.md
index 7e20b449..985b7b97 100644
--- a/lib/search/scoreboard.md
+++ b/lib/search/scoreboard.md
@@ -1,6 +1,6 @@
 # search scoreboard
 
-**196 / 196 passing** (0 failure(s)).
+**205 / 205 passing** (0 failure(s)).
 
 | Suite | Passed | Total | Status |
 |-------|--------|-------|--------|
@@ -14,3 +14,4 @@
 | fuzzy | 18 | 18 | ok |
 | highlight | 12 | 12 | ok |
 | stem | 18 | 18 | ok |
+| near | 9 | 9 | ok |
diff --git a/lib/search/tests/near.sx b/lib/search/tests/near.sx
new file mode 100644
index 00000000..0caa32a8
--- /dev/null
+++ b/lib/search/tests/near.sx
@@ -0,0 +1,49 @@
+;; Extension — proximity (NEAR) search: terms within k positions, unordered.
+;; Corpus:
+;;   1 "the quick brown fox"    the0 quick1 brown2 fox3
+;;   2 "quick the lazy fox dog" quick0 the1 lazy2 fox3 dog4
+;;   3 "fox runs quick"         fox0 runs1 quick2
+
+(define
+  near-setup
+  "idx = indexDoc 3 \"fox runs quick\" (indexDoc 2 \"quick the lazy fox dog\" (indexDoc 1 \"the quick brown fox\" emptyIndex))\n")
+
+(define
+  near-cases
+  (list
+    (list
+      "near adjacent one doc"
+      "nearDocs 1 \"quick\" \"brown\" idx"
+      (list 1))
+    (list
+      "near adjacent both docs"
+      "nearDocs 1 \"quick\" \"the\" idx"
+      (list 1 2))
+    (list
+      "near within 2"
+      "nearDocs 2 \"quick\" \"fox\" idx"
+      (list 1 3))
+    (list "near too far at k1" "nearDocs 1 \"quick\" \"fox\" idx" (list))
+    (list
+      "near unordered symmetric"
+      "nearDocs 2 \"fox\" \"quick\" idx"
+      (list 1 3))
+    (list "near wider window" "nearDocs 5 \"the\" \"dog\" idx" (list 2))
+    (list "near absent term" "nearDocs 1 \"quick\" \"zzz\" idx" (list))
+    (list "near needs both terms" "nearDocs 3 \"brown\" \"dog\" idx" (list))
+    (list
+      "near same docs only"
+      "nearDocs 3 \"fox\" \"runs\" idx"
+      (list 3))))
+
+(define
+  near-results
+  (search-batch near-setup (map (fn (c) (nth c 1)) near-cases)))
+
+(map-indexed
+  (fn
+    (i c)
+    (hk-test (nth c 0) (nth near-results i) (nth c 2)))
+  near-cases)
+
+{:fail hk-test-fail :pass hk-test-pass :fails hk-test-fails}
diff --git a/plans/search-on-sx.md b/plans/search-on-sx.md
index c2c71b7b..775aa82a 100644
--- a/plans/search-on-sx.md
+++ b/plans/search-on-sx.md
@@ -118,9 +118,13 @@ lib/search/index.sx                     lib/search/eval.sx
 - [x] snippet / highlight generation (`highlight`, `snippet`) — 12 tests
 - [x] stemming (suffix stripping) — `stem`, `stemText`, `stemTokens`, `indexStemmed`
   — 18 tests
+- [x] proximity / NEAR — `nearDocs k t1 t2` (unordered, within k positions) — 9 tests
 
 ## Progress log
 
+- **Extension: proximity/NEAR (205/205 total).** `nearDocs k t1 t2 idx` returns docs
+  where both terms occur within k positions (unordered), candidates = posting
+  intersection, filtered on the positional postings. 9 tests.
 - **Extension: stemming (196/196 total).** Deterministic English suffix stripping
   (`stem`), `stemText`/`stemTokens`, `indexStemmed`. Two haskell-on-sx gotchas: take/drop
   over a String yield char CODES not char strings (rebuild via `joinChars . map chr`),

From cfa68c3db34c2d1405ebaaadb3369d50493d98f0 Mon Sep 17 00:00:00 2001
From: giles <giles.bradshaw@rose-ash.com>
Date: Sat, 6 Jun 2026 23:27:03 +0000
Subject: [PATCH 13/15] search: synonym / query expansion + 9 tests

A synonym map [(Term,[Term])] expands a query term to itself + synonyms
(expandTerm); synDocs unions and synRankTfIdf ranks the expanded set. 214/214.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 lib/search/api.sx           |  6 +++--
 lib/search/conformance.conf |  2 ++
 lib/search/scoreboard.json  |  9 ++++---
 lib/search/scoreboard.md    |  3 ++-
 lib/search/syn.sx           | 10 +++++++
 lib/search/tests/syn.sx     | 53 +++++++++++++++++++++++++++++++++++++
 plans/search-on-sx.md       |  4 +++
 7 files changed, 80 insertions(+), 7 deletions(-)
 create mode 100644 lib/search/syn.sx
 create mode 100644 lib/search/tests/syn.sx

diff --git a/lib/search/api.sx b/lib/search/api.sx
index 84f5e943..14ba1774 100644
--- a/lib/search/api.sx
+++ b/lib/search/api.sx
@@ -7,7 +7,7 @@
 ;; topNTfIdfAcl, searchBm25Acl, prefixTerms, prefixDocs, prefixRankTfIdf,
 ;; paginate, pageTfIdf, pageBm25, resultCount, editDist, fuzzyTerms, fuzzyDocs,
 ;; fuzzyRankTfIdf, highlight, snippet, stem, stemText, stemTokens, indexStemmed,
-;; nearDocs.
+;; nearDocs, expandTerm, synDocs, synRankTfIdf.
 
 (define
   search/src
@@ -34,4 +34,6 @@
     "\n"
     search/stem-src
     "\n"
-    search/near-src))
+    search/near-src
+    "\n"
+    search/syn-src))
diff --git a/lib/search/conformance.conf b/lib/search/conformance.conf
index f92d61f5..96d38540 100644
--- a/lib/search/conformance.conf
+++ b/lib/search/conformance.conf
@@ -30,6 +30,7 @@ PRELOADS=(
   lib/search/highlight.sx
   lib/search/stem.sx
   lib/search/near.sx
+  lib/search/syn.sx
   lib/search/api.sx
   lib/search/testlib.sx
 )
@@ -46,4 +47,5 @@ SUITES=(
   "highlight:lib/search/tests/highlight.sx"
   "stem:lib/search/tests/stem.sx"
   "near:lib/search/tests/near.sx"
+  "syn:lib/search/tests/syn.sx"
 )
diff --git a/lib/search/scoreboard.json b/lib/search/scoreboard.json
index a7c01f7d..6f965309 100644
--- a/lib/search/scoreboard.json
+++ b/lib/search/scoreboard.json
@@ -1,8 +1,8 @@
 {
   "lang": "search",
-  "total_passed": 205,
+  "total_passed": 214,
   "total_failed": 0,
-  "total": 205,
+  "total": 214,
   "suites": [
     {"name":"index","passed":18,"failed":0,"total":18},
     {"name":"boolean","passed":28,"failed":0,"total":28},
@@ -14,7 +14,8 @@
     {"name":"fuzzy","passed":18,"failed":0,"total":18},
     {"name":"highlight","passed":12,"failed":0,"total":12},
     {"name":"stem","passed":18,"failed":0,"total":18},
-    {"name":"near","passed":9,"failed":0,"total":9}
+    {"name":"near","passed":9,"failed":0,"total":9},
+    {"name":"syn","passed":9,"failed":0,"total":9}
   ],
-  "generated": "2026-06-06T23:01:07+00:00"
+  "generated": "2026-06-06T23:25:35+00:00"
 }
diff --git a/lib/search/scoreboard.md b/lib/search/scoreboard.md
index 985b7b97..0f54edbb 100644
--- a/lib/search/scoreboard.md
+++ b/lib/search/scoreboard.md
@@ -1,6 +1,6 @@
 # search scoreboard
 
-**205 / 205 passing** (0 failure(s)).
+**214 / 214 passing** (0 failure(s)).
 
 | Suite | Passed | Total | Status |
 |-------|--------|-------|--------|
@@ -15,3 +15,4 @@
 | highlight | 12 | 12 | ok |
 | stem | 18 | 18 | ok |
 | near | 9 | 9 | ok |
+| syn | 9 | 9 | ok |
diff --git a/lib/search/syn.sx b/lib/search/syn.sx
new file mode 100644
index 00000000..6072cd65
--- /dev/null
+++ b/lib/search/syn.sx
@@ -0,0 +1,10 @@
+;; search synonym / query expansion — Haskell source fragment. Depends on index +
+;; rank. A synonym map is an assoc list [(Term, [Term])]; a query term is expanded
+;; to itself plus its synonyms, then the expanded set is unioned / ranked.
+;;   expandTerm    :: [(Term,[Term])] -> Term -> [Term]
+;;   synDocs       :: [(Term,[Term])] -> Term -> Index -> [DocId]
+;;   synRankTfIdf  :: [(Term,[Term])] -> Term -> Index -> [DocId]
+
+(define
+  search/syn-src
+  "synLookup synmap t = case lookup t synmap of { Nothing -> [] ; Just ss -> ss }\nexpandTerm synmap t = t : synLookup synmap t\nsynDocs synmap t idx = foldl (candStep idx) [] (expandTerm synmap t)\nsynRankTfIdf synmap t idx = rankTfIdf (expandTerm synmap t) idx\n")
diff --git a/lib/search/tests/syn.sx b/lib/search/tests/syn.sx
new file mode 100644
index 00000000..aaeea7bd
--- /dev/null
+++ b/lib/search/tests/syn.sx
@@ -0,0 +1,53 @@
+;; Extension — synonym / query expansion.
+;; synmap: car -> automobile, vehicle ; big -> large
+;; Corpus: 1 "fast car"  2 "shiny automobile"  3 "big truck"  4 "large house"  5 "vehicle review"
+
+(define
+  syn-setup
+  "synmap = [(\"car\", [\"automobile\", \"vehicle\"]), (\"big\", [\"large\"])]\nidx = indexDoc 5 \"vehicle review\" (indexDoc 4 \"large house\" (indexDoc 3 \"big truck\" (indexDoc 2 \"shiny automobile\" (indexDoc 1 \"fast car\" emptyIndex))))\n")
+
+(define
+  syn-cases
+  (list
+    (list
+      "expand term with synonyms"
+      "expandTerm synmap \"car\""
+      (list "car" "automobile" "vehicle"))
+    (list
+      "expand single synonym"
+      "expandTerm synmap \"big\""
+      (list "big" "large"))
+    (list "expand unknown term" "expandTerm synmap \"banana\"" (list "banana"))
+    (list
+      "syn docs union"
+      "synDocs synmap \"car\" idx"
+      (list 1 2 5))
+    (list
+      "syn docs single synonym"
+      "synDocs synmap \"big\" idx"
+      (list 3 4))
+    (list
+      "syn docs no synonyms"
+      "synDocs synmap \"house\" idx"
+      (list 4))
+    (list "syn docs absent" "synDocs synmap \"plane\" idx" (list))
+    (list
+      "syn rank expanded"
+      "synRankTfIdf synmap \"car\" idx"
+      (list 1 2 5))
+    (list
+      "syn rank single"
+      "synRankTfIdf synmap \"big\" idx"
+      (list 3 4))))
+
+(define
+  syn-results
+  (search-batch syn-setup (map (fn (c) (nth c 1)) syn-cases)))
+
+(map-indexed
+  (fn
+    (i c)
+    (hk-test (nth c 0) (nth syn-results i) (nth c 2)))
+  syn-cases)
+
+{:fail hk-test-fail :pass hk-test-pass :fails hk-test-fails}
diff --git a/plans/search-on-sx.md b/plans/search-on-sx.md
index 775aa82a..2e62c53b 100644
--- a/plans/search-on-sx.md
+++ b/plans/search-on-sx.md
@@ -119,9 +119,13 @@ lib/search/index.sx                     lib/search/eval.sx
 - [x] stemming (suffix stripping) — `stem`, `stemText`, `stemTokens`, `indexStemmed`
   — 18 tests
 - [x] proximity / NEAR — `nearDocs k t1 t2` (unordered, within k positions) — 9 tests
+- [x] synonym / query expansion — `expandTerm`, `synDocs`, `synRankTfIdf` — 9 tests
 
 ## Progress log
 
+- **Extension: synonyms/query expansion (214/214 total).** A synonym map
+  `[(Term,[Term])]` expands a query term to itself + synonyms (`expandTerm`); `synDocs`
+  unions, `synRankTfIdf` ranks the expanded set. 9 tests.
 - **Extension: proximity/NEAR (205/205 total).** `nearDocs k t1 t2 idx` returns docs
   where both terms occur within k positions (unordered), candidates = posting
   intersection, filtered on the positional postings. 9 tests.

From db2a5dc6ab255cceeca37b058332c8224924dc31 Mon Sep 17 00:00:00 2001
From: giles <giles.bradshaw@rose-ash.com>
Date: Sat, 6 Jun 2026 23:58:37 +0000
Subject: [PATCH 14/15] search: boolean-filtered ranked search + 11 tests
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

searchRankTfIdf/searchRankBm25 parse a boolean query, filter docs via evalQuery,
then rank survivors by relevance over the query's leaf terms (queryTerms) — the
filter-then-rank pattern. 225/225.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 lib/search/api.sx           |  7 ++--
 lib/search/conformance.conf |  2 ++
 lib/search/rankq.sx         | 11 ++++++
 lib/search/scoreboard.json  |  9 ++---
 lib/search/scoreboard.md    |  3 +-
 lib/search/tests/rankq.sx   | 67 +++++++++++++++++++++++++++++++++++++
 plans/search-on-sx.md       |  6 ++++
 7 files changed, 98 insertions(+), 7 deletions(-)
 create mode 100644 lib/search/rankq.sx
 create mode 100644 lib/search/tests/rankq.sx

diff --git a/lib/search/api.sx b/lib/search/api.sx
index 14ba1774..29f445af 100644
--- a/lib/search/api.sx
+++ b/lib/search/api.sx
@@ -7,7 +7,8 @@
 ;; topNTfIdfAcl, searchBm25Acl, prefixTerms, prefixDocs, prefixRankTfIdf,
 ;; paginate, pageTfIdf, pageBm25, resultCount, editDist, fuzzyTerms, fuzzyDocs,
 ;; fuzzyRankTfIdf, highlight, snippet, stem, stemText, stemTokens, indexStemmed,
-;; nearDocs, expandTerm, synDocs, synRankTfIdf.
+;; nearDocs, expandTerm, synDocs, synRankTfIdf, queryTerms, searchRankTfIdf,
+;; searchRankBm25.
 
 (define
   search/src
@@ -36,4 +37,6 @@
     "\n"
     search/near-src
     "\n"
-    search/syn-src))
+    search/syn-src
+    "\n"
+    search/rankq-src))
diff --git a/lib/search/conformance.conf b/lib/search/conformance.conf
index 96d38540..9c7b006e 100644
--- a/lib/search/conformance.conf
+++ b/lib/search/conformance.conf
@@ -31,6 +31,7 @@ PRELOADS=(
   lib/search/stem.sx
   lib/search/near.sx
   lib/search/syn.sx
+  lib/search/rankq.sx
   lib/search/api.sx
   lib/search/testlib.sx
 )
@@ -48,4 +49,5 @@ SUITES=(
   "stem:lib/search/tests/stem.sx"
   "near:lib/search/tests/near.sx"
   "syn:lib/search/tests/syn.sx"
+  "rankq:lib/search/tests/rankq.sx"
 )
diff --git a/lib/search/rankq.sx b/lib/search/rankq.sx
new file mode 100644
index 00000000..77b70468
--- /dev/null
+++ b/lib/search/rankq.sx
@@ -0,0 +1,11 @@
+;; search boolean-filtered ranked search — Haskell source fragment.
+;; Depends on parse (parseQuery/Query), query (evalQuery), rank (tfidfDoc/bm25Doc/
+;; cmpScore). Filters by the boolean query, then ranks the surviving docs by
+;; relevance over the query's leaf terms — the real-world filter-then-rank pattern.
+;;   queryTerms       :: Query -> [Term]
+;;   searchRankTfIdf  :: String -> Index -> [DocId]
+;;   searchRankBm25   :: Float -> Float -> String -> Index -> [DocId]
+
+(define
+  search/rankq-src
+  "queryTerms (Term t) = [t]\nqueryTerms (And a b) = queryTerms a ++ queryTerms b\nqueryTerms (Or a b) = queryTerms a ++ queryTerms b\nqueryTerms (Not a) = queryTerms a\nqueryTerms (Phrase ts) = ts\nmkSubPair f terms idx d = (f terms idx d, d)\nrankSubsetWith f terms docs idx = map snd (sortBy cmpScore (map (mkSubPair f terms idx) docs))\nsearchRankTfIdf s idx = let q = parseQuery s in rankSubsetWith tfidfDoc (queryTerms q) (evalQuery idx q) idx\nsearchRankBm25 k1 b s idx = let q = parseQuery s in rankSubsetWith (bm25Doc k1 b) (queryTerms q) (evalQuery idx q) idx\n")
diff --git a/lib/search/scoreboard.json b/lib/search/scoreboard.json
index 6f965309..3ea5b5ee 100644
--- a/lib/search/scoreboard.json
+++ b/lib/search/scoreboard.json
@@ -1,8 +1,8 @@
 {
   "lang": "search",
-  "total_passed": 214,
+  "total_passed": 225,
   "total_failed": 0,
-  "total": 214,
+  "total": 225,
   "suites": [
     {"name":"index","passed":18,"failed":0,"total":18},
     {"name":"boolean","passed":28,"failed":0,"total":28},
@@ -15,7 +15,8 @@
     {"name":"highlight","passed":12,"failed":0,"total":12},
     {"name":"stem","passed":18,"failed":0,"total":18},
     {"name":"near","passed":9,"failed":0,"total":9},
-    {"name":"syn","passed":9,"failed":0,"total":9}
+    {"name":"syn","passed":9,"failed":0,"total":9},
+    {"name":"rankq","passed":11,"failed":0,"total":11}
   ],
-  "generated": "2026-06-06T23:25:35+00:00"
+  "generated": "2026-06-06T23:58:05+00:00"
 }
diff --git a/lib/search/scoreboard.md b/lib/search/scoreboard.md
index 0f54edbb..2cc7fd9c 100644
--- a/lib/search/scoreboard.md
+++ b/lib/search/scoreboard.md
@@ -1,6 +1,6 @@
 # search scoreboard
 
-**214 / 214 passing** (0 failure(s)).
+**225 / 225 passing** (0 failure(s)).
 
 | Suite | Passed | Total | Status |
 |-------|--------|-------|--------|
@@ -16,3 +16,4 @@
 | stem | 18 | 18 | ok |
 | near | 9 | 9 | ok |
 | syn | 9 | 9 | ok |
+| rankq | 11 | 11 | ok |
diff --git a/lib/search/tests/rankq.sx b/lib/search/tests/rankq.sx
new file mode 100644
index 00000000..dd360310
--- /dev/null
+++ b/lib/search/tests/rankq.sx
@@ -0,0 +1,67 @@
+;; Extension — boolean-filtered ranked search (filter then rank by relevance).
+;; Corpus:
+;;   1 "apple apple banana"   apple2 banana1
+;;   2 "apple cherry"         apple1 cherry1
+;;   3 "banana cherry"        banana1 cherry1
+;;   4 "apple banana cherry"  apple1 banana1 cherry1
+
+(define
+  rankq-setup
+  "idx = indexDoc 4 \"apple banana cherry\" (indexDoc 3 \"banana cherry\" (indexDoc 2 \"apple cherry\" (indexDoc 1 \"apple apple banana\" emptyIndex)))\n")
+
+(define
+  rankq-cases
+  (list
+    (list
+      "queryTerms and"
+      "queryTerms (parseQuery \"apple AND banana\")"
+      (list "apple" "banana"))
+    (list
+      "queryTerms or not"
+      "queryTerms (parseQuery \"a OR NOT b\")"
+      (list "a" "b"))
+    (list
+      "queryTerms phrase"
+      "queryTerms (parseQuery \"\\\"x y\\\" OR z\")"
+      (list "x" "y" "z"))
+    (list
+      "and filter ranked by tf"
+      "searchRankTfIdf \"apple AND banana\" idx"
+      (list 1 4))
+    (list
+      "single term ranked tie"
+      "searchRankTfIdf \"cherry\" idx"
+      (list 2 3 4))
+    (list
+      "or filter ranked"
+      "searchRankTfIdf \"apple OR banana\" idx"
+      (list 1 4 2 3))
+    (list
+      "and-not narrows then ranks"
+      "searchRankTfIdf \"apple AND NOT banana\" idx"
+      (list 2))
+    (list
+      "phrase filter ranked"
+      "searchRankTfIdf \"\\\"apple banana\\\"\" idx"
+      (list 1 4))
+    (list "no matches" "searchRankTfIdf \"zzz\" idx" (list))
+    (list
+      "bm25 boolean ranked subset"
+      "sort (searchRankBm25 1.5 0.75 \"apple OR banana\" idx)"
+      (list 1 2 3 4))
+    (list
+      "bm25 and filter"
+      "searchRankBm25 1.5 0.75 \"apple AND NOT banana\" idx"
+      (list 2))))
+
+(define
+  rankq-results
+  (search-batch rankq-setup (map (fn (c) (nth c 1)) rankq-cases)))
+
+(map-indexed
+  (fn
+    (i c)
+    (hk-test (nth c 0) (nth rankq-results i) (nth c 2)))
+  rankq-cases)
+
+{:fail hk-test-fail :pass hk-test-pass :fails hk-test-fails}
diff --git a/plans/search-on-sx.md b/plans/search-on-sx.md
index 2e62c53b..cf8c530a 100644
--- a/plans/search-on-sx.md
+++ b/plans/search-on-sx.md
@@ -120,9 +120,15 @@ lib/search/index.sx                     lib/search/eval.sx
   — 18 tests
 - [x] proximity / NEAR — `nearDocs k t1 t2` (unordered, within k positions) — 9 tests
 - [x] synonym / query expansion — `expandTerm`, `synDocs`, `synRankTfIdf` — 9 tests
+- [x] boolean-filtered ranked search — `queryTerms`, `searchRankTfIdf`,
+  `searchRankBm25` (filter by boolean query, rank survivors by relevance) — 11 tests
 
 ## Progress log
 
+- **Extension: boolean-filtered ranked search (225/225 total).** `searchRankTfIdf`/
+  `searchRankBm25` parse a boolean query, filter docs via evalQuery, then rank the
+  survivors by relevance over the query's leaf terms (`queryTerms`) — the real-world
+  filter-then-rank pattern. 11 tests.
 - **Extension: synonyms/query expansion (214/214 total).** A synonym map
   `[(Term,[Term])]` expands a query term to itself + synonyms (`expandTerm`); `synDocs`
   unions, `synRankTfIdf` ranks the expanded set. 9 tests.

From 5d62d08e1c5786b25863bd4f19539631fc5fb568 Mon Sep 17 00:00:00 2001
From: giles <giles.bradshaw@rose-ash.com>
Date: Sun, 7 Jun 2026 00:46:22 +0000
Subject: [PATCH 15/15] search: did-you-mean spelling suggestion + 9 tests

suggest/suggestN rank indexed terms by edit distance to a (misspelled) query
term, alphabetical tiebreak. 234/234.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 lib/search/api.sx           |  6 ++++--
 lib/search/conformance.conf |  2 ++
 lib/search/scoreboard.json  |  9 ++++----
 lib/search/scoreboard.md    |  3 ++-
 lib/search/suggest.sx       |  9 ++++++++
 lib/search/tests/suggest.sx | 42 +++++++++++++++++++++++++++++++++++++
 plans/search-on-sx.md       |  5 +++++
 7 files changed, 69 insertions(+), 7 deletions(-)
 create mode 100644 lib/search/suggest.sx
 create mode 100644 lib/search/tests/suggest.sx

diff --git a/lib/search/api.sx b/lib/search/api.sx
index 29f445af..dd66031b 100644
--- a/lib/search/api.sx
+++ b/lib/search/api.sx
@@ -8,7 +8,7 @@
 ;; paginate, pageTfIdf, pageBm25, resultCount, editDist, fuzzyTerms, fuzzyDocs,
 ;; fuzzyRankTfIdf, highlight, snippet, stem, stemText, stemTokens, indexStemmed,
 ;; nearDocs, expandTerm, synDocs, synRankTfIdf, queryTerms, searchRankTfIdf,
-;; searchRankBm25.
+;; searchRankBm25, suggestN, suggest.
 
 (define
   search/src
@@ -39,4 +39,6 @@
     "\n"
     search/syn-src
     "\n"
-    search/rankq-src))
+    search/rankq-src
+    "\n"
+    search/suggest-src))
diff --git a/lib/search/conformance.conf b/lib/search/conformance.conf
index 9c7b006e..ec0fa631 100644
--- a/lib/search/conformance.conf
+++ b/lib/search/conformance.conf
@@ -32,6 +32,7 @@ PRELOADS=(
   lib/search/near.sx
   lib/search/syn.sx
   lib/search/rankq.sx
+  lib/search/suggest.sx
   lib/search/api.sx
   lib/search/testlib.sx
 )
@@ -50,4 +51,5 @@ SUITES=(
   "near:lib/search/tests/near.sx"
   "syn:lib/search/tests/syn.sx"
   "rankq:lib/search/tests/rankq.sx"
+  "suggest:lib/search/tests/suggest.sx"
 )
diff --git a/lib/search/scoreboard.json b/lib/search/scoreboard.json
index 3ea5b5ee..d548e4b3 100644
--- a/lib/search/scoreboard.json
+++ b/lib/search/scoreboard.json
@@ -1,8 +1,8 @@
 {
   "lang": "search",
-  "total_passed": 225,
+  "total_passed": 234,
   "total_failed": 0,
-  "total": 225,
+  "total": 234,
   "suites": [
     {"name":"index","passed":18,"failed":0,"total":18},
     {"name":"boolean","passed":28,"failed":0,"total":28},
@@ -16,7 +16,8 @@
     {"name":"stem","passed":18,"failed":0,"total":18},
     {"name":"near","passed":9,"failed":0,"total":9},
     {"name":"syn","passed":9,"failed":0,"total":9},
-    {"name":"rankq","passed":11,"failed":0,"total":11}
+    {"name":"rankq","passed":11,"failed":0,"total":11},
+    {"name":"suggest","passed":9,"failed":0,"total":9}
   ],
-  "generated": "2026-06-06T23:58:05+00:00"
+  "generated": "2026-06-07T00:44:05+00:00"
 }
diff --git a/lib/search/scoreboard.md b/lib/search/scoreboard.md
index 2cc7fd9c..4a59608e 100644
--- a/lib/search/scoreboard.md
+++ b/lib/search/scoreboard.md
@@ -1,6 +1,6 @@
 # search scoreboard
 
-**225 / 225 passing** (0 failure(s)).
+**234 / 234 passing** (0 failure(s)).
 
 | Suite | Passed | Total | Status |
 |-------|--------|-------|--------|
@@ -17,3 +17,4 @@
 | near | 9 | 9 | ok |
 | syn | 9 | 9 | ok |
 | rankq | 11 | 11 | ok |
+| suggest | 9 | 9 | ok |
diff --git a/lib/search/suggest.sx b/lib/search/suggest.sx
new file mode 100644
index 00000000..7b06b1fb
--- /dev/null
+++ b/lib/search/suggest.sx
@@ -0,0 +1,9 @@
+;; search did-you-mean / spelling suggestion — Haskell source fragment.
+;; Depends on fuzzy (editDist) + index (allTerms). Ranks indexed terms by edit
+;; distance to a (possibly misspelled) query term; ties broken alphabetically.
+;;   suggestN :: Int -> String -> Index -> [Term]
+;;   suggest  :: String -> Index -> Term   ("" if the index has no terms)
+
+(define
+  search/suggest-src
+  "sgMk term t = (editDist term t, t)\nsgPairs term idx = map (sgMk term) (allTerms idx)\nsgCmp p1 p2 = if fst p1 < fst p2 then LT else if fst p1 > fst p2 then GT else compare (snd p1) (snd p2)\nsuggestN n term idx = take n (map snd (sortBy sgCmp (sgPairs term idx)))\nsgHead [] = \"\"\nsgHead (x:xs) = x\nsuggest term idx = sgHead (suggestN 1 term idx)\n")
diff --git a/lib/search/tests/suggest.sx b/lib/search/tests/suggest.sx
new file mode 100644
index 00000000..164b43ec
--- /dev/null
+++ b/lib/search/tests/suggest.sx
@@ -0,0 +1,42 @@
+;; Extension — did-you-mean / spelling suggestion.
+;; Corpus terms (sorted): ample apple apply banana orange
+
+(define
+  suggest-setup
+  "idx = indexDoc 1 \"apple apply ample banana orange\" emptyIndex\n")
+
+(define
+  suggest-cases
+  (list
+    (list "suggest exact term" "[suggest \"apple\" idx]" (list "apple"))
+    (list
+      "suggest misspelled banana"
+      "[suggest \"bananna\" idx]"
+      (list "banana"))
+    (list
+      "suggest missing letter orange"
+      "[suggest \"orang\" idx]"
+      (list "orange"))
+    (list "suggest closest apply" "[suggest \"aply\" idx]" (list "apply"))
+    (list "suggestN 1 banana" "suggestN 1 \"bananna\" idx" (list "banana"))
+    (list
+      "suggestN 2 ties alpha"
+      "suggestN 2 \"aple\" idx"
+      (list "ample" "apple"))
+    (list "suggest empty term shortest" "[suggest \"\" idx]" (list "ample"))
+    (list "suggest empty index" "[suggest \"apple\" emptyIndex]" (list ""))
+    (list "suggestN empty index" "suggestN 1 \"apple\" emptyIndex" (list))))
+
+(define
+  suggest-results
+  (search-batch
+    suggest-setup
+    (map (fn (c) (nth c 1)) suggest-cases)))
+
+(map-indexed
+  (fn
+    (i c)
+    (hk-test (nth c 0) (nth suggest-results i) (nth c 2)))
+  suggest-cases)
+
+{:fail hk-test-fail :pass hk-test-pass :fails hk-test-fails}
diff --git a/plans/search-on-sx.md b/plans/search-on-sx.md
index cf8c530a..4cd93e8f 100644
--- a/plans/search-on-sx.md
+++ b/plans/search-on-sx.md
@@ -122,9 +122,14 @@ lib/search/index.sx                     lib/search/eval.sx
 - [x] synonym / query expansion — `expandTerm`, `synDocs`, `synRankTfIdf` — 9 tests
 - [x] boolean-filtered ranked search — `queryTerms`, `searchRankTfIdf`,
   `searchRankBm25` (filter by boolean query, rank survivors by relevance) — 11 tests
+- [x] did-you-mean / spelling suggestion — `suggest`, `suggestN` (closest indexed
+  terms by edit distance, alphabetical tiebreak) — 9 tests
 
 ## Progress log
 
+- **Extension: did-you-mean / spelling suggestion (234/234 total).** `suggest`/`suggestN`
+  rank indexed terms by edit distance to a (misspelled) query term, alphabetical
+  tiebreak. 9 tests.
 - **Extension: boolean-filtered ranked search (225/225 total).** `searchRankTfIdf`/
   `searchRankBm25` parse a boolean query, filter docs via evalQuery, then rank the
   survivors by relevance over the query's leaf terms (`queryTerms`) — the real-world