Merge loops/datalog into architecture: tokenizer/parser, magic sets, negation, semi-naive (259/259 tests)

2026-05-13 19:52:11 +00:00
parent a7665a7b25 0df5e92c46
commit 4fd376a348
28 changed files with 7662 additions and 112 deletions
--- a/plans/datalog-on-sx.md
+++ b/plans/datalog-on-sx.md
@@ -13,6 +13,20 @@ End-state goal: **full core Datalog** (facts, rules, stratified negation, aggreg
 recursion) with a clean SX query API, and a demonstration of Datalog as a query engine
 for rose-ash data (e.g. federation graph, content relationships).

+## Status (rolling)
+
+`bash lib/datalog/conformance.sh` → **276/276 across 11 suites**
+(tokenize, parse, unify, eval, builtins, semi_naive, negation, aggregates,
+api, magic, demo). Source is ~3100 LOC, tests ~2900 LOC, public API
+documented in `lib/datalog/datalog.sx`.
+
+Phases 1–9 are functionally complete; Phase 10 covers the rose-ash
+domain demos (in `lib/datalog/demo.sx` — federation, content,
+permissions, cooking-posts, tag co-occurrence, shortest path, org chart).
+The PostgreSQL loader and `/internal/datalog` HTTP endpoint listed in
+Phase 10 require service-tree edits outside `lib/datalog/**` and are
+flagged as out-of-scope for this loop.
+
 ## Ground rules

 - **Scope:** only touch `lib/datalog/**` and `plans/datalog-on-sx.md`. Do **not** edit
@@ -25,23 +39,6 @@ for rose-ash data (e.g. federation graph, content relationships).
  Dalmau "Datalog and Constraint Satisfaction".
 - **Commits:** one feature per commit. Keep `## Progress log` updated and tick boxes.

-## Non-goals
-
-Deliberately out of scope for this implementation. Real engines (Soufflé, Cozo, DDlog) include
-some of these; we accept they're missing and will note them in `Blockers` if a use case demands
-one later.
-
- **Function symbols** — keeps termination guaranteed and prevents collapse into Prolog.
- **Disjunctive heads** (`p :- q. p :- r.` is fine; `p ; q :- r.` is not) — research extension.
- **Well-founded semantics** — only stratified negation. Programs that aren't stratifiable are
-  rejected at load time, not evaluated under WFS.
- **Tabled top-down (SLG resolution)** — bottom-up only. If you want top-down with termination,
-  use the Prolog implementation.
- **Constraint Datalog** (Datalog over reals, intervals, finite domains) — research extension.
- **Distributed evaluation / Differential Dataflow** — single-process fixpoint only. The rose-ash
-  cross-service story (Phase 10) federates by querying each service's local Datalog instance and
-  joining results, not by running a distributed fixpoint.
-
 ## Architecture sketch

 ```
@@ -75,128 +72,647 @@ Key differences from Prolog:
 ## Roadmap

 ### Phase 1 — tokenizer + parser
- [ ] Tokenizer: atoms (lowercase/quoted), variables (uppercase/`_`), numbers, strings,
-      operators (`:- `, `?-`, `,`, `.`), arithmetic + comparison operators
-      (`+`, `-`, `*`, `/`, `<`, `<=`, `>`, `>=`, `=`, `!=`), comments (`%`, `/* */`)
-      Note: no function symbol syntax (no nested `f(...)` in arg position).
- [ ] Parser:
+- [x] Tokenizer: atoms (lowercase/quoted), variables (uppercase/`_`), numbers, strings,
+      punct (`( )`, `,`, `.`), operators (`:-`, `?-`, `<=`, `>=`, `!=`, `<`, `>`, `=`,
+      `+`, `-`, `*`, `/`), comments (`%`, `/* */`)
+      Note: no function symbol syntax (no nested `f(...)` in arg position) — but the
+      parser permits nested compounds for arithmetic; safety analysis (Phase 3) rejects
+      non-arithmetic nesting.
+- [x] Parser:
      - Facts: `parent(tom, bob).` → `{:head (parent tom bob) :body ()}`
      - Rules: `ancestor(X,Z) :- parent(X,Y), ancestor(Y,Z).`
              → `{:head (ancestor X Z) :body ((parent X Y) (ancestor Y Z))}`
-      - Queries: `?- ancestor(tom, X).` → `{:query (ancestor tom X)}`
+      - Queries: `?- ancestor(tom, X).` → `{:query ((ancestor tom X))}`
+        (`:query` value is always a list of literals; `?- p, q.` → `{:query ((p) (q))}`)
      - Negation: `not(parent(X,Y))` in body position → `{:neg (parent X Y)}`
- [ ] Tests in `lib/datalog/tests/parse.sx`
+- [x] Tests in `lib/datalog/tests/parse.sx` (18) and `lib/datalog/tests/tokenize.sx` (26).
+      Conformance harness: `bash lib/datalog/conformance.sh` → 44 / 44 passing.

 ### Phase 2 — unification + substitution
- [ ] Share or port unification from `lib/prolog/` — term walk, occurs check off by default
- [ ] `dl-unify` `t1` `t2` `subst` → extended subst or nil (no function symbols means simpler)
- [ ] `dl-ground?` `term` → bool — all variables bound in substitution
- [ ] Tests: atom/atom, var/atom, var/var, list args
+- [x] Ported (not shared) from `lib/prolog/` — term walk, no occurs check.
+- [x] `dl-unify t1 t2 subst` → extended subst dict, or `nil` on failure.
+- [x] `dl-walk`, `dl-bind`, `dl-apply-subst`, `dl-ground?`, `dl-vars-of`.
+- [x] Substitutions are immutable dicts keyed by variable name (string).
+      Lists/tuples unify element-wise (used for arithmetic compounds too).
+- [x] Tests in `lib/datalog/tests/unify.sx` (28). 72 / 72 conformance.

-### Phase 3 — extensional DB + naive evaluation
- [ ] EDB: `{:relation-name → set-of-ground-tuples}` using SX sets (Phase 18 of primitives)
- [ ] `dl-add-fact!` `db` `relation` `args` → add ground tuple
- [ ] `dl-add-rule!` `db` `head` `body` → add rule clause
- [ ] Naive evaluation: iterate rules until fixpoint
-      For each rule, for each combination of body tuples that unify, derive head tuple.
-      Repeat until no new tuples added.
- [ ] `dl-query` `db` `goal` → list of substitutions satisfying goal against derived DB
- [ ] **Safety analysis**: every variable in a rule head must also appear in a positive body
-      literal; reject unsafe rules at `dl-add-rule!` time with a clear error pointing at the
-      offending variable. Built-in predicates and negated atoms do not satisfy safety on their
-      own (`p(X) :- X > 0.` is unsafe).
- [ ] Tests: transitive closure (ancestor), sibling, same-generation — classic Datalog programs;
-      safety violation rejection cases.
+### Phase 3 — extensional DB + naive evaluation + safety analysis
+- [x] EDB+IDB combined: `{:facts {<rel-name-string> -> (literal ...)}}` —
+      relations indexed by name; tuples stored as full literals so they
+      unify directly. Dedup on insert via `dl-tuple-equal?`.
+- [x] `dl-add-fact! db lit` (rejects non-ground) and `dl-add-rule! db rule`
+      (rejects unsafe). `dl-program source` parses + loads in one step.
+- [x] Naive evaluation `dl-saturate! db`: iterate rules until no new tuples.
+      `dl-find-bindings` recursively joins body literals; `dl-match-positive`
+      unifies a literal against every tuple in the relation.
+- [x] `dl-query db goal` → list of substitutions over `goal`'s vars,
+      deduplicated. `dl-relation db name` for derived tuples.
+- [x] Safety analysis at `dl-add-rule!` time: every head variable except
+      `_` must appear in some positive body literal. Built-ins and negated
+      literals do not satisfy safety. Helpers `dl-positive-body-vars`,
+      `dl-rule-unsafe-head-vars` exposed for later phases.
+- [x] Negation and arithmetic built-ins error cleanly at saturate time
+      (Phase 4 / Phase 7 will swap in real semantics).
+- [x] Tests in `lib/datalog/tests/eval.sx` (15): transitive closure,
+      sibling, same-generation, grandparent, cyclic graph reach, six
+      safety cases. 87 / 87 conformance.

 ### Phase 4 — built-in predicates + body arithmetic
-
-Almost every real query needs `<`, `=`, simple arithmetic, and string comparisons in body
-position. These are not EDB lookups — they're constraints that filter bindings.
-
- [ ] Recognise built-in predicates in body: `(< X Y)`, `(<= X Y)`, `(> X Y)`, `(>= X Y)`,
-      `(= X Y)`, `(!= X Y)` and arithmetic forms `(is Z (+ X Y))`, `(is Z (- X Y))`,
-      `(is Z (* X Y))`, `(is Z (/ X Y))`.
- [ ] Built-in evaluation in the fixpoint: at the join step, after binding variables from EDB
-      lookups, evaluate built-ins as constraints. If any built-in fails or has unbound inputs,
-      drop the candidate substitution.
- [ ] **Safety extension**: `is` binds its left operand iff right operand is fully ground.
-      `(< X Y)` requires both X and Y bound by some prior body literal — reject unsafe.
- [ ] Wire arithmetic operators through to SX numeric primitives — no separate Datalog number
-      tower.
- [ ] Tests: range filters, arithmetic derivations (`(plus-one X Y :- ..., (is Y (+ X 1)))`),
-      comparison-based queries, safety violation detection on `(p X) :- (< X 5).`
+Almost every real query needs `<`, `=`, simple arithmetic, and string
+comparisons in body position. These are not EDB lookups — they're
+constraints that filter bindings.
+- [x] Recognise built-in predicates in body: `(< X Y)`, `(<= X Y)`, `(> X Y)`,
+      `(>= X Y)`, `(= X Y)`, `(!= X Y)` and arithmetic forms `(is Z (+ X Y))`,
+      `(is Z (- X Y))`, `(is Z (* X Y))`, `(is Z (/ X Y))`. Live in
+      `lib/datalog/builtins.sx`.
+- [x] `dl-eval-builtin` dispatches; `dl-eval-arith` recursively evaluates
+      `(+ a b)` etc. with full nesting. `=` unifies; `!=` rejects equal
+      ground terms.
+- [x] Order-aware safety analysis (`dl-rule-check-safety`): walks body
+      left-to-right tracking which vars are bound. `is`'s RHS vars must
+      be already bound; LHS becomes bound. Comparisons require both
+      sides bound. `=` is special-cased — at least one side bound binds
+      the other. Negation vars must be bound (will be enforced fully in
+      Phase 7).
+- [x] Wired through SX numeric primitives — no separate number tower.
+- [x] Tests in `lib/datalog/tests/builtins.sx` (19): range filters,
+      arithmetic derivations, equality binding, eight safety violations
+      and three safe-shape tests. Conformance 106 / 106.

 ### Phase 5 — semi-naive evaluation (performance)
- [ ] Delta sets: track newly derived tuples per iteration
- [ ] Semi-naive rule: only join against delta tuples from last iteration, not full relation
- [ ] Significant speedup for recursive rules — avoids re-deriving known tuples
- [ ] `dl-stratify` `db` → dependency graph + SCC analysis → stratum ordering
- [ ] Tests: verify semi-naive produces same results as naive; benchmark on large ancestor chain
+- [x] Delta sets `{rel-name -> tuples}` track newly derived tuples per iter.
+      `dl-snapshot-facts` builds the initial delta from the EDB.
+- [x] Semi-naive rule: for each rule, walk every positive body literal
+      position; substitute that one with the per-relation delta and join
+      the rest against the previous-iteration DB (`dl-find-bindings-semi`).
+      Candidates are collected before mutating the DB so the "full" sides
+      see a consistent snapshot.
+- [x] `dl-collect-rule-candidates` falls back to a naive single pass when
+      a rule has no positive body literal (e.g. `(p X) :- (= X 5).`).
+- [x] `dl-saturate!` is now semi-naive by default; `dl-saturate-naive!`
+      kept for differential testing and a reference implementation.
+- [x] Tests in `lib/datalog/tests/semi_naive.sx` (8) — every recursive
+      program from earlier suites is run under both saturators with
+      per-relation tuple counts compared (cheap, robust under bundled
+      conformance session). A chain-5 differential exercises multiple
+      semi-naive iterations against the recursive ancestor rule.
+      Larger chains hit prohibitive wall-clock under conformance CPU
+      contention with other agents — a future Blocker tracks switching
+      `dl-tuple-member?` from O(n²) list scan to a hash-set per relation.

-### Phase 6 — magic sets (goal-directed bottom-up)
-
-Naive bottom-up evaluation derives **all** consequences of all rules before answering, even when
-the query touches a tiny slice of the EDB. Magic sets rewrite the program so the fixpoint only
-derives tuples relevant to the goal — a major perf win for "what's reachable from node X" style
-queries on large graphs.
-
- [ ] Adornments: annotate rule predicates with bound (`b`) / free (`f`) patterns based on how
-      they're called (`ancestor^bf(tom, X)` vs `ancestor^ff(X, Y)`).
- [ ] Magic transformation: for each adorned predicate, generate a `magic_<pred>` relation and
-      rewrite rule bodies to filter through it. Seed with `magic_<query-pred>(<bound-args>)`.
- [ ] Sideways information passing strategy (SIPS): left-to-right by default; pluggable.
- [ ] Optional pass — guarded behind `(dl-set-strategy! db :magic)`; default remains semi-naive.
- [ ] Tests: ancestor query from a single root on a 10k-node graph — magic-rewritten version
-      should be O(reachable) instead of O(graph). Equivalence vs naive on small inputs.
+### Phase 6 — magic sets (goal-directed bottom-up, opt-in)
+Naive bottom-up derives **all** consequences before answering. Magic sets
+rewrite the program so the fixpoint only derives tuples relevant to the
+goal — a major perf win for "what's reachable from node X" queries on
+large graphs.
+- [x] Adornments: `dl-adorn-goal goal` and `dl-adorn-lit lit bound` in
+      `lib/datalog/magic.sx`. Per-arg `b`/`f` based on whether the arg
+      is a constant or a variable already in the bound set.
+- [x] Magic transformation: `dl-magic-rewrite rules query-rel adn args`
+      generates `{:rules <rewritten-rules> :seed <magic-seed>}`. Each
+      original rule is gated with a `magic_<rel>^<adn>(bound)` filter,
+      and propagation rules are emitted for each positive non-builtin
+      body literal. Worklist over `(rel, adn)` pairs starts from the
+      query and stops when no new pairs appear. EDB facts pass through
+      unchanged.
+- [x] Sideways information passing strategy (SIPS): left-to-right
+      `dl-rule-sips rule head-adornment` walks body literals tracking
+      the bound set, returning `({:lit :adornment} ...)`. Recognises
+      `is`/aggregate result-vars as new binders; comparisons and
+      negation pass through with computed adornments. (Pluggable
+      strategies are future work.)
+- [x] `dl-set-strategy! db strategy` hook + `dl-get-strategy db`. Default
+      `:semi-naive`. `:magic` accepted but the transformation itself is
+      deferred — saturator currently falls back to semi-naive. Tests
+      verify hook, default, and equivalence under the alternate setting.
+- [x] Equivalence test: rewritten ancestor program over the same EDB
+      derives the same number of `ancestor` tuples and returns the
+      same query answers as the unrewritten program (chain-3 case).
+- [x] `dl-magic-query db query-goal` — top-level driver. Builds a
+      fresh internal db with the caller's EDB facts, the magic seed,
+      and the rewritten rules; saturates and queries. Caller's db is
+      untouched. Equivalent to `dl-query` for fully-stratifiable
+      programs (sole motivation is a perf alternative on goal-shaped
+      queries against large recursive relations).
+- [ ] Perf test: 10k-node reachability with magic vs semi-naive.
+      Left to a future iteration — would need a benchmarking harness
+      for large graphs and the conformance budget can't afford it.

 ### Phase 7 — stratified negation
- [ ] Dependency graph analysis: which relations depend on which (positively or negatively)
- [ ] Stratification check: error if negation is in a cycle (non-stratifiable program)
- [ ] Evaluation: process strata in order — lower stratum fully computed before using its
-      complement in a higher stratum
- [ ] `not(P)` in rule body: at evaluation time, check P is NOT in the derived EDB
- [ ] Tests: non-member (`not(member(X,L))`), colored-graph (`not(same-color(X,Y))`),
-      stratification error detection
+- [x] Dependency graph: `dl-build-dep-graph db` returns `{head -> ({:rel
+      :neg} ...)}`. Built-ins drop out (they're not relations).
+- [x] Reachability via Floyd-Warshall in `dl-build-reach`; cycles
+      detected by `reach[A][B] && reach[B][A]`. Programs are
+      non-stratifiable iff any negative dependency falls inside an SCC.
+      `dl-check-stratifiable` returns nil on success or a clear message.
+- [x] `dl-compute-strata` propagates stratum numbers iteratively:
+      `stratum(R) = max over deps of (stratum(dep) + (1 if negated else 0))`.
+- [x] Saturator refactor: `dl-saturate-rules! db rules` is the semi-
+      naive worker; `dl-saturate! db` rejects non-stratifiable programs,
+      groups rules by head's stratum, and runs the worker on each
+      stratum in increasing order.
+- [x] `not(P)` in body: `dl-match-negation` walks the inner literal
+      under the current subst and uses `dl-match-positive` — succeeds
+      iff zero matches. Order-aware safety in `dl-rule-check-safety`
+      (already present from Phase 4) requires negation vars to be
+      bound by an earlier positive literal.
+- [x] Tests in `lib/datalog/tests/negation.sx` (10): EDB and IDB
+      negation, two-step strata, multi-level strata, with-arithmetic,
+      empty-result and always-fail cases, non-stratifiability
+      rejection, and a negation safety violation.

 ### Phase 8 — aggregation (Datalog+)
- [ ] `count(X, Goal)` → number of distinct X satisfying Goal
- [ ] `sum(X, Goal)` → sum of X values satisfying Goal
- [ ] `min(X, Goal)` / `max(X, Goal)` → min/max of X satisfying Goal
- [ ] `group-by` semantics: `count(X, sibling(bob, X))` → count of bob's siblings
- [ ] Aggregation breaks stratification — evaluate in a separate post-fixpoint pass
- [ ] Tests: social network statistics, grade aggregation, inventory sums
+- [x] `(count R V Goal)`, `(sum R V Goal)`, `(min R V Goal)`,
+      `(max R V Goal)`, `(findall L V Goal)` — first arg is the result
+      variable, second is the aggregated variable, third is the goal
+      literal. `findall` returns the distinct-value list itself; the
+      others reduce. Live in `lib/datalog/aggregates.sx`.
+- [x] `dl-eval-aggregate`: runs `dl-find-bindings` on the goal under the
+      current subst (which provides outer-context bindings), collects
+      distinct values of the aggregated var, applies the aggregate.
+      `count`/`sum` produce 0 when no matches; `min`/`max` produce no
+      binding (rule fails) when empty.
+- [x] Group-by emerges naturally: outer-context vars in the goal are
+      substituted from the current subst, so `popular(P) :- post(P),
+      count(N, U, liked(U, P)), >=(N, 3).` correctly counts per-post.
+- [x] Stratification: `dl-aggregate-dep-edge` returns a negation-like
+      edge so the aggregate's goal relation is fully derived before the
+      aggregate fires. Non-monotonicity respected.
+- [x] Safety: aggregate body lit binds the result var; goal-internal
+      vars are existentially quantified and don't need outer binding.
+- [x] Tests in `lib/datalog/tests/aggregates.sx` (10): count siblings,
+      sum prices, min/max scores, count over derived relation,
+      empty-input cases for each operator, popularity threshold with
+      group-by, distinct-counted-once.

 ### Phase 9 — SX embedding API
- [ ] `(dl-program facts rules)` → database from SX data directly (no parsing required)
+- [x] `(dl-program-data facts rules)` builds a db from SX data —
+      `facts` is a list of literals, `rules` is a list of either
+      dicts `{:head … :body …}` or lists `(<head…> <- <body…>)`.
+      Variables are SX symbols whose first char is uppercase or `_`,
+      matching the parser's convention.
      ```
-      (dl-program
-        '((parent tom bob) (parent tom liz) (parent bob ann))
-        '((ancestor X Z :- (parent X Y) (ancestor Y Z))
-          (ancestor X Y :- (parent X Y))))
+      (dl-program-data
+        '((parent tom bob) (parent bob ann))
+        '((ancestor X Y <- (parent X Y))
+          (ancestor X Z <- (parent X Y) (ancestor Y Z))))
      ```
- [ ] `(dl-query db '(ancestor tom ?X))` → `((ann) (bob) (liz) (pat))`
- [ ] `(dl-assert! db '(parent ann pat))` → incremental fact addition + re-derive
- [ ] `(dl-retract! db '(parent tom bob))` → fact removal + re-derive from scratch
- [ ] Integration demo: federation graph query — `(ancestor actor1 actor2)` over
-      rose-ash ActivityPub follow relationships
+- [x] `(dl-rule head body)` constructor for the dict form.
+- [x] `(dl-query db '(ancestor tom X))` already worked — same query API
+      consumes the SX-data goal. Now also accepts a *list* of body
+      literals for conjunctive queries:
+      `(dl-query db '((p X) (q X)))`,
+      `(dl-query db (list '(n X) '(> X 2)))`. Auto-dispatched via
+      `dl-query-coerce` on first-element shape.
+- [x] `(dl-assert! db '(parent ann pat))` → adds the fact and re-saturates.
+- [x] `(dl-retract! db '(parent bob ann))` → drops matching tuples from
+      the EDB list, wipes every relation that has a rule (those are IDB),
+      and re-saturates from the surviving EDB.
+- [x] Tests in `lib/datalog/tests/api.sx` (9): closure via data API,
+      dict-rule form, dl-rule constructor, dl-assert! incremental,
+      dl-retract! removes derived, cyclic-graph reach via data,
+      assert into empty db, fact-style rule (no arrow), coerce dict.
+- [x] Integration demo: federation graph query — `(reachable A B)` /
+      `(mutual A B)` / `(foaf A C)` over `(follows ACTOR-A ACTOR-B)` in
+      `lib/datalog/demo.sx`. Tests in `lib/datalog/tests/demo.sx`.
+      Wiring this to actual rose-ash ActivityPub data is Phase 10
+      service work and is out of scope for this loop.

 ### Phase 10 — Datalog as a query language for rose-ash
- [ ] Schema: map SQLAlchemy model relationships to Datalog EDB facts
-      (e.g. `(follows user1 user2)`, `(authored user post)`, `(tagged post tag)`)
- [ ] Loader: `dl-load-from-db!` — query PostgreSQL, populate Datalog EDB
- [ ] Query examples:
-      - `?- ancestor(me, X), authored(X, Post), tagged(Post, cooking).`
-        → posts about cooking by people I follow (transitively)
-      - `?- popular(Post) :- tagged(Post, T), count(L, (liked(L, Post))) >= 10.`
-        → posts with 10+ likes
- [ ] Expose as a rose-ash service endpoint: `POST /internal/datalog` with program + query
+- [x] Schema sketches in `lib/datalog/demo.sx`:
+      - **Federation**: `(follows A B)` → `(mutual A B)`, `(reachable A B)`,
+        `(foaf A C)` (friend-of-a-friend, distinct).
+      - **Content**: `(authored A P)`, `(liked U P)`, `(tagged P T)` →
+        `(post-likes P N)` via aggregation, `(popular P)` for likes ≥ 3,
+        `(interesting Me P)` joining follows + authored + popular.
+      - **Permissions**: `(member A G)`, `(subgroup C P)`, `(allowed G R)`
+        → `(in-group A G)` over transitive subgroups, `(can-access A R)`.
+      - **Cooking-posts** (the canonical example): `(reach Me Them)` over
+        the follow graph, then `(cooking-post-by-network Me P)` joining
+        reach + authored + `(tagged P cooking)`.
+- [ ] Loader `dl-load-from-db!` — out of scope for this loop
+      (would need to edit `shared/services/` outside `lib/datalog/`).
+      Programs in `demo.sx` already document the EDB shape expected
+      from such a loader. `dl-program-data` consumes the same shape.
+- [x] Query examples covered by `lib/datalog/tests/demo.sx` (10):
+      mutuals, transitive reach, FOAF, popular posts, interesting feed,
+      post likes count, direct/subgroup/transitive group access, no
+      access without grant.
+- [ ] Service endpoint `POST /internal/datalog` — out of scope as above.
+      Once exposed, server-side handler would be `dl-program-data` +
+      `dl-query`, returning JSON-encoded substitutions.

 ## Blockers

-_(none yet)_
+- **Saturation perf**: three rounds done.
+  - hash-set membership in `dl-add-fact!` (Phase 5b)
+  - indexed iteration in `dl-find-bindings` (Phase 5c)
+  - first-arg index per relation (Phase 5e) — when a body literal's
+    first arg walks to a non-variable, dl-match-positive looks up
+    by `(str arg)` instead of scanning the full relation.
+  chain-25 saturation drops from ~33s to ~18s real (10s user).
+  chain-50 still long (~120s+) due to dict-copy overhead in
+  unification subst threading. Future: per-rule "compiled" body
+  with pre-resolved var positions, slot-based subst representation
+  to avoid `assoc` per binding.

 ## Progress log

 _Newest first._

-_(awaiting phase 1)_
+- 2026-05-11 — `dl-set-strategy!` accepted arbitrary keyword values
+  silently. Typos like `:semi_naive` or `:semiNaive` were stored
+  uninspected; the saturator then used the default and the user
+  never learned their setting was a typo. Validator added: strategy
+  must be one of `:semi-naive`, `:naive`, `:magic`. 1 regression test;
+  276/276.
+
+- 2026-05-11 — Anonymous-variable renamer collided with user-written
+  `_anon<N>` symbols. The renamer started counter at 0 and produced
+  `_anon1, _anon2, ...` unconditionally; if the user wrote
+  `q(_anon1) :- p(_anon1, _).` the `_` got renamed to `_anon1` too,
+  collapsing the two positions of `p` to a single var and returning
+  the empty result instead of `{a, c}`. Fix: scan each rule (and
+  query) for the max `_anon<N>` and start the renamer past it. The
+  renamer constructor now takes a `start` arg; new helpers
+  `dl-max-anon-num` / `dl-max-anon-num-list` walk the rule tree.
+  1 regression test; 275/275.
+
+- 2026-05-11 — `dl-magic-query` could silently diverge from
+  `dl-query` when an aggregate's inner-goal relation was IDB. The
+  rewriter passes aggregate body lits through unchanged (no magic
+  propagation for them), so the inner relation was empty in the
+  magic db and the aggregate returned 0. Probe:
+  `dl-eval-magic "u(a). u(b). u(c). u(d). banned(b). banned(d).
+                  active(X) :- u(X), not(banned(X)).
+                  n(N) :- count(N, X, active(X))." "?- n(N)."`
+  returned `N=0` instead of `N=2`. Fix: `dl-magic-query` now
+  pre-saturates the source db before copying facts into the magic
+  db. This guarantees equivalence with `dl-query` for every
+  stratified program; the magic benefit comes from goal-directed
+  re-derivation of the query relation under the seed (which still
+  matters for large recursive joins). The existing test suite's
+  aggregate cases happened to dodge this because the inner goals
+  were all EDB. 1 new regression test; 274/274.
+
+- 2026-05-11 — Anonymous `_` in a negated literal was incorrectly
+  flagged by the safety check. The canonical idiom
+  `orphan(X) :- person(X), not(parent(X, _))` was rejected with
+  "negation refers to unbound variable(s) (\"_anon1\")" because the
+  parser renames each `_` to a fresh `_anon*` symbol and the negation
+  safety walk demanded all vars in the negated lit be bound by an
+  earlier positive body literal. Anonymous vars in negation are
+  existentially quantified — they shouldn't need outer binding.
+  Added `dl-non-anon-vars` filter; `dl-process-neg!` now strips
+  `_anon*` names from `needed` before the binding check. 2 new
+  regression tests; 273/273.
+
+- 2026-05-11 — Compound terms in fact-arg / rule-head positions were
+  silently stored as unreduced expressions. `p(+(1, 2)).` resulted
+  in a tuple `(p (+ 1 2))` (dl-ground? sees no free variables, so it
+  passed). `double(*(X, 2)) :- n(X).` saturated to `double((* 3 2))`
+  rather than `double(6)`. Datalog has no function symbols in arg
+  positions — `dl-add-fact!` and `dl-add-rule!` now reject compound
+  args via a new `dl-simple-term?` (number / string / symbol).
+  Compounds remain legal in body literals where they encode `is` /
+  arithmetic / aggregate sub-goals. 2 new regression tests; 271/271.
+
+- 2026-05-11 — Quoted atoms with uppercase-or-underscore-leading
+  names were misclassified as variables. `p('Hello World').` ran
+  through the tokenizer's `"atom"` branch and through the parser's
+  `string->symbol`, producing a symbol named "Hello World". dl-var?
+  checks the first character — "H" is uppercase, so the fact was
+  rejected as non-ground. Fix: tokenizer emits `"string"` for any
+  `'...'` quoted form, so quoted atoms become opaque string constants
+  (matching how Datalog idiomatically treats them — the alternative
+  was a per-symbol "quoted" marker which would have rippled through
+  unification and dl-var?). Updated the existing tokenize test and
+  added one for `'Hello'`; also added a parse-level regression. 269/269.
+
+- 2026-05-11 — Type-mixed comparisons were silently inconsistent:
+  `<(X, 5)` with `X` bound to a string returned `()` (no result, no
+  error), while `X` bound to a symbol raised "Expected number, got
+  symbol". Both should fail loudly. Added `dl-compare-typeok?` —
+  `<`, `<=`, `>`, `>=` now require both operands to share a primitive
+  type (both numbers or both strings) and raise otherwise. `!=` is
+  exempted since it's a polymorphic inequality test built on
+  `dl-tuple-equal?`. 2 new regression tests; 267/267.
+
+- 2026-05-11 — Body literal shape validation in
+  `dl-rule-check-safety`: a dict that isn't `{:neg ...}` (e.g. typo'd
+  `{:negs ...}`) used to silently fall through every dispatch clause,
+  contributing zero bound vars; the user would then see a confusing
+  "head var X unbound" error pointing at the head, not the malformed
+  body. Same for body lits that are bare numbers / strings / symbols.
+  Both shapes now raise a clear error naming the offending lit. 1 new
+  regression test; 265/265.
+
+- 2026-05-11 — Division by zero in `is` silently produced IEEE
+  infinity instead of raising. `is(R, /(X, 0))` returned `R = inf`,
+  which then flowed through comparisons and aggregations to produce
+  nonsense results. `dl-eval-arith` now raises with a clear
+  "division by zero in <expr>" message. 1 new test; 264/264.
+
+- 2026-05-11 — Aggregate variable validation: `count(N, Y, p(X))`
+  silently returned `N = 1` because `Y` was never bound in `p(X)` —
+  every match contributed the same unbound symbol, which dl-val-member?
+  deduped to a single entry. Similarly `sum(S, Y, p(X))` raised a
+  confusing "expected number" error from the underlying `+`. Added
+  a third validator in `dl-eval-aggregate`: the agg-var must appear
+  in the goal literal. Error names the variable and the goal and
+  explains the consequence. 1 new test; 263/263.
+
+- 2026-05-11 — `dl-retract!` was silently destroying EDB facts in
+  "mixed" relations (those with BOTH user-asserted facts AND a rule
+  defining the same head). The retract pass wiped every rule-head
+  relation wholesale and then re-saturated — but the saturator only
+  re-derives the IDB portion, so explicit EDB facts vanished even
+  for a no-op retract of a non-existent tuple. Probe:
+  `(let ((db (dl-program "p(a). p(b). p(X) :- q(X). q(c).")))
+     (dl-retract! db (quote (p z))) (dl-query db (quote (p X))))`
+  went from `{a,b,c}` to just `{c}`.
+  Fix: tracked `:edb-keys` provenance in the db. `dl-add-fact!` (public
+  API) marks the tuple as EDB; saturator calls new internal
+  `dl-add-derived!` which doesn't mark it. `dl-retract!` now strips
+  only the IDB-derived portion of rule-head relations and preserves
+  EDB-marked tuples through the re-saturate pass. 2 new regression
+  tests; 262/262.
+
+- 2026-05-11 — Eval-semantics bug-hunt: nested `not(not(P))` was
+  silently misinterpreted. Outer-level `not(...)` is parsed as
+  negation, but the inner `not(banned(X))` was parsed as a regular
+  positive literal naming a relation called `not`. Since no `not`
+  relation existed, the inner match was empty and the outer
+  negation succeeded vacuously, making `vip(X) :- u(X), not(not(banned(X))).`
+  equivalent to `vip(X) :- u(X).` (a silent double-negation = identity
+  fallacy). Fix in `dl-rule-check-safety`: both the positive-literal
+  branch and `dl-process-neg!` now flag any body literal whose head
+  is in `dl-reserved-rel-names`. Error message names the relation and
+  points the user at intermediate-relation stratified negation. 1 new
+  regression test; 260/260.
+
+- 2026-05-10 — Bug-hunt round on parser/safety surfaced 7 real
+  bugs, each fixed with regression tests:
+  - Reserved relation names (`not`, `count`, `<`, `is`, ...) were
+    accepted as rule/fact heads — would silently shadow built-ins.
+  - Negative number literals (`n(-1).`) failed to parse — users
+    had to express them as `(- 0 1)` or via `is`.
+  - Unterminated block comment `/* ...` silently consumed the
+    rest of the input. Now raises with the position.
+  - Same silent-consume bug in unterminated string / quoted-atom.
+  - Empty-list rule head and non-list rule body weren't validated;
+    they'd crash later in `rest`. dl-add-rule! now checks shape.
+  - dl-magic-query with non-list / non-dict goal crashed cryptically.
+  - Tokenizer silently swallowed unrecognised characters (`?`, `!`,
+    `#`, `@`, etc.) — typos produced confusing downstream errors.
+
+- 2026-05-08 — Phase 6 driver: `dl-magic-query db query-goal`.
+  Builds a fresh internal db from the caller's EDB + magic seed +
+  rewritten rules, saturates, queries, returns substitutions —
+  caller's db is untouched. Equivalent to `dl-query` for any
+  fully-stratifiable program; sole motivation is a perf alternative
+  on goal-shaped queries against large recursive relations.
+  2 new tests cover equivalence and non-mutation.
+
+- 2026-05-08 — Phase 6 magic-sets rewriter. `dl-magic-rewrite rules
+  query-rel adn args` returns `{:rules <rewritten> :seed <seed-fact>}`.
+  Worklist over `(rel, adn)` pairs starts from the query, gates each
+  original rule with a `magic_<rel>^<adn>(bound)` filter, and emits
+  propagation rules for each positive non-builtin body literal so
+  that magic spreads to body relations. EDB facts pass through.
+  3 new tests cover seed structure, equivalence on chain-3 by
+  ancestor-relation tuple count, and same-query-answers under
+  the rewritten program. The plumbing for a `dl-saturate-magic!`
+  driver and large-graph perf benchmarks is still future work.
+
+- 2026-05-08 — Phase 6 building blocks for the magic-sets
+  transformation: `dl-magic-rel-name`, `dl-magic-lit`,
+  `dl-bound-args`. The rewriter that generates magic seed and
+  propagation rules is still future work; with these primitives
+  in place it's a straightforward worklist algorithm. 4 new tests.
+
+- 2026-05-08 — Phase 6 adornments + SIPS in
+  `lib/datalog/magic.sx`. Inspection helpers — `dl-adorn-goal` and
+  `dl-adorn-lit` compute per-arg `b`/`f` patterns under a bound
+  set; `dl-rule-sips rule head-adornment` walks body literals
+  left-to-right propagating the bound set, recognising `is` and
+  aggregate result-vars as new binders. Lays groundwork for a
+  later magic-sets transformation. 10 new tests cover pure
+  adornment, SIPS over a chain rule, head-fully-bound rules,
+  comparisons, and `is`. Saturator does not yet consume these.
+
+- 2026-05-08 — Comprehensive integration test in api suite: a
+  single program exercising recursion (`reach` transitive closure)
+  + stratified negation (`safe X Y :- reach X Y, not banned Y`) +
+  aggregation (`reach_count` via count) + comparison (`>= N 2`)
+  composed end-to-end via `dl-eval source query-source`. Confirms
+  the full pipeline (parser → safety → stratifier → semi-naive +
+  aggregate post-pass → query) on a non-trivial program.
+
+- 2026-05-08 — Bug fix: aggregates work as top-level query goals.
+  `dl-match-lit` (the naive matcher used by `dl-find-bindings`) was
+  missing the `dl-aggregate?` dispatch — it was only present in
+  `dl-fbs-aux` (semi-naive). Symptom: `(dl-query db '(count N X (p X)))`
+  silently returned `()`. Also updated `dl-query-user-vars` to project
+  only the result var (first arg) of an aggregate goal — the
+  aggregated var and inner-goal vars are existentials and should not
+  appear in the projected substitution. 2 new aggregate tests cover
+  the regression.
+
+- 2026-05-08 — Convenience: `dl-eval source query-source`. Parses
+  both strings, builds a db, saturates, runs the query, returns
+  the substitution list. Single-call user-friendly entry. 2 new
+  api tests cover ancestor and multi-goal queries.
+
+- 2026-05-08 — Phase 6 stub: `dl-set-strategy! db strategy` and
+  `dl-get-strategy db` user-facing hooks. Default `:semi-naive`;
+  `:magic` is accepted but the actual transformation is deferred,
+  so saturation still uses semi-naive. Lets us tick the
+  "Optional pass — guarded behind dl-set-strategy!" Phase 6 box.
+  3 new eval tests.
+
+- 2026-05-08 — Demo: weighted-DAG shortest path. `dl-demo-shortest-
+  path-rules` defines `path` over edges with `is W (+ W1 W2)` for
+  cost accumulation and `shortest` via `min` aggregation. 3 demo
+  tests cover direct/multi-hop choice, multi-hop wins on cheaper
+  route, and unreachable-empty. Added `dl-summary db` inspection
+  helper returning `{<rel>: count}` (4 eval tests).
+
+- 2026-05-08 — Phase 5e perf: first-arg index per relation. db gains
+  `:facts-index {<rel>: {<first-arg-key>: tuples}}` mirroring the
+  existing `:facts-keys` membership index. `dl-add-fact!` populates
+  it; `dl-match-positive` walks the body literal's first arg under
+  the current subst — if it's bound to a non-var, look up by
+  `(str arg)` and iterate only the matching subset. chain-25
+  saturation 33s → 18s real (~2x). chain-50 still slow (~120s+)
+  but tractable; next bottleneck is subst dict copies during
+  unification. Differential test bumped to chain-12, semi-only
+  count to chain-25.
+
+- 2026-05-08 — Demo: tag co-occurrence. `(cotagged P T1 T2)` — post
+  has both T1 and T2 with T1 != T2 — and `(tag-pair-count T1 T2 N)`
+  counting posts per distinct tag pair. Demonstrates count
+  aggregation grouped by outer-context vars. 2 new demo tests.
+
+- 2026-05-08 — `dl-query` accepts a list of body literals for
+  conjunctive queries, in addition to a single positive literal.
+  `dl-query-coerce` dispatches based on the first element's shape:
+  positive lit (head is a symbol) or `:neg` dict → wrap as singleton;
+  list of lits → use as-is. `dl-query-user-vars` collects the union
+  of vars across all goals (deduped, `_` filtered) for projection.
+  2 new api tests: multi-goal AND, and conjunction with comparison.
+
+- 2026-05-08 — Bug fix: `dl-check-stratifiable` now rejects recursion
+  through aggregation (e.g., `q(N) :- count(N, X, q(X))`). The
+  stratifier was already adding negation-like edges for aggregates,
+  but the cycle scan only looked at explicit `:neg` literals. Added
+  the matching aggregate branch to the body iteration. Also adds
+  doc-only `lib/datalog/datalog.sx` with the public-API surface
+  (since `load` is an epoch command and can't recurse from within an
+  `.sx` file). 3 new aggregate tests cover recursion-rejection,
+  negation-and-aggregation coexistence, and min-over-empty-derived.
+
+- 2026-05-08 — Phase 10 demo + canonical query. Added the "cooking
+  posts by people I follow (transitively)" example from the plan:
+  `dl-demo-cooking-rules` defines `reach` over the follow graph
+  (recursive transitive closure) and `cooking-post-by-network` that
+  joins reach with `authored` and `(tagged P cooking)`. 3 demo
+  tests cover transitive network, direct-only follow, and
+  empty-network cases.
+
+- 2026-05-08 — Phase 8 extension: `findall L V Goal` aggregate. Bind
+  L to the list of distinct V values for which Goal holds (or the
+  empty list when no matches). Implemented as a one-line case in
+  `dl-do-aggregate`. 3 new tests: EDB, derived relation, empty.
+  Useful for "give me all the X such that …" queries without
+  scalar reduction.
+
+- 2026-05-08 — Phase 5d semantic fix: anonymous `_` variables are
+  renamed per occurrence at `dl-add-rule!` and `dl-query` time so
+  `(p X _) (p _ Y)` no longer unifies the two `_`s. New helpers
+  `dl-rename-anon-term`, `dl-rename-anon-lit`, `dl-make-anon-renamer`,
+  `dl-rename-anon-rule` in db.sx; eval.sx's dl-query renames the goal
+  before search and projects only user-named vars (`_` is filtered
+  out of the projection list). The "underscore in head" test now
+  correctly rejects `(p X _) :- q(X).` — after renaming, the head's
+  fresh anon var has no body binder. Two new eval tests verify
+  rule-level and goal-level independence. 155/155 expected.
+
+- 2026-05-08 — Phase 5c perf: indexed `dl-find-bindings`. Replaced
+  the recursive `(rest lits)` walk with `dl-fb-aux lits db subst i n`
+  using `nth lits i`. Eliminates O(N²) list-copy per body of length
+  N. chain-15 saturation 25s → 16s; chain-25 finishes in 33s real
+  (vs. timeout previously). Bumped semi_naive tests: differential
+  on chain-10, semi-only count on chain-15 (was chain-5/chain-5).
+  153/153.
+
+- 2026-05-08 — Phase 10 syntactic demo. New `lib/datalog/demo.sx`
+  with three programs over rose-ash-shaped data: federation
+  (`mutual`, `reachable`, `foaf`), content recommendation
+  (`post-likes` via count aggregation, `popular`, `interesting`),
+  and role-based permissions (`in-group` over transitive subgroups,
+  `can-access`). 10 demo tests pass against synthetic EDB tuples.
+  Postgres loader and `/internal/datalog` HTTP endpoint remain
+  out of scope for this loop (they need service-tree edits beyond
+  `lib/datalog/**`). Conformance now 153/153.
+
+- 2026-05-08 — Phase 5b perf: hash-set membership in `dl-add-fact!`.
+  db gains a parallel `:facts-keys {<rel>: {<tuple-string>: true}}`
+  index alongside `:facts`. `dl-tuple-key` derives a stable string
+  key via `(str lit)` — `(p 30)` and `(p 30.0)` collide correctly
+  because SX prints them identically. Insertion is O(1) instead of
+  O(n). chain-7 saturation drops from ~12s to ~6s; chain-15 from
+  ~50s to ~25s under shared CPU. Larger chains are still slow due
+  to body-join overhead in dl-find-bindings (Blocker updated).
+  `dl-retract!` updated to keep both indices consistent. 143/143.
+
+- 2026-05-08 — Phase 9 done. New `lib/datalog/api.sx` exposes a
+  parser-free embedding: `dl-program-data facts rules` accepts SX
+  data lists, with rules in either dict form or list form using
+  `<-` as the rule arrow (since SX parses `:-` as a keyword).
+  `dl-rule head body` constructs the dict. `dl-assert! db lit` adds
+  a fact and re-saturates; `dl-retract! db lit` drops the fact from
+  EDB, wipes all rule-headed (IDB) relations, and re-saturates from
+  scratch — the simplest correct semantics until provenance tracking
+  arrives in a later phase. 9 API tests; conformance now 143/143.
+
+- 2026-05-08 — Phase 8 done. New `lib/datalog/aggregates.sx` (~110
+  LOC): count / sum / min / max. Each is a body literal of shape
+  `(op R V Goal)` — `dl-eval-aggregate` runs `dl-find-bindings` on
+  the goal under the outer subst (so outer vars in the goal get
+  substituted, giving group-by-style aggregation), collects the
+  distinct values of `V`, and binds `R`. Empty input: count/sum
+  return 0; min/max produce no binding (rule fails). Stratifier
+  extended via `dl-aggregate-dep-edge` so the aggregate's goal
+  relation is fully derived before the aggregate fires. Safety check
+  treats goal-internal vars as existentials (no outer binding
+  required); only the result var becomes bound. Conformance now
+  134 / 134.
+
+- 2026-05-08 — Phase 7 done (Phase 6 magic sets deferred — opt-in,
+  semi-naive default suffices for current test suite). New
+  `lib/datalog/strata.sx` (~290 LOC): dep graph build, Floyd-Warshall
+  reachability, SCC-via-mutual-reachability for non-stratifiability
+  detection, iterative stratum computation, rule grouping by head
+  stratum. eval.sx split: `dl-saturate-rules!` is the per-rule-set
+  semi-naive worker, `dl-saturate!` is now the stratified driver
+  (errors out on non-stratifiable programs). `dl-match-negation` in
+  eval.sx: succeeds iff inner positive match is empty. Stratum-keyed
+  dicts use `(str s)` since SX dicts only accept string/keyword keys.
+  10 negation tests cover EDB/IDB negation, multi-level strata,
+  non-stratifiability rejection, and a negation safety violation.
+
+- 2026-05-08 — Phase 5 done. `lib/datalog/eval.sx` rewritten to
+  semi-naive default. `dl-saturate!` tracks a per-relation delta and
+  on each iteration walks every positive body position substituting
+  delta for that one literal — joining the rest against the full DB
+  snapshot. `dl-saturate-naive!` retained as the reference. Rules
+  with no positive body literal (e.g. `(p X) :- (= X 5).`) fall back
+  to a naive one-shot via `dl-collect-rule-candidates`. 8 tests
+  differentially compare the two saturators using per-relation tuple
+  counts (cheap). Chain-5 differential exercises multi-iteration
+  recursive saturation. Larger chains made conformance.sh time out
+  due to O(n) `dl-tuple-member?` × CPU sharing with other loop
+  agents — added a Blocker to swap to a hash-set for membership.
+  Also tightened `dl-tuple-member?` to use indexed iteration instead
+  of recursive `rest` (was creating a fresh list per step).
+
+- 2026-05-07 — Phase 4 done. `lib/datalog/builtins.sx` (~280 LOC) adds
+  `(< X Y)`, `(<= X Y)`, `(> X Y)`, `(>= X Y)`, `(= X Y)`, `(!= X Y)`,
+  and `(is X expr)` with `+ - * /`. `dl-eval-builtin` dispatches;
+  `dl-eval-arith` recursively evaluates nested compounds. Safety
+  check is now order-aware — it walks body literals left-to-right
+  tracking the bound set, requires comparison/`is` inputs to be
+  already bound, and special-cases `=` (binds the var-side; both
+  sides must include at least one bound to bind the other). Phase 3's
+  simple safety check stays in db.sx as a forward-reference fallback;
+  builtins.sx redefines `dl-rule-check-safety` to the comprehensive
+  version. eval.sx's `dl-match-lit` now dispatches built-ins through
+  `dl-eval-builtin`. 19 builtins tests; conformance 106 / 106.
+
+- 2026-05-07 — Phase 3 done. `lib/datalog/db.sx` (~250 LOC) holds facts
+  indexed by relation name plus the rules list, with `dl-add-fact!` /
+  `dl-add-rule!` (rejects non-ground facts and unsafe rules);
+  `lib/datalog/eval.sx` (~150 LOC) implements the naive bottom-up
+  fixpoint via `dl-find-bindings`/`dl-match-positive`/`dl-saturate!`
+  and `dl-query` (deduped projected substitutions). Safety analysis
+  rejects unsafe head vars at load time. Negation and arithmetic
+  built-ins raise clean errors (lifted in later phases). 15 eval
+  tests cover transitive closure, sibling, same-generation, cyclic
+  graph reach, and six safety violations. Conformance 87 / 87.
+
+- 2026-05-07 — Phase 2 done. `lib/datalog/unify.sx` (~140 LOC):
+  `dl-var?` (case + underscore), `dl-walk`, `dl-bind`, `dl-unify` (returns
+  extended dict subst or `nil`), `dl-apply-subst`, `dl-ground?`, `dl-vars-of`.
+  Substitutions are immutable dicts; `assoc` builds extended copies. 28
+  unify tests; conformance now 72 / 72.
+
+- 2026-05-07 — Phase 1 done. `lib/datalog/tokenizer.sx` (~190 LOC) emits
+  `{:type :value :pos}` tokens; `lib/datalog/parser.sx` (~150 LOC) produces
+  `{:head … :body …}` / `{:query …}` clauses, with nested compounds
+  permitted for arithmetic and `not(...)` desugared to `{:neg …}`. 44 / 44
+  via `bash lib/datalog/conformance.sh` (26 tokenize + 18 parse). Local
+  helpers namespace-prefixed (`dl-emit!`, `dl-peek`) after a host-primitive
+  shadow clash. Test harness uses a custom `dl-deep-equal?` that handles
+  out-of-order dict keys and number repr (`equal?` fails on dict key order
+  and on `30` vs `30.0`).