Files
rose-ash/plans/datalog-on-sx.md
giles a080ce656c
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 27s
datalog: magic-sets rewriter (Phase 6, 202/202)
dl-magic-rewrite rules query-rel adn args returns:
  {:rules <rewritten-rules> :seed <magic-seed-fact>}

Worklist over (rel, adn) pairs starts from the query and stops
when no new pairs appear. For each rule with head matching a
worklist pair:
  - Adorned rule: head :- magic_<rel>^<adn>(bound), body...
  - Propagation rules: for each positive non-builtin body lit
    at position i:
      magic_<lit-rel>^<lit-adn>(bound-of-lit) :-
        magic_<rel>^<adn>(bound-of-head),
        body[0..i-1]
  - Add (lit-rel, lit-adn) to the worklist.

Built-ins, negation, and aggregates pass through without
generating propagation rules. EDB facts are unchanged.

3 new tests cover seed structure, equivalence on chain-3 (full
closure, 6 ancestor tuples — magic helps only when the EDB has
nodes outside the seed's transitive cone), and same-query-answers
under the rewritten program. Total 202/202.

Wiring up a `dl-saturate-magic!` driver and large-graph perf
benchmarks is left for a future iteration.
2026-05-08 09:58:36 +00:00

31 KiB
Raw Blame History

Datalog-on-SX: Datalog on the CEK/VM

Datalog is a declarative query language: a restricted subset of Prolog with no function symbols, only relations. Programs are sets of facts and rules; queries ask what follows. Evaluation is bottom-up (fixpoint iteration) rather than Prolog's top-down DFS — which means no infinite loops, guaranteed termination, and efficient incremental updates.

The unique angle: Datalog is a natural companion to the Prolog implementation already in progress (lib/prolog/). The parser and term representation can share infrastructure; the evaluator is an entirely different fixpoint engine rather than a DFS solver.

End-state goal: full core Datalog (facts, rules, stratified negation, aggregation, recursion) with a clean SX query API, and a demonstration of Datalog as a query engine for rose-ash data (e.g. federation graph, content relationships).

Ground rules

  • Scope: only touch lib/datalog/** and plans/datalog-on-sx.md. Do not edit spec/, hosts/, shared/, lib/prolog/**, or other lib/<lang>/.
  • Shared-file issues go under "Blockers" below with a minimal repro; do not fix here.
  • SX files: use sx-tree MCP tools only.
  • Architecture: Datalog source → term AST → fixpoint evaluator. No transpiler to SX AST — the evaluator is written in SX and works directly on term structures.
  • Reference: Ramakrishnan & Ullman "A Survey of Deductive Database Systems"; Dalmau "Datalog and Constraint Satisfaction".
  • Commits: one feature per commit. Keep ## Progress log updated and tick boxes.

Architecture sketch

Datalog source text
    │
    ▼
lib/datalog/tokenizer.sx   — atoms, variables, numbers, strings, punct (?- :- , . ( ) [ ])
    │
    ▼
lib/datalog/parser.sx      — facts: atom(args). rules: head :- body. queries: ?- goal.
    │                        No function symbols (only constants and variables in args).
    ▼
lib/datalog/db.sx          — extensional DB (EDB): ground facts; IDB: derived relations;
    │                        clause index by relation name/arity
    ▼
lib/datalog/eval.sx        — bottom-up fixpoint: semi-naive evaluation with delta sets;
    │                        stratification for negation; incremental update API
    ▼
lib/datalog/query.sx       — query API: (datalog-query db goal) → list of substitutions;
                             SX embedding: define facts/rules as SX data directly

Key differences from Prolog:

  • No function symbols — args are atoms, numbers, strings, or variables only. No f(a,b).
  • No cuts — no procedural control.
  • Bottom-up — derive all consequences of all rules before answering; no search tree.
  • Termination guaranteed — no infinite derivation chains (no function symbols → finite Herbrand base).
  • Stratified negationnot(P) legal iff P does not recursively depend on its own negation.
  • Aggregationcount, sum, min, max over derived tuples (Datalog+).

Roadmap

Phase 1 — tokenizer + parser

  • Tokenizer: atoms (lowercase/quoted), variables (uppercase/_), numbers, strings, punct (( ), ,, .), operators (:-, ?-, <=, >=, !=, <, >, =, +, -, *, /), comments (%, /* */) Note: no function symbol syntax (no nested f(...) in arg position) — but the parser permits nested compounds for arithmetic; safety analysis (Phase 3) rejects non-arithmetic nesting.
  • Parser: - Facts: parent(tom, bob).{:head (parent tom bob) :body ()} - Rules: ancestor(X,Z) :- parent(X,Y), ancestor(Y,Z).{:head (ancestor X Z) :body ((parent X Y) (ancestor Y Z))} - Queries: ?- ancestor(tom, X).{:query ((ancestor tom X))} (:query value is always a list of literals; ?- p, q.{:query ((p) (q))}) - Negation: not(parent(X,Y)) in body position → {:neg (parent X Y)}
  • Tests in lib/datalog/tests/parse.sx (18) and lib/datalog/tests/tokenize.sx (26). Conformance harness: bash lib/datalog/conformance.sh → 44 / 44 passing.

Phase 2 — unification + substitution

  • Ported (not shared) from lib/prolog/ — term walk, no occurs check.
  • dl-unify t1 t2 subst → extended subst dict, or nil on failure.
  • dl-walk, dl-bind, dl-apply-subst, dl-ground?, dl-vars-of.
  • Substitutions are immutable dicts keyed by variable name (string). Lists/tuples unify element-wise (used for arithmetic compounds too).
  • Tests in lib/datalog/tests/unify.sx (28). 72 / 72 conformance.

Phase 3 — extensional DB + naive evaluation + safety analysis

  • EDB+IDB combined: {:facts {<rel-name-string> -> (literal ...)}} — relations indexed by name; tuples stored as full literals so they unify directly. Dedup on insert via dl-tuple-equal?.
  • dl-add-fact! db lit (rejects non-ground) and dl-add-rule! db rule (rejects unsafe). dl-program source parses + loads in one step.
  • Naive evaluation dl-saturate! db: iterate rules until no new tuples. dl-find-bindings recursively joins body literals; dl-match-positive unifies a literal against every tuple in the relation.
  • dl-query db goal → list of substitutions over goal's vars, deduplicated. dl-relation db name for derived tuples.
  • Safety analysis at dl-add-rule! time: every head variable except _ must appear in some positive body literal. Built-ins and negated literals do not satisfy safety. Helpers dl-positive-body-vars, dl-rule-unsafe-head-vars exposed for later phases.
  • Negation and arithmetic built-ins error cleanly at saturate time (Phase 4 / Phase 7 will swap in real semantics).
  • Tests in lib/datalog/tests/eval.sx (15): transitive closure, sibling, same-generation, grandparent, cyclic graph reach, six safety cases. 87 / 87 conformance.

Phase 4 — built-in predicates + body arithmetic

Almost every real query needs <, =, simple arithmetic, and string comparisons in body position. These are not EDB lookups — they're constraints that filter bindings.

  • Recognise built-in predicates in body: (< X Y), (<= X Y), (> X Y), (>= X Y), (= X Y), (!= X Y) and arithmetic forms (is Z (+ X Y)), (is Z (- X Y)), (is Z (* X Y)), (is Z (/ X Y)). Live in lib/datalog/builtins.sx.
  • dl-eval-builtin dispatches; dl-eval-arith recursively evaluates (+ a b) etc. with full nesting. = unifies; != rejects equal ground terms.
  • Order-aware safety analysis (dl-rule-check-safety): walks body left-to-right tracking which vars are bound. is's RHS vars must be already bound; LHS becomes bound. Comparisons require both sides bound. = is special-cased — at least one side bound binds the other. Negation vars must be bound (will be enforced fully in Phase 7).
  • Wired through SX numeric primitives — no separate number tower.
  • Tests in lib/datalog/tests/builtins.sx (19): range filters, arithmetic derivations, equality binding, eight safety violations and three safe-shape tests. Conformance 106 / 106.

Phase 5 — semi-naive evaluation (performance)

  • Delta sets {rel-name -> tuples} track newly derived tuples per iter. dl-snapshot-facts builds the initial delta from the EDB.
  • Semi-naive rule: for each rule, walk every positive body literal position; substitute that one with the per-relation delta and join the rest against the previous-iteration DB (dl-find-bindings-semi). Candidates are collected before mutating the DB so the "full" sides see a consistent snapshot.
  • dl-collect-rule-candidates falls back to a naive single pass when a rule has no positive body literal (e.g. (p X) :- (= X 5).).
  • dl-saturate! is now semi-naive by default; dl-saturate-naive! kept for differential testing and a reference implementation.
  • Tests in lib/datalog/tests/semi_naive.sx (8) — every recursive program from earlier suites is run under both saturators with per-relation tuple counts compared (cheap, robust under bundled conformance session). A chain-5 differential exercises multiple semi-naive iterations against the recursive ancestor rule. Larger chains hit prohibitive wall-clock under conformance CPU contention with other agents — a future Blocker tracks switching dl-tuple-member? from O(n²) list scan to a hash-set per relation.

Phase 6 — magic sets (goal-directed bottom-up, opt-in)

Naive bottom-up derives all consequences before answering. Magic sets rewrite the program so the fixpoint only derives tuples relevant to the goal — a major perf win for "what's reachable from node X" queries on large graphs.

  • Adornments: dl-adorn-goal goal and dl-adorn-lit lit bound in lib/datalog/magic.sx. Per-arg b/f based on whether the arg is a constant or a variable already in the bound set.
  • Magic transformation: dl-magic-rewrite rules query-rel adn args generates {:rules <rewritten-rules> :seed <magic-seed>}. Each original rule is gated with a magic_<rel>^<adn>(bound) filter, and propagation rules are emitted for each positive non-builtin body literal. Worklist over (rel, adn) pairs starts from the query and stops when no new pairs appear. EDB facts pass through unchanged.
  • Sideways information passing strategy (SIPS): left-to-right dl-rule-sips rule head-adornment walks body literals tracking the bound set, returning ({:lit :adornment} ...). Recognises is/aggregate result-vars as new binders; comparisons and negation pass through with computed adornments. (Pluggable strategies are future work.)
  • dl-set-strategy! db strategy hook + dl-get-strategy db. Default :semi-naive. :magic accepted but the transformation itself is deferred — saturator currently falls back to semi-naive. Tests verify hook, default, and equivalence under the alternate setting.
  • Equivalence test: rewritten ancestor program over the same EDB derives the same number of ancestor tuples and returns the same query answers as the unrewritten program (chain-3 case).
  • Perf test: 10k-node reachability with magic vs semi-naive. Pending — would need a dl-saturate-magic! driver that builds a temporary db from rewrite output. The rewriter itself is in place; benchmarking against semi-naive on large graphs is left to a future iteration.

Phase 7 — stratified negation

  • Dependency graph: dl-build-dep-graph db returns {head -> ({:rel :neg} ...)}. Built-ins drop out (they're not relations).
  • Reachability via Floyd-Warshall in dl-build-reach; cycles detected by reach[A][B] && reach[B][A]. Programs are non-stratifiable iff any negative dependency falls inside an SCC. dl-check-stratifiable returns nil on success or a clear message.
  • dl-compute-strata propagates stratum numbers iteratively: stratum(R) = max over deps of (stratum(dep) + (1 if negated else 0)).
  • Saturator refactor: dl-saturate-rules! db rules is the semi- naive worker; dl-saturate! db rejects non-stratifiable programs, groups rules by head's stratum, and runs the worker on each stratum in increasing order.
  • not(P) in body: dl-match-negation walks the inner literal under the current subst and uses dl-match-positive — succeeds iff zero matches. Order-aware safety in dl-rule-check-safety (already present from Phase 4) requires negation vars to be bound by an earlier positive literal.
  • Tests in lib/datalog/tests/negation.sx (10): EDB and IDB negation, two-step strata, multi-level strata, with-arithmetic, empty-result and always-fail cases, non-stratifiability rejection, and a negation safety violation.

Phase 8 — aggregation (Datalog+)

  • (count R V Goal), (sum R V Goal), (min R V Goal), (max R V Goal), (findall L V Goal) — first arg is the result variable, second is the aggregated variable, third is the goal literal. findall returns the distinct-value list itself; the others reduce. Live in lib/datalog/aggregates.sx.
  • dl-eval-aggregate: runs dl-find-bindings on the goal under the current subst (which provides outer-context bindings), collects distinct values of the aggregated var, applies the aggregate. count/sum produce 0 when no matches; min/max produce no binding (rule fails) when empty.
  • Group-by emerges naturally: outer-context vars in the goal are substituted from the current subst, so popular(P) :- post(P), count(N, U, liked(U, P)), >=(N, 3). correctly counts per-post.
  • Stratification: dl-aggregate-dep-edge returns a negation-like edge so the aggregate's goal relation is fully derived before the aggregate fires. Non-monotonicity respected.
  • Safety: aggregate body lit binds the result var; goal-internal vars are existentially quantified and don't need outer binding.
  • Tests in lib/datalog/tests/aggregates.sx (10): count siblings, sum prices, min/max scores, count over derived relation, empty-input cases for each operator, popularity threshold with group-by, distinct-counted-once.

Phase 9 — SX embedding API

  • (dl-program-data facts rules) builds a db from SX data — facts is a list of literals, rules is a list of either dicts {:head … :body …} or lists (<head…> <- <body…>). Variables are SX symbols whose first char is uppercase or _, matching the parser's convention. (dl-program-data '((parent tom bob) (parent bob ann)) '((ancestor X Y <- (parent X Y)) (ancestor X Z <- (parent X Y) (ancestor Y Z))))
  • (dl-rule head body) constructor for the dict form.
  • (dl-query db '(ancestor tom X)) already worked — same query API consumes the SX-data goal. Now also accepts a list of body literals for conjunctive queries: (dl-query db '((p X) (q X))), (dl-query db (list '(n X) '(> X 2))). Auto-dispatched via dl-query-coerce on first-element shape.
  • (dl-assert! db '(parent ann pat)) → adds the fact and re-saturates.
  • (dl-retract! db '(parent bob ann)) → drops matching tuples from the EDB list, wipes every relation that has a rule (those are IDB), and re-saturates from the surviving EDB.
  • Tests in lib/datalog/tests/api.sx (9): closure via data API, dict-rule form, dl-rule constructor, dl-assert! incremental, dl-retract! removes derived, cyclic-graph reach via data, assert into empty db, fact-style rule (no arrow), coerce dict.
  • Integration demo: federation graph query — (ancestor actor1 actor2) over rose-ash ActivityPub follow relationships (Phase 10).

Phase 10 — Datalog as a query language for rose-ash

  • Schema sketches in lib/datalog/demo.sx: - Federation: (follows A B)(mutual A B), (reachable A B), (foaf A C) (friend-of-a-friend, distinct). - Content: (authored A P), (liked U P), (tagged P T)(post-likes P N) via aggregation, (popular P) for likes ≥ 3, (interesting Me P) joining follows + authored + popular. - Permissions: (member A G), (subgroup C P), (allowed G R)(in-group A G) over transitive subgroups, (can-access A R). - Cooking-posts (the canonical example): (reach Me Them) over the follow graph, then (cooking-post-by-network Me P) joining reach + authored + (tagged P cooking).
  • Loader dl-load-from-db! — out of scope for this loop (would need to edit shared/services/ outside lib/datalog/). Programs in demo.sx already document the EDB shape expected from such a loader. dl-program-data consumes the same shape.
  • Query examples covered by lib/datalog/tests/demo.sx (10): mutuals, transitive reach, FOAF, popular posts, interesting feed, post likes count, direct/subgroup/transitive group access, no access without grant.
  • Service endpoint POST /internal/datalog — out of scope as above. Once exposed, server-side handler would be dl-program-data + dl-query, returning JSON-encoded substitutions.

Blockers

  • Saturation perf: three rounds done.
    • hash-set membership in dl-add-fact! (Phase 5b)
    • indexed iteration in dl-find-bindings (Phase 5c)
    • first-arg index per relation (Phase 5e) — when a body literal's first arg walks to a non-variable, dl-match-positive looks up by (str arg) instead of scanning the full relation. chain-25 saturation drops from ~33s to ~18s real (10s user). chain-50 still long (~120s+) due to dict-copy overhead in unification subst threading. Future: per-rule "compiled" body with pre-resolved var positions, slot-based subst representation to avoid assoc per binding.

Progress log

Newest first.

  • 2026-05-08 — Phase 6 magic-sets rewriter. dl-magic-rewrite rules query-rel adn args returns {:rules <rewritten> :seed <seed-fact>}. Worklist over (rel, adn) pairs starts from the query, gates each original rule with a magic_<rel>^<adn>(bound) filter, and emits propagation rules for each positive non-builtin body literal so that magic spreads to body relations. EDB facts pass through. 3 new tests cover seed structure, equivalence on chain-3 by ancestor-relation tuple count, and same-query-answers under the rewritten program. The plumbing for a dl-saturate-magic! driver and large-graph perf benchmarks is still future work.

  • 2026-05-08 — Phase 6 building blocks for the magic-sets transformation: dl-magic-rel-name, dl-magic-lit, dl-bound-args. The rewriter that generates magic seed and propagation rules is still future work; with these primitives in place it's a straightforward worklist algorithm. 4 new tests.

  • 2026-05-08 — Phase 6 adornments + SIPS in lib/datalog/magic.sx. Inspection helpers — dl-adorn-goal and dl-adorn-lit compute per-arg b/f patterns under a bound set; dl-rule-sips rule head-adornment walks body literals left-to-right propagating the bound set, recognising is and aggregate result-vars as new binders. Lays groundwork for a later magic-sets transformation. 10 new tests cover pure adornment, SIPS over a chain rule, head-fully-bound rules, comparisons, and is. Saturator does not yet consume these.

  • 2026-05-08 — Comprehensive integration test in api suite: a single program exercising recursion (reach transitive closure)

    • stratified negation (safe X Y :- reach X Y, not banned Y) + aggregation (reach_count via count) + comparison (>= N 2) composed end-to-end via dl-eval source query-source. Confirms the full pipeline (parser → safety → stratifier → semi-naive + aggregate post-pass → query) on a non-trivial program.
  • 2026-05-08 — Bug fix: aggregates work as top-level query goals. dl-match-lit (the naive matcher used by dl-find-bindings) was missing the dl-aggregate? dispatch — it was only present in dl-fbs-aux (semi-naive). Symptom: (dl-query db '(count N X (p X))) silently returned (). Also updated dl-query-user-vars to project only the result var (first arg) of an aggregate goal — the aggregated var and inner-goal vars are existentials and should not appear in the projected substitution. 2 new aggregate tests cover the regression.

  • 2026-05-08 — Convenience: dl-eval source query-source. Parses both strings, builds a db, saturates, runs the query, returns the substitution list. Single-call user-friendly entry. 2 new api tests cover ancestor and multi-goal queries.

  • 2026-05-08 — Phase 6 stub: dl-set-strategy! db strategy and dl-get-strategy db user-facing hooks. Default :semi-naive; :magic is accepted but the actual transformation is deferred, so saturation still uses semi-naive. Lets us tick the "Optional pass — guarded behind dl-set-strategy!" Phase 6 box. 3 new eval tests.

  • 2026-05-08 — Demo: weighted-DAG shortest path. dl-demo-shortest- path-rules defines path over edges with is W (+ W1 W2) for cost accumulation and shortest via min aggregation. 3 demo tests cover direct/multi-hop choice, multi-hop wins on cheaper route, and unreachable-empty. Added dl-summary db inspection helper returning {<rel>: count} (4 eval tests).

  • 2026-05-08 — Phase 5e perf: first-arg index per relation. db gains :facts-index {<rel>: {<first-arg-key>: tuples}} mirroring the existing :facts-keys membership index. dl-add-fact! populates it; dl-match-positive walks the body literal's first arg under the current subst — if it's bound to a non-var, look up by (str arg) and iterate only the matching subset. chain-25 saturation 33s → 18s real (~2x). chain-50 still slow (~120s+) but tractable; next bottleneck is subst dict copies during unification. Differential test bumped to chain-12, semi-only count to chain-25.

  • 2026-05-08 — Demo: tag co-occurrence. (cotagged P T1 T2) — post has both T1 and T2 with T1 != T2 — and (tag-pair-count T1 T2 N) counting posts per distinct tag pair. Demonstrates count aggregation grouped by outer-context vars. 2 new demo tests.

  • 2026-05-08 — dl-query accepts a list of body literals for conjunctive queries, in addition to a single positive literal. dl-query-coerce dispatches based on the first element's shape: positive lit (head is a symbol) or :neg dict → wrap as singleton; list of lits → use as-is. dl-query-user-vars collects the union of vars across all goals (deduped, _ filtered) for projection. 2 new api tests: multi-goal AND, and conjunction with comparison.

  • 2026-05-08 — Bug fix: dl-check-stratifiable now rejects recursion through aggregation (e.g., q(N) :- count(N, X, q(X))). The stratifier was already adding negation-like edges for aggregates, but the cycle scan only looked at explicit :neg literals. Added the matching aggregate branch to the body iteration. Also adds doc-only lib/datalog/datalog.sx with the public-API surface (since load is an epoch command and can't recurse from within an .sx file). 3 new aggregate tests cover recursion-rejection, negation-and-aggregation coexistence, and min-over-empty-derived.

  • 2026-05-08 — Phase 10 demo + canonical query. Added the "cooking posts by people I follow (transitively)" example from the plan: dl-demo-cooking-rules defines reach over the follow graph (recursive transitive closure) and cooking-post-by-network that joins reach with authored and (tagged P cooking). 3 demo tests cover transitive network, direct-only follow, and empty-network cases.

  • 2026-05-08 — Phase 8 extension: findall L V Goal aggregate. Bind L to the list of distinct V values for which Goal holds (or the empty list when no matches). Implemented as a one-line case in dl-do-aggregate. 3 new tests: EDB, derived relation, empty. Useful for "give me all the X such that …" queries without scalar reduction.

  • 2026-05-08 — Phase 5d semantic fix: anonymous _ variables are renamed per occurrence at dl-add-rule! and dl-query time so (p X _) (p _ Y) no longer unifies the two _s. New helpers dl-rename-anon-term, dl-rename-anon-lit, dl-make-anon-renamer, dl-rename-anon-rule in db.sx; eval.sx's dl-query renames the goal before search and projects only user-named vars (_ is filtered out of the projection list). The "underscore in head" test now correctly rejects (p X _) :- q(X). — after renaming, the head's fresh anon var has no body binder. Two new eval tests verify rule-level and goal-level independence. 155/155 expected.

  • 2026-05-08 — Phase 5c perf: indexed dl-find-bindings. Replaced the recursive (rest lits) walk with dl-fb-aux lits db subst i n using nth lits i. Eliminates O(N²) list-copy per body of length N. chain-15 saturation 25s → 16s; chain-25 finishes in 33s real (vs. timeout previously). Bumped semi_naive tests: differential on chain-10, semi-only count on chain-15 (was chain-5/chain-5). 153/153.

  • 2026-05-08 — Phase 10 syntactic demo. New lib/datalog/demo.sx with three programs over rose-ash-shaped data: federation (mutual, reachable, foaf), content recommendation (post-likes via count aggregation, popular, interesting), and role-based permissions (in-group over transitive subgroups, can-access). 10 demo tests pass against synthetic EDB tuples. Postgres loader and /internal/datalog HTTP endpoint remain out of scope for this loop (they need service-tree edits beyond lib/datalog/**). Conformance now 153/153.

  • 2026-05-08 — Phase 5b perf: hash-set membership in dl-add-fact!. db gains a parallel :facts-keys {<rel>: {<tuple-string>: true}} index alongside :facts. dl-tuple-key derives a stable string key via (str lit)(p 30) and (p 30.0) collide correctly because SX prints them identically. Insertion is O(1) instead of O(n). chain-7 saturation drops from ~12s to ~6s; chain-15 from ~50s to ~25s under shared CPU. Larger chains are still slow due to body-join overhead in dl-find-bindings (Blocker updated). dl-retract! updated to keep both indices consistent. 143/143.

  • 2026-05-08 — Phase 9 done. New lib/datalog/api.sx exposes a parser-free embedding: dl-program-data facts rules accepts SX data lists, with rules in either dict form or list form using <- as the rule arrow (since SX parses :- as a keyword). dl-rule head body constructs the dict. dl-assert! db lit adds a fact and re-saturates; dl-retract! db lit drops the fact from EDB, wipes all rule-headed (IDB) relations, and re-saturates from scratch — the simplest correct semantics until provenance tracking arrives in a later phase. 9 API tests; conformance now 143/143.

  • 2026-05-08 — Phase 8 done. New lib/datalog/aggregates.sx (~110 LOC): count / sum / min / max. Each is a body literal of shape (op R V Goal)dl-eval-aggregate runs dl-find-bindings on the goal under the outer subst (so outer vars in the goal get substituted, giving group-by-style aggregation), collects the distinct values of V, and binds R. Empty input: count/sum return 0; min/max produce no binding (rule fails). Stratifier extended via dl-aggregate-dep-edge so the aggregate's goal relation is fully derived before the aggregate fires. Safety check treats goal-internal vars as existentials (no outer binding required); only the result var becomes bound. Conformance now 134 / 134.

  • 2026-05-08 — Phase 7 done (Phase 6 magic sets deferred — opt-in, semi-naive default suffices for current test suite). New lib/datalog/strata.sx (~290 LOC): dep graph build, Floyd-Warshall reachability, SCC-via-mutual-reachability for non-stratifiability detection, iterative stratum computation, rule grouping by head stratum. eval.sx split: dl-saturate-rules! is the per-rule-set semi-naive worker, dl-saturate! is now the stratified driver (errors out on non-stratifiable programs). dl-match-negation in eval.sx: succeeds iff inner positive match is empty. Stratum-keyed dicts use (str s) since SX dicts only accept string/keyword keys. 10 negation tests cover EDB/IDB negation, multi-level strata, non-stratifiability rejection, and a negation safety violation.

  • 2026-05-08 — Phase 5 done. lib/datalog/eval.sx rewritten to semi-naive default. dl-saturate! tracks a per-relation delta and on each iteration walks every positive body position substituting delta for that one literal — joining the rest against the full DB snapshot. dl-saturate-naive! retained as the reference. Rules with no positive body literal (e.g. (p X) :- (= X 5).) fall back to a naive one-shot via dl-collect-rule-candidates. 8 tests differentially compare the two saturators using per-relation tuple counts (cheap). Chain-5 differential exercises multi-iteration recursive saturation. Larger chains made conformance.sh time out due to O(n) dl-tuple-member? × CPU sharing with other loop agents — added a Blocker to swap to a hash-set for membership. Also tightened dl-tuple-member? to use indexed iteration instead of recursive rest (was creating a fresh list per step).

  • 2026-05-07 — Phase 4 done. lib/datalog/builtins.sx (~280 LOC) adds (< X Y), (<= X Y), (> X Y), (>= X Y), (= X Y), (!= X Y), and (is X expr) with + - * /. dl-eval-builtin dispatches; dl-eval-arith recursively evaluates nested compounds. Safety check is now order-aware — it walks body literals left-to-right tracking the bound set, requires comparison/is inputs to be already bound, and special-cases = (binds the var-side; both sides must include at least one bound to bind the other). Phase 3's simple safety check stays in db.sx as a forward-reference fallback; builtins.sx redefines dl-rule-check-safety to the comprehensive version. eval.sx's dl-match-lit now dispatches built-ins through dl-eval-builtin. 19 builtins tests; conformance 106 / 106.

  • 2026-05-07 — Phase 3 done. lib/datalog/db.sx (~250 LOC) holds facts indexed by relation name plus the rules list, with dl-add-fact! / dl-add-rule! (rejects non-ground facts and unsafe rules); lib/datalog/eval.sx (~150 LOC) implements the naive bottom-up fixpoint via dl-find-bindings/dl-match-positive/dl-saturate! and dl-query (deduped projected substitutions). Safety analysis rejects unsafe head vars at load time. Negation and arithmetic built-ins raise clean errors (lifted in later phases). 15 eval tests cover transitive closure, sibling, same-generation, cyclic graph reach, and six safety violations. Conformance 87 / 87.

  • 2026-05-07 — Phase 2 done. lib/datalog/unify.sx (~140 LOC): dl-var? (case + underscore), dl-walk, dl-bind, dl-unify (returns extended dict subst or nil), dl-apply-subst, dl-ground?, dl-vars-of. Substitutions are immutable dicts; assoc builds extended copies. 28 unify tests; conformance now 72 / 72.

  • 2026-05-07 — Phase 1 done. lib/datalog/tokenizer.sx (~190 LOC) emits {:type :value :pos} tokens; lib/datalog/parser.sx (~150 LOC) produces {:head … :body …} / {:query …} clauses, with nested compounds permitted for arithmetic and not(...) desugared to {:neg …}. 44 / 44 via bash lib/datalog/conformance.sh (26 tokenize + 18 parse). Local helpers namespace-prefixed (dl-emit!, dl-peek) after a host-primitive shadow clash. Test harness uses a custom dl-deep-equal? that handles out-of-order dict keys and number repr (equal? fails on dict key order and on 30 vs 30.0).