The renamer for anonymous `_` variables started at counter 0 and
produced `_anon1, _anon2, ...` unconditionally. A user writing the
same naming convention would see their variables shadowed:
(dl-eval "p(a, b). p(c, d). q(_anon1) :- p(_anon1, _)."
"?- q(X).")
=> () ; should be ({:X a} {:X c})
The `_` got renamed to `_anon1` too, collapsing the two positions
of `p` to a single var (forcing args to be equal — which neither
tuple satisfies).
Fix: scan each rule (and query goal) for the highest `_anon<N>`
already present and start the renamer past it. New helpers
`dl-max-anon-num` / `dl-max-anon-num-list` / `dl-try-parse-int`
walk the rule tree; `dl-make-anon-renamer` now takes a `start`
argument; `dl-rename-anon-rule` and the query-time renamer in
`dl-query` both compute the start from the input.
1 regression test; conformance 275/275.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
40 KiB
Datalog-on-SX: Datalog on the CEK/VM
Datalog is a declarative query language: a restricted subset of Prolog with no function symbols, only relations. Programs are sets of facts and rules; queries ask what follows. Evaluation is bottom-up (fixpoint iteration) rather than Prolog's top-down DFS — which means no infinite loops, guaranteed termination, and efficient incremental updates.
The unique angle: Datalog is a natural companion to the Prolog implementation already in
progress (lib/prolog/). The parser and term representation can share infrastructure;
the evaluator is an entirely different fixpoint engine rather than a DFS solver.
End-state goal: full core Datalog (facts, rules, stratified negation, aggregation, recursion) with a clean SX query API, and a demonstration of Datalog as a query engine for rose-ash data (e.g. federation graph, content relationships).
Status (rolling)
bash lib/datalog/conformance.sh → 275/275 across 11 suites
(tokenize, parse, unify, eval, builtins, semi_naive, negation, aggregates,
api, magic, demo). Source is ~3100 LOC, tests ~2900 LOC, public API
documented in lib/datalog/datalog.sx.
Phases 1–9 are functionally complete; Phase 10 covers the rose-ash
domain demos (in lib/datalog/demo.sx — federation, content,
permissions, cooking-posts, tag co-occurrence, shortest path, org chart).
The PostgreSQL loader and /internal/datalog HTTP endpoint listed in
Phase 10 require service-tree edits outside lib/datalog/** and are
flagged as out-of-scope for this loop.
Ground rules
- Scope: only touch
lib/datalog/**andplans/datalog-on-sx.md. Do not editspec/,hosts/,shared/,lib/prolog/**, or otherlib/<lang>/. - Shared-file issues go under "Blockers" below with a minimal repro; do not fix here.
- SX files: use
sx-treeMCP tools only. - Architecture: Datalog source → term AST → fixpoint evaluator. No transpiler to SX AST — the evaluator is written in SX and works directly on term structures.
- Reference: Ramakrishnan & Ullman "A Survey of Deductive Database Systems"; Dalmau "Datalog and Constraint Satisfaction".
- Commits: one feature per commit. Keep
## Progress logupdated and tick boxes.
Architecture sketch
Datalog source text
│
▼
lib/datalog/tokenizer.sx — atoms, variables, numbers, strings, punct (?- :- , . ( ) [ ])
│
▼
lib/datalog/parser.sx — facts: atom(args). rules: head :- body. queries: ?- goal.
│ No function symbols (only constants and variables in args).
▼
lib/datalog/db.sx — extensional DB (EDB): ground facts; IDB: derived relations;
│ clause index by relation name/arity
▼
lib/datalog/eval.sx — bottom-up fixpoint: semi-naive evaluation with delta sets;
│ stratification for negation; incremental update API
▼
lib/datalog/query.sx — query API: (datalog-query db goal) → list of substitutions;
SX embedding: define facts/rules as SX data directly
Key differences from Prolog:
- No function symbols — args are atoms, numbers, strings, or variables only. No
f(a,b). - No cuts — no procedural control.
- Bottom-up — derive all consequences of all rules before answering; no search tree.
- Termination guaranteed — no infinite derivation chains (no function symbols → finite Herbrand base).
- Stratified negation —
not(P)legal iff P does not recursively depend on its own negation. - Aggregation —
count,sum,min,maxover derived tuples (Datalog+).
Roadmap
Phase 1 — tokenizer + parser
- Tokenizer: atoms (lowercase/quoted), variables (uppercase/
_), numbers, strings, punct (( ),,,.), operators (:-,?-,<=,>=,!=,<,>,=,+,-,*,/), comments (%,/* */) Note: no function symbol syntax (no nestedf(...)in arg position) — but the parser permits nested compounds for arithmetic; safety analysis (Phase 3) rejects non-arithmetic nesting. - Parser:
- Facts:
parent(tom, bob).→{:head (parent tom bob) :body ()}- Rules:ancestor(X,Z) :- parent(X,Y), ancestor(Y,Z).→{:head (ancestor X Z) :body ((parent X Y) (ancestor Y Z))}- Queries:?- ancestor(tom, X).→{:query ((ancestor tom X))}(:queryvalue is always a list of literals;?- p, q.→{:query ((p) (q))}) - Negation:not(parent(X,Y))in body position →{:neg (parent X Y)} - Tests in
lib/datalog/tests/parse.sx(18) andlib/datalog/tests/tokenize.sx(26). Conformance harness:bash lib/datalog/conformance.sh→ 44 / 44 passing.
Phase 2 — unification + substitution
- Ported (not shared) from
lib/prolog/— term walk, no occurs check. dl-unify t1 t2 subst→ extended subst dict, ornilon failure.dl-walk,dl-bind,dl-apply-subst,dl-ground?,dl-vars-of.- Substitutions are immutable dicts keyed by variable name (string). Lists/tuples unify element-wise (used for arithmetic compounds too).
- Tests in
lib/datalog/tests/unify.sx(28). 72 / 72 conformance.
Phase 3 — extensional DB + naive evaluation + safety analysis
- EDB+IDB combined:
{:facts {<rel-name-string> -> (literal ...)}}— relations indexed by name; tuples stored as full literals so they unify directly. Dedup on insert viadl-tuple-equal?. dl-add-fact! db lit(rejects non-ground) anddl-add-rule! db rule(rejects unsafe).dl-program sourceparses + loads in one step.- Naive evaluation
dl-saturate! db: iterate rules until no new tuples.dl-find-bindingsrecursively joins body literals;dl-match-positiveunifies a literal against every tuple in the relation. dl-query db goal→ list of substitutions overgoal's vars, deduplicated.dl-relation db namefor derived tuples.- Safety analysis at
dl-add-rule!time: every head variable except_must appear in some positive body literal. Built-ins and negated literals do not satisfy safety. Helpersdl-positive-body-vars,dl-rule-unsafe-head-varsexposed for later phases. - Negation and arithmetic built-ins error cleanly at saturate time (Phase 4 / Phase 7 will swap in real semantics).
- Tests in
lib/datalog/tests/eval.sx(15): transitive closure, sibling, same-generation, grandparent, cyclic graph reach, six safety cases. 87 / 87 conformance.
Phase 4 — built-in predicates + body arithmetic
Almost every real query needs <, =, simple arithmetic, and string
comparisons in body position. These are not EDB lookups — they're
constraints that filter bindings.
- Recognise built-in predicates in body:
(< X Y),(<= X Y),(> X Y),(>= X Y),(= X Y),(!= X Y)and arithmetic forms(is Z (+ X Y)),(is Z (- X Y)),(is Z (* X Y)),(is Z (/ X Y)). Live inlib/datalog/builtins.sx. dl-eval-builtindispatches;dl-eval-arithrecursively evaluates(+ a b)etc. with full nesting.=unifies;!=rejects equal ground terms.- Order-aware safety analysis (
dl-rule-check-safety): walks body left-to-right tracking which vars are bound.is's RHS vars must be already bound; LHS becomes bound. Comparisons require both sides bound.=is special-cased — at least one side bound binds the other. Negation vars must be bound (will be enforced fully in Phase 7). - Wired through SX numeric primitives — no separate number tower.
- Tests in
lib/datalog/tests/builtins.sx(19): range filters, arithmetic derivations, equality binding, eight safety violations and three safe-shape tests. Conformance 106 / 106.
Phase 5 — semi-naive evaluation (performance)
- Delta sets
{rel-name -> tuples}track newly derived tuples per iter.dl-snapshot-factsbuilds the initial delta from the EDB. - Semi-naive rule: for each rule, walk every positive body literal
position; substitute that one with the per-relation delta and join
the rest against the previous-iteration DB (
dl-find-bindings-semi). Candidates are collected before mutating the DB so the "full" sides see a consistent snapshot. dl-collect-rule-candidatesfalls back to a naive single pass when a rule has no positive body literal (e.g.(p X) :- (= X 5).).dl-saturate!is now semi-naive by default;dl-saturate-naive!kept for differential testing and a reference implementation.- Tests in
lib/datalog/tests/semi_naive.sx(8) — every recursive program from earlier suites is run under both saturators with per-relation tuple counts compared (cheap, robust under bundled conformance session). A chain-5 differential exercises multiple semi-naive iterations against the recursive ancestor rule. Larger chains hit prohibitive wall-clock under conformance CPU contention with other agents — a future Blocker tracks switchingdl-tuple-member?from O(n²) list scan to a hash-set per relation.
Phase 6 — magic sets (goal-directed bottom-up, opt-in)
Naive bottom-up derives all consequences before answering. Magic sets rewrite the program so the fixpoint only derives tuples relevant to the goal — a major perf win for "what's reachable from node X" queries on large graphs.
- Adornments:
dl-adorn-goal goalanddl-adorn-lit lit boundinlib/datalog/magic.sx. Per-argb/fbased on whether the arg is a constant or a variable already in the bound set. - Magic transformation:
dl-magic-rewrite rules query-rel adn argsgenerates{:rules <rewritten-rules> :seed <magic-seed>}. Each original rule is gated with amagic_<rel>^<adn>(bound)filter, and propagation rules are emitted for each positive non-builtin body literal. Worklist over(rel, adn)pairs starts from the query and stops when no new pairs appear. EDB facts pass through unchanged. - Sideways information passing strategy (SIPS): left-to-right
dl-rule-sips rule head-adornmentwalks body literals tracking the bound set, returning({:lit :adornment} ...). Recognisesis/aggregate result-vars as new binders; comparisons and negation pass through with computed adornments. (Pluggable strategies are future work.) dl-set-strategy! db strategyhook +dl-get-strategy db. Default:semi-naive.:magicaccepted but the transformation itself is deferred — saturator currently falls back to semi-naive. Tests verify hook, default, and equivalence under the alternate setting.- Equivalence test: rewritten ancestor program over the same EDB
derives the same number of
ancestortuples and returns the same query answers as the unrewritten program (chain-3 case). dl-magic-query db query-goal— top-level driver. Builds a fresh internal db with the caller's EDB facts, the magic seed, and the rewritten rules; saturates and queries. Caller's db is untouched. Equivalent todl-queryfor fully-stratifiable programs (sole motivation is a perf alternative on goal-shaped queries against large recursive relations).- Perf test: 10k-node reachability with magic vs semi-naive. Left to a future iteration — would need a benchmarking harness for large graphs and the conformance budget can't afford it.
Phase 7 — stratified negation
- Dependency graph:
dl-build-dep-graph dbreturns{head -> ({:rel :neg} ...)}. Built-ins drop out (they're not relations). - Reachability via Floyd-Warshall in
dl-build-reach; cycles detected byreach[A][B] && reach[B][A]. Programs are non-stratifiable iff any negative dependency falls inside an SCC.dl-check-stratifiablereturns nil on success or a clear message. dl-compute-stratapropagates stratum numbers iteratively:stratum(R) = max over deps of (stratum(dep) + (1 if negated else 0)).- Saturator refactor:
dl-saturate-rules! db rulesis the semi- naive worker;dl-saturate! dbrejects non-stratifiable programs, groups rules by head's stratum, and runs the worker on each stratum in increasing order. not(P)in body:dl-match-negationwalks the inner literal under the current subst and usesdl-match-positive— succeeds iff zero matches. Order-aware safety indl-rule-check-safety(already present from Phase 4) requires negation vars to be bound by an earlier positive literal.- Tests in
lib/datalog/tests/negation.sx(10): EDB and IDB negation, two-step strata, multi-level strata, with-arithmetic, empty-result and always-fail cases, non-stratifiability rejection, and a negation safety violation.
Phase 8 — aggregation (Datalog+)
(count R V Goal),(sum R V Goal),(min R V Goal),(max R V Goal),(findall L V Goal)— first arg is the result variable, second is the aggregated variable, third is the goal literal.findallreturns the distinct-value list itself; the others reduce. Live inlib/datalog/aggregates.sx.dl-eval-aggregate: runsdl-find-bindingson the goal under the current subst (which provides outer-context bindings), collects distinct values of the aggregated var, applies the aggregate.count/sumproduce 0 when no matches;min/maxproduce no binding (rule fails) when empty.- Group-by emerges naturally: outer-context vars in the goal are
substituted from the current subst, so
popular(P) :- post(P), count(N, U, liked(U, P)), >=(N, 3).correctly counts per-post. - Stratification:
dl-aggregate-dep-edgereturns a negation-like edge so the aggregate's goal relation is fully derived before the aggregate fires. Non-monotonicity respected. - Safety: aggregate body lit binds the result var; goal-internal vars are existentially quantified and don't need outer binding.
- Tests in
lib/datalog/tests/aggregates.sx(10): count siblings, sum prices, min/max scores, count over derived relation, empty-input cases for each operator, popularity threshold with group-by, distinct-counted-once.
Phase 9 — SX embedding API
(dl-program-data facts rules)builds a db from SX data —factsis a list of literals,rulesis a list of either dicts{:head … :body …}or lists(<head…> <- <body…>). Variables are SX symbols whose first char is uppercase or_, matching the parser's convention.(dl-program-data '((parent tom bob) (parent bob ann)) '((ancestor X Y <- (parent X Y)) (ancestor X Z <- (parent X Y) (ancestor Y Z))))(dl-rule head body)constructor for the dict form.(dl-query db '(ancestor tom X))already worked — same query API consumes the SX-data goal. Now also accepts a list of body literals for conjunctive queries:(dl-query db '((p X) (q X))),(dl-query db (list '(n X) '(> X 2))). Auto-dispatched viadl-query-coerceon first-element shape.(dl-assert! db '(parent ann pat))→ adds the fact and re-saturates.(dl-retract! db '(parent bob ann))→ drops matching tuples from the EDB list, wipes every relation that has a rule (those are IDB), and re-saturates from the surviving EDB.- Tests in
lib/datalog/tests/api.sx(9): closure via data API, dict-rule form, dl-rule constructor, dl-assert! incremental, dl-retract! removes derived, cyclic-graph reach via data, assert into empty db, fact-style rule (no arrow), coerce dict. - Integration demo: federation graph query —
(reachable A B)/(mutual A B)/(foaf A C)over(follows ACTOR-A ACTOR-B)inlib/datalog/demo.sx. Tests inlib/datalog/tests/demo.sx. Wiring this to actual rose-ash ActivityPub data is Phase 10 service work and is out of scope for this loop.
Phase 10 — Datalog as a query language for rose-ash
- Schema sketches in
lib/datalog/demo.sx: - Federation:(follows A B)→(mutual A B),(reachable A B),(foaf A C)(friend-of-a-friend, distinct). - Content:(authored A P),(liked U P),(tagged P T)→(post-likes P N)via aggregation,(popular P)for likes ≥ 3,(interesting Me P)joining follows + authored + popular. - Permissions:(member A G),(subgroup C P),(allowed G R)→(in-group A G)over transitive subgroups,(can-access A R). - Cooking-posts (the canonical example):(reach Me Them)over the follow graph, then(cooking-post-by-network Me P)joining reach + authored +(tagged P cooking). - Loader
dl-load-from-db!— out of scope for this loop (would need to editshared/services/outsidelib/datalog/). Programs indemo.sxalready document the EDB shape expected from such a loader.dl-program-dataconsumes the same shape. - Query examples covered by
lib/datalog/tests/demo.sx(10): mutuals, transitive reach, FOAF, popular posts, interesting feed, post likes count, direct/subgroup/transitive group access, no access without grant. - Service endpoint
POST /internal/datalog— out of scope as above. Once exposed, server-side handler would bedl-program-data+dl-query, returning JSON-encoded substitutions.
Blockers
- Saturation perf: three rounds done.
- hash-set membership in
dl-add-fact!(Phase 5b) - indexed iteration in
dl-find-bindings(Phase 5c) - first-arg index per relation (Phase 5e) — when a body literal's
first arg walks to a non-variable, dl-match-positive looks up
by
(str arg)instead of scanning the full relation. chain-25 saturation drops from ~33s to ~18s real (10s user). chain-50 still long (~120s+) due to dict-copy overhead in unification subst threading. Future: per-rule "compiled" body with pre-resolved var positions, slot-based subst representation to avoidassocper binding.
- hash-set membership in
Progress log
Newest first.
-
2026-05-11 — Anonymous-variable renamer collided with user-written
_anon<N>symbols. The renamer started counter at 0 and produced_anon1, _anon2, ...unconditionally; if the user wroteq(_anon1) :- p(_anon1, _).the_got renamed to_anon1too, collapsing the two positions ofpto a single var and returning the empty result instead of{a, c}. Fix: scan each rule (and query) for the max_anon<N>and start the renamer past it. The renamer constructor now takes astartarg; new helpersdl-max-anon-num/dl-max-anon-num-listwalk the rule tree. 1 regression test; 275/275. -
2026-05-11 —
dl-magic-querycould silently diverge fromdl-querywhen an aggregate's inner-goal relation was IDB. The rewriter passes aggregate body lits through unchanged (no magic propagation for them), so the inner relation was empty in the magic db and the aggregate returned 0. Probe:dl-eval-magic "u(a). u(b). u(c). u(d). banned(b). banned(d). active(X) :- u(X), not(banned(X)). n(N) :- count(N, X, active(X))." "?- n(N)."returnedN=0instead ofN=2. Fix:dl-magic-querynow pre-saturates the source db before copying facts into the magic db. This guarantees equivalence withdl-queryfor every stratified program; the magic benefit comes from goal-directed re-derivation of the query relation under the seed (which still matters for large recursive joins). The existing test suite's aggregate cases happened to dodge this because the inner goals were all EDB. 1 new regression test; 274/274. -
2026-05-11 — Anonymous
_in a negated literal was incorrectly flagged by the safety check. The canonical idiomorphan(X) :- person(X), not(parent(X, _))was rejected with "negation refers to unbound variable(s) ("_anon1")" because the parser renames each_to a fresh_anon*symbol and the negation safety walk demanded all vars in the negated lit be bound by an earlier positive body literal. Anonymous vars in negation are existentially quantified — they shouldn't need outer binding. Addeddl-non-anon-varsfilter;dl-process-neg!now strips_anon*names fromneededbefore the binding check. 2 new regression tests; 273/273. -
2026-05-11 — Compound terms in fact-arg / rule-head positions were silently stored as unreduced expressions.
p(+(1, 2)).resulted in a tuple(p (+ 1 2))(dl-ground? sees no free variables, so it passed).double(*(X, 2)) :- n(X).saturated todouble((* 3 2))rather thandouble(6). Datalog has no function symbols in arg positions —dl-add-fact!anddl-add-rule!now reject compound args via a newdl-simple-term?(number / string / symbol). Compounds remain legal in body literals where they encodeis/ arithmetic / aggregate sub-goals. 2 new regression tests; 271/271. -
2026-05-11 — Quoted atoms with uppercase-or-underscore-leading names were misclassified as variables.
p('Hello World').ran through the tokenizer's"atom"branch and through the parser'sstring->symbol, producing a symbol named "Hello World". dl-var? checks the first character — "H" is uppercase, so the fact was rejected as non-ground. Fix: tokenizer emits"string"for any'...'quoted form, so quoted atoms become opaque string constants (matching how Datalog idiomatically treats them — the alternative was a per-symbol "quoted" marker which would have rippled through unification and dl-var?). Updated the existing tokenize test and added one for'Hello'; also added a parse-level regression. 269/269. -
2026-05-11 — Type-mixed comparisons were silently inconsistent:
<(X, 5)withXbound to a string returned()(no result, no error), whileXbound to a symbol raised "Expected number, got symbol". Both should fail loudly. Addeddl-compare-typeok?—<,<=,>,>=now require both operands to share a primitive type (both numbers or both strings) and raise otherwise.!=is exempted since it's a polymorphic inequality test built ondl-tuple-equal?. 2 new regression tests; 267/267. -
2026-05-11 — Body literal shape validation in
dl-rule-check-safety: a dict that isn't{:neg ...}(e.g. typo'd{:negs ...}) used to silently fall through every dispatch clause, contributing zero bound vars; the user would then see a confusing "head var X unbound" error pointing at the head, not the malformed body. Same for body lits that are bare numbers / strings / symbols. Both shapes now raise a clear error naming the offending lit. 1 new regression test; 265/265. -
2026-05-11 — Division by zero in
issilently produced IEEE infinity instead of raising.is(R, /(X, 0))returnedR = inf, which then flowed through comparisons and aggregations to produce nonsense results.dl-eval-arithnow raises with a clear "division by zero in " message. 1 new test; 264/264. -
2026-05-11 — Aggregate variable validation:
count(N, Y, p(X))silently returnedN = 1becauseYwas never bound inp(X)— every match contributed the same unbound symbol, which dl-val-member? deduped to a single entry. Similarlysum(S, Y, p(X))raised a confusing "expected number" error from the underlying+. Added a third validator indl-eval-aggregate: the agg-var must appear in the goal literal. Error names the variable and the goal and explains the consequence. 1 new test; 263/263. -
2026-05-11 —
dl-retract!was silently destroying EDB facts in "mixed" relations (those with BOTH user-asserted facts AND a rule defining the same head). The retract pass wiped every rule-head relation wholesale and then re-saturated — but the saturator only re-derives the IDB portion, so explicit EDB facts vanished even for a no-op retract of a non-existent tuple. Probe:(let ((db (dl-program "p(a). p(b). p(X) :- q(X). q(c)."))) (dl-retract! db (quote (p z))) (dl-query db (quote (p X))))went from{a,b,c}to just{c}. Fix: tracked:edb-keysprovenance in the db.dl-add-fact!(public API) marks the tuple as EDB; saturator calls new internaldl-add-derived!which doesn't mark it.dl-retract!now strips only the IDB-derived portion of rule-head relations and preserves EDB-marked tuples through the re-saturate pass. 2 new regression tests; 262/262. -
2026-05-11 — Eval-semantics bug-hunt: nested
not(not(P))was silently misinterpreted. Outer-levelnot(...)is parsed as negation, but the innernot(banned(X))was parsed as a regular positive literal naming a relation callednot. Since nonotrelation existed, the inner match was empty and the outer negation succeeded vacuously, makingvip(X) :- u(X), not(not(banned(X))).equivalent tovip(X) :- u(X).(a silent double-negation = identity fallacy). Fix indl-rule-check-safety: both the positive-literal branch anddl-process-neg!now flag any body literal whose head is indl-reserved-rel-names. Error message names the relation and points the user at intermediate-relation stratified negation. 1 new regression test; 260/260. -
2026-05-10 — Bug-hunt round on parser/safety surfaced 7 real bugs, each fixed with regression tests:
- Reserved relation names (
not,count,<,is, ...) were accepted as rule/fact heads — would silently shadow built-ins. - Negative number literals (
n(-1).) failed to parse — users had to express them as(- 0 1)or viais. - Unterminated block comment
/* ...silently consumed the rest of the input. Now raises with the position. - Same silent-consume bug in unterminated string / quoted-atom.
- Empty-list rule head and non-list rule body weren't validated;
they'd crash later in
rest. dl-add-rule! now checks shape. - dl-magic-query with non-list / non-dict goal crashed cryptically.
- Tokenizer silently swallowed unrecognised characters (
?,!,#,@, etc.) — typos produced confusing downstream errors.
- Reserved relation names (
-
2026-05-08 — Phase 6 driver:
dl-magic-query db query-goal. Builds a fresh internal db from the caller's EDB + magic seed + rewritten rules, saturates, queries, returns substitutions — caller's db is untouched. Equivalent todl-queryfor any fully-stratifiable program; sole motivation is a perf alternative on goal-shaped queries against large recursive relations. 2 new tests cover equivalence and non-mutation. -
2026-05-08 — Phase 6 magic-sets rewriter.
dl-magic-rewrite rules query-rel adn argsreturns{:rules <rewritten> :seed <seed-fact>}. Worklist over(rel, adn)pairs starts from the query, gates each original rule with amagic_<rel>^<adn>(bound)filter, and emits propagation rules for each positive non-builtin body literal so that magic spreads to body relations. EDB facts pass through. 3 new tests cover seed structure, equivalence on chain-3 by ancestor-relation tuple count, and same-query-answers under the rewritten program. The plumbing for adl-saturate-magic!driver and large-graph perf benchmarks is still future work. -
2026-05-08 — Phase 6 building blocks for the magic-sets transformation:
dl-magic-rel-name,dl-magic-lit,dl-bound-args. The rewriter that generates magic seed and propagation rules is still future work; with these primitives in place it's a straightforward worklist algorithm. 4 new tests. -
2026-05-08 — Phase 6 adornments + SIPS in
lib/datalog/magic.sx. Inspection helpers —dl-adorn-goalanddl-adorn-litcompute per-argb/fpatterns under a bound set;dl-rule-sips rule head-adornmentwalks body literals left-to-right propagating the bound set, recognisingisand aggregate result-vars as new binders. Lays groundwork for a later magic-sets transformation. 10 new tests cover pure adornment, SIPS over a chain rule, head-fully-bound rules, comparisons, andis. Saturator does not yet consume these. -
2026-05-08 — Comprehensive integration test in api suite: a single program exercising recursion (
reachtransitive closure)- stratified negation (
safe X Y :- reach X Y, not banned Y) + aggregation (reach_countvia count) + comparison (>= N 2) composed end-to-end viadl-eval source query-source. Confirms the full pipeline (parser → safety → stratifier → semi-naive + aggregate post-pass → query) on a non-trivial program.
- stratified negation (
-
2026-05-08 — Bug fix: aggregates work as top-level query goals.
dl-match-lit(the naive matcher used bydl-find-bindings) was missing thedl-aggregate?dispatch — it was only present indl-fbs-aux(semi-naive). Symptom:(dl-query db '(count N X (p X)))silently returned(). Also updateddl-query-user-varsto project only the result var (first arg) of an aggregate goal — the aggregated var and inner-goal vars are existentials and should not appear in the projected substitution. 2 new aggregate tests cover the regression. -
2026-05-08 — Convenience:
dl-eval source query-source. Parses both strings, builds a db, saturates, runs the query, returns the substitution list. Single-call user-friendly entry. 2 new api tests cover ancestor and multi-goal queries. -
2026-05-08 — Phase 6 stub:
dl-set-strategy! db strategyanddl-get-strategy dbuser-facing hooks. Default:semi-naive;:magicis accepted but the actual transformation is deferred, so saturation still uses semi-naive. Lets us tick the "Optional pass — guarded behind dl-set-strategy!" Phase 6 box. 3 new eval tests. -
2026-05-08 — Demo: weighted-DAG shortest path.
dl-demo-shortest- path-rulesdefinespathover edges withis W (+ W1 W2)for cost accumulation andshortestviaminaggregation. 3 demo tests cover direct/multi-hop choice, multi-hop wins on cheaper route, and unreachable-empty. Addeddl-summary dbinspection helper returning{<rel>: count}(4 eval tests). -
2026-05-08 — Phase 5e perf: first-arg index per relation. db gains
:facts-index {<rel>: {<first-arg-key>: tuples}}mirroring the existing:facts-keysmembership index.dl-add-fact!populates it;dl-match-positivewalks the body literal's first arg under the current subst — if it's bound to a non-var, look up by(str arg)and iterate only the matching subset. chain-25 saturation 33s → 18s real (~2x). chain-50 still slow (~120s+) but tractable; next bottleneck is subst dict copies during unification. Differential test bumped to chain-12, semi-only count to chain-25. -
2026-05-08 — Demo: tag co-occurrence.
(cotagged P T1 T2)— post has both T1 and T2 with T1 != T2 — and(tag-pair-count T1 T2 N)counting posts per distinct tag pair. Demonstrates count aggregation grouped by outer-context vars. 2 new demo tests. -
2026-05-08 —
dl-queryaccepts a list of body literals for conjunctive queries, in addition to a single positive literal.dl-query-coercedispatches based on the first element's shape: positive lit (head is a symbol) or:negdict → wrap as singleton; list of lits → use as-is.dl-query-user-varscollects the union of vars across all goals (deduped,_filtered) for projection. 2 new api tests: multi-goal AND, and conjunction with comparison. -
2026-05-08 — Bug fix:
dl-check-stratifiablenow rejects recursion through aggregation (e.g.,q(N) :- count(N, X, q(X))). The stratifier was already adding negation-like edges for aggregates, but the cycle scan only looked at explicit:negliterals. Added the matching aggregate branch to the body iteration. Also adds doc-onlylib/datalog/datalog.sxwith the public-API surface (sinceloadis an epoch command and can't recurse from within an.sxfile). 3 new aggregate tests cover recursion-rejection, negation-and-aggregation coexistence, and min-over-empty-derived. -
2026-05-08 — Phase 10 demo + canonical query. Added the "cooking posts by people I follow (transitively)" example from the plan:
dl-demo-cooking-rulesdefinesreachover the follow graph (recursive transitive closure) andcooking-post-by-networkthat joins reach withauthoredand(tagged P cooking). 3 demo tests cover transitive network, direct-only follow, and empty-network cases. -
2026-05-08 — Phase 8 extension:
findall L V Goalaggregate. Bind L to the list of distinct V values for which Goal holds (or the empty list when no matches). Implemented as a one-line case indl-do-aggregate. 3 new tests: EDB, derived relation, empty. Useful for "give me all the X such that …" queries without scalar reduction. -
2026-05-08 — Phase 5d semantic fix: anonymous
_variables are renamed per occurrence atdl-add-rule!anddl-querytime so(p X _) (p _ Y)no longer unifies the two_s. New helpersdl-rename-anon-term,dl-rename-anon-lit,dl-make-anon-renamer,dl-rename-anon-rulein db.sx; eval.sx's dl-query renames the goal before search and projects only user-named vars (_is filtered out of the projection list). The "underscore in head" test now correctly rejects(p X _) :- q(X).— after renaming, the head's fresh anon var has no body binder. Two new eval tests verify rule-level and goal-level independence. 155/155 expected. -
2026-05-08 — Phase 5c perf: indexed
dl-find-bindings. Replaced the recursive(rest lits)walk withdl-fb-aux lits db subst i nusingnth lits i. Eliminates O(N²) list-copy per body of length N. chain-15 saturation 25s → 16s; chain-25 finishes in 33s real (vs. timeout previously). Bumped semi_naive tests: differential on chain-10, semi-only count on chain-15 (was chain-5/chain-5). 153/153. -
2026-05-08 — Phase 10 syntactic demo. New
lib/datalog/demo.sxwith three programs over rose-ash-shaped data: federation (mutual,reachable,foaf), content recommendation (post-likesvia count aggregation,popular,interesting), and role-based permissions (in-groupover transitive subgroups,can-access). 10 demo tests pass against synthetic EDB tuples. Postgres loader and/internal/datalogHTTP endpoint remain out of scope for this loop (they need service-tree edits beyondlib/datalog/**). Conformance now 153/153. -
2026-05-08 — Phase 5b perf: hash-set membership in
dl-add-fact!. db gains a parallel:facts-keys {<rel>: {<tuple-string>: true}}index alongside:facts.dl-tuple-keyderives a stable string key via(str lit)—(p 30)and(p 30.0)collide correctly because SX prints them identically. Insertion is O(1) instead of O(n). chain-7 saturation drops from ~12s to ~6s; chain-15 from ~50s to ~25s under shared CPU. Larger chains are still slow due to body-join overhead in dl-find-bindings (Blocker updated).dl-retract!updated to keep both indices consistent. 143/143. -
2026-05-08 — Phase 9 done. New
lib/datalog/api.sxexposes a parser-free embedding:dl-program-data facts rulesaccepts SX data lists, with rules in either dict form or list form using<-as the rule arrow (since SX parses:-as a keyword).dl-rule head bodyconstructs the dict.dl-assert! db litadds a fact and re-saturates;dl-retract! db litdrops the fact from EDB, wipes all rule-headed (IDB) relations, and re-saturates from scratch — the simplest correct semantics until provenance tracking arrives in a later phase. 9 API tests; conformance now 143/143. -
2026-05-08 — Phase 8 done. New
lib/datalog/aggregates.sx(~110 LOC): count / sum / min / max. Each is a body literal of shape(op R V Goal)—dl-eval-aggregaterunsdl-find-bindingson the goal under the outer subst (so outer vars in the goal get substituted, giving group-by-style aggregation), collects the distinct values ofV, and bindsR. Empty input: count/sum return 0; min/max produce no binding (rule fails). Stratifier extended viadl-aggregate-dep-edgeso the aggregate's goal relation is fully derived before the aggregate fires. Safety check treats goal-internal vars as existentials (no outer binding required); only the result var becomes bound. Conformance now 134 / 134. -
2026-05-08 — Phase 7 done (Phase 6 magic sets deferred — opt-in, semi-naive default suffices for current test suite). New
lib/datalog/strata.sx(~290 LOC): dep graph build, Floyd-Warshall reachability, SCC-via-mutual-reachability for non-stratifiability detection, iterative stratum computation, rule grouping by head stratum. eval.sx split:dl-saturate-rules!is the per-rule-set semi-naive worker,dl-saturate!is now the stratified driver (errors out on non-stratifiable programs).dl-match-negationin eval.sx: succeeds iff inner positive match is empty. Stratum-keyed dicts use(str s)since SX dicts only accept string/keyword keys. 10 negation tests cover EDB/IDB negation, multi-level strata, non-stratifiability rejection, and a negation safety violation. -
2026-05-08 — Phase 5 done.
lib/datalog/eval.sxrewritten to semi-naive default.dl-saturate!tracks a per-relation delta and on each iteration walks every positive body position substituting delta for that one literal — joining the rest against the full DB snapshot.dl-saturate-naive!retained as the reference. Rules with no positive body literal (e.g.(p X) :- (= X 5).) fall back to a naive one-shot viadl-collect-rule-candidates. 8 tests differentially compare the two saturators using per-relation tuple counts (cheap). Chain-5 differential exercises multi-iteration recursive saturation. Larger chains made conformance.sh time out due to O(n)dl-tuple-member?× CPU sharing with other loop agents — added a Blocker to swap to a hash-set for membership. Also tighteneddl-tuple-member?to use indexed iteration instead of recursiverest(was creating a fresh list per step). -
2026-05-07 — Phase 4 done.
lib/datalog/builtins.sx(~280 LOC) adds(< X Y),(<= X Y),(> X Y),(>= X Y),(= X Y),(!= X Y), and(is X expr)with+ - * /.dl-eval-builtindispatches;dl-eval-arithrecursively evaluates nested compounds. Safety check is now order-aware — it walks body literals left-to-right tracking the bound set, requires comparison/isinputs to be already bound, and special-cases=(binds the var-side; both sides must include at least one bound to bind the other). Phase 3's simple safety check stays in db.sx as a forward-reference fallback; builtins.sx redefinesdl-rule-check-safetyto the comprehensive version. eval.sx'sdl-match-litnow dispatches built-ins throughdl-eval-builtin. 19 builtins tests; conformance 106 / 106. -
2026-05-07 — Phase 3 done.
lib/datalog/db.sx(~250 LOC) holds facts indexed by relation name plus the rules list, withdl-add-fact!/dl-add-rule!(rejects non-ground facts and unsafe rules);lib/datalog/eval.sx(~150 LOC) implements the naive bottom-up fixpoint viadl-find-bindings/dl-match-positive/dl-saturate!anddl-query(deduped projected substitutions). Safety analysis rejects unsafe head vars at load time. Negation and arithmetic built-ins raise clean errors (lifted in later phases). 15 eval tests cover transitive closure, sibling, same-generation, cyclic graph reach, and six safety violations. Conformance 87 / 87. -
2026-05-07 — Phase 2 done.
lib/datalog/unify.sx(~140 LOC):dl-var?(case + underscore),dl-walk,dl-bind,dl-unify(returns extended dict subst ornil),dl-apply-subst,dl-ground?,dl-vars-of. Substitutions are immutable dicts;assocbuilds extended copies. 28 unify tests; conformance now 72 / 72. -
2026-05-07 — Phase 1 done.
lib/datalog/tokenizer.sx(~190 LOC) emits{:type :value :pos}tokens;lib/datalog/parser.sx(~150 LOC) produces{:head … :body …}/{:query …}clauses, with nested compounds permitted for arithmetic andnot(...)desugared to{:neg …}. 44 / 44 viabash lib/datalog/conformance.sh(26 tokenize + 18 parse). Local helpers namespace-prefixed (dl-emit!,dl-peek) after a host-primitive shadow clash. Test harness uses a customdl-deep-equal?that handles out-of-order dict keys and number repr (equal?fails on dict key order and on30vs30.0).