32 KiB
Datalog-on-SX: Datalog on the CEK/VM
Datalog is a declarative query language: a restricted subset of Prolog with no function symbols, only relations. Programs are sets of facts and rules; queries ask what follows. Evaluation is bottom-up (fixpoint iteration) rather than Prolog's top-down DFS — which means no infinite loops, guaranteed termination, and efficient incremental updates.
The unique angle: Datalog is a natural companion to the Prolog implementation already in
progress (lib/prolog/). The parser and term representation can share infrastructure;
the evaluator is an entirely different fixpoint engine rather than a DFS solver.
End-state goal: full core Datalog (facts, rules, stratified negation, aggregation, recursion) with a clean SX query API, and a demonstration of Datalog as a query engine for rose-ash data (e.g. federation graph, content relationships).
Status (rolling)
bash lib/datalog/conformance.sh → 208/208 across 11 suites
(tokenize, parse, unify, eval, builtins, semi_naive, negation, aggregates,
api, magic, demo). Source is ~3000 LOC, tests ~2700 LOC, public API
documented in lib/datalog/datalog.sx.
Phases 1–9 are functionally complete; Phase 10 covers the rose-ash
domain demos (in lib/datalog/demo.sx). The PostgreSQL loader and
/internal/datalog HTTP endpoint listed in Phase 10 require service-
tree edits outside lib/datalog/** and are flagged as out-of-scope
for this loop.
Ground rules
- Scope: only touch
lib/datalog/**andplans/datalog-on-sx.md. Do not editspec/,hosts/,shared/,lib/prolog/**, or otherlib/<lang>/. - Shared-file issues go under "Blockers" below with a minimal repro; do not fix here.
- SX files: use
sx-treeMCP tools only. - Architecture: Datalog source → term AST → fixpoint evaluator. No transpiler to SX AST — the evaluator is written in SX and works directly on term structures.
- Reference: Ramakrishnan & Ullman "A Survey of Deductive Database Systems"; Dalmau "Datalog and Constraint Satisfaction".
- Commits: one feature per commit. Keep
## Progress logupdated and tick boxes.
Architecture sketch
Datalog source text
│
▼
lib/datalog/tokenizer.sx — atoms, variables, numbers, strings, punct (?- :- , . ( ) [ ])
│
▼
lib/datalog/parser.sx — facts: atom(args). rules: head :- body. queries: ?- goal.
│ No function symbols (only constants and variables in args).
▼
lib/datalog/db.sx — extensional DB (EDB): ground facts; IDB: derived relations;
│ clause index by relation name/arity
▼
lib/datalog/eval.sx — bottom-up fixpoint: semi-naive evaluation with delta sets;
│ stratification for negation; incremental update API
▼
lib/datalog/query.sx — query API: (datalog-query db goal) → list of substitutions;
SX embedding: define facts/rules as SX data directly
Key differences from Prolog:
- No function symbols — args are atoms, numbers, strings, or variables only. No
f(a,b). - No cuts — no procedural control.
- Bottom-up — derive all consequences of all rules before answering; no search tree.
- Termination guaranteed — no infinite derivation chains (no function symbols → finite Herbrand base).
- Stratified negation —
not(P)legal iff P does not recursively depend on its own negation. - Aggregation —
count,sum,min,maxover derived tuples (Datalog+).
Roadmap
Phase 1 — tokenizer + parser
- Tokenizer: atoms (lowercase/quoted), variables (uppercase/
_), numbers, strings, punct (( ),,,.), operators (:-,?-,<=,>=,!=,<,>,=,+,-,*,/), comments (%,/* */) Note: no function symbol syntax (no nestedf(...)in arg position) — but the parser permits nested compounds for arithmetic; safety analysis (Phase 3) rejects non-arithmetic nesting. - Parser:
- Facts:
parent(tom, bob).→{:head (parent tom bob) :body ()}- Rules:ancestor(X,Z) :- parent(X,Y), ancestor(Y,Z).→{:head (ancestor X Z) :body ((parent X Y) (ancestor Y Z))}- Queries:?- ancestor(tom, X).→{:query ((ancestor tom X))}(:queryvalue is always a list of literals;?- p, q.→{:query ((p) (q))}) - Negation:not(parent(X,Y))in body position →{:neg (parent X Y)} - Tests in
lib/datalog/tests/parse.sx(18) andlib/datalog/tests/tokenize.sx(26). Conformance harness:bash lib/datalog/conformance.sh→ 44 / 44 passing.
Phase 2 — unification + substitution
- Ported (not shared) from
lib/prolog/— term walk, no occurs check. dl-unify t1 t2 subst→ extended subst dict, ornilon failure.dl-walk,dl-bind,dl-apply-subst,dl-ground?,dl-vars-of.- Substitutions are immutable dicts keyed by variable name (string). Lists/tuples unify element-wise (used for arithmetic compounds too).
- Tests in
lib/datalog/tests/unify.sx(28). 72 / 72 conformance.
Phase 3 — extensional DB + naive evaluation + safety analysis
- EDB+IDB combined:
{:facts {<rel-name-string> -> (literal ...)}}— relations indexed by name; tuples stored as full literals so they unify directly. Dedup on insert viadl-tuple-equal?. dl-add-fact! db lit(rejects non-ground) anddl-add-rule! db rule(rejects unsafe).dl-program sourceparses + loads in one step.- Naive evaluation
dl-saturate! db: iterate rules until no new tuples.dl-find-bindingsrecursively joins body literals;dl-match-positiveunifies a literal against every tuple in the relation. dl-query db goal→ list of substitutions overgoal's vars, deduplicated.dl-relation db namefor derived tuples.- Safety analysis at
dl-add-rule!time: every head variable except_must appear in some positive body literal. Built-ins and negated literals do not satisfy safety. Helpersdl-positive-body-vars,dl-rule-unsafe-head-varsexposed for later phases. - Negation and arithmetic built-ins error cleanly at saturate time (Phase 4 / Phase 7 will swap in real semantics).
- Tests in
lib/datalog/tests/eval.sx(15): transitive closure, sibling, same-generation, grandparent, cyclic graph reach, six safety cases. 87 / 87 conformance.
Phase 4 — built-in predicates + body arithmetic
Almost every real query needs <, =, simple arithmetic, and string
comparisons in body position. These are not EDB lookups — they're
constraints that filter bindings.
- Recognise built-in predicates in body:
(< X Y),(<= X Y),(> X Y),(>= X Y),(= X Y),(!= X Y)and arithmetic forms(is Z (+ X Y)),(is Z (- X Y)),(is Z (* X Y)),(is Z (/ X Y)). Live inlib/datalog/builtins.sx. dl-eval-builtindispatches;dl-eval-arithrecursively evaluates(+ a b)etc. with full nesting.=unifies;!=rejects equal ground terms.- Order-aware safety analysis (
dl-rule-check-safety): walks body left-to-right tracking which vars are bound.is's RHS vars must be already bound; LHS becomes bound. Comparisons require both sides bound.=is special-cased — at least one side bound binds the other. Negation vars must be bound (will be enforced fully in Phase 7). - Wired through SX numeric primitives — no separate number tower.
- Tests in
lib/datalog/tests/builtins.sx(19): range filters, arithmetic derivations, equality binding, eight safety violations and three safe-shape tests. Conformance 106 / 106.
Phase 5 — semi-naive evaluation (performance)
- Delta sets
{rel-name -> tuples}track newly derived tuples per iter.dl-snapshot-factsbuilds the initial delta from the EDB. - Semi-naive rule: for each rule, walk every positive body literal
position; substitute that one with the per-relation delta and join
the rest against the previous-iteration DB (
dl-find-bindings-semi). Candidates are collected before mutating the DB so the "full" sides see a consistent snapshot. dl-collect-rule-candidatesfalls back to a naive single pass when a rule has no positive body literal (e.g.(p X) :- (= X 5).).dl-saturate!is now semi-naive by default;dl-saturate-naive!kept for differential testing and a reference implementation.- Tests in
lib/datalog/tests/semi_naive.sx(8) — every recursive program from earlier suites is run under both saturators with per-relation tuple counts compared (cheap, robust under bundled conformance session). A chain-5 differential exercises multiple semi-naive iterations against the recursive ancestor rule. Larger chains hit prohibitive wall-clock under conformance CPU contention with other agents — a future Blocker tracks switchingdl-tuple-member?from O(n²) list scan to a hash-set per relation.
Phase 6 — magic sets (goal-directed bottom-up, opt-in)
Naive bottom-up derives all consequences before answering. Magic sets rewrite the program so the fixpoint only derives tuples relevant to the goal — a major perf win for "what's reachable from node X" queries on large graphs.
- Adornments:
dl-adorn-goal goalanddl-adorn-lit lit boundinlib/datalog/magic.sx. Per-argb/fbased on whether the arg is a constant or a variable already in the bound set. - Magic transformation:
dl-magic-rewrite rules query-rel adn argsgenerates{:rules <rewritten-rules> :seed <magic-seed>}. Each original rule is gated with amagic_<rel>^<adn>(bound)filter, and propagation rules are emitted for each positive non-builtin body literal. Worklist over(rel, adn)pairs starts from the query and stops when no new pairs appear. EDB facts pass through unchanged. - Sideways information passing strategy (SIPS): left-to-right
dl-rule-sips rule head-adornmentwalks body literals tracking the bound set, returning({:lit :adornment} ...). Recognisesis/aggregate result-vars as new binders; comparisons and negation pass through with computed adornments. (Pluggable strategies are future work.) dl-set-strategy! db strategyhook +dl-get-strategy db. Default:semi-naive.:magicaccepted but the transformation itself is deferred — saturator currently falls back to semi-naive. Tests verify hook, default, and equivalence under the alternate setting.- Equivalence test: rewritten ancestor program over the same EDB
derives the same number of
ancestortuples and returns the same query answers as the unrewritten program (chain-3 case). dl-magic-query db query-goal— top-level driver. Builds a fresh internal db with the caller's EDB facts, the magic seed, and the rewritten rules; saturates and queries. Caller's db is untouched. Equivalent todl-queryfor fully-stratifiable programs (sole motivation is a perf alternative on goal-shaped queries against large recursive relations).- Perf test: 10k-node reachability with magic vs semi-naive. Left to a future iteration — would need a benchmarking harness for large graphs and the conformance budget can't afford it.
Phase 7 — stratified negation
- Dependency graph:
dl-build-dep-graph dbreturns{head -> ({:rel :neg} ...)}. Built-ins drop out (they're not relations). - Reachability via Floyd-Warshall in
dl-build-reach; cycles detected byreach[A][B] && reach[B][A]. Programs are non-stratifiable iff any negative dependency falls inside an SCC.dl-check-stratifiablereturns nil on success or a clear message. dl-compute-stratapropagates stratum numbers iteratively:stratum(R) = max over deps of (stratum(dep) + (1 if negated else 0)).- Saturator refactor:
dl-saturate-rules! db rulesis the semi- naive worker;dl-saturate! dbrejects non-stratifiable programs, groups rules by head's stratum, and runs the worker on each stratum in increasing order. not(P)in body:dl-match-negationwalks the inner literal under the current subst and usesdl-match-positive— succeeds iff zero matches. Order-aware safety indl-rule-check-safety(already present from Phase 4) requires negation vars to be bound by an earlier positive literal.- Tests in
lib/datalog/tests/negation.sx(10): EDB and IDB negation, two-step strata, multi-level strata, with-arithmetic, empty-result and always-fail cases, non-stratifiability rejection, and a negation safety violation.
Phase 8 — aggregation (Datalog+)
(count R V Goal),(sum R V Goal),(min R V Goal),(max R V Goal),(findall L V Goal)— first arg is the result variable, second is the aggregated variable, third is the goal literal.findallreturns the distinct-value list itself; the others reduce. Live inlib/datalog/aggregates.sx.dl-eval-aggregate: runsdl-find-bindingson the goal under the current subst (which provides outer-context bindings), collects distinct values of the aggregated var, applies the aggregate.count/sumproduce 0 when no matches;min/maxproduce no binding (rule fails) when empty.- Group-by emerges naturally: outer-context vars in the goal are
substituted from the current subst, so
popular(P) :- post(P), count(N, U, liked(U, P)), >=(N, 3).correctly counts per-post. - Stratification:
dl-aggregate-dep-edgereturns a negation-like edge so the aggregate's goal relation is fully derived before the aggregate fires. Non-monotonicity respected. - Safety: aggregate body lit binds the result var; goal-internal vars are existentially quantified and don't need outer binding.
- Tests in
lib/datalog/tests/aggregates.sx(10): count siblings, sum prices, min/max scores, count over derived relation, empty-input cases for each operator, popularity threshold with group-by, distinct-counted-once.
Phase 9 — SX embedding API
(dl-program-data facts rules)builds a db from SX data —factsis a list of literals,rulesis a list of either dicts{:head … :body …}or lists(<head…> <- <body…>). Variables are SX symbols whose first char is uppercase or_, matching the parser's convention.(dl-program-data '((parent tom bob) (parent bob ann)) '((ancestor X Y <- (parent X Y)) (ancestor X Z <- (parent X Y) (ancestor Y Z))))(dl-rule head body)constructor for the dict form.(dl-query db '(ancestor tom X))already worked — same query API consumes the SX-data goal. Now also accepts a list of body literals for conjunctive queries:(dl-query db '((p X) (q X))),(dl-query db (list '(n X) '(> X 2))). Auto-dispatched viadl-query-coerceon first-element shape.(dl-assert! db '(parent ann pat))→ adds the fact and re-saturates.(dl-retract! db '(parent bob ann))→ drops matching tuples from the EDB list, wipes every relation that has a rule (those are IDB), and re-saturates from the surviving EDB.- Tests in
lib/datalog/tests/api.sx(9): closure via data API, dict-rule form, dl-rule constructor, dl-assert! incremental, dl-retract! removes derived, cyclic-graph reach via data, assert into empty db, fact-style rule (no arrow), coerce dict. - Integration demo: federation graph query —
(reachable A B)/(mutual A B)/(foaf A C)over(follows ACTOR-A ACTOR-B)inlib/datalog/demo.sx. Tests inlib/datalog/tests/demo.sx. Wiring this to actual rose-ash ActivityPub data is Phase 10 service work and is out of scope for this loop.
Phase 10 — Datalog as a query language for rose-ash
- Schema sketches in
lib/datalog/demo.sx: - Federation:(follows A B)→(mutual A B),(reachable A B),(foaf A C)(friend-of-a-friend, distinct). - Content:(authored A P),(liked U P),(tagged P T)→(post-likes P N)via aggregation,(popular P)for likes ≥ 3,(interesting Me P)joining follows + authored + popular. - Permissions:(member A G),(subgroup C P),(allowed G R)→(in-group A G)over transitive subgroups,(can-access A R). - Cooking-posts (the canonical example):(reach Me Them)over the follow graph, then(cooking-post-by-network Me P)joining reach + authored +(tagged P cooking). - Loader
dl-load-from-db!— out of scope for this loop (would need to editshared/services/outsidelib/datalog/). Programs indemo.sxalready document the EDB shape expected from such a loader.dl-program-dataconsumes the same shape. - Query examples covered by
lib/datalog/tests/demo.sx(10): mutuals, transitive reach, FOAF, popular posts, interesting feed, post likes count, direct/subgroup/transitive group access, no access without grant. - Service endpoint
POST /internal/datalog— out of scope as above. Once exposed, server-side handler would bedl-program-data+dl-query, returning JSON-encoded substitutions.
Blockers
- Saturation perf: three rounds done.
- hash-set membership in
dl-add-fact!(Phase 5b) - indexed iteration in
dl-find-bindings(Phase 5c) - first-arg index per relation (Phase 5e) — when a body literal's
first arg walks to a non-variable, dl-match-positive looks up
by
(str arg)instead of scanning the full relation. chain-25 saturation drops from ~33s to ~18s real (10s user). chain-50 still long (~120s+) due to dict-copy overhead in unification subst threading. Future: per-rule "compiled" body with pre-resolved var positions, slot-based subst representation to avoidassocper binding.
- hash-set membership in
Progress log
Newest first.
-
2026-05-08 — Phase 6 driver:
dl-magic-query db query-goal. Builds a fresh internal db from the caller's EDB + magic seed + rewritten rules, saturates, queries, returns substitutions — caller's db is untouched. Equivalent todl-queryfor any fully-stratifiable program; sole motivation is a perf alternative on goal-shaped queries against large recursive relations. 2 new tests cover equivalence and non-mutation. -
2026-05-08 — Phase 6 magic-sets rewriter.
dl-magic-rewrite rules query-rel adn argsreturns{:rules <rewritten> :seed <seed-fact>}. Worklist over(rel, adn)pairs starts from the query, gates each original rule with amagic_<rel>^<adn>(bound)filter, and emits propagation rules for each positive non-builtin body literal so that magic spreads to body relations. EDB facts pass through. 3 new tests cover seed structure, equivalence on chain-3 by ancestor-relation tuple count, and same-query-answers under the rewritten program. The plumbing for adl-saturate-magic!driver and large-graph perf benchmarks is still future work. -
2026-05-08 — Phase 6 building blocks for the magic-sets transformation:
dl-magic-rel-name,dl-magic-lit,dl-bound-args. The rewriter that generates magic seed and propagation rules is still future work; with these primitives in place it's a straightforward worklist algorithm. 4 new tests. -
2026-05-08 — Phase 6 adornments + SIPS in
lib/datalog/magic.sx. Inspection helpers —dl-adorn-goalanddl-adorn-litcompute per-argb/fpatterns under a bound set;dl-rule-sips rule head-adornmentwalks body literals left-to-right propagating the bound set, recognisingisand aggregate result-vars as new binders. Lays groundwork for a later magic-sets transformation. 10 new tests cover pure adornment, SIPS over a chain rule, head-fully-bound rules, comparisons, andis. Saturator does not yet consume these. -
2026-05-08 — Comprehensive integration test in api suite: a single program exercising recursion (
reachtransitive closure)- stratified negation (
safe X Y :- reach X Y, not banned Y) + aggregation (reach_countvia count) + comparison (>= N 2) composed end-to-end viadl-eval source query-source. Confirms the full pipeline (parser → safety → stratifier → semi-naive + aggregate post-pass → query) on a non-trivial program.
- stratified negation (
-
2026-05-08 — Bug fix: aggregates work as top-level query goals.
dl-match-lit(the naive matcher used bydl-find-bindings) was missing thedl-aggregate?dispatch — it was only present indl-fbs-aux(semi-naive). Symptom:(dl-query db '(count N X (p X)))silently returned(). Also updateddl-query-user-varsto project only the result var (first arg) of an aggregate goal — the aggregated var and inner-goal vars are existentials and should not appear in the projected substitution. 2 new aggregate tests cover the regression. -
2026-05-08 — Convenience:
dl-eval source query-source. Parses both strings, builds a db, saturates, runs the query, returns the substitution list. Single-call user-friendly entry. 2 new api tests cover ancestor and multi-goal queries. -
2026-05-08 — Phase 6 stub:
dl-set-strategy! db strategyanddl-get-strategy dbuser-facing hooks. Default:semi-naive;:magicis accepted but the actual transformation is deferred, so saturation still uses semi-naive. Lets us tick the "Optional pass — guarded behind dl-set-strategy!" Phase 6 box. 3 new eval tests. -
2026-05-08 — Demo: weighted-DAG shortest path.
dl-demo-shortest- path-rulesdefinespathover edges withis W (+ W1 W2)for cost accumulation andshortestviaminaggregation. 3 demo tests cover direct/multi-hop choice, multi-hop wins on cheaper route, and unreachable-empty. Addeddl-summary dbinspection helper returning{<rel>: count}(4 eval tests). -
2026-05-08 — Phase 5e perf: first-arg index per relation. db gains
:facts-index {<rel>: {<first-arg-key>: tuples}}mirroring the existing:facts-keysmembership index.dl-add-fact!populates it;dl-match-positivewalks the body literal's first arg under the current subst — if it's bound to a non-var, look up by(str arg)and iterate only the matching subset. chain-25 saturation 33s → 18s real (~2x). chain-50 still slow (~120s+) but tractable; next bottleneck is subst dict copies during unification. Differential test bumped to chain-12, semi-only count to chain-25. -
2026-05-08 — Demo: tag co-occurrence.
(cotagged P T1 T2)— post has both T1 and T2 with T1 != T2 — and(tag-pair-count T1 T2 N)counting posts per distinct tag pair. Demonstrates count aggregation grouped by outer-context vars. 2 new demo tests. -
2026-05-08 —
dl-queryaccepts a list of body literals for conjunctive queries, in addition to a single positive literal.dl-query-coercedispatches based on the first element's shape: positive lit (head is a symbol) or:negdict → wrap as singleton; list of lits → use as-is.dl-query-user-varscollects the union of vars across all goals (deduped,_filtered) for projection. 2 new api tests: multi-goal AND, and conjunction with comparison. -
2026-05-08 — Bug fix:
dl-check-stratifiablenow rejects recursion through aggregation (e.g.,q(N) :- count(N, X, q(X))). The stratifier was already adding negation-like edges for aggregates, but the cycle scan only looked at explicit:negliterals. Added the matching aggregate branch to the body iteration. Also adds doc-onlylib/datalog/datalog.sxwith the public-API surface (sinceloadis an epoch command and can't recurse from within an.sxfile). 3 new aggregate tests cover recursion-rejection, negation-and-aggregation coexistence, and min-over-empty-derived. -
2026-05-08 — Phase 10 demo + canonical query. Added the "cooking posts by people I follow (transitively)" example from the plan:
dl-demo-cooking-rulesdefinesreachover the follow graph (recursive transitive closure) andcooking-post-by-networkthat joins reach withauthoredand(tagged P cooking). 3 demo tests cover transitive network, direct-only follow, and empty-network cases. -
2026-05-08 — Phase 8 extension:
findall L V Goalaggregate. Bind L to the list of distinct V values for which Goal holds (or the empty list when no matches). Implemented as a one-line case indl-do-aggregate. 3 new tests: EDB, derived relation, empty. Useful for "give me all the X such that …" queries without scalar reduction. -
2026-05-08 — Phase 5d semantic fix: anonymous
_variables are renamed per occurrence atdl-add-rule!anddl-querytime so(p X _) (p _ Y)no longer unifies the two_s. New helpersdl-rename-anon-term,dl-rename-anon-lit,dl-make-anon-renamer,dl-rename-anon-rulein db.sx; eval.sx's dl-query renames the goal before search and projects only user-named vars (_is filtered out of the projection list). The "underscore in head" test now correctly rejects(p X _) :- q(X).— after renaming, the head's fresh anon var has no body binder. Two new eval tests verify rule-level and goal-level independence. 155/155 expected. -
2026-05-08 — Phase 5c perf: indexed
dl-find-bindings. Replaced the recursive(rest lits)walk withdl-fb-aux lits db subst i nusingnth lits i. Eliminates O(N²) list-copy per body of length N. chain-15 saturation 25s → 16s; chain-25 finishes in 33s real (vs. timeout previously). Bumped semi_naive tests: differential on chain-10, semi-only count on chain-15 (was chain-5/chain-5). 153/153. -
2026-05-08 — Phase 10 syntactic demo. New
lib/datalog/demo.sxwith three programs over rose-ash-shaped data: federation (mutual,reachable,foaf), content recommendation (post-likesvia count aggregation,popular,interesting), and role-based permissions (in-groupover transitive subgroups,can-access). 10 demo tests pass against synthetic EDB tuples. Postgres loader and/internal/datalogHTTP endpoint remain out of scope for this loop (they need service-tree edits beyondlib/datalog/**). Conformance now 153/153. -
2026-05-08 — Phase 5b perf: hash-set membership in
dl-add-fact!. db gains a parallel:facts-keys {<rel>: {<tuple-string>: true}}index alongside:facts.dl-tuple-keyderives a stable string key via(str lit)—(p 30)and(p 30.0)collide correctly because SX prints them identically. Insertion is O(1) instead of O(n). chain-7 saturation drops from ~12s to ~6s; chain-15 from ~50s to ~25s under shared CPU. Larger chains are still slow due to body-join overhead in dl-find-bindings (Blocker updated).dl-retract!updated to keep both indices consistent. 143/143. -
2026-05-08 — Phase 9 done. New
lib/datalog/api.sxexposes a parser-free embedding:dl-program-data facts rulesaccepts SX data lists, with rules in either dict form or list form using<-as the rule arrow (since SX parses:-as a keyword).dl-rule head bodyconstructs the dict.dl-assert! db litadds a fact and re-saturates;dl-retract! db litdrops the fact from EDB, wipes all rule-headed (IDB) relations, and re-saturates from scratch — the simplest correct semantics until provenance tracking arrives in a later phase. 9 API tests; conformance now 143/143. -
2026-05-08 — Phase 8 done. New
lib/datalog/aggregates.sx(~110 LOC): count / sum / min / max. Each is a body literal of shape(op R V Goal)—dl-eval-aggregaterunsdl-find-bindingson the goal under the outer subst (so outer vars in the goal get substituted, giving group-by-style aggregation), collects the distinct values ofV, and bindsR. Empty input: count/sum return 0; min/max produce no binding (rule fails). Stratifier extended viadl-aggregate-dep-edgeso the aggregate's goal relation is fully derived before the aggregate fires. Safety check treats goal-internal vars as existentials (no outer binding required); only the result var becomes bound. Conformance now 134 / 134. -
2026-05-08 — Phase 7 done (Phase 6 magic sets deferred — opt-in, semi-naive default suffices for current test suite). New
lib/datalog/strata.sx(~290 LOC): dep graph build, Floyd-Warshall reachability, SCC-via-mutual-reachability for non-stratifiability detection, iterative stratum computation, rule grouping by head stratum. eval.sx split:dl-saturate-rules!is the per-rule-set semi-naive worker,dl-saturate!is now the stratified driver (errors out on non-stratifiable programs).dl-match-negationin eval.sx: succeeds iff inner positive match is empty. Stratum-keyed dicts use(str s)since SX dicts only accept string/keyword keys. 10 negation tests cover EDB/IDB negation, multi-level strata, non-stratifiability rejection, and a negation safety violation. -
2026-05-08 — Phase 5 done.
lib/datalog/eval.sxrewritten to semi-naive default.dl-saturate!tracks a per-relation delta and on each iteration walks every positive body position substituting delta for that one literal — joining the rest against the full DB snapshot.dl-saturate-naive!retained as the reference. Rules with no positive body literal (e.g.(p X) :- (= X 5).) fall back to a naive one-shot viadl-collect-rule-candidates. 8 tests differentially compare the two saturators using per-relation tuple counts (cheap). Chain-5 differential exercises multi-iteration recursive saturation. Larger chains made conformance.sh time out due to O(n)dl-tuple-member?× CPU sharing with other loop agents — added a Blocker to swap to a hash-set for membership. Also tighteneddl-tuple-member?to use indexed iteration instead of recursiverest(was creating a fresh list per step). -
2026-05-07 — Phase 4 done.
lib/datalog/builtins.sx(~280 LOC) adds(< X Y),(<= X Y),(> X Y),(>= X Y),(= X Y),(!= X Y), and(is X expr)with+ - * /.dl-eval-builtindispatches;dl-eval-arithrecursively evaluates nested compounds. Safety check is now order-aware — it walks body literals left-to-right tracking the bound set, requires comparison/isinputs to be already bound, and special-cases=(binds the var-side; both sides must include at least one bound to bind the other). Phase 3's simple safety check stays in db.sx as a forward-reference fallback; builtins.sx redefinesdl-rule-check-safetyto the comprehensive version. eval.sx'sdl-match-litnow dispatches built-ins throughdl-eval-builtin. 19 builtins tests; conformance 106 / 106. -
2026-05-07 — Phase 3 done.
lib/datalog/db.sx(~250 LOC) holds facts indexed by relation name plus the rules list, withdl-add-fact!/dl-add-rule!(rejects non-ground facts and unsafe rules);lib/datalog/eval.sx(~150 LOC) implements the naive bottom-up fixpoint viadl-find-bindings/dl-match-positive/dl-saturate!anddl-query(deduped projected substitutions). Safety analysis rejects unsafe head vars at load time. Negation and arithmetic built-ins raise clean errors (lifted in later phases). 15 eval tests cover transitive closure, sibling, same-generation, cyclic graph reach, and six safety violations. Conformance 87 / 87. -
2026-05-07 — Phase 2 done.
lib/datalog/unify.sx(~140 LOC):dl-var?(case + underscore),dl-walk,dl-bind,dl-unify(returns extended dict subst ornil),dl-apply-subst,dl-ground?,dl-vars-of. Substitutions are immutable dicts;assocbuilds extended copies. 28 unify tests; conformance now 72 / 72. -
2026-05-07 — Phase 1 done.
lib/datalog/tokenizer.sx(~190 LOC) emits{:type :value :pos}tokens;lib/datalog/parser.sx(~150 LOC) produces{:head … :body …}/{:query …}clauses, with nested compounds permitted for arithmetic andnot(...)desugared to{:neg …}. 44 / 44 viabash lib/datalog/conformance.sh(26 tokenize + 18 parse). Local helpers namespace-prefixed (dl-emit!,dl-peek) after a host-primitive shadow clash. Test harness uses a customdl-deep-equal?that handles out-of-order dict keys and number repr (equal?fails on dict key order and on30vs30.0).