db.sx: facts indexed by relation name, rules list, dl-add-fact! (rejects non-ground), dl-add-rule! (rejects unsafe — head vars not in positive body). eval.sx: dl-saturate! fixpoint, dl-query with deduped projected results. Negation and arithmetic raise clear errors (Phase 4/7 to follow). 15 eval tests: transitive closure, sibling, same-gen, grandparent, cyclic reach, safety.
10 KiB
Datalog-on-SX: Datalog on the CEK/VM
Datalog is a declarative query language: a restricted subset of Prolog with no function symbols, only relations. Programs are sets of facts and rules; queries ask what follows. Evaluation is bottom-up (fixpoint iteration) rather than Prolog's top-down DFS — which means no infinite loops, guaranteed termination, and efficient incremental updates.
The unique angle: Datalog is a natural companion to the Prolog implementation already in
progress (lib/prolog/). The parser and term representation can share infrastructure;
the evaluator is an entirely different fixpoint engine rather than a DFS solver.
End-state goal: full core Datalog (facts, rules, stratified negation, aggregation, recursion) with a clean SX query API, and a demonstration of Datalog as a query engine for rose-ash data (e.g. federation graph, content relationships).
Ground rules
- Scope: only touch
lib/datalog/**andplans/datalog-on-sx.md. Do not editspec/,hosts/,shared/,lib/prolog/**, or otherlib/<lang>/. - Shared-file issues go under "Blockers" below with a minimal repro; do not fix here.
- SX files: use
sx-treeMCP tools only. - Architecture: Datalog source → term AST → fixpoint evaluator. No transpiler to SX AST — the evaluator is written in SX and works directly on term structures.
- Reference: Ramakrishnan & Ullman "A Survey of Deductive Database Systems"; Dalmau "Datalog and Constraint Satisfaction".
- Commits: one feature per commit. Keep
## Progress logupdated and tick boxes.
Architecture sketch
Datalog source text
│
▼
lib/datalog/tokenizer.sx — atoms, variables, numbers, strings, punct (?- :- , . ( ) [ ])
│
▼
lib/datalog/parser.sx — facts: atom(args). rules: head :- body. queries: ?- goal.
│ No function symbols (only constants and variables in args).
▼
lib/datalog/db.sx — extensional DB (EDB): ground facts; IDB: derived relations;
│ clause index by relation name/arity
▼
lib/datalog/eval.sx — bottom-up fixpoint: semi-naive evaluation with delta sets;
│ stratification for negation; incremental update API
▼
lib/datalog/query.sx — query API: (datalog-query db goal) → list of substitutions;
SX embedding: define facts/rules as SX data directly
Key differences from Prolog:
- No function symbols — args are atoms, numbers, strings, or variables only. No
f(a,b). - No cuts — no procedural control.
- Bottom-up — derive all consequences of all rules before answering; no search tree.
- Termination guaranteed — no infinite derivation chains (no function symbols → finite Herbrand base).
- Stratified negation —
not(P)legal iff P does not recursively depend on its own negation. - Aggregation —
count,sum,min,maxover derived tuples (Datalog+).
Roadmap
Phase 1 — tokenizer + parser
- Tokenizer: atoms (lowercase/quoted), variables (uppercase/
_), numbers, strings, punct (( ),,,.), operators (:-,?-,<=,>=,!=,<,>,=,+,-,*,/), comments (%,/* */) Note: no function symbol syntax (no nestedf(...)in arg position) — but the parser permits nested compounds for arithmetic; safety analysis (Phase 3) rejects non-arithmetic nesting. - Parser:
- Facts:
parent(tom, bob).→{:head (parent tom bob) :body ()}- Rules:ancestor(X,Z) :- parent(X,Y), ancestor(Y,Z).→{:head (ancestor X Z) :body ((parent X Y) (ancestor Y Z))}- Queries:?- ancestor(tom, X).→{:query ((ancestor tom X))}(:queryvalue is always a list of literals;?- p, q.→{:query ((p) (q))}) - Negation:not(parent(X,Y))in body position →{:neg (parent X Y)} - Tests in
lib/datalog/tests/parse.sx(18) andlib/datalog/tests/tokenize.sx(26). Conformance harness:bash lib/datalog/conformance.sh→ 44 / 44 passing.
Phase 2 — unification + substitution
- Ported (not shared) from
lib/prolog/— term walk, no occurs check. dl-unify t1 t2 subst→ extended subst dict, ornilon failure.dl-walk,dl-bind,dl-apply-subst,dl-ground?,dl-vars-of.- Substitutions are immutable dicts keyed by variable name (string). Lists/tuples unify element-wise (used for arithmetic compounds too).
- Tests in
lib/datalog/tests/unify.sx(28). 72 / 72 conformance.
Phase 3 — extensional DB + naive evaluation + safety analysis
- EDB+IDB combined:
{:facts {<rel-name-string> -> (literal ...)}}— relations indexed by name; tuples stored as full literals so they unify directly. Dedup on insert viadl-tuple-equal?. dl-add-fact! db lit(rejects non-ground) anddl-add-rule! db rule(rejects unsafe).dl-program sourceparses + loads in one step.- Naive evaluation
dl-saturate! db: iterate rules until no new tuples.dl-find-bindingsrecursively joins body literals;dl-match-positiveunifies a literal against every tuple in the relation. dl-query db goal→ list of substitutions overgoal's vars, deduplicated.dl-relation db namefor derived tuples.- Safety analysis at
dl-add-rule!time: every head variable except_must appear in some positive body literal. Built-ins and negated literals do not satisfy safety. Helpersdl-positive-body-vars,dl-rule-unsafe-head-varsexposed for later phases. - Negation and arithmetic built-ins error cleanly at saturate time (Phase 4 / Phase 7 will swap in real semantics).
- Tests in
lib/datalog/tests/eval.sx(15): transitive closure, sibling, same-generation, grandparent, cyclic graph reach, six safety cases. 87 / 87 conformance.
Phase 4 — semi-naive evaluation (performance)
- Delta sets: track newly derived tuples per iteration
- Semi-naive rule: only join against delta tuples from last iteration, not full relation
- Significant speedup for recursive rules — avoids re-deriving known tuples
dl-stratifydb→ dependency graph + SCC analysis → stratum ordering- Tests: verify semi-naive produces same results as naive; benchmark on large ancestor chain
Phase 5 — stratified negation
- Dependency graph analysis: which relations depend on which (positively or negatively)
- Stratification check: error if negation is in a cycle (non-stratifiable program)
- Evaluation: process strata in order — lower stratum fully computed before using its complement in a higher stratum
not(P)in rule body: at evaluation time, check P is NOT in the derived EDB- Tests: non-member (
not(member(X,L))), colored-graph (not(same-color(X,Y))), stratification error detection
Phase 6 — aggregation (Datalog+)
count(X, Goal)→ number of distinct X satisfying Goalsum(X, Goal)→ sum of X values satisfying Goalmin(X, Goal)/max(X, Goal)→ min/max of X satisfying Goalgroup-bysemantics:count(X, sibling(bob, X))→ count of bob's siblings- Aggregation breaks stratification — evaluate in a separate post-fixpoint pass
- Tests: social network statistics, grade aggregation, inventory sums
Phase 7 — SX embedding API
(dl-program facts rules)→ database from SX data directly (no parsing required)(dl-program '((parent tom bob) (parent tom liz) (parent bob ann)) '((ancestor X Z :- (parent X Y) (ancestor Y Z)) (ancestor X Y :- (parent X Y))))(dl-query db '(ancestor tom ?X))→((ann) (bob) (liz) (pat))(dl-assert! db '(parent ann pat))→ incremental fact addition + re-derive(dl-retract! db '(parent tom bob))→ fact removal + re-derive from scratch- Integration demo: federation graph query —
(ancestor actor1 actor2)over rose-ash ActivityPub follow relationships
Phase 8 — Datalog as a query language for rose-ash
- Schema: map SQLAlchemy model relationships to Datalog EDB facts
(e.g.
(follows user1 user2),(authored user post),(tagged post tag)) - Loader:
dl-load-from-db!— query PostgreSQL, populate Datalog EDB - Query examples:
-
?- ancestor(me, X), authored(X, Post), tagged(Post, cooking).→ posts about cooking by people I follow (transitively) -?- popular(Post) :- tagged(Post, T), count(L, (liked(L, Post))) >= 10.→ posts with 10+ likes - Expose as a rose-ash service endpoint:
POST /internal/datalogwith program + query
Blockers
(none yet)
Progress log
Newest first.
-
2026-05-07 — Phase 3 done.
lib/datalog/db.sx(~250 LOC) holds facts indexed by relation name plus the rules list, withdl-add-fact!/dl-add-rule!(rejects non-ground facts and unsafe rules);lib/datalog/eval.sx(~150 LOC) implements the naive bottom-up fixpoint viadl-find-bindings/dl-match-positive/dl-saturate!anddl-query(deduped projected substitutions). Safety analysis rejects unsafe head vars at load time. Negation and arithmetic built-ins raise clean errors (lifted in later phases). 15 eval tests cover transitive closure, sibling, same-generation, cyclic graph reach, and six safety violations. Conformance 87 / 87. -
2026-05-07 — Phase 2 done.
lib/datalog/unify.sx(~140 LOC):dl-var?(case + underscore),dl-walk,dl-bind,dl-unify(returns extended dict subst ornil),dl-apply-subst,dl-ground?,dl-vars-of. Substitutions are immutable dicts;assocbuilds extended copies. 28 unify tests; conformance now 72 / 72. -
2026-05-07 — Phase 1 done.
lib/datalog/tokenizer.sx(~190 LOC) emits{:type :value :pos}tokens;lib/datalog/parser.sx(~150 LOC) produces{:head … :body …}/{:query …}clauses, with nested compounds permitted for arithmetic andnot(...)desugared to{:neg …}. 44 / 44 viabash lib/datalog/conformance.sh(26 tokenize + 18 parse). Local helpers namespace-prefixed (dl-emit!,dl-peek) after a host-primitive shadow clash. Test harness uses a customdl-deep-equal?that handles out-of-order dict keys and number repr (equal?fails on dict key order and on30vs30.0).