# Datalog-on-SX: Datalog on the CEK/VM Datalog is a declarative query language: a restricted subset of Prolog with no function symbols, only relations. Programs are sets of facts and rules; queries ask what follows. Evaluation is bottom-up (fixpoint iteration) rather than Prolog's top-down DFS — which means no infinite loops, guaranteed termination, and efficient incremental updates. The unique angle: Datalog is a natural companion to the Prolog implementation already in progress (`lib/prolog/`). The parser and term representation can share infrastructure; the evaluator is an entirely different fixpoint engine rather than a DFS solver. End-state goal: **full core Datalog** (facts, rules, stratified negation, aggregation, recursion) with a clean SX query API, and a demonstration of Datalog as a query engine for rose-ash data (e.g. federation graph, content relationships). ## Ground rules - **Scope:** only touch `lib/datalog/**` and `plans/datalog-on-sx.md`. Do **not** edit `spec/`, `hosts/`, `shared/`, `lib/prolog/**`, or other `lib//`. - **Shared-file issues** go under "Blockers" below with a minimal repro; do not fix here. - **SX files:** use `sx-tree` MCP tools only. - **Architecture:** Datalog source → term AST → fixpoint evaluator. No transpiler to SX AST — the evaluator is written in SX and works directly on term structures. - **Reference:** Ramakrishnan & Ullman "A Survey of Deductive Database Systems"; Dalmau "Datalog and Constraint Satisfaction". - **Commits:** one feature per commit. Keep `## Progress log` updated and tick boxes. ## Architecture sketch ``` Datalog source text │ ▼ lib/datalog/tokenizer.sx — atoms, variables, numbers, strings, punct (?- :- , . ( ) [ ]) │ ▼ lib/datalog/parser.sx — facts: atom(args). rules: head :- body. queries: ?- goal. │ No function symbols (only constants and variables in args). ▼ lib/datalog/db.sx — extensional DB (EDB): ground facts; IDB: derived relations; │ clause index by relation name/arity ▼ lib/datalog/eval.sx — bottom-up fixpoint: semi-naive evaluation with delta sets; │ stratification for negation; incremental update API ▼ lib/datalog/query.sx — query API: (datalog-query db goal) → list of substitutions; SX embedding: define facts/rules as SX data directly ``` Key differences from Prolog: - **No function symbols** — args are atoms, numbers, strings, or variables only. No `f(a,b)`. - **No cuts** — no procedural control. - **Bottom-up** — derive all consequences of all rules before answering; no search tree. - **Termination guaranteed** — no infinite derivation chains (no function symbols → finite Herbrand base). - **Stratified negation** — `not(P)` legal iff P does not recursively depend on its own negation. - **Aggregation** — `count`, `sum`, `min`, `max` over derived tuples (Datalog+). ## Roadmap ### Phase 1 — tokenizer + parser - [ ] Tokenizer: atoms (lowercase/quoted), variables (uppercase/`_`), numbers, strings, operators (`:- `, `?-`, `,`, `.`), comments (`%`, `/* */`) Note: no function symbol syntax (no nested `f(...)` in arg position). - [ ] Parser: - Facts: `parent(tom, bob).` → `{:head (parent tom bob) :body ()}` - Rules: `ancestor(X,Z) :- parent(X,Y), ancestor(Y,Z).` → `{:head (ancestor X Z) :body ((parent X Y) (ancestor Y Z))}` - Queries: `?- ancestor(tom, X).` → `{:query (ancestor tom X)}` - Negation: `not(parent(X,Y))` in body position → `{:neg (parent X Y)}` - [ ] Tests in `lib/datalog/tests/parse.sx` ### Phase 2 — unification + substitution - [ ] Share or port unification from `lib/prolog/` — term walk, occurs check off by default - [ ] `dl-unify` `t1` `t2` `subst` → extended subst or nil (no function symbols means simpler) - [ ] `dl-ground?` `term` → bool — all variables bound in substitution - [ ] Tests: atom/atom, var/atom, var/var, list args ### Phase 3 — extensional DB + naive evaluation - [ ] EDB: `{:relation-name → set-of-ground-tuples}` using SX sets (Phase 18 of primitives) - [ ] `dl-add-fact!` `db` `relation` `args` → add ground tuple - [ ] `dl-add-rule!` `db` `head` `body` → add rule clause - [ ] Naive evaluation: iterate rules until fixpoint For each rule, for each combination of body tuples that unify, derive head tuple. Repeat until no new tuples added. - [ ] `dl-query` `db` `goal` → list of substitutions satisfying goal against derived DB - [ ] Tests: transitive closure (ancestor), sibling, same-generation — classic Datalog programs ### Phase 4 — semi-naive evaluation (performance) - [ ] Delta sets: track newly derived tuples per iteration - [ ] Semi-naive rule: only join against delta tuples from last iteration, not full relation - [ ] Significant speedup for recursive rules — avoids re-deriving known tuples - [ ] `dl-stratify` `db` → dependency graph + SCC analysis → stratum ordering - [ ] Tests: verify semi-naive produces same results as naive; benchmark on large ancestor chain ### Phase 5 — stratified negation - [ ] Dependency graph analysis: which relations depend on which (positively or negatively) - [ ] Stratification check: error if negation is in a cycle (non-stratifiable program) - [ ] Evaluation: process strata in order — lower stratum fully computed before using its complement in a higher stratum - [ ] `not(P)` in rule body: at evaluation time, check P is NOT in the derived EDB - [ ] Tests: non-member (`not(member(X,L))`), colored-graph (`not(same-color(X,Y))`), stratification error detection ### Phase 6 — aggregation (Datalog+) - [ ] `count(X, Goal)` → number of distinct X satisfying Goal - [ ] `sum(X, Goal)` → sum of X values satisfying Goal - [ ] `min(X, Goal)` / `max(X, Goal)` → min/max of X satisfying Goal - [ ] `group-by` semantics: `count(X, sibling(bob, X))` → count of bob's siblings - [ ] Aggregation breaks stratification — evaluate in a separate post-fixpoint pass - [ ] Tests: social network statistics, grade aggregation, inventory sums ### Phase 7 — SX embedding API - [ ] `(dl-program facts rules)` → database from SX data directly (no parsing required) ``` (dl-program '((parent tom bob) (parent tom liz) (parent bob ann)) '((ancestor X Z :- (parent X Y) (ancestor Y Z)) (ancestor X Y :- (parent X Y)))) ``` - [ ] `(dl-query db '(ancestor tom ?X))` → `((ann) (bob) (liz) (pat))` - [ ] `(dl-assert! db '(parent ann pat))` → incremental fact addition + re-derive - [ ] `(dl-retract! db '(parent tom bob))` → fact removal + re-derive from scratch - [ ] Integration demo: federation graph query — `(ancestor actor1 actor2)` over rose-ash ActivityPub follow relationships ### Phase 8 — Datalog as a query language for rose-ash - [ ] Schema: map SQLAlchemy model relationships to Datalog EDB facts (e.g. `(follows user1 user2)`, `(authored user post)`, `(tagged post tag)`) - [ ] Loader: `dl-load-from-db!` — query PostgreSQL, populate Datalog EDB - [ ] Query examples: - `?- ancestor(me, X), authored(X, Post), tagged(Post, cooking).` → posts about cooking by people I follow (transitively) - `?- popular(Post) :- tagged(Post, T), count(L, (liked(L, Post))) >= 10.` → posts with 10+ likes - [ ] Expose as a rose-ash service endpoint: `POST /internal/datalog` with program + query ## Blockers _(none yet)_ ## Progress log _Newest first._ _(awaiting phase 1)_