diff --git a/plans/datalog-on-sx.md b/plans/datalog-on-sx.md index c978043c..6f74f6d6 100644 --- a/plans/datalog-on-sx.md +++ b/plans/datalog-on-sx.md @@ -103,23 +103,59 @@ Key differences from Prolog: sibling, same-generation, grandparent, cyclic graph reach, six safety cases. 87 / 87 conformance. -### Phase 4 — semi-naive evaluation (performance) +### Phase 4 — built-in predicates + body arithmetic +Almost every real query needs `<`, `=`, simple arithmetic, and string +comparisons in body position. These are not EDB lookups — they're +constraints that filter bindings. +- [ ] Recognise built-in predicates in body: `(< X Y)`, `(<= X Y)`, `(> X Y)`, + `(>= X Y)`, `(= X Y)`, `(!= X Y)` and arithmetic forms `(is Z (+ X Y))`, + `(is Z (- X Y))`, `(is Z (* X Y))`, `(is Z (/ X Y))`. +- [ ] Built-in evaluation: at the join step, after binding variables from + EDB lookups, evaluate built-ins as constraints. If any built-in fails + or has unbound inputs, drop the candidate substitution. +- [ ] **Safety extension**: `is` binds its left operand iff right operand is + fully ground. `(< X Y)` requires both X and Y bound by some prior body + literal — reject unsafe at `dl-add-rule!` time. +- [ ] Wire arithmetic operators through to SX numeric primitives — no + separate Datalog number tower. +- [ ] Tests: range filters, arithmetic derivations, comparison-based + queries, safety violation on `(p X) :- (< X 5).` + +### Phase 5 — semi-naive evaluation (performance) - [ ] Delta sets: track newly derived tuples per iteration - [ ] Semi-naive rule: only join against delta tuples from last iteration, not full relation - [ ] Significant speedup for recursive rules — avoids re-deriving known tuples -- [ ] `dl-stratify` `db` → dependency graph + SCC analysis → stratum ordering - [ ] Tests: verify semi-naive produces same results as naive; benchmark on large ancestor chain -### Phase 5 — stratified negation +### Phase 6 — magic sets (goal-directed bottom-up, opt-in) +Naive bottom-up derives **all** consequences before answering. Magic sets +rewrite the program so the fixpoint only derives tuples relevant to the +goal — a major perf win for "what's reachable from node X" queries on +large graphs. +- [ ] Adornments: annotate rule predicates with bound (`b`) / free (`f`) + patterns based on how they're called. +- [ ] Magic transformation: for each adorned predicate, generate a + `magic_` relation and rewrite rule bodies to filter through it. +- [ ] Sideways information passing strategy (SIPS): left-to-right by + default; pluggable. +- [ ] Optional pass — `(dl-set-strategy! db :magic)`; default semi-naive. +- [ ] Tests: equivalence vs naive on small inputs; perf win on a 10k-node + reachability query from a single root. + +### Phase 7 — stratified negation - [ ] Dependency graph analysis: which relations depend on which (positively or negatively) - [ ] Stratification check: error if negation is in a cycle (non-stratifiable program) -- [ ] Evaluation: process strata in order — lower stratum fully computed before using its - complement in a higher stratum -- [ ] `not(P)` in rule body: at evaluation time, check P is NOT in the derived EDB -- [ ] Tests: non-member (`not(member(X,L))`), colored-graph (`not(same-color(X,Y))`), - stratification error detection +- [ ] `dl-stratify db` → SCC analysis → stratum ordering +- [ ] Evaluation: process strata in order — lower stratum fully computed + before using its complement in a higher stratum +- [ ] `not(P)` in rule body: at evaluation time, check P is NOT in the + derived EDB +- [ ] Safety extension: head vars in negative literals must also appear in + some positive body literal of the same rule +- [ ] Tests: non-member (`not(member(X,L))`), colored-graph + (`not(same-color(X,Y))`), stratification error detection -### Phase 6 — aggregation (Datalog+) +### Phase 8 — aggregation (Datalog+) - [ ] `count(X, Goal)` → number of distinct X satisfying Goal - [ ] `sum(X, Goal)` → sum of X values satisfying Goal - [ ] `min(X, Goal)` / `max(X, Goal)` → min/max of X satisfying Goal @@ -127,7 +163,7 @@ Key differences from Prolog: - [ ] Aggregation breaks stratification — evaluate in a separate post-fixpoint pass - [ ] Tests: social network statistics, grade aggregation, inventory sums -### Phase 7 — SX embedding API +### Phase 9 — SX embedding API - [ ] `(dl-program facts rules)` → database from SX data directly (no parsing required) ``` (dl-program @@ -141,7 +177,7 @@ Key differences from Prolog: - [ ] Integration demo: federation graph query — `(ancestor actor1 actor2)` over rose-ash ActivityPub follow relationships -### Phase 8 — Datalog as a query language for rose-ash +### Phase 10 — Datalog as a query language for rose-ash - [ ] Schema: map SQLAlchemy model relationships to Datalog EDB facts (e.g. `(follows user1 user2)`, `(authored user post)`, `(tagged post tag)`) - [ ] Loader: `dl-load-from-db!` — query PostgreSQL, populate Datalog EDB