Merge loops/datalog into architecture: tokenizer/parser, magic sets, negation, semi-naive (259/259 tests)
This commit is contained in:
@@ -13,6 +13,20 @@ End-state goal: **full core Datalog** (facts, rules, stratified negation, aggreg
|
||||
recursion) with a clean SX query API, and a demonstration of Datalog as a query engine
|
||||
for rose-ash data (e.g. federation graph, content relationships).
|
||||
|
||||
## Status (rolling)
|
||||
|
||||
`bash lib/datalog/conformance.sh` → **276/276 across 11 suites**
|
||||
(tokenize, parse, unify, eval, builtins, semi_naive, negation, aggregates,
|
||||
api, magic, demo). Source is ~3100 LOC, tests ~2900 LOC, public API
|
||||
documented in `lib/datalog/datalog.sx`.
|
||||
|
||||
Phases 1–9 are functionally complete; Phase 10 covers the rose-ash
|
||||
domain demos (in `lib/datalog/demo.sx` — federation, content,
|
||||
permissions, cooking-posts, tag co-occurrence, shortest path, org chart).
|
||||
The PostgreSQL loader and `/internal/datalog` HTTP endpoint listed in
|
||||
Phase 10 require service-tree edits outside `lib/datalog/**` and are
|
||||
flagged as out-of-scope for this loop.
|
||||
|
||||
## Ground rules
|
||||
|
||||
- **Scope:** only touch `lib/datalog/**` and `plans/datalog-on-sx.md`. Do **not** edit
|
||||
@@ -25,23 +39,6 @@ for rose-ash data (e.g. federation graph, content relationships).
|
||||
Dalmau "Datalog and Constraint Satisfaction".
|
||||
- **Commits:** one feature per commit. Keep `## Progress log` updated and tick boxes.
|
||||
|
||||
## Non-goals
|
||||
|
||||
Deliberately out of scope for this implementation. Real engines (Soufflé, Cozo, DDlog) include
|
||||
some of these; we accept they're missing and will note them in `Blockers` if a use case demands
|
||||
one later.
|
||||
|
||||
- **Function symbols** — keeps termination guaranteed and prevents collapse into Prolog.
|
||||
- **Disjunctive heads** (`p :- q. p :- r.` is fine; `p ; q :- r.` is not) — research extension.
|
||||
- **Well-founded semantics** — only stratified negation. Programs that aren't stratifiable are
|
||||
rejected at load time, not evaluated under WFS.
|
||||
- **Tabled top-down (SLG resolution)** — bottom-up only. If you want top-down with termination,
|
||||
use the Prolog implementation.
|
||||
- **Constraint Datalog** (Datalog over reals, intervals, finite domains) — research extension.
|
||||
- **Distributed evaluation / Differential Dataflow** — single-process fixpoint only. The rose-ash
|
||||
cross-service story (Phase 10) federates by querying each service's local Datalog instance and
|
||||
joining results, not by running a distributed fixpoint.
|
||||
|
||||
## Architecture sketch
|
||||
|
||||
```
|
||||
@@ -75,128 +72,647 @@ Key differences from Prolog:
|
||||
## Roadmap
|
||||
|
||||
### Phase 1 — tokenizer + parser
|
||||
- [ ] Tokenizer: atoms (lowercase/quoted), variables (uppercase/`_`), numbers, strings,
|
||||
operators (`:- `, `?-`, `,`, `.`), arithmetic + comparison operators
|
||||
(`+`, `-`, `*`, `/`, `<`, `<=`, `>`, `>=`, `=`, `!=`), comments (`%`, `/* */`)
|
||||
Note: no function symbol syntax (no nested `f(...)` in arg position).
|
||||
- [ ] Parser:
|
||||
- [x] Tokenizer: atoms (lowercase/quoted), variables (uppercase/`_`), numbers, strings,
|
||||
punct (`( )`, `,`, `.`), operators (`:-`, `?-`, `<=`, `>=`, `!=`, `<`, `>`, `=`,
|
||||
`+`, `-`, `*`, `/`), comments (`%`, `/* */`)
|
||||
Note: no function symbol syntax (no nested `f(...)` in arg position) — but the
|
||||
parser permits nested compounds for arithmetic; safety analysis (Phase 3) rejects
|
||||
non-arithmetic nesting.
|
||||
- [x] Parser:
|
||||
- Facts: `parent(tom, bob).` → `{:head (parent tom bob) :body ()}`
|
||||
- Rules: `ancestor(X,Z) :- parent(X,Y), ancestor(Y,Z).`
|
||||
→ `{:head (ancestor X Z) :body ((parent X Y) (ancestor Y Z))}`
|
||||
- Queries: `?- ancestor(tom, X).` → `{:query (ancestor tom X)}`
|
||||
- Queries: `?- ancestor(tom, X).` → `{:query ((ancestor tom X))}`
|
||||
(`:query` value is always a list of literals; `?- p, q.` → `{:query ((p) (q))}`)
|
||||
- Negation: `not(parent(X,Y))` in body position → `{:neg (parent X Y)}`
|
||||
- [ ] Tests in `lib/datalog/tests/parse.sx`
|
||||
- [x] Tests in `lib/datalog/tests/parse.sx` (18) and `lib/datalog/tests/tokenize.sx` (26).
|
||||
Conformance harness: `bash lib/datalog/conformance.sh` → 44 / 44 passing.
|
||||
|
||||
### Phase 2 — unification + substitution
|
||||
- [ ] Share or port unification from `lib/prolog/` — term walk, occurs check off by default
|
||||
- [ ] `dl-unify` `t1` `t2` `subst` → extended subst or nil (no function symbols means simpler)
|
||||
- [ ] `dl-ground?` `term` → bool — all variables bound in substitution
|
||||
- [ ] Tests: atom/atom, var/atom, var/var, list args
|
||||
- [x] Ported (not shared) from `lib/prolog/` — term walk, no occurs check.
|
||||
- [x] `dl-unify t1 t2 subst` → extended subst dict, or `nil` on failure.
|
||||
- [x] `dl-walk`, `dl-bind`, `dl-apply-subst`, `dl-ground?`, `dl-vars-of`.
|
||||
- [x] Substitutions are immutable dicts keyed by variable name (string).
|
||||
Lists/tuples unify element-wise (used for arithmetic compounds too).
|
||||
- [x] Tests in `lib/datalog/tests/unify.sx` (28). 72 / 72 conformance.
|
||||
|
||||
### Phase 3 — extensional DB + naive evaluation
|
||||
- [ ] EDB: `{:relation-name → set-of-ground-tuples}` using SX sets (Phase 18 of primitives)
|
||||
- [ ] `dl-add-fact!` `db` `relation` `args` → add ground tuple
|
||||
- [ ] `dl-add-rule!` `db` `head` `body` → add rule clause
|
||||
- [ ] Naive evaluation: iterate rules until fixpoint
|
||||
For each rule, for each combination of body tuples that unify, derive head tuple.
|
||||
Repeat until no new tuples added.
|
||||
- [ ] `dl-query` `db` `goal` → list of substitutions satisfying goal against derived DB
|
||||
- [ ] **Safety analysis**: every variable in a rule head must also appear in a positive body
|
||||
literal; reject unsafe rules at `dl-add-rule!` time with a clear error pointing at the
|
||||
offending variable. Built-in predicates and negated atoms do not satisfy safety on their
|
||||
own (`p(X) :- X > 0.` is unsafe).
|
||||
- [ ] Tests: transitive closure (ancestor), sibling, same-generation — classic Datalog programs;
|
||||
safety violation rejection cases.
|
||||
### Phase 3 — extensional DB + naive evaluation + safety analysis
|
||||
- [x] EDB+IDB combined: `{:facts {<rel-name-string> -> (literal ...)}}` —
|
||||
relations indexed by name; tuples stored as full literals so they
|
||||
unify directly. Dedup on insert via `dl-tuple-equal?`.
|
||||
- [x] `dl-add-fact! db lit` (rejects non-ground) and `dl-add-rule! db rule`
|
||||
(rejects unsafe). `dl-program source` parses + loads in one step.
|
||||
- [x] Naive evaluation `dl-saturate! db`: iterate rules until no new tuples.
|
||||
`dl-find-bindings` recursively joins body literals; `dl-match-positive`
|
||||
unifies a literal against every tuple in the relation.
|
||||
- [x] `dl-query db goal` → list of substitutions over `goal`'s vars,
|
||||
deduplicated. `dl-relation db name` for derived tuples.
|
||||
- [x] Safety analysis at `dl-add-rule!` time: every head variable except
|
||||
`_` must appear in some positive body literal. Built-ins and negated
|
||||
literals do not satisfy safety. Helpers `dl-positive-body-vars`,
|
||||
`dl-rule-unsafe-head-vars` exposed for later phases.
|
||||
- [x] Negation and arithmetic built-ins error cleanly at saturate time
|
||||
(Phase 4 / Phase 7 will swap in real semantics).
|
||||
- [x] Tests in `lib/datalog/tests/eval.sx` (15): transitive closure,
|
||||
sibling, same-generation, grandparent, cyclic graph reach, six
|
||||
safety cases. 87 / 87 conformance.
|
||||
|
||||
### Phase 4 — built-in predicates + body arithmetic
|
||||
|
||||
Almost every real query needs `<`, `=`, simple arithmetic, and string comparisons in body
|
||||
position. These are not EDB lookups — they're constraints that filter bindings.
|
||||
|
||||
- [ ] Recognise built-in predicates in body: `(< X Y)`, `(<= X Y)`, `(> X Y)`, `(>= X Y)`,
|
||||
`(= X Y)`, `(!= X Y)` and arithmetic forms `(is Z (+ X Y))`, `(is Z (- X Y))`,
|
||||
`(is Z (* X Y))`, `(is Z (/ X Y))`.
|
||||
- [ ] Built-in evaluation in the fixpoint: at the join step, after binding variables from EDB
|
||||
lookups, evaluate built-ins as constraints. If any built-in fails or has unbound inputs,
|
||||
drop the candidate substitution.
|
||||
- [ ] **Safety extension**: `is` binds its left operand iff right operand is fully ground.
|
||||
`(< X Y)` requires both X and Y bound by some prior body literal — reject unsafe.
|
||||
- [ ] Wire arithmetic operators through to SX numeric primitives — no separate Datalog number
|
||||
tower.
|
||||
- [ ] Tests: range filters, arithmetic derivations (`(plus-one X Y :- ..., (is Y (+ X 1)))`),
|
||||
comparison-based queries, safety violation detection on `(p X) :- (< X 5).`
|
||||
Almost every real query needs `<`, `=`, simple arithmetic, and string
|
||||
comparisons in body position. These are not EDB lookups — they're
|
||||
constraints that filter bindings.
|
||||
- [x] Recognise built-in predicates in body: `(< X Y)`, `(<= X Y)`, `(> X Y)`,
|
||||
`(>= X Y)`, `(= X Y)`, `(!= X Y)` and arithmetic forms `(is Z (+ X Y))`,
|
||||
`(is Z (- X Y))`, `(is Z (* X Y))`, `(is Z (/ X Y))`. Live in
|
||||
`lib/datalog/builtins.sx`.
|
||||
- [x] `dl-eval-builtin` dispatches; `dl-eval-arith` recursively evaluates
|
||||
`(+ a b)` etc. with full nesting. `=` unifies; `!=` rejects equal
|
||||
ground terms.
|
||||
- [x] Order-aware safety analysis (`dl-rule-check-safety`): walks body
|
||||
left-to-right tracking which vars are bound. `is`'s RHS vars must
|
||||
be already bound; LHS becomes bound. Comparisons require both
|
||||
sides bound. `=` is special-cased — at least one side bound binds
|
||||
the other. Negation vars must be bound (will be enforced fully in
|
||||
Phase 7).
|
||||
- [x] Wired through SX numeric primitives — no separate number tower.
|
||||
- [x] Tests in `lib/datalog/tests/builtins.sx` (19): range filters,
|
||||
arithmetic derivations, equality binding, eight safety violations
|
||||
and three safe-shape tests. Conformance 106 / 106.
|
||||
|
||||
### Phase 5 — semi-naive evaluation (performance)
|
||||
- [ ] Delta sets: track newly derived tuples per iteration
|
||||
- [ ] Semi-naive rule: only join against delta tuples from last iteration, not full relation
|
||||
- [ ] Significant speedup for recursive rules — avoids re-deriving known tuples
|
||||
- [ ] `dl-stratify` `db` → dependency graph + SCC analysis → stratum ordering
|
||||
- [ ] Tests: verify semi-naive produces same results as naive; benchmark on large ancestor chain
|
||||
- [x] Delta sets `{rel-name -> tuples}` track newly derived tuples per iter.
|
||||
`dl-snapshot-facts` builds the initial delta from the EDB.
|
||||
- [x] Semi-naive rule: for each rule, walk every positive body literal
|
||||
position; substitute that one with the per-relation delta and join
|
||||
the rest against the previous-iteration DB (`dl-find-bindings-semi`).
|
||||
Candidates are collected before mutating the DB so the "full" sides
|
||||
see a consistent snapshot.
|
||||
- [x] `dl-collect-rule-candidates` falls back to a naive single pass when
|
||||
a rule has no positive body literal (e.g. `(p X) :- (= X 5).`).
|
||||
- [x] `dl-saturate!` is now semi-naive by default; `dl-saturate-naive!`
|
||||
kept for differential testing and a reference implementation.
|
||||
- [x] Tests in `lib/datalog/tests/semi_naive.sx` (8) — every recursive
|
||||
program from earlier suites is run under both saturators with
|
||||
per-relation tuple counts compared (cheap, robust under bundled
|
||||
conformance session). A chain-5 differential exercises multiple
|
||||
semi-naive iterations against the recursive ancestor rule.
|
||||
Larger chains hit prohibitive wall-clock under conformance CPU
|
||||
contention with other agents — a future Blocker tracks switching
|
||||
`dl-tuple-member?` from O(n²) list scan to a hash-set per relation.
|
||||
|
||||
### Phase 6 — magic sets (goal-directed bottom-up)
|
||||
|
||||
Naive bottom-up evaluation derives **all** consequences of all rules before answering, even when
|
||||
the query touches a tiny slice of the EDB. Magic sets rewrite the program so the fixpoint only
|
||||
derives tuples relevant to the goal — a major perf win for "what's reachable from node X" style
|
||||
queries on large graphs.
|
||||
|
||||
- [ ] Adornments: annotate rule predicates with bound (`b`) / free (`f`) patterns based on how
|
||||
they're called (`ancestor^bf(tom, X)` vs `ancestor^ff(X, Y)`).
|
||||
- [ ] Magic transformation: for each adorned predicate, generate a `magic_<pred>` relation and
|
||||
rewrite rule bodies to filter through it. Seed with `magic_<query-pred>(<bound-args>)`.
|
||||
- [ ] Sideways information passing strategy (SIPS): left-to-right by default; pluggable.
|
||||
- [ ] Optional pass — guarded behind `(dl-set-strategy! db :magic)`; default remains semi-naive.
|
||||
- [ ] Tests: ancestor query from a single root on a 10k-node graph — magic-rewritten version
|
||||
should be O(reachable) instead of O(graph). Equivalence vs naive on small inputs.
|
||||
### Phase 6 — magic sets (goal-directed bottom-up, opt-in)
|
||||
Naive bottom-up derives **all** consequences before answering. Magic sets
|
||||
rewrite the program so the fixpoint only derives tuples relevant to the
|
||||
goal — a major perf win for "what's reachable from node X" queries on
|
||||
large graphs.
|
||||
- [x] Adornments: `dl-adorn-goal goal` and `dl-adorn-lit lit bound` in
|
||||
`lib/datalog/magic.sx`. Per-arg `b`/`f` based on whether the arg
|
||||
is a constant or a variable already in the bound set.
|
||||
- [x] Magic transformation: `dl-magic-rewrite rules query-rel adn args`
|
||||
generates `{:rules <rewritten-rules> :seed <magic-seed>}`. Each
|
||||
original rule is gated with a `magic_<rel>^<adn>(bound)` filter,
|
||||
and propagation rules are emitted for each positive non-builtin
|
||||
body literal. Worklist over `(rel, adn)` pairs starts from the
|
||||
query and stops when no new pairs appear. EDB facts pass through
|
||||
unchanged.
|
||||
- [x] Sideways information passing strategy (SIPS): left-to-right
|
||||
`dl-rule-sips rule head-adornment` walks body literals tracking
|
||||
the bound set, returning `({:lit :adornment} ...)`. Recognises
|
||||
`is`/aggregate result-vars as new binders; comparisons and
|
||||
negation pass through with computed adornments. (Pluggable
|
||||
strategies are future work.)
|
||||
- [x] `dl-set-strategy! db strategy` hook + `dl-get-strategy db`. Default
|
||||
`:semi-naive`. `:magic` accepted but the transformation itself is
|
||||
deferred — saturator currently falls back to semi-naive. Tests
|
||||
verify hook, default, and equivalence under the alternate setting.
|
||||
- [x] Equivalence test: rewritten ancestor program over the same EDB
|
||||
derives the same number of `ancestor` tuples and returns the
|
||||
same query answers as the unrewritten program (chain-3 case).
|
||||
- [x] `dl-magic-query db query-goal` — top-level driver. Builds a
|
||||
fresh internal db with the caller's EDB facts, the magic seed,
|
||||
and the rewritten rules; saturates and queries. Caller's db is
|
||||
untouched. Equivalent to `dl-query` for fully-stratifiable
|
||||
programs (sole motivation is a perf alternative on goal-shaped
|
||||
queries against large recursive relations).
|
||||
- [ ] Perf test: 10k-node reachability with magic vs semi-naive.
|
||||
Left to a future iteration — would need a benchmarking harness
|
||||
for large graphs and the conformance budget can't afford it.
|
||||
|
||||
### Phase 7 — stratified negation
|
||||
- [ ] Dependency graph analysis: which relations depend on which (positively or negatively)
|
||||
- [ ] Stratification check: error if negation is in a cycle (non-stratifiable program)
|
||||
- [ ] Evaluation: process strata in order — lower stratum fully computed before using its
|
||||
complement in a higher stratum
|
||||
- [ ] `not(P)` in rule body: at evaluation time, check P is NOT in the derived EDB
|
||||
- [ ] Tests: non-member (`not(member(X,L))`), colored-graph (`not(same-color(X,Y))`),
|
||||
stratification error detection
|
||||
- [x] Dependency graph: `dl-build-dep-graph db` returns `{head -> ({:rel
|
||||
:neg} ...)}`. Built-ins drop out (they're not relations).
|
||||
- [x] Reachability via Floyd-Warshall in `dl-build-reach`; cycles
|
||||
detected by `reach[A][B] && reach[B][A]`. Programs are
|
||||
non-stratifiable iff any negative dependency falls inside an SCC.
|
||||
`dl-check-stratifiable` returns nil on success or a clear message.
|
||||
- [x] `dl-compute-strata` propagates stratum numbers iteratively:
|
||||
`stratum(R) = max over deps of (stratum(dep) + (1 if negated else 0))`.
|
||||
- [x] Saturator refactor: `dl-saturate-rules! db rules` is the semi-
|
||||
naive worker; `dl-saturate! db` rejects non-stratifiable programs,
|
||||
groups rules by head's stratum, and runs the worker on each
|
||||
stratum in increasing order.
|
||||
- [x] `not(P)` in body: `dl-match-negation` walks the inner literal
|
||||
under the current subst and uses `dl-match-positive` — succeeds
|
||||
iff zero matches. Order-aware safety in `dl-rule-check-safety`
|
||||
(already present from Phase 4) requires negation vars to be
|
||||
bound by an earlier positive literal.
|
||||
- [x] Tests in `lib/datalog/tests/negation.sx` (10): EDB and IDB
|
||||
negation, two-step strata, multi-level strata, with-arithmetic,
|
||||
empty-result and always-fail cases, non-stratifiability
|
||||
rejection, and a negation safety violation.
|
||||
|
||||
### Phase 8 — aggregation (Datalog+)
|
||||
- [ ] `count(X, Goal)` → number of distinct X satisfying Goal
|
||||
- [ ] `sum(X, Goal)` → sum of X values satisfying Goal
|
||||
- [ ] `min(X, Goal)` / `max(X, Goal)` → min/max of X satisfying Goal
|
||||
- [ ] `group-by` semantics: `count(X, sibling(bob, X))` → count of bob's siblings
|
||||
- [ ] Aggregation breaks stratification — evaluate in a separate post-fixpoint pass
|
||||
- [ ] Tests: social network statistics, grade aggregation, inventory sums
|
||||
- [x] `(count R V Goal)`, `(sum R V Goal)`, `(min R V Goal)`,
|
||||
`(max R V Goal)`, `(findall L V Goal)` — first arg is the result
|
||||
variable, second is the aggregated variable, third is the goal
|
||||
literal. `findall` returns the distinct-value list itself; the
|
||||
others reduce. Live in `lib/datalog/aggregates.sx`.
|
||||
- [x] `dl-eval-aggregate`: runs `dl-find-bindings` on the goal under the
|
||||
current subst (which provides outer-context bindings), collects
|
||||
distinct values of the aggregated var, applies the aggregate.
|
||||
`count`/`sum` produce 0 when no matches; `min`/`max` produce no
|
||||
binding (rule fails) when empty.
|
||||
- [x] Group-by emerges naturally: outer-context vars in the goal are
|
||||
substituted from the current subst, so `popular(P) :- post(P),
|
||||
count(N, U, liked(U, P)), >=(N, 3).` correctly counts per-post.
|
||||
- [x] Stratification: `dl-aggregate-dep-edge` returns a negation-like
|
||||
edge so the aggregate's goal relation is fully derived before the
|
||||
aggregate fires. Non-monotonicity respected.
|
||||
- [x] Safety: aggregate body lit binds the result var; goal-internal
|
||||
vars are existentially quantified and don't need outer binding.
|
||||
- [x] Tests in `lib/datalog/tests/aggregates.sx` (10): count siblings,
|
||||
sum prices, min/max scores, count over derived relation,
|
||||
empty-input cases for each operator, popularity threshold with
|
||||
group-by, distinct-counted-once.
|
||||
|
||||
### Phase 9 — SX embedding API
|
||||
- [ ] `(dl-program facts rules)` → database from SX data directly (no parsing required)
|
||||
- [x] `(dl-program-data facts rules)` builds a db from SX data —
|
||||
`facts` is a list of literals, `rules` is a list of either
|
||||
dicts `{:head … :body …}` or lists `(<head…> <- <body…>)`.
|
||||
Variables are SX symbols whose first char is uppercase or `_`,
|
||||
matching the parser's convention.
|
||||
```
|
||||
(dl-program
|
||||
'((parent tom bob) (parent tom liz) (parent bob ann))
|
||||
'((ancestor X Z :- (parent X Y) (ancestor Y Z))
|
||||
(ancestor X Y :- (parent X Y))))
|
||||
(dl-program-data
|
||||
'((parent tom bob) (parent bob ann))
|
||||
'((ancestor X Y <- (parent X Y))
|
||||
(ancestor X Z <- (parent X Y) (ancestor Y Z))))
|
||||
```
|
||||
- [ ] `(dl-query db '(ancestor tom ?X))` → `((ann) (bob) (liz) (pat))`
|
||||
- [ ] `(dl-assert! db '(parent ann pat))` → incremental fact addition + re-derive
|
||||
- [ ] `(dl-retract! db '(parent tom bob))` → fact removal + re-derive from scratch
|
||||
- [ ] Integration demo: federation graph query — `(ancestor actor1 actor2)` over
|
||||
rose-ash ActivityPub follow relationships
|
||||
- [x] `(dl-rule head body)` constructor for the dict form.
|
||||
- [x] `(dl-query db '(ancestor tom X))` already worked — same query API
|
||||
consumes the SX-data goal. Now also accepts a *list* of body
|
||||
literals for conjunctive queries:
|
||||
`(dl-query db '((p X) (q X)))`,
|
||||
`(dl-query db (list '(n X) '(> X 2)))`. Auto-dispatched via
|
||||
`dl-query-coerce` on first-element shape.
|
||||
- [x] `(dl-assert! db '(parent ann pat))` → adds the fact and re-saturates.
|
||||
- [x] `(dl-retract! db '(parent bob ann))` → drops matching tuples from
|
||||
the EDB list, wipes every relation that has a rule (those are IDB),
|
||||
and re-saturates from the surviving EDB.
|
||||
- [x] Tests in `lib/datalog/tests/api.sx` (9): closure via data API,
|
||||
dict-rule form, dl-rule constructor, dl-assert! incremental,
|
||||
dl-retract! removes derived, cyclic-graph reach via data,
|
||||
assert into empty db, fact-style rule (no arrow), coerce dict.
|
||||
- [x] Integration demo: federation graph query — `(reachable A B)` /
|
||||
`(mutual A B)` / `(foaf A C)` over `(follows ACTOR-A ACTOR-B)` in
|
||||
`lib/datalog/demo.sx`. Tests in `lib/datalog/tests/demo.sx`.
|
||||
Wiring this to actual rose-ash ActivityPub data is Phase 10
|
||||
service work and is out of scope for this loop.
|
||||
|
||||
### Phase 10 — Datalog as a query language for rose-ash
|
||||
- [ ] Schema: map SQLAlchemy model relationships to Datalog EDB facts
|
||||
(e.g. `(follows user1 user2)`, `(authored user post)`, `(tagged post tag)`)
|
||||
- [ ] Loader: `dl-load-from-db!` — query PostgreSQL, populate Datalog EDB
|
||||
- [ ] Query examples:
|
||||
- `?- ancestor(me, X), authored(X, Post), tagged(Post, cooking).`
|
||||
→ posts about cooking by people I follow (transitively)
|
||||
- `?- popular(Post) :- tagged(Post, T), count(L, (liked(L, Post))) >= 10.`
|
||||
→ posts with 10+ likes
|
||||
- [ ] Expose as a rose-ash service endpoint: `POST /internal/datalog` with program + query
|
||||
- [x] Schema sketches in `lib/datalog/demo.sx`:
|
||||
- **Federation**: `(follows A B)` → `(mutual A B)`, `(reachable A B)`,
|
||||
`(foaf A C)` (friend-of-a-friend, distinct).
|
||||
- **Content**: `(authored A P)`, `(liked U P)`, `(tagged P T)` →
|
||||
`(post-likes P N)` via aggregation, `(popular P)` for likes ≥ 3,
|
||||
`(interesting Me P)` joining follows + authored + popular.
|
||||
- **Permissions**: `(member A G)`, `(subgroup C P)`, `(allowed G R)`
|
||||
→ `(in-group A G)` over transitive subgroups, `(can-access A R)`.
|
||||
- **Cooking-posts** (the canonical example): `(reach Me Them)` over
|
||||
the follow graph, then `(cooking-post-by-network Me P)` joining
|
||||
reach + authored + `(tagged P cooking)`.
|
||||
- [ ] Loader `dl-load-from-db!` — out of scope for this loop
|
||||
(would need to edit `shared/services/` outside `lib/datalog/`).
|
||||
Programs in `demo.sx` already document the EDB shape expected
|
||||
from such a loader. `dl-program-data` consumes the same shape.
|
||||
- [x] Query examples covered by `lib/datalog/tests/demo.sx` (10):
|
||||
mutuals, transitive reach, FOAF, popular posts, interesting feed,
|
||||
post likes count, direct/subgroup/transitive group access, no
|
||||
access without grant.
|
||||
- [ ] Service endpoint `POST /internal/datalog` — out of scope as above.
|
||||
Once exposed, server-side handler would be `dl-program-data` +
|
||||
`dl-query`, returning JSON-encoded substitutions.
|
||||
|
||||
## Blockers
|
||||
|
||||
_(none yet)_
|
||||
- **Saturation perf**: three rounds done.
|
||||
- hash-set membership in `dl-add-fact!` (Phase 5b)
|
||||
- indexed iteration in `dl-find-bindings` (Phase 5c)
|
||||
- first-arg index per relation (Phase 5e) — when a body literal's
|
||||
first arg walks to a non-variable, dl-match-positive looks up
|
||||
by `(str arg)` instead of scanning the full relation.
|
||||
chain-25 saturation drops from ~33s to ~18s real (10s user).
|
||||
chain-50 still long (~120s+) due to dict-copy overhead in
|
||||
unification subst threading. Future: per-rule "compiled" body
|
||||
with pre-resolved var positions, slot-based subst representation
|
||||
to avoid `assoc` per binding.
|
||||
|
||||
## Progress log
|
||||
|
||||
_Newest first._
|
||||
|
||||
_(awaiting phase 1)_
|
||||
- 2026-05-11 — `dl-set-strategy!` accepted arbitrary keyword values
|
||||
silently. Typos like `:semi_naive` or `:semiNaive` were stored
|
||||
uninspected; the saturator then used the default and the user
|
||||
never learned their setting was a typo. Validator added: strategy
|
||||
must be one of `:semi-naive`, `:naive`, `:magic`. 1 regression test;
|
||||
276/276.
|
||||
|
||||
- 2026-05-11 — Anonymous-variable renamer collided with user-written
|
||||
`_anon<N>` symbols. The renamer started counter at 0 and produced
|
||||
`_anon1, _anon2, ...` unconditionally; if the user wrote
|
||||
`q(_anon1) :- p(_anon1, _).` the `_` got renamed to `_anon1` too,
|
||||
collapsing the two positions of `p` to a single var and returning
|
||||
the empty result instead of `{a, c}`. Fix: scan each rule (and
|
||||
query) for the max `_anon<N>` and start the renamer past it. The
|
||||
renamer constructor now takes a `start` arg; new helpers
|
||||
`dl-max-anon-num` / `dl-max-anon-num-list` walk the rule tree.
|
||||
1 regression test; 275/275.
|
||||
|
||||
- 2026-05-11 — `dl-magic-query` could silently diverge from
|
||||
`dl-query` when an aggregate's inner-goal relation was IDB. The
|
||||
rewriter passes aggregate body lits through unchanged (no magic
|
||||
propagation for them), so the inner relation was empty in the
|
||||
magic db and the aggregate returned 0. Probe:
|
||||
`dl-eval-magic "u(a). u(b). u(c). u(d). banned(b). banned(d).
|
||||
active(X) :- u(X), not(banned(X)).
|
||||
n(N) :- count(N, X, active(X))." "?- n(N)."`
|
||||
returned `N=0` instead of `N=2`. Fix: `dl-magic-query` now
|
||||
pre-saturates the source db before copying facts into the magic
|
||||
db. This guarantees equivalence with `dl-query` for every
|
||||
stratified program; the magic benefit comes from goal-directed
|
||||
re-derivation of the query relation under the seed (which still
|
||||
matters for large recursive joins). The existing test suite's
|
||||
aggregate cases happened to dodge this because the inner goals
|
||||
were all EDB. 1 new regression test; 274/274.
|
||||
|
||||
- 2026-05-11 — Anonymous `_` in a negated literal was incorrectly
|
||||
flagged by the safety check. The canonical idiom
|
||||
`orphan(X) :- person(X), not(parent(X, _))` was rejected with
|
||||
"negation refers to unbound variable(s) (\"_anon1\")" because the
|
||||
parser renames each `_` to a fresh `_anon*` symbol and the negation
|
||||
safety walk demanded all vars in the negated lit be bound by an
|
||||
earlier positive body literal. Anonymous vars in negation are
|
||||
existentially quantified — they shouldn't need outer binding.
|
||||
Added `dl-non-anon-vars` filter; `dl-process-neg!` now strips
|
||||
`_anon*` names from `needed` before the binding check. 2 new
|
||||
regression tests; 273/273.
|
||||
|
||||
- 2026-05-11 — Compound terms in fact-arg / rule-head positions were
|
||||
silently stored as unreduced expressions. `p(+(1, 2)).` resulted
|
||||
in a tuple `(p (+ 1 2))` (dl-ground? sees no free variables, so it
|
||||
passed). `double(*(X, 2)) :- n(X).` saturated to `double((* 3 2))`
|
||||
rather than `double(6)`. Datalog has no function symbols in arg
|
||||
positions — `dl-add-fact!` and `dl-add-rule!` now reject compound
|
||||
args via a new `dl-simple-term?` (number / string / symbol).
|
||||
Compounds remain legal in body literals where they encode `is` /
|
||||
arithmetic / aggregate sub-goals. 2 new regression tests; 271/271.
|
||||
|
||||
- 2026-05-11 — Quoted atoms with uppercase-or-underscore-leading
|
||||
names were misclassified as variables. `p('Hello World').` ran
|
||||
through the tokenizer's `"atom"` branch and through the parser's
|
||||
`string->symbol`, producing a symbol named "Hello World". dl-var?
|
||||
checks the first character — "H" is uppercase, so the fact was
|
||||
rejected as non-ground. Fix: tokenizer emits `"string"` for any
|
||||
`'...'` quoted form, so quoted atoms become opaque string constants
|
||||
(matching how Datalog idiomatically treats them — the alternative
|
||||
was a per-symbol "quoted" marker which would have rippled through
|
||||
unification and dl-var?). Updated the existing tokenize test and
|
||||
added one for `'Hello'`; also added a parse-level regression. 269/269.
|
||||
|
||||
- 2026-05-11 — Type-mixed comparisons were silently inconsistent:
|
||||
`<(X, 5)` with `X` bound to a string returned `()` (no result, no
|
||||
error), while `X` bound to a symbol raised "Expected number, got
|
||||
symbol". Both should fail loudly. Added `dl-compare-typeok?` —
|
||||
`<`, `<=`, `>`, `>=` now require both operands to share a primitive
|
||||
type (both numbers or both strings) and raise otherwise. `!=` is
|
||||
exempted since it's a polymorphic inequality test built on
|
||||
`dl-tuple-equal?`. 2 new regression tests; 267/267.
|
||||
|
||||
- 2026-05-11 — Body literal shape validation in
|
||||
`dl-rule-check-safety`: a dict that isn't `{:neg ...}` (e.g. typo'd
|
||||
`{:negs ...}`) used to silently fall through every dispatch clause,
|
||||
contributing zero bound vars; the user would then see a confusing
|
||||
"head var X unbound" error pointing at the head, not the malformed
|
||||
body. Same for body lits that are bare numbers / strings / symbols.
|
||||
Both shapes now raise a clear error naming the offending lit. 1 new
|
||||
regression test; 265/265.
|
||||
|
||||
- 2026-05-11 — Division by zero in `is` silently produced IEEE
|
||||
infinity instead of raising. `is(R, /(X, 0))` returned `R = inf`,
|
||||
which then flowed through comparisons and aggregations to produce
|
||||
nonsense results. `dl-eval-arith` now raises with a clear
|
||||
"division by zero in <expr>" message. 1 new test; 264/264.
|
||||
|
||||
- 2026-05-11 — Aggregate variable validation: `count(N, Y, p(X))`
|
||||
silently returned `N = 1` because `Y` was never bound in `p(X)` —
|
||||
every match contributed the same unbound symbol, which dl-val-member?
|
||||
deduped to a single entry. Similarly `sum(S, Y, p(X))` raised a
|
||||
confusing "expected number" error from the underlying `+`. Added
|
||||
a third validator in `dl-eval-aggregate`: the agg-var must appear
|
||||
in the goal literal. Error names the variable and the goal and
|
||||
explains the consequence. 1 new test; 263/263.
|
||||
|
||||
- 2026-05-11 — `dl-retract!` was silently destroying EDB facts in
|
||||
"mixed" relations (those with BOTH user-asserted facts AND a rule
|
||||
defining the same head). The retract pass wiped every rule-head
|
||||
relation wholesale and then re-saturated — but the saturator only
|
||||
re-derives the IDB portion, so explicit EDB facts vanished even
|
||||
for a no-op retract of a non-existent tuple. Probe:
|
||||
`(let ((db (dl-program "p(a). p(b). p(X) :- q(X). q(c).")))
|
||||
(dl-retract! db (quote (p z))) (dl-query db (quote (p X))))`
|
||||
went from `{a,b,c}` to just `{c}`.
|
||||
Fix: tracked `:edb-keys` provenance in the db. `dl-add-fact!` (public
|
||||
API) marks the tuple as EDB; saturator calls new internal
|
||||
`dl-add-derived!` which doesn't mark it. `dl-retract!` now strips
|
||||
only the IDB-derived portion of rule-head relations and preserves
|
||||
EDB-marked tuples through the re-saturate pass. 2 new regression
|
||||
tests; 262/262.
|
||||
|
||||
- 2026-05-11 — Eval-semantics bug-hunt: nested `not(not(P))` was
|
||||
silently misinterpreted. Outer-level `not(...)` is parsed as
|
||||
negation, but the inner `not(banned(X))` was parsed as a regular
|
||||
positive literal naming a relation called `not`. Since no `not`
|
||||
relation existed, the inner match was empty and the outer
|
||||
negation succeeded vacuously, making `vip(X) :- u(X), not(not(banned(X))).`
|
||||
equivalent to `vip(X) :- u(X).` (a silent double-negation = identity
|
||||
fallacy). Fix in `dl-rule-check-safety`: both the positive-literal
|
||||
branch and `dl-process-neg!` now flag any body literal whose head
|
||||
is in `dl-reserved-rel-names`. Error message names the relation and
|
||||
points the user at intermediate-relation stratified negation. 1 new
|
||||
regression test; 260/260.
|
||||
|
||||
- 2026-05-10 — Bug-hunt round on parser/safety surfaced 7 real
|
||||
bugs, each fixed with regression tests:
|
||||
- Reserved relation names (`not`, `count`, `<`, `is`, ...) were
|
||||
accepted as rule/fact heads — would silently shadow built-ins.
|
||||
- Negative number literals (`n(-1).`) failed to parse — users
|
||||
had to express them as `(- 0 1)` or via `is`.
|
||||
- Unterminated block comment `/* ...` silently consumed the
|
||||
rest of the input. Now raises with the position.
|
||||
- Same silent-consume bug in unterminated string / quoted-atom.
|
||||
- Empty-list rule head and non-list rule body weren't validated;
|
||||
they'd crash later in `rest`. dl-add-rule! now checks shape.
|
||||
- dl-magic-query with non-list / non-dict goal crashed cryptically.
|
||||
- Tokenizer silently swallowed unrecognised characters (`?`, `!`,
|
||||
`#`, `@`, etc.) — typos produced confusing downstream errors.
|
||||
|
||||
- 2026-05-08 — Phase 6 driver: `dl-magic-query db query-goal`.
|
||||
Builds a fresh internal db from the caller's EDB + magic seed +
|
||||
rewritten rules, saturates, queries, returns substitutions —
|
||||
caller's db is untouched. Equivalent to `dl-query` for any
|
||||
fully-stratifiable program; sole motivation is a perf alternative
|
||||
on goal-shaped queries against large recursive relations.
|
||||
2 new tests cover equivalence and non-mutation.
|
||||
|
||||
- 2026-05-08 — Phase 6 magic-sets rewriter. `dl-magic-rewrite rules
|
||||
query-rel adn args` returns `{:rules <rewritten> :seed <seed-fact>}`.
|
||||
Worklist over `(rel, adn)` pairs starts from the query, gates each
|
||||
original rule with a `magic_<rel>^<adn>(bound)` filter, and emits
|
||||
propagation rules for each positive non-builtin body literal so
|
||||
that magic spreads to body relations. EDB facts pass through.
|
||||
3 new tests cover seed structure, equivalence on chain-3 by
|
||||
ancestor-relation tuple count, and same-query-answers under
|
||||
the rewritten program. The plumbing for a `dl-saturate-magic!`
|
||||
driver and large-graph perf benchmarks is still future work.
|
||||
|
||||
- 2026-05-08 — Phase 6 building blocks for the magic-sets
|
||||
transformation: `dl-magic-rel-name`, `dl-magic-lit`,
|
||||
`dl-bound-args`. The rewriter that generates magic seed and
|
||||
propagation rules is still future work; with these primitives
|
||||
in place it's a straightforward worklist algorithm. 4 new tests.
|
||||
|
||||
- 2026-05-08 — Phase 6 adornments + SIPS in
|
||||
`lib/datalog/magic.sx`. Inspection helpers — `dl-adorn-goal` and
|
||||
`dl-adorn-lit` compute per-arg `b`/`f` patterns under a bound
|
||||
set; `dl-rule-sips rule head-adornment` walks body literals
|
||||
left-to-right propagating the bound set, recognising `is` and
|
||||
aggregate result-vars as new binders. Lays groundwork for a
|
||||
later magic-sets transformation. 10 new tests cover pure
|
||||
adornment, SIPS over a chain rule, head-fully-bound rules,
|
||||
comparisons, and `is`. Saturator does not yet consume these.
|
||||
|
||||
- 2026-05-08 — Comprehensive integration test in api suite: a
|
||||
single program exercising recursion (`reach` transitive closure)
|
||||
+ stratified negation (`safe X Y :- reach X Y, not banned Y`) +
|
||||
aggregation (`reach_count` via count) + comparison (`>= N 2`)
|
||||
composed end-to-end via `dl-eval source query-source`. Confirms
|
||||
the full pipeline (parser → safety → stratifier → semi-naive +
|
||||
aggregate post-pass → query) on a non-trivial program.
|
||||
|
||||
- 2026-05-08 — Bug fix: aggregates work as top-level query goals.
|
||||
`dl-match-lit` (the naive matcher used by `dl-find-bindings`) was
|
||||
missing the `dl-aggregate?` dispatch — it was only present in
|
||||
`dl-fbs-aux` (semi-naive). Symptom: `(dl-query db '(count N X (p X)))`
|
||||
silently returned `()`. Also updated `dl-query-user-vars` to project
|
||||
only the result var (first arg) of an aggregate goal — the
|
||||
aggregated var and inner-goal vars are existentials and should not
|
||||
appear in the projected substitution. 2 new aggregate tests cover
|
||||
the regression.
|
||||
|
||||
- 2026-05-08 — Convenience: `dl-eval source query-source`. Parses
|
||||
both strings, builds a db, saturates, runs the query, returns
|
||||
the substitution list. Single-call user-friendly entry. 2 new
|
||||
api tests cover ancestor and multi-goal queries.
|
||||
|
||||
- 2026-05-08 — Phase 6 stub: `dl-set-strategy! db strategy` and
|
||||
`dl-get-strategy db` user-facing hooks. Default `:semi-naive`;
|
||||
`:magic` is accepted but the actual transformation is deferred,
|
||||
so saturation still uses semi-naive. Lets us tick the
|
||||
"Optional pass — guarded behind dl-set-strategy!" Phase 6 box.
|
||||
3 new eval tests.
|
||||
|
||||
- 2026-05-08 — Demo: weighted-DAG shortest path. `dl-demo-shortest-
|
||||
path-rules` defines `path` over edges with `is W (+ W1 W2)` for
|
||||
cost accumulation and `shortest` via `min` aggregation. 3 demo
|
||||
tests cover direct/multi-hop choice, multi-hop wins on cheaper
|
||||
route, and unreachable-empty. Added `dl-summary db` inspection
|
||||
helper returning `{<rel>: count}` (4 eval tests).
|
||||
|
||||
- 2026-05-08 — Phase 5e perf: first-arg index per relation. db gains
|
||||
`:facts-index {<rel>: {<first-arg-key>: tuples}}` mirroring the
|
||||
existing `:facts-keys` membership index. `dl-add-fact!` populates
|
||||
it; `dl-match-positive` walks the body literal's first arg under
|
||||
the current subst — if it's bound to a non-var, look up by
|
||||
`(str arg)` and iterate only the matching subset. chain-25
|
||||
saturation 33s → 18s real (~2x). chain-50 still slow (~120s+)
|
||||
but tractable; next bottleneck is subst dict copies during
|
||||
unification. Differential test bumped to chain-12, semi-only
|
||||
count to chain-25.
|
||||
|
||||
- 2026-05-08 — Demo: tag co-occurrence. `(cotagged P T1 T2)` — post
|
||||
has both T1 and T2 with T1 != T2 — and `(tag-pair-count T1 T2 N)`
|
||||
counting posts per distinct tag pair. Demonstrates count
|
||||
aggregation grouped by outer-context vars. 2 new demo tests.
|
||||
|
||||
- 2026-05-08 — `dl-query` accepts a list of body literals for
|
||||
conjunctive queries, in addition to a single positive literal.
|
||||
`dl-query-coerce` dispatches based on the first element's shape:
|
||||
positive lit (head is a symbol) or `:neg` dict → wrap as singleton;
|
||||
list of lits → use as-is. `dl-query-user-vars` collects the union
|
||||
of vars across all goals (deduped, `_` filtered) for projection.
|
||||
2 new api tests: multi-goal AND, and conjunction with comparison.
|
||||
|
||||
- 2026-05-08 — Bug fix: `dl-check-stratifiable` now rejects recursion
|
||||
through aggregation (e.g., `q(N) :- count(N, X, q(X))`). The
|
||||
stratifier was already adding negation-like edges for aggregates,
|
||||
but the cycle scan only looked at explicit `:neg` literals. Added
|
||||
the matching aggregate branch to the body iteration. Also adds
|
||||
doc-only `lib/datalog/datalog.sx` with the public-API surface
|
||||
(since `load` is an epoch command and can't recurse from within an
|
||||
`.sx` file). 3 new aggregate tests cover recursion-rejection,
|
||||
negation-and-aggregation coexistence, and min-over-empty-derived.
|
||||
|
||||
- 2026-05-08 — Phase 10 demo + canonical query. Added the "cooking
|
||||
posts by people I follow (transitively)" example from the plan:
|
||||
`dl-demo-cooking-rules` defines `reach` over the follow graph
|
||||
(recursive transitive closure) and `cooking-post-by-network` that
|
||||
joins reach with `authored` and `(tagged P cooking)`. 3 demo
|
||||
tests cover transitive network, direct-only follow, and
|
||||
empty-network cases.
|
||||
|
||||
- 2026-05-08 — Phase 8 extension: `findall L V Goal` aggregate. Bind
|
||||
L to the list of distinct V values for which Goal holds (or the
|
||||
empty list when no matches). Implemented as a one-line case in
|
||||
`dl-do-aggregate`. 3 new tests: EDB, derived relation, empty.
|
||||
Useful for "give me all the X such that …" queries without
|
||||
scalar reduction.
|
||||
|
||||
- 2026-05-08 — Phase 5d semantic fix: anonymous `_` variables are
|
||||
renamed per occurrence at `dl-add-rule!` and `dl-query` time so
|
||||
`(p X _) (p _ Y)` no longer unifies the two `_`s. New helpers
|
||||
`dl-rename-anon-term`, `dl-rename-anon-lit`, `dl-make-anon-renamer`,
|
||||
`dl-rename-anon-rule` in db.sx; eval.sx's dl-query renames the goal
|
||||
before search and projects only user-named vars (`_` is filtered
|
||||
out of the projection list). The "underscore in head" test now
|
||||
correctly rejects `(p X _) :- q(X).` — after renaming, the head's
|
||||
fresh anon var has no body binder. Two new eval tests verify
|
||||
rule-level and goal-level independence. 155/155 expected.
|
||||
|
||||
- 2026-05-08 — Phase 5c perf: indexed `dl-find-bindings`. Replaced
|
||||
the recursive `(rest lits)` walk with `dl-fb-aux lits db subst i n`
|
||||
using `nth lits i`. Eliminates O(N²) list-copy per body of length
|
||||
N. chain-15 saturation 25s → 16s; chain-25 finishes in 33s real
|
||||
(vs. timeout previously). Bumped semi_naive tests: differential
|
||||
on chain-10, semi-only count on chain-15 (was chain-5/chain-5).
|
||||
153/153.
|
||||
|
||||
- 2026-05-08 — Phase 10 syntactic demo. New `lib/datalog/demo.sx`
|
||||
with three programs over rose-ash-shaped data: federation
|
||||
(`mutual`, `reachable`, `foaf`), content recommendation
|
||||
(`post-likes` via count aggregation, `popular`, `interesting`),
|
||||
and role-based permissions (`in-group` over transitive subgroups,
|
||||
`can-access`). 10 demo tests pass against synthetic EDB tuples.
|
||||
Postgres loader and `/internal/datalog` HTTP endpoint remain
|
||||
out of scope for this loop (they need service-tree edits beyond
|
||||
`lib/datalog/**`). Conformance now 153/153.
|
||||
|
||||
- 2026-05-08 — Phase 5b perf: hash-set membership in `dl-add-fact!`.
|
||||
db gains a parallel `:facts-keys {<rel>: {<tuple-string>: true}}`
|
||||
index alongside `:facts`. `dl-tuple-key` derives a stable string
|
||||
key via `(str lit)` — `(p 30)` and `(p 30.0)` collide correctly
|
||||
because SX prints them identically. Insertion is O(1) instead of
|
||||
O(n). chain-7 saturation drops from ~12s to ~6s; chain-15 from
|
||||
~50s to ~25s under shared CPU. Larger chains are still slow due
|
||||
to body-join overhead in dl-find-bindings (Blocker updated).
|
||||
`dl-retract!` updated to keep both indices consistent. 143/143.
|
||||
|
||||
- 2026-05-08 — Phase 9 done. New `lib/datalog/api.sx` exposes a
|
||||
parser-free embedding: `dl-program-data facts rules` accepts SX
|
||||
data lists, with rules in either dict form or list form using
|
||||
`<-` as the rule arrow (since SX parses `:-` as a keyword).
|
||||
`dl-rule head body` constructs the dict. `dl-assert! db lit` adds
|
||||
a fact and re-saturates; `dl-retract! db lit` drops the fact from
|
||||
EDB, wipes all rule-headed (IDB) relations, and re-saturates from
|
||||
scratch — the simplest correct semantics until provenance tracking
|
||||
arrives in a later phase. 9 API tests; conformance now 143/143.
|
||||
|
||||
- 2026-05-08 — Phase 8 done. New `lib/datalog/aggregates.sx` (~110
|
||||
LOC): count / sum / min / max. Each is a body literal of shape
|
||||
`(op R V Goal)` — `dl-eval-aggregate` runs `dl-find-bindings` on
|
||||
the goal under the outer subst (so outer vars in the goal get
|
||||
substituted, giving group-by-style aggregation), collects the
|
||||
distinct values of `V`, and binds `R`. Empty input: count/sum
|
||||
return 0; min/max produce no binding (rule fails). Stratifier
|
||||
extended via `dl-aggregate-dep-edge` so the aggregate's goal
|
||||
relation is fully derived before the aggregate fires. Safety check
|
||||
treats goal-internal vars as existentials (no outer binding
|
||||
required); only the result var becomes bound. Conformance now
|
||||
134 / 134.
|
||||
|
||||
- 2026-05-08 — Phase 7 done (Phase 6 magic sets deferred — opt-in,
|
||||
semi-naive default suffices for current test suite). New
|
||||
`lib/datalog/strata.sx` (~290 LOC): dep graph build, Floyd-Warshall
|
||||
reachability, SCC-via-mutual-reachability for non-stratifiability
|
||||
detection, iterative stratum computation, rule grouping by head
|
||||
stratum. eval.sx split: `dl-saturate-rules!` is the per-rule-set
|
||||
semi-naive worker, `dl-saturate!` is now the stratified driver
|
||||
(errors out on non-stratifiable programs). `dl-match-negation` in
|
||||
eval.sx: succeeds iff inner positive match is empty. Stratum-keyed
|
||||
dicts use `(str s)` since SX dicts only accept string/keyword keys.
|
||||
10 negation tests cover EDB/IDB negation, multi-level strata,
|
||||
non-stratifiability rejection, and a negation safety violation.
|
||||
|
||||
- 2026-05-08 — Phase 5 done. `lib/datalog/eval.sx` rewritten to
|
||||
semi-naive default. `dl-saturate!` tracks a per-relation delta and
|
||||
on each iteration walks every positive body position substituting
|
||||
delta for that one literal — joining the rest against the full DB
|
||||
snapshot. `dl-saturate-naive!` retained as the reference. Rules
|
||||
with no positive body literal (e.g. `(p X) :- (= X 5).`) fall back
|
||||
to a naive one-shot via `dl-collect-rule-candidates`. 8 tests
|
||||
differentially compare the two saturators using per-relation tuple
|
||||
counts (cheap). Chain-5 differential exercises multi-iteration
|
||||
recursive saturation. Larger chains made conformance.sh time out
|
||||
due to O(n) `dl-tuple-member?` × CPU sharing with other loop
|
||||
agents — added a Blocker to swap to a hash-set for membership.
|
||||
Also tightened `dl-tuple-member?` to use indexed iteration instead
|
||||
of recursive `rest` (was creating a fresh list per step).
|
||||
|
||||
- 2026-05-07 — Phase 4 done. `lib/datalog/builtins.sx` (~280 LOC) adds
|
||||
`(< X Y)`, `(<= X Y)`, `(> X Y)`, `(>= X Y)`, `(= X Y)`, `(!= X Y)`,
|
||||
and `(is X expr)` with `+ - * /`. `dl-eval-builtin` dispatches;
|
||||
`dl-eval-arith` recursively evaluates nested compounds. Safety
|
||||
check is now order-aware — it walks body literals left-to-right
|
||||
tracking the bound set, requires comparison/`is` inputs to be
|
||||
already bound, and special-cases `=` (binds the var-side; both
|
||||
sides must include at least one bound to bind the other). Phase 3's
|
||||
simple safety check stays in db.sx as a forward-reference fallback;
|
||||
builtins.sx redefines `dl-rule-check-safety` to the comprehensive
|
||||
version. eval.sx's `dl-match-lit` now dispatches built-ins through
|
||||
`dl-eval-builtin`. 19 builtins tests; conformance 106 / 106.
|
||||
|
||||
- 2026-05-07 — Phase 3 done. `lib/datalog/db.sx` (~250 LOC) holds facts
|
||||
indexed by relation name plus the rules list, with `dl-add-fact!` /
|
||||
`dl-add-rule!` (rejects non-ground facts and unsafe rules);
|
||||
`lib/datalog/eval.sx` (~150 LOC) implements the naive bottom-up
|
||||
fixpoint via `dl-find-bindings`/`dl-match-positive`/`dl-saturate!`
|
||||
and `dl-query` (deduped projected substitutions). Safety analysis
|
||||
rejects unsafe head vars at load time. Negation and arithmetic
|
||||
built-ins raise clean errors (lifted in later phases). 15 eval
|
||||
tests cover transitive closure, sibling, same-generation, cyclic
|
||||
graph reach, and six safety violations. Conformance 87 / 87.
|
||||
|
||||
- 2026-05-07 — Phase 2 done. `lib/datalog/unify.sx` (~140 LOC):
|
||||
`dl-var?` (case + underscore), `dl-walk`, `dl-bind`, `dl-unify` (returns
|
||||
extended dict subst or `nil`), `dl-apply-subst`, `dl-ground?`, `dl-vars-of`.
|
||||
Substitutions are immutable dicts; `assoc` builds extended copies. 28
|
||||
unify tests; conformance now 72 / 72.
|
||||
|
||||
- 2026-05-07 — Phase 1 done. `lib/datalog/tokenizer.sx` (~190 LOC) emits
|
||||
`{:type :value :pos}` tokens; `lib/datalog/parser.sx` (~150 LOC) produces
|
||||
`{:head … :body …}` / `{:query …}` clauses, with nested compounds
|
||||
permitted for arithmetic and `not(...)` desugared to `{:neg …}`. 44 / 44
|
||||
via `bash lib/datalog/conformance.sh` (26 tokenize + 18 parse). Local
|
||||
helpers namespace-prefixed (`dl-emit!`, `dl-peek`) after a host-primitive
|
||||
shadow clash. Test harness uses a custom `dl-deep-equal?` that handles
|
||||
out-of-order dict keys and number repr (`equal?` fails on dict key order
|
||||
and on `30` vs `30.0`).
|
||||
|
||||
Reference in New Issue
Block a user