Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 19s
573 lines
32 KiB
Markdown
573 lines
32 KiB
Markdown
# Datalog-on-SX: Datalog on the CEK/VM
|
||
|
||
Datalog is a declarative query language: a restricted subset of Prolog with no function
|
||
symbols, only relations. Programs are sets of facts and rules; queries ask what follows.
|
||
Evaluation is bottom-up (fixpoint iteration) rather than Prolog's top-down DFS — which
|
||
means no infinite loops, guaranteed termination, and efficient incremental updates.
|
||
|
||
The unique angle: Datalog is a natural companion to the Prolog implementation already in
|
||
progress (`lib/prolog/`). The parser and term representation can share infrastructure;
|
||
the evaluator is an entirely different fixpoint engine rather than a DFS solver.
|
||
|
||
End-state goal: **full core Datalog** (facts, rules, stratified negation, aggregation,
|
||
recursion) with a clean SX query API, and a demonstration of Datalog as a query engine
|
||
for rose-ash data (e.g. federation graph, content relationships).
|
||
|
||
## Status (rolling)
|
||
|
||
`bash lib/datalog/conformance.sh` → **224/224 across 11 suites**
|
||
(tokenize, parse, unify, eval, builtins, semi_naive, negation, aggregates,
|
||
api, magic, demo). Source is ~3100 LOC, tests ~2900 LOC, public API
|
||
documented in `lib/datalog/datalog.sx`.
|
||
|
||
Phases 1–9 are functionally complete; Phase 10 covers the rose-ash
|
||
domain demos (in `lib/datalog/demo.sx` — federation, content,
|
||
permissions, cooking-posts, tag co-occurrence, shortest path, org chart).
|
||
The PostgreSQL loader and `/internal/datalog` HTTP endpoint listed in
|
||
Phase 10 require service-tree edits outside `lib/datalog/**` and are
|
||
flagged as out-of-scope for this loop.
|
||
|
||
## Ground rules
|
||
|
||
- **Scope:** only touch `lib/datalog/**` and `plans/datalog-on-sx.md`. Do **not** edit
|
||
`spec/`, `hosts/`, `shared/`, `lib/prolog/**`, or other `lib/<lang>/`.
|
||
- **Shared-file issues** go under "Blockers" below with a minimal repro; do not fix here.
|
||
- **SX files:** use `sx-tree` MCP tools only.
|
||
- **Architecture:** Datalog source → term AST → fixpoint evaluator. No transpiler to SX AST —
|
||
the evaluator is written in SX and works directly on term structures.
|
||
- **Reference:** Ramakrishnan & Ullman "A Survey of Deductive Database Systems";
|
||
Dalmau "Datalog and Constraint Satisfaction".
|
||
- **Commits:** one feature per commit. Keep `## Progress log` updated and tick boxes.
|
||
|
||
## Architecture sketch
|
||
|
||
```
|
||
Datalog source text
|
||
│
|
||
▼
|
||
lib/datalog/tokenizer.sx — atoms, variables, numbers, strings, punct (?- :- , . ( ) [ ])
|
||
│
|
||
▼
|
||
lib/datalog/parser.sx — facts: atom(args). rules: head :- body. queries: ?- goal.
|
||
│ No function symbols (only constants and variables in args).
|
||
▼
|
||
lib/datalog/db.sx — extensional DB (EDB): ground facts; IDB: derived relations;
|
||
│ clause index by relation name/arity
|
||
▼
|
||
lib/datalog/eval.sx — bottom-up fixpoint: semi-naive evaluation with delta sets;
|
||
│ stratification for negation; incremental update API
|
||
▼
|
||
lib/datalog/query.sx — query API: (datalog-query db goal) → list of substitutions;
|
||
SX embedding: define facts/rules as SX data directly
|
||
```
|
||
|
||
Key differences from Prolog:
|
||
- **No function symbols** — args are atoms, numbers, strings, or variables only. No `f(a,b)`.
|
||
- **No cuts** — no procedural control.
|
||
- **Bottom-up** — derive all consequences of all rules before answering; no search tree.
|
||
- **Termination guaranteed** — no infinite derivation chains (no function symbols → finite Herbrand base).
|
||
- **Stratified negation** — `not(P)` legal iff P does not recursively depend on its own negation.
|
||
- **Aggregation** — `count`, `sum`, `min`, `max` over derived tuples (Datalog+).
|
||
|
||
## Roadmap
|
||
|
||
### Phase 1 — tokenizer + parser
|
||
- [x] Tokenizer: atoms (lowercase/quoted), variables (uppercase/`_`), numbers, strings,
|
||
punct (`( )`, `,`, `.`), operators (`:-`, `?-`, `<=`, `>=`, `!=`, `<`, `>`, `=`,
|
||
`+`, `-`, `*`, `/`), comments (`%`, `/* */`)
|
||
Note: no function symbol syntax (no nested `f(...)` in arg position) — but the
|
||
parser permits nested compounds for arithmetic; safety analysis (Phase 3) rejects
|
||
non-arithmetic nesting.
|
||
- [x] Parser:
|
||
- Facts: `parent(tom, bob).` → `{:head (parent tom bob) :body ()}`
|
||
- Rules: `ancestor(X,Z) :- parent(X,Y), ancestor(Y,Z).`
|
||
→ `{:head (ancestor X Z) :body ((parent X Y) (ancestor Y Z))}`
|
||
- Queries: `?- ancestor(tom, X).` → `{:query ((ancestor tom X))}`
|
||
(`:query` value is always a list of literals; `?- p, q.` → `{:query ((p) (q))}`)
|
||
- Negation: `not(parent(X,Y))` in body position → `{:neg (parent X Y)}`
|
||
- [x] Tests in `lib/datalog/tests/parse.sx` (18) and `lib/datalog/tests/tokenize.sx` (26).
|
||
Conformance harness: `bash lib/datalog/conformance.sh` → 44 / 44 passing.
|
||
|
||
### Phase 2 — unification + substitution
|
||
- [x] Ported (not shared) from `lib/prolog/` — term walk, no occurs check.
|
||
- [x] `dl-unify t1 t2 subst` → extended subst dict, or `nil` on failure.
|
||
- [x] `dl-walk`, `dl-bind`, `dl-apply-subst`, `dl-ground?`, `dl-vars-of`.
|
||
- [x] Substitutions are immutable dicts keyed by variable name (string).
|
||
Lists/tuples unify element-wise (used for arithmetic compounds too).
|
||
- [x] Tests in `lib/datalog/tests/unify.sx` (28). 72 / 72 conformance.
|
||
|
||
### Phase 3 — extensional DB + naive evaluation + safety analysis
|
||
- [x] EDB+IDB combined: `{:facts {<rel-name-string> -> (literal ...)}}` —
|
||
relations indexed by name; tuples stored as full literals so they
|
||
unify directly. Dedup on insert via `dl-tuple-equal?`.
|
||
- [x] `dl-add-fact! db lit` (rejects non-ground) and `dl-add-rule! db rule`
|
||
(rejects unsafe). `dl-program source` parses + loads in one step.
|
||
- [x] Naive evaluation `dl-saturate! db`: iterate rules until no new tuples.
|
||
`dl-find-bindings` recursively joins body literals; `dl-match-positive`
|
||
unifies a literal against every tuple in the relation.
|
||
- [x] `dl-query db goal` → list of substitutions over `goal`'s vars,
|
||
deduplicated. `dl-relation db name` for derived tuples.
|
||
- [x] Safety analysis at `dl-add-rule!` time: every head variable except
|
||
`_` must appear in some positive body literal. Built-ins and negated
|
||
literals do not satisfy safety. Helpers `dl-positive-body-vars`,
|
||
`dl-rule-unsafe-head-vars` exposed for later phases.
|
||
- [x] Negation and arithmetic built-ins error cleanly at saturate time
|
||
(Phase 4 / Phase 7 will swap in real semantics).
|
||
- [x] Tests in `lib/datalog/tests/eval.sx` (15): transitive closure,
|
||
sibling, same-generation, grandparent, cyclic graph reach, six
|
||
safety cases. 87 / 87 conformance.
|
||
|
||
### Phase 4 — built-in predicates + body arithmetic
|
||
Almost every real query needs `<`, `=`, simple arithmetic, and string
|
||
comparisons in body position. These are not EDB lookups — they're
|
||
constraints that filter bindings.
|
||
- [x] Recognise built-in predicates in body: `(< X Y)`, `(<= X Y)`, `(> X Y)`,
|
||
`(>= X Y)`, `(= X Y)`, `(!= X Y)` and arithmetic forms `(is Z (+ X Y))`,
|
||
`(is Z (- X Y))`, `(is Z (* X Y))`, `(is Z (/ X Y))`. Live in
|
||
`lib/datalog/builtins.sx`.
|
||
- [x] `dl-eval-builtin` dispatches; `dl-eval-arith` recursively evaluates
|
||
`(+ a b)` etc. with full nesting. `=` unifies; `!=` rejects equal
|
||
ground terms.
|
||
- [x] Order-aware safety analysis (`dl-rule-check-safety`): walks body
|
||
left-to-right tracking which vars are bound. `is`'s RHS vars must
|
||
be already bound; LHS becomes bound. Comparisons require both
|
||
sides bound. `=` is special-cased — at least one side bound binds
|
||
the other. Negation vars must be bound (will be enforced fully in
|
||
Phase 7).
|
||
- [x] Wired through SX numeric primitives — no separate number tower.
|
||
- [x] Tests in `lib/datalog/tests/builtins.sx` (19): range filters,
|
||
arithmetic derivations, equality binding, eight safety violations
|
||
and three safe-shape tests. Conformance 106 / 106.
|
||
|
||
### Phase 5 — semi-naive evaluation (performance)
|
||
- [x] Delta sets `{rel-name -> tuples}` track newly derived tuples per iter.
|
||
`dl-snapshot-facts` builds the initial delta from the EDB.
|
||
- [x] Semi-naive rule: for each rule, walk every positive body literal
|
||
position; substitute that one with the per-relation delta and join
|
||
the rest against the previous-iteration DB (`dl-find-bindings-semi`).
|
||
Candidates are collected before mutating the DB so the "full" sides
|
||
see a consistent snapshot.
|
||
- [x] `dl-collect-rule-candidates` falls back to a naive single pass when
|
||
a rule has no positive body literal (e.g. `(p X) :- (= X 5).`).
|
||
- [x] `dl-saturate!` is now semi-naive by default; `dl-saturate-naive!`
|
||
kept for differential testing and a reference implementation.
|
||
- [x] Tests in `lib/datalog/tests/semi_naive.sx` (8) — every recursive
|
||
program from earlier suites is run under both saturators with
|
||
per-relation tuple counts compared (cheap, robust under bundled
|
||
conformance session). A chain-5 differential exercises multiple
|
||
semi-naive iterations against the recursive ancestor rule.
|
||
Larger chains hit prohibitive wall-clock under conformance CPU
|
||
contention with other agents — a future Blocker tracks switching
|
||
`dl-tuple-member?` from O(n²) list scan to a hash-set per relation.
|
||
|
||
### Phase 6 — magic sets (goal-directed bottom-up, opt-in)
|
||
Naive bottom-up derives **all** consequences before answering. Magic sets
|
||
rewrite the program so the fixpoint only derives tuples relevant to the
|
||
goal — a major perf win for "what's reachable from node X" queries on
|
||
large graphs.
|
||
- [x] Adornments: `dl-adorn-goal goal` and `dl-adorn-lit lit bound` in
|
||
`lib/datalog/magic.sx`. Per-arg `b`/`f` based on whether the arg
|
||
is a constant or a variable already in the bound set.
|
||
- [x] Magic transformation: `dl-magic-rewrite rules query-rel adn args`
|
||
generates `{:rules <rewritten-rules> :seed <magic-seed>}`. Each
|
||
original rule is gated with a `magic_<rel>^<adn>(bound)` filter,
|
||
and propagation rules are emitted for each positive non-builtin
|
||
body literal. Worklist over `(rel, adn)` pairs starts from the
|
||
query and stops when no new pairs appear. EDB facts pass through
|
||
unchanged.
|
||
- [x] Sideways information passing strategy (SIPS): left-to-right
|
||
`dl-rule-sips rule head-adornment` walks body literals tracking
|
||
the bound set, returning `({:lit :adornment} ...)`. Recognises
|
||
`is`/aggregate result-vars as new binders; comparisons and
|
||
negation pass through with computed adornments. (Pluggable
|
||
strategies are future work.)
|
||
- [x] `dl-set-strategy! db strategy` hook + `dl-get-strategy db`. Default
|
||
`:semi-naive`. `:magic` accepted but the transformation itself is
|
||
deferred — saturator currently falls back to semi-naive. Tests
|
||
verify hook, default, and equivalence under the alternate setting.
|
||
- [x] Equivalence test: rewritten ancestor program over the same EDB
|
||
derives the same number of `ancestor` tuples and returns the
|
||
same query answers as the unrewritten program (chain-3 case).
|
||
- [x] `dl-magic-query db query-goal` — top-level driver. Builds a
|
||
fresh internal db with the caller's EDB facts, the magic seed,
|
||
and the rewritten rules; saturates and queries. Caller's db is
|
||
untouched. Equivalent to `dl-query` for fully-stratifiable
|
||
programs (sole motivation is a perf alternative on goal-shaped
|
||
queries against large recursive relations).
|
||
- [ ] Perf test: 10k-node reachability with magic vs semi-naive.
|
||
Left to a future iteration — would need a benchmarking harness
|
||
for large graphs and the conformance budget can't afford it.
|
||
|
||
### Phase 7 — stratified negation
|
||
- [x] Dependency graph: `dl-build-dep-graph db` returns `{head -> ({:rel
|
||
:neg} ...)}`. Built-ins drop out (they're not relations).
|
||
- [x] Reachability via Floyd-Warshall in `dl-build-reach`; cycles
|
||
detected by `reach[A][B] && reach[B][A]`. Programs are
|
||
non-stratifiable iff any negative dependency falls inside an SCC.
|
||
`dl-check-stratifiable` returns nil on success or a clear message.
|
||
- [x] `dl-compute-strata` propagates stratum numbers iteratively:
|
||
`stratum(R) = max over deps of (stratum(dep) + (1 if negated else 0))`.
|
||
- [x] Saturator refactor: `dl-saturate-rules! db rules` is the semi-
|
||
naive worker; `dl-saturate! db` rejects non-stratifiable programs,
|
||
groups rules by head's stratum, and runs the worker on each
|
||
stratum in increasing order.
|
||
- [x] `not(P)` in body: `dl-match-negation` walks the inner literal
|
||
under the current subst and uses `dl-match-positive` — succeeds
|
||
iff zero matches. Order-aware safety in `dl-rule-check-safety`
|
||
(already present from Phase 4) requires negation vars to be
|
||
bound by an earlier positive literal.
|
||
- [x] Tests in `lib/datalog/tests/negation.sx` (10): EDB and IDB
|
||
negation, two-step strata, multi-level strata, with-arithmetic,
|
||
empty-result and always-fail cases, non-stratifiability
|
||
rejection, and a negation safety violation.
|
||
|
||
### Phase 8 — aggregation (Datalog+)
|
||
- [x] `(count R V Goal)`, `(sum R V Goal)`, `(min R V Goal)`,
|
||
`(max R V Goal)`, `(findall L V Goal)` — first arg is the result
|
||
variable, second is the aggregated variable, third is the goal
|
||
literal. `findall` returns the distinct-value list itself; the
|
||
others reduce. Live in `lib/datalog/aggregates.sx`.
|
||
- [x] `dl-eval-aggregate`: runs `dl-find-bindings` on the goal under the
|
||
current subst (which provides outer-context bindings), collects
|
||
distinct values of the aggregated var, applies the aggregate.
|
||
`count`/`sum` produce 0 when no matches; `min`/`max` produce no
|
||
binding (rule fails) when empty.
|
||
- [x] Group-by emerges naturally: outer-context vars in the goal are
|
||
substituted from the current subst, so `popular(P) :- post(P),
|
||
count(N, U, liked(U, P)), >=(N, 3).` correctly counts per-post.
|
||
- [x] Stratification: `dl-aggregate-dep-edge` returns a negation-like
|
||
edge so the aggregate's goal relation is fully derived before the
|
||
aggregate fires. Non-monotonicity respected.
|
||
- [x] Safety: aggregate body lit binds the result var; goal-internal
|
||
vars are existentially quantified and don't need outer binding.
|
||
- [x] Tests in `lib/datalog/tests/aggregates.sx` (10): count siblings,
|
||
sum prices, min/max scores, count over derived relation,
|
||
empty-input cases for each operator, popularity threshold with
|
||
group-by, distinct-counted-once.
|
||
|
||
### Phase 9 — SX embedding API
|
||
- [x] `(dl-program-data facts rules)` builds a db from SX data —
|
||
`facts` is a list of literals, `rules` is a list of either
|
||
dicts `{:head … :body …}` or lists `(<head…> <- <body…>)`.
|
||
Variables are SX symbols whose first char is uppercase or `_`,
|
||
matching the parser's convention.
|
||
```
|
||
(dl-program-data
|
||
'((parent tom bob) (parent bob ann))
|
||
'((ancestor X Y <- (parent X Y))
|
||
(ancestor X Z <- (parent X Y) (ancestor Y Z))))
|
||
```
|
||
- [x] `(dl-rule head body)` constructor for the dict form.
|
||
- [x] `(dl-query db '(ancestor tom X))` already worked — same query API
|
||
consumes the SX-data goal. Now also accepts a *list* of body
|
||
literals for conjunctive queries:
|
||
`(dl-query db '((p X) (q X)))`,
|
||
`(dl-query db (list '(n X) '(> X 2)))`. Auto-dispatched via
|
||
`dl-query-coerce` on first-element shape.
|
||
- [x] `(dl-assert! db '(parent ann pat))` → adds the fact and re-saturates.
|
||
- [x] `(dl-retract! db '(parent bob ann))` → drops matching tuples from
|
||
the EDB list, wipes every relation that has a rule (those are IDB),
|
||
and re-saturates from the surviving EDB.
|
||
- [x] Tests in `lib/datalog/tests/api.sx` (9): closure via data API,
|
||
dict-rule form, dl-rule constructor, dl-assert! incremental,
|
||
dl-retract! removes derived, cyclic-graph reach via data,
|
||
assert into empty db, fact-style rule (no arrow), coerce dict.
|
||
- [x] Integration demo: federation graph query — `(reachable A B)` /
|
||
`(mutual A B)` / `(foaf A C)` over `(follows ACTOR-A ACTOR-B)` in
|
||
`lib/datalog/demo.sx`. Tests in `lib/datalog/tests/demo.sx`.
|
||
Wiring this to actual rose-ash ActivityPub data is Phase 10
|
||
service work and is out of scope for this loop.
|
||
|
||
### Phase 10 — Datalog as a query language for rose-ash
|
||
- [x] Schema sketches in `lib/datalog/demo.sx`:
|
||
- **Federation**: `(follows A B)` → `(mutual A B)`, `(reachable A B)`,
|
||
`(foaf A C)` (friend-of-a-friend, distinct).
|
||
- **Content**: `(authored A P)`, `(liked U P)`, `(tagged P T)` →
|
||
`(post-likes P N)` via aggregation, `(popular P)` for likes ≥ 3,
|
||
`(interesting Me P)` joining follows + authored + popular.
|
||
- **Permissions**: `(member A G)`, `(subgroup C P)`, `(allowed G R)`
|
||
→ `(in-group A G)` over transitive subgroups, `(can-access A R)`.
|
||
- **Cooking-posts** (the canonical example): `(reach Me Them)` over
|
||
the follow graph, then `(cooking-post-by-network Me P)` joining
|
||
reach + authored + `(tagged P cooking)`.
|
||
- [ ] Loader `dl-load-from-db!` — out of scope for this loop
|
||
(would need to edit `shared/services/` outside `lib/datalog/`).
|
||
Programs in `demo.sx` already document the EDB shape expected
|
||
from such a loader. `dl-program-data` consumes the same shape.
|
||
- [x] Query examples covered by `lib/datalog/tests/demo.sx` (10):
|
||
mutuals, transitive reach, FOAF, popular posts, interesting feed,
|
||
post likes count, direct/subgroup/transitive group access, no
|
||
access without grant.
|
||
- [ ] Service endpoint `POST /internal/datalog` — out of scope as above.
|
||
Once exposed, server-side handler would be `dl-program-data` +
|
||
`dl-query`, returning JSON-encoded substitutions.
|
||
|
||
## Blockers
|
||
|
||
- **Saturation perf**: three rounds done.
|
||
- hash-set membership in `dl-add-fact!` (Phase 5b)
|
||
- indexed iteration in `dl-find-bindings` (Phase 5c)
|
||
- first-arg index per relation (Phase 5e) — when a body literal's
|
||
first arg walks to a non-variable, dl-match-positive looks up
|
||
by `(str arg)` instead of scanning the full relation.
|
||
chain-25 saturation drops from ~33s to ~18s real (10s user).
|
||
chain-50 still long (~120s+) due to dict-copy overhead in
|
||
unification subst threading. Future: per-rule "compiled" body
|
||
with pre-resolved var positions, slot-based subst representation
|
||
to avoid `assoc` per binding.
|
||
|
||
## Progress log
|
||
|
||
_Newest first._
|
||
|
||
- 2026-05-08 — Phase 6 driver: `dl-magic-query db query-goal`.
|
||
Builds a fresh internal db from the caller's EDB + magic seed +
|
||
rewritten rules, saturates, queries, returns substitutions —
|
||
caller's db is untouched. Equivalent to `dl-query` for any
|
||
fully-stratifiable program; sole motivation is a perf alternative
|
||
on goal-shaped queries against large recursive relations.
|
||
2 new tests cover equivalence and non-mutation.
|
||
|
||
- 2026-05-08 — Phase 6 magic-sets rewriter. `dl-magic-rewrite rules
|
||
query-rel adn args` returns `{:rules <rewritten> :seed <seed-fact>}`.
|
||
Worklist over `(rel, adn)` pairs starts from the query, gates each
|
||
original rule with a `magic_<rel>^<adn>(bound)` filter, and emits
|
||
propagation rules for each positive non-builtin body literal so
|
||
that magic spreads to body relations. EDB facts pass through.
|
||
3 new tests cover seed structure, equivalence on chain-3 by
|
||
ancestor-relation tuple count, and same-query-answers under
|
||
the rewritten program. The plumbing for a `dl-saturate-magic!`
|
||
driver and large-graph perf benchmarks is still future work.
|
||
|
||
- 2026-05-08 — Phase 6 building blocks for the magic-sets
|
||
transformation: `dl-magic-rel-name`, `dl-magic-lit`,
|
||
`dl-bound-args`. The rewriter that generates magic seed and
|
||
propagation rules is still future work; with these primitives
|
||
in place it's a straightforward worklist algorithm. 4 new tests.
|
||
|
||
- 2026-05-08 — Phase 6 adornments + SIPS in
|
||
`lib/datalog/magic.sx`. Inspection helpers — `dl-adorn-goal` and
|
||
`dl-adorn-lit` compute per-arg `b`/`f` patterns under a bound
|
||
set; `dl-rule-sips rule head-adornment` walks body literals
|
||
left-to-right propagating the bound set, recognising `is` and
|
||
aggregate result-vars as new binders. Lays groundwork for a
|
||
later magic-sets transformation. 10 new tests cover pure
|
||
adornment, SIPS over a chain rule, head-fully-bound rules,
|
||
comparisons, and `is`. Saturator does not yet consume these.
|
||
|
||
- 2026-05-08 — Comprehensive integration test in api suite: a
|
||
single program exercising recursion (`reach` transitive closure)
|
||
+ stratified negation (`safe X Y :- reach X Y, not banned Y`) +
|
||
aggregation (`reach_count` via count) + comparison (`>= N 2`)
|
||
composed end-to-end via `dl-eval source query-source`. Confirms
|
||
the full pipeline (parser → safety → stratifier → semi-naive +
|
||
aggregate post-pass → query) on a non-trivial program.
|
||
|
||
- 2026-05-08 — Bug fix: aggregates work as top-level query goals.
|
||
`dl-match-lit` (the naive matcher used by `dl-find-bindings`) was
|
||
missing the `dl-aggregate?` dispatch — it was only present in
|
||
`dl-fbs-aux` (semi-naive). Symptom: `(dl-query db '(count N X (p X)))`
|
||
silently returned `()`. Also updated `dl-query-user-vars` to project
|
||
only the result var (first arg) of an aggregate goal — the
|
||
aggregated var and inner-goal vars are existentials and should not
|
||
appear in the projected substitution. 2 new aggregate tests cover
|
||
the regression.
|
||
|
||
- 2026-05-08 — Convenience: `dl-eval source query-source`. Parses
|
||
both strings, builds a db, saturates, runs the query, returns
|
||
the substitution list. Single-call user-friendly entry. 2 new
|
||
api tests cover ancestor and multi-goal queries.
|
||
|
||
- 2026-05-08 — Phase 6 stub: `dl-set-strategy! db strategy` and
|
||
`dl-get-strategy db` user-facing hooks. Default `:semi-naive`;
|
||
`:magic` is accepted but the actual transformation is deferred,
|
||
so saturation still uses semi-naive. Lets us tick the
|
||
"Optional pass — guarded behind dl-set-strategy!" Phase 6 box.
|
||
3 new eval tests.
|
||
|
||
- 2026-05-08 — Demo: weighted-DAG shortest path. `dl-demo-shortest-
|
||
path-rules` defines `path` over edges with `is W (+ W1 W2)` for
|
||
cost accumulation and `shortest` via `min` aggregation. 3 demo
|
||
tests cover direct/multi-hop choice, multi-hop wins on cheaper
|
||
route, and unreachable-empty. Added `dl-summary db` inspection
|
||
helper returning `{<rel>: count}` (4 eval tests).
|
||
|
||
- 2026-05-08 — Phase 5e perf: first-arg index per relation. db gains
|
||
`:facts-index {<rel>: {<first-arg-key>: tuples}}` mirroring the
|
||
existing `:facts-keys` membership index. `dl-add-fact!` populates
|
||
it; `dl-match-positive` walks the body literal's first arg under
|
||
the current subst — if it's bound to a non-var, look up by
|
||
`(str arg)` and iterate only the matching subset. chain-25
|
||
saturation 33s → 18s real (~2x). chain-50 still slow (~120s+)
|
||
but tractable; next bottleneck is subst dict copies during
|
||
unification. Differential test bumped to chain-12, semi-only
|
||
count to chain-25.
|
||
|
||
- 2026-05-08 — Demo: tag co-occurrence. `(cotagged P T1 T2)` — post
|
||
has both T1 and T2 with T1 != T2 — and `(tag-pair-count T1 T2 N)`
|
||
counting posts per distinct tag pair. Demonstrates count
|
||
aggregation grouped by outer-context vars. 2 new demo tests.
|
||
|
||
- 2026-05-08 — `dl-query` accepts a list of body literals for
|
||
conjunctive queries, in addition to a single positive literal.
|
||
`dl-query-coerce` dispatches based on the first element's shape:
|
||
positive lit (head is a symbol) or `:neg` dict → wrap as singleton;
|
||
list of lits → use as-is. `dl-query-user-vars` collects the union
|
||
of vars across all goals (deduped, `_` filtered) for projection.
|
||
2 new api tests: multi-goal AND, and conjunction with comparison.
|
||
|
||
- 2026-05-08 — Bug fix: `dl-check-stratifiable` now rejects recursion
|
||
through aggregation (e.g., `q(N) :- count(N, X, q(X))`). The
|
||
stratifier was already adding negation-like edges for aggregates,
|
||
but the cycle scan only looked at explicit `:neg` literals. Added
|
||
the matching aggregate branch to the body iteration. Also adds
|
||
doc-only `lib/datalog/datalog.sx` with the public-API surface
|
||
(since `load` is an epoch command and can't recurse from within an
|
||
`.sx` file). 3 new aggregate tests cover recursion-rejection,
|
||
negation-and-aggregation coexistence, and min-over-empty-derived.
|
||
|
||
- 2026-05-08 — Phase 10 demo + canonical query. Added the "cooking
|
||
posts by people I follow (transitively)" example from the plan:
|
||
`dl-demo-cooking-rules` defines `reach` over the follow graph
|
||
(recursive transitive closure) and `cooking-post-by-network` that
|
||
joins reach with `authored` and `(tagged P cooking)`. 3 demo
|
||
tests cover transitive network, direct-only follow, and
|
||
empty-network cases.
|
||
|
||
- 2026-05-08 — Phase 8 extension: `findall L V Goal` aggregate. Bind
|
||
L to the list of distinct V values for which Goal holds (or the
|
||
empty list when no matches). Implemented as a one-line case in
|
||
`dl-do-aggregate`. 3 new tests: EDB, derived relation, empty.
|
||
Useful for "give me all the X such that …" queries without
|
||
scalar reduction.
|
||
|
||
- 2026-05-08 — Phase 5d semantic fix: anonymous `_` variables are
|
||
renamed per occurrence at `dl-add-rule!` and `dl-query` time so
|
||
`(p X _) (p _ Y)` no longer unifies the two `_`s. New helpers
|
||
`dl-rename-anon-term`, `dl-rename-anon-lit`, `dl-make-anon-renamer`,
|
||
`dl-rename-anon-rule` in db.sx; eval.sx's dl-query renames the goal
|
||
before search and projects only user-named vars (`_` is filtered
|
||
out of the projection list). The "underscore in head" test now
|
||
correctly rejects `(p X _) :- q(X).` — after renaming, the head's
|
||
fresh anon var has no body binder. Two new eval tests verify
|
||
rule-level and goal-level independence. 155/155 expected.
|
||
|
||
- 2026-05-08 — Phase 5c perf: indexed `dl-find-bindings`. Replaced
|
||
the recursive `(rest lits)` walk with `dl-fb-aux lits db subst i n`
|
||
using `nth lits i`. Eliminates O(N²) list-copy per body of length
|
||
N. chain-15 saturation 25s → 16s; chain-25 finishes in 33s real
|
||
(vs. timeout previously). Bumped semi_naive tests: differential
|
||
on chain-10, semi-only count on chain-15 (was chain-5/chain-5).
|
||
153/153.
|
||
|
||
- 2026-05-08 — Phase 10 syntactic demo. New `lib/datalog/demo.sx`
|
||
with three programs over rose-ash-shaped data: federation
|
||
(`mutual`, `reachable`, `foaf`), content recommendation
|
||
(`post-likes` via count aggregation, `popular`, `interesting`),
|
||
and role-based permissions (`in-group` over transitive subgroups,
|
||
`can-access`). 10 demo tests pass against synthetic EDB tuples.
|
||
Postgres loader and `/internal/datalog` HTTP endpoint remain
|
||
out of scope for this loop (they need service-tree edits beyond
|
||
`lib/datalog/**`). Conformance now 153/153.
|
||
|
||
- 2026-05-08 — Phase 5b perf: hash-set membership in `dl-add-fact!`.
|
||
db gains a parallel `:facts-keys {<rel>: {<tuple-string>: true}}`
|
||
index alongside `:facts`. `dl-tuple-key` derives a stable string
|
||
key via `(str lit)` — `(p 30)` and `(p 30.0)` collide correctly
|
||
because SX prints them identically. Insertion is O(1) instead of
|
||
O(n). chain-7 saturation drops from ~12s to ~6s; chain-15 from
|
||
~50s to ~25s under shared CPU. Larger chains are still slow due
|
||
to body-join overhead in dl-find-bindings (Blocker updated).
|
||
`dl-retract!` updated to keep both indices consistent. 143/143.
|
||
|
||
- 2026-05-08 — Phase 9 done. New `lib/datalog/api.sx` exposes a
|
||
parser-free embedding: `dl-program-data facts rules` accepts SX
|
||
data lists, with rules in either dict form or list form using
|
||
`<-` as the rule arrow (since SX parses `:-` as a keyword).
|
||
`dl-rule head body` constructs the dict. `dl-assert! db lit` adds
|
||
a fact and re-saturates; `dl-retract! db lit` drops the fact from
|
||
EDB, wipes all rule-headed (IDB) relations, and re-saturates from
|
||
scratch — the simplest correct semantics until provenance tracking
|
||
arrives in a later phase. 9 API tests; conformance now 143/143.
|
||
|
||
- 2026-05-08 — Phase 8 done. New `lib/datalog/aggregates.sx` (~110
|
||
LOC): count / sum / min / max. Each is a body literal of shape
|
||
`(op R V Goal)` — `dl-eval-aggregate` runs `dl-find-bindings` on
|
||
the goal under the outer subst (so outer vars in the goal get
|
||
substituted, giving group-by-style aggregation), collects the
|
||
distinct values of `V`, and binds `R`. Empty input: count/sum
|
||
return 0; min/max produce no binding (rule fails). Stratifier
|
||
extended via `dl-aggregate-dep-edge` so the aggregate's goal
|
||
relation is fully derived before the aggregate fires. Safety check
|
||
treats goal-internal vars as existentials (no outer binding
|
||
required); only the result var becomes bound. Conformance now
|
||
134 / 134.
|
||
|
||
- 2026-05-08 — Phase 7 done (Phase 6 magic sets deferred — opt-in,
|
||
semi-naive default suffices for current test suite). New
|
||
`lib/datalog/strata.sx` (~290 LOC): dep graph build, Floyd-Warshall
|
||
reachability, SCC-via-mutual-reachability for non-stratifiability
|
||
detection, iterative stratum computation, rule grouping by head
|
||
stratum. eval.sx split: `dl-saturate-rules!` is the per-rule-set
|
||
semi-naive worker, `dl-saturate!` is now the stratified driver
|
||
(errors out on non-stratifiable programs). `dl-match-negation` in
|
||
eval.sx: succeeds iff inner positive match is empty. Stratum-keyed
|
||
dicts use `(str s)` since SX dicts only accept string/keyword keys.
|
||
10 negation tests cover EDB/IDB negation, multi-level strata,
|
||
non-stratifiability rejection, and a negation safety violation.
|
||
|
||
- 2026-05-08 — Phase 5 done. `lib/datalog/eval.sx` rewritten to
|
||
semi-naive default. `dl-saturate!` tracks a per-relation delta and
|
||
on each iteration walks every positive body position substituting
|
||
delta for that one literal — joining the rest against the full DB
|
||
snapshot. `dl-saturate-naive!` retained as the reference. Rules
|
||
with no positive body literal (e.g. `(p X) :- (= X 5).`) fall back
|
||
to a naive one-shot via `dl-collect-rule-candidates`. 8 tests
|
||
differentially compare the two saturators using per-relation tuple
|
||
counts (cheap). Chain-5 differential exercises multi-iteration
|
||
recursive saturation. Larger chains made conformance.sh time out
|
||
due to O(n) `dl-tuple-member?` × CPU sharing with other loop
|
||
agents — added a Blocker to swap to a hash-set for membership.
|
||
Also tightened `dl-tuple-member?` to use indexed iteration instead
|
||
of recursive `rest` (was creating a fresh list per step).
|
||
|
||
- 2026-05-07 — Phase 4 done. `lib/datalog/builtins.sx` (~280 LOC) adds
|
||
`(< X Y)`, `(<= X Y)`, `(> X Y)`, `(>= X Y)`, `(= X Y)`, `(!= X Y)`,
|
||
and `(is X expr)` with `+ - * /`. `dl-eval-builtin` dispatches;
|
||
`dl-eval-arith` recursively evaluates nested compounds. Safety
|
||
check is now order-aware — it walks body literals left-to-right
|
||
tracking the bound set, requires comparison/`is` inputs to be
|
||
already bound, and special-cases `=` (binds the var-side; both
|
||
sides must include at least one bound to bind the other). Phase 3's
|
||
simple safety check stays in db.sx as a forward-reference fallback;
|
||
builtins.sx redefines `dl-rule-check-safety` to the comprehensive
|
||
version. eval.sx's `dl-match-lit` now dispatches built-ins through
|
||
`dl-eval-builtin`. 19 builtins tests; conformance 106 / 106.
|
||
|
||
- 2026-05-07 — Phase 3 done. `lib/datalog/db.sx` (~250 LOC) holds facts
|
||
indexed by relation name plus the rules list, with `dl-add-fact!` /
|
||
`dl-add-rule!` (rejects non-ground facts and unsafe rules);
|
||
`lib/datalog/eval.sx` (~150 LOC) implements the naive bottom-up
|
||
fixpoint via `dl-find-bindings`/`dl-match-positive`/`dl-saturate!`
|
||
and `dl-query` (deduped projected substitutions). Safety analysis
|
||
rejects unsafe head vars at load time. Negation and arithmetic
|
||
built-ins raise clean errors (lifted in later phases). 15 eval
|
||
tests cover transitive closure, sibling, same-generation, cyclic
|
||
graph reach, and six safety violations. Conformance 87 / 87.
|
||
|
||
- 2026-05-07 — Phase 2 done. `lib/datalog/unify.sx` (~140 LOC):
|
||
`dl-var?` (case + underscore), `dl-walk`, `dl-bind`, `dl-unify` (returns
|
||
extended dict subst or `nil`), `dl-apply-subst`, `dl-ground?`, `dl-vars-of`.
|
||
Substitutions are immutable dicts; `assoc` builds extended copies. 28
|
||
unify tests; conformance now 72 / 72.
|
||
|
||
- 2026-05-07 — Phase 1 done. `lib/datalog/tokenizer.sx` (~190 LOC) emits
|
||
`{:type :value :pos}` tokens; `lib/datalog/parser.sx` (~150 LOC) produces
|
||
`{:head … :body …}` / `{:query …}` clauses, with nested compounds
|
||
permitted for arithmetic and `not(...)` desugared to `{:neg …}`. 44 / 44
|
||
via `bash lib/datalog/conformance.sh` (26 tokenize + 18 parse). Local
|
||
helpers namespace-prefixed (`dl-emit!`, `dl-peek`) after a host-primitive
|
||
shadow clash. Test harness uses a custom `dl-deep-equal?` that handles
|
||
out-of-order dict keys and number repr (`equal?` fails on dict key order
|
||
and on `30` vs `30.0`).
|