dl-magic-rewrite rules query-rel adn args returns:
{:rules <rewritten-rules> :seed <magic-seed-fact>}
Worklist over (rel, adn) pairs starts from the query and stops
when no new pairs appear. For each rule with head matching a
worklist pair:
- Adorned rule: head :- magic_<rel>^<adn>(bound), body...
- Propagation rules: for each positive non-builtin body lit
at position i:
magic_<lit-rel>^<lit-adn>(bound-of-lit) :-
magic_<rel>^<adn>(bound-of-head),
body[0..i-1]
- Add (lit-rel, lit-adn) to the worklist.
Built-ins, negation, and aggregates pass through without
generating propagation rules. EDB facts are unchanged.
3 new tests cover seed structure, equivalence on chain-3 (full
closure, 6 ancestor tuples — magic helps only when the EDB has
nodes outside the seed's transitive cone), and same-query-answers
under the rewritten program. Total 202/202.
Wiring up a `dl-saturate-magic!` driver and large-graph perf
benchmarks is left for a future iteration.
Adds the primitives a future magic-sets rewriter will compose:
dl-magic-rel-name rel adornment → "magic_<rel>^<adornment>"
dl-magic-lit rel adn bound-args → magic literal as SX list
dl-bound-args lit adornment → bound-position arg values
Rewriter algorithm (worklist over (rel, adornment) pairs,
generating seed, propagation, and adorned-rule outputs) is still
TODO — these helpers are inspection-only for now.
4 new magic tests cover naming, lit construction, and bound-args
extraction (mixed/free).
New lib/datalog/magic.sx — first piece of magic-sets:
dl-adorn-arg arg bound → "b" or "f"
dl-adorn-args args bound → adornment string
dl-adorn-goal goal → adornment under empty bound set
dl-adorn-lit lit bound → adornment of any literal
dl-vars-bound-by-lit lit bound → free vars this lit will bind
dl-init-head-bound head adn → bound set seeded from head adornment
dl-rule-sips rule head-adn → ({:lit :adornment} ...) per body lit
SIPS walks left-to-right tracking the bound set; recognises `is` and
aggregate result-vars as new binders, lets comparisons and negation
pass through with computed adornments.
Inspection-only — saturator doesn't yet consume these. Lays
groundwork for a future magic-sets transformation.
10 new tests cover pure adornment, SIPS over a chain rule,
head-fully-bound rules, comparisons, and `is`. Total 194/194.
Bug: dl-match-lit (the naive matcher used by dl-find-bindings)
was missing dl-aggregate? dispatch — it was only present in
dl-fbs-aux (semi-naive). Symptom:
(dl-query db '(count N X (p X)))
silently returned ().
Two fixes:
- Add aggregate branch to dl-match-lit before the positive case.
- dl-query-user-vars now projects only the result var (first arg)
of an aggregate goal — the aggregated var and inner-goal vars
are existentials and should not leak into substitutions.
2 new aggregate tests cover count and findall as direct query goals.
Single-call entry: dl-eval source-string query-string parses
both, builds a db via dl-program, saturates implicitly, runs
the query (extracted from the parsed `?- ...` clause), and
returns the substitution list.
Most user-friendly path:
(dl-eval "parent(a, b). ..." "?- ancestor(a, X).")
2 new api tests cover ancestor and multi-goal usage.
Adds a user-facing strategy hook: dl-set-strategy! db strategy and
dl-get-strategy db. Default :semi-naive; :magic is accepted but
the actual transformation is deferred — the saturator currently
falls back to semi-naive regardless. Lets us tick the Phase 6
"Optional pass — guarded behind dl-set-strategy!" checkbox while
keeping the equivalence/perf tests pending future work.
3 new eval tests.
dl-demo-shortest-path-rules: path enumerates X→Z with cost
W = sum of edge weights via is/+; shortest filters to the
minimum cost path per (X, Y) pair via min aggregation.
3 demo tests cover direct/multi-hop choice, multi-hop wins on
cheaper route, and unreachable-empty.
Note: cycles produce infinite distance values without a depth
filter; the rule docstring flags this and suggests adding
(<, D, MAX) for graphs that may cycle.
Returns {<rel-name>: tuple-count} for relations with any tuples
or that are rule-headed (so empty IDB shows as :rel 0 rather than
disappearing). Skips placeholder entries from internal
dl-ensure-rel! calls. 4 tests cover basic, empty IDB, mixed
EDB+IDB, and empty-db cases.
db gains :facts-index {<rel>: {<first-arg-key>: tuples}} mirroring
the membership :facts-keys index. dl-add-fact! populates the index;
dl-match-positive walks the body literal's first arg under the
current subst — when it's bound to a non-var, look up by (str arg)
instead of scanning the full relation.
For chain-style recursive rules (parent X Y), (ancestor Y Z) the
inner Y has at most one parent, so the inner lookup returns 0–1
tuples instead of N. chain-25 saturation drops from ~33s to ~18s
real (~2x). chain-50 still long but tractable; next bottleneck is
subst dict copies during unification.
dl-retract! refreshed to keep the new index consistent: kept-index
rebuilt during EDB filter, IDB wipes clear all three slots.
Differential semi-naive test bumped to chain-12, semi-only count
test to chain-25.
Adds (cotagged P T1 T2) — P has both T1 and T2 with T1 != T2 — and
(tag-pair-count T1 T2 N) which counts posts cotagged with each
distinct (T1, T2) pair. Demonstrates count aggregation against a
recursive-then-aggregated stream of derived tuples.
2 new demo tests: cooking + vegetarian co-occurrence on a small
data set, and a count-of-co-occurrences query.
dl-query now auto-dispatches on the first element's shape:
- positive literal (head is a symbol) or {:neg ...} dict → wrap
- list of literals → conjunctive query
dl-query-coerce normalizes; dl-query-user-vars collects the union
of user-named vars (deduped, '_' filtered) for projection. Old
single-literal callers unchanged.
(dl-query db '(p X)) ; single
(dl-query db '((p X) (q X))) ; conjunction
(dl-query db (list '(n X) '(> X 2))) ; with comparison
2 new api tests cover multi-goal AND and conjunction with comparison.
Bug: dl-check-stratifiable iterated body literals looking only for
explicit :neg literals, missing aggregate cycles. Now also walks
aggregates via dl-aggregate-dep-edge — q(N) :- count(N, X, q(X))
correctly errors out at saturation time.
3 new tests cover:
- recursion-through-aggregation rejected
- negation + aggregation coexist when in different strata
- min over empty derived relation produces no result
Adds the canonical Phase 10 example from the plan: "Posts about
cooking by people I follow (transitively)." dl-demo-cooking-rules
defines reach over the follow graph (recursive transitive closure)
and cooking-post-by-network joining reach + authored + (tagged P
cooking). 3 new demo tests cover transitive network, direct-only
follow, and empty-network cases.
(findall L V Goal) — bind L to the distinct V values for which Goal
holds, or the empty list when none. One-line addition to
dl-do-aggregate that returns the unreduced list. Tests cover EDB,
derived relation, and empty cases.
Useful for "give me all the X such that ..." queries without
scalar reduction.
(p X _), (p _ Y) — the two _ are now different variables, matching
standard Datalog semantics. Previously both _ symbols were the same
SX symbol, so unification across them gave wrong answers.
Fix in db.sx: dl-rename-anon-term + dl-rename-anon-lit walk a term
or literal and replace each '_' symbol with a fresh _anon<N>.
dl-make-anon-renamer returns a counter-based name generator scoped
per call. dl-rename-anon-rule applies it to head and body of a
rule. dl-add-rule! invokes the renamer before safety check.
eval.sx: dl-query renames anon vars in the goal before search and
filters '_' out of the projection so user-facing results aren't
polluted with internal _anon<N> bindings.
The previous "underscore in head ok" test now correctly rejects
(p X _) :- q(X) as unsafe (the head's fresh anon var has no body
binder). New "underscore in body only" test confirms the safe
case. Two regression tests for rule-level and goal-level
independence.
dl-find-bindings now uses dl-fb-aux lits db subst i n (indexed
iteration via nth) instead of recursive (rest lits). Eliminates
O(N²) list-copy per body of length N. chain-15 saturation 25s
→ 16s; chain-25 finishes in 33s real (vs. timeout previously).
Bumped semi_naive tests to chain-10 differential + chain-15
semi-only count (was chain-5/chain-5). Blocker entry refreshed.
New lib/datalog/demo.sx with three Datalog-as-query-language demos
over synthetic rose-ash data:
Federation: (mutual A B), (reachable A B), (foaf A C) over a
follows graph.
Content: (post-likes P N) via count aggregation, (popular P)
for likes >= 3, (interesting Me P) joining follows
+ authored + popular.
Permissions: (in-group A G) over transitive subgroup chains,
(can-access A R).
10 tests run each program against in-memory EDB tuples loaded via
dl-program-data.
Wiring to PostgreSQL and exposing as a service endpoint (/internal
/datalog) is out of scope for this loop — both would require
edits outside lib/datalog/. Programs above document the EDB shape
a real loader would populate.
New lib/datalog/api.sx: dl-program-data facts rules takes SX data
lists. Rules accept either dict form or list form using <- as the
rule arrow (since SX parses :- as a keyword). dl-rule constructor
for the dict shape. dl-assert! adds a fact and re-saturates;
dl-retract! drops EDB matches, wipes all rule-headed IDB
relations, and re-saturates from scratch — simplest correct
semantics until provenance tracking arrives.
9 API tests cover ancestor closure via data, dict-rule form,
dl-rule constructor, incremental assert/retract, cyclic-graph
reach, assert into empty, fact-style rule (no arrow), dict
passthrough.
New lib/datalog/aggregates.sx: (count R V Goal), (sum R V Goal),
(min R V Goal), (max R V Goal). dl-eval-aggregate runs
dl-find-bindings on the goal under the outer subst, collects
distinct values of V, applies the operator, binds R. Empty input:
count/sum return 0; min/max produce no binding (rule fails).
Group-by emerges naturally from outer-subst substitution into the
goal — `popular(P) :- post(P), count(N, U, liked(U, P)), >=(N, 3).`
counts per-post.
Stratifier extended: dl-aggregate-dep-edge contributes a
negation-like edge so the aggregate's goal relation is fully
derived before the aggregate fires (non-monotonicity respected).
Safety relaxed for aggregates: goal-internal vars are existentials,
only the result var becomes bound.
New lib/datalog/strata.sx: dl-build-dep-graph (relation -> deps with
:neg flag), Floyd-Warshall reachability, SCC-via-mutual-reach for
non-stratifiability detection, iterative dl-compute-strata, and
dl-group-rules-by-stratum.
eval.sx refactor:
- dl-saturate-rules! db rules — semi-naive worker over a rule subset
- dl-saturate! db — stratified driver. Rejects non-stratifiable
programs at saturation time, then iterates strata in order
- dl-match-negation — succeeds iff inner positive match is empty
Order-aware safety in dl-rule-check-safety (Phase 4) already
required negation vars to be bound by a prior positive literal.
Stratum dict keys are strings (SX dicts don't accept ints).
Phase 6 magic sets deferred — opt-in path, semi-naive default
suffices for current workloads.
dl-saturate! is now semi-naive: tracks a per-relation delta dict,
and on each iteration walks every positive body-literal position,
substituting the delta of its relation while joining the rest
against the previous-iteration DB. Candidates are collected before
mutating the DB so the "full" sides see a consistent snapshot.
Rules with no positive body literal (e.g. (p X) :- (= X 5).)
fall back to a one-shot naive pass via dl-collect-rule-candidates.
dl-saturate-naive! retained as the reference implementation; 8
differential tests compare per-relation tuple counts on every
recursive program. Switched dl-tuple-member? to indexed iteration
instead of recursive rest (eliminates per-step list copy). Larger
chains under bundled conformance trip O(n) membership × CPU
sharing — added a Blocker to swap relations to hash-set membership.
New lib/datalog/builtins.sx: (< <= > >= = !=) and (is X expr) with
+ - * /. dl-eval-arith recursively evaluates nested compounds.
Safety analysis now walks body left-to-right tracking the bound
set: comparisons require all args bound, is RHS vars must be bound
(LHS becomes bound), = special-cases the var/non-var combos.
db.sx keeps the simple safety check as a forward-reference
fallback; builtins.sx redefines dl-rule-check-safety to the
comprehensive version. eval.sx dispatches built-ins through
dl-eval-builtin instead of erroring. 19 new tests.
Tokens → list of {:head :body} / {:query} clauses. SX symbols for
constants and variables (case-distinguished). not(literal) in body
desugars to {:neg literal}. Nested compounds permitted in arg
position for arithmetic; safety analysis (Phase 3) will gate them.
Conformance harness wraps lib/guest/conformance.sh; produces
lib/datalog/scoreboard.{json,md}.