datalog: aggregation count/sum/min/max (Phase 8, 134/134)
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 50s

New lib/datalog/aggregates.sx: (count R V Goal), (sum R V Goal),
(min R V Goal), (max R V Goal). dl-eval-aggregate runs
dl-find-bindings on the goal under the outer subst, collects
distinct values of V, applies the operator, binds R. Empty input:
count/sum return 0; min/max produce no binding (rule fails).

Group-by emerges naturally from outer-subst substitution into the
goal — `popular(P) :- post(P), count(N, U, liked(U, P)), >=(N, 3).`
counts per-post.

Stratifier extended: dl-aggregate-dep-edge contributes a
negation-like edge so the aggregate's goal relation is fully
derived before the aggregate fires (non-monotonicity respected).
Safety relaxed for aggregates: goal-internal vars are existentials,
only the result var becomes bound.
This commit is contained in:
2026-05-08 08:28:45 +00:00
parent caec05eb27
commit 6d04cf7bf2
9 changed files with 447 additions and 27 deletions

View File

@@ -185,12 +185,27 @@ large graphs.
rejection, and a negation safety violation.
### Phase 8 — aggregation (Datalog+)
- [ ] `count(X, Goal)` → number of distinct X satisfying Goal
- [ ] `sum(X, Goal)` → sum of X values satisfying Goal
- [ ] `min(X, Goal)` / `max(X, Goal)` → min/max of X satisfying Goal
- [ ] `group-by` semantics: `count(X, sibling(bob, X))` → count of bob's siblings
- [ ] Aggregation breaks stratification — evaluate in a separate post-fixpoint pass
- [ ] Tests: social network statistics, grade aggregation, inventory sums
- [x] `(count R V Goal)`, `(sum R V Goal)`, `(min R V Goal)`,
`(max R V Goal)` — first arg is the result variable, second is the
aggregated variable, third is the goal literal. Live in
`lib/datalog/aggregates.sx`.
- [x] `dl-eval-aggregate`: runs `dl-find-bindings` on the goal under the
current subst (which provides outer-context bindings), collects
distinct values of the aggregated var, applies the aggregate.
`count`/`sum` produce 0 when no matches; `min`/`max` produce no
binding (rule fails) when empty.
- [x] Group-by emerges naturally: outer-context vars in the goal are
substituted from the current subst, so `popular(P) :- post(P),
count(N, U, liked(U, P)), >=(N, 3).` correctly counts per-post.
- [x] Stratification: `dl-aggregate-dep-edge` returns a negation-like
edge so the aggregate's goal relation is fully derived before the
aggregate fires. Non-monotonicity respected.
- [x] Safety: aggregate body lit binds the result var; goal-internal
vars are existentially quantified and don't need outer binding.
- [x] Tests in `lib/datalog/tests/aggregates.sx` (10): count siblings,
sum prices, min/max scores, count over derived relation,
empty-input cases for each operator, popularity threshold with
group-by, distinct-counted-once.
### Phase 9 — SX embedding API
- [ ] `(dl-program facts rules)` → database from SX data directly (no parsing required)
@@ -230,6 +245,19 @@ large graphs.
_Newest first._
- 2026-05-08 — Phase 8 done. New `lib/datalog/aggregates.sx` (~110
LOC): count / sum / min / max. Each is a body literal of shape
`(op R V Goal)` — `dl-eval-aggregate` runs `dl-find-bindings` on
the goal under the outer subst (so outer vars in the goal get
substituted, giving group-by-style aggregation), collects the
distinct values of `V`, and binds `R`. Empty input: count/sum
return 0; min/max produce no binding (rule fails). Stratifier
extended via `dl-aggregate-dep-edge` so the aggregate's goal
relation is fully derived before the aggregate fires. Safety check
treats goal-internal vars as existentials (no outer binding
required); only the result var becomes bound. Conformance now
134 / 134.
- 2026-05-08 — Phase 7 done (Phase 6 magic sets deferred — opt-in,
semi-naive default suffices for current test suite). New
`lib/datalog/strata.sx` (~290 LOC): dep graph build, Floyd-Warshall