datalog: semi-naive saturator + delta sets (Phase 5, 114/114)
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 56s

dl-saturate! is now semi-naive: tracks a per-relation delta dict,
and on each iteration walks every positive body-literal position,
substituting the delta of its relation while joining the rest
against the previous-iteration DB. Candidates are collected before
mutating the DB so the "full" sides see a consistent snapshot.
Rules with no positive body literal (e.g. (p X) :- (= X 5).)
fall back to a one-shot naive pass via dl-collect-rule-candidates.

dl-saturate-naive! retained as the reference implementation; 8
differential tests compare per-relation tuple counts on every
recursive program. Switched dl-tuple-member? to indexed iteration
instead of recursive rest (eliminates per-step list copy). Larger
chains under bundled conformance trip O(n) membership × CPU
sharing — added a Blocker to swap relations to hash-set membership.
This commit is contained in:
2026-05-08 08:13:07 +00:00
parent 7ce723f732
commit d964f58c48
7 changed files with 425 additions and 24 deletions

View File

@@ -126,10 +126,25 @@ constraints that filter bindings.
and three safe-shape tests. Conformance 106 / 106.
### Phase 5 — semi-naive evaluation (performance)
- [ ] Delta sets: track newly derived tuples per iteration
- [ ] Semi-naive rule: only join against delta tuples from last iteration, not full relation
- [ ] Significant speedup for recursive rules — avoids re-deriving known tuples
- [ ] Tests: verify semi-naive produces same results as naive; benchmark on large ancestor chain
- [x] Delta sets `{rel-name -> tuples}` track newly derived tuples per iter.
`dl-snapshot-facts` builds the initial delta from the EDB.
- [x] Semi-naive rule: for each rule, walk every positive body literal
position; substitute that one with the per-relation delta and join
the rest against the previous-iteration DB (`dl-find-bindings-semi`).
Candidates are collected before mutating the DB so the "full" sides
see a consistent snapshot.
- [x] `dl-collect-rule-candidates` falls back to a naive single pass when
a rule has no positive body literal (e.g. `(p X) :- (= X 5).`).
- [x] `dl-saturate!` is now semi-naive by default; `dl-saturate-naive!`
kept for differential testing and a reference implementation.
- [x] Tests in `lib/datalog/tests/semi_naive.sx` (8) — every recursive
program from earlier suites is run under both saturators with
per-relation tuple counts compared (cheap, robust under bundled
conformance session). A chain-5 differential exercises multiple
semi-naive iterations against the recursive ancestor rule.
Larger chains hit prohibitive wall-clock under conformance CPU
contention with other agents — a future Blocker tracks switching
`dl-tuple-member?` from O(n²) list scan to a hash-set per relation.
### Phase 6 — magic sets (goal-directed bottom-up, opt-in)
Naive bottom-up derives **all** consequences before answering. Magic sets
@@ -194,12 +209,32 @@ large graphs.
## Blockers
_(none yet)_
- **Hash-set membership for relations.** `dl-tuple-member?` uses a linear
list scan; insert is O(n) and saturating chain-N pushes O(n²) → O(n³)
total. Under bundled conformance (CPU shared with other loop agents)
even chain-15 hits multi-minute wall-clock. Tests scoped to chain-5
for now. Fix: maintain a `{tuple-key → true}` dict per relation
alongside the list; key tuples by their serialized form.
## Progress log
_Newest first._
- 2026-05-08 — Phase 5 done. `lib/datalog/eval.sx` rewritten to
semi-naive default. `dl-saturate!` tracks a per-relation delta and
on each iteration walks every positive body position substituting
delta for that one literal — joining the rest against the full DB
snapshot. `dl-saturate-naive!` retained as the reference. Rules
with no positive body literal (e.g. `(p X) :- (= X 5).`) fall back
to a naive one-shot via `dl-collect-rule-candidates`. 8 tests
differentially compare the two saturators using per-relation tuple
counts (cheap). Chain-5 differential exercises multi-iteration
recursive saturation. Larger chains made conformance.sh time out
due to O(n) `dl-tuple-member?` × CPU sharing with other loop
agents — added a Blocker to swap to a hash-set for membership.
Also tightened `dl-tuple-member?` to use indexed iteration instead
of recursive `rest` (was creating a fresh list per step).
- 2026-05-07 — Phase 4 done. `lib/datalog/builtins.sx` (~280 LOC) adds
`(< X Y)`, `(<= X Y)`, `(> X Y)`, `(>= X Y)`, `(= X Y)`, `(!= X Y)`,
and `(is X expr)` with `+ - * /`. `dl-eval-builtin` dispatches;