datalog: first-arg index per relation (Phase 5e perf, 169/169)
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 33s
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 33s
db gains :facts-index {<rel>: {<first-arg-key>: tuples}} mirroring
the membership :facts-keys index. dl-add-fact! populates the index;
dl-match-positive walks the body literal's first arg under the
current subst — when it's bound to a non-var, look up by (str arg)
instead of scanning the full relation.
For chain-style recursive rules (parent X Y), (ancestor Y Z) the
inner Y has at most one parent, so the inner lookup returns 0–1
tuples instead of N. chain-25 saturation drops from ~33s to ~18s
real (~2x). chain-50 still long but tractable; next bottleneck is
subst dict copies during unification.
dl-retract! refreshed to keep the new index consistent: kept-index
rebuilt during EDB filter, IDB wipes clear all three slots.
Differential semi-naive test bumped to chain-12, semi-only count
test to chain-25.
This commit is contained in:
@@ -264,19 +264,33 @@ large graphs.
|
||||
|
||||
## Blockers
|
||||
|
||||
- **Saturation perf** improving but not free. Resolved hash-set
|
||||
membership in `dl-add-fact!` and replaced recursive `(rest lits)` in
|
||||
`dl-find-bindings` with indexed iteration. chain-15 drops from ~25s
|
||||
to ~16s and chain-25 saturates in ~33s real / 11s user — most CPU
|
||||
now in unification (assoc-based subst dict copies) and dict
|
||||
lookups during walks. Future: a per-rule "compiled" body that
|
||||
pre-resolves arg positions and intern variable indices, then
|
||||
unification can use array slots instead of dict assoc.
|
||||
- **Saturation perf**: three rounds done.
|
||||
- hash-set membership in `dl-add-fact!` (Phase 5b)
|
||||
- indexed iteration in `dl-find-bindings` (Phase 5c)
|
||||
- first-arg index per relation (Phase 5e) — when a body literal's
|
||||
first arg walks to a non-variable, dl-match-positive looks up
|
||||
by `(str arg)` instead of scanning the full relation.
|
||||
chain-25 saturation drops from ~33s to ~18s real (10s user).
|
||||
chain-50 still long (~120s+) due to dict-copy overhead in
|
||||
unification subst threading. Future: per-rule "compiled" body
|
||||
with pre-resolved var positions, slot-based subst representation
|
||||
to avoid `assoc` per binding.
|
||||
|
||||
## Progress log
|
||||
|
||||
_Newest first._
|
||||
|
||||
- 2026-05-08 — Phase 5e perf: first-arg index per relation. db gains
|
||||
`:facts-index {<rel>: {<first-arg-key>: tuples}}` mirroring the
|
||||
existing `:facts-keys` membership index. `dl-add-fact!` populates
|
||||
it; `dl-match-positive` walks the body literal's first arg under
|
||||
the current subst — if it's bound to a non-var, look up by
|
||||
`(str arg)` and iterate only the matching subset. chain-25
|
||||
saturation 33s → 18s real (~2x). chain-50 still slow (~120s+)
|
||||
but tractable; next bottleneck is subst dict copies during
|
||||
unification. Differential test bumped to chain-12, semi-only
|
||||
count to chain-25.
|
||||
|
||||
- 2026-05-08 — Demo: tag co-occurrence. `(cotagged P T1 T2)` — post
|
||||
has both T1 and T2 with T1 != T2 — and `(tag-pair-count T1 T2 N)`
|
||||
counting posts per distinct tag pair. Demonstrates count
|
||||
|
||||
Reference in New Issue
Block a user