datalog: first-arg index per relation (Phase 5e perf, 169/169)
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 33s

db gains :facts-index {<rel>: {<first-arg-key>: tuples}} mirroring
the membership :facts-keys index. dl-add-fact! populates the index;
dl-match-positive walks the body literal's first arg under the
current subst — when it's bound to a non-var, look up by (str arg)
instead of scanning the full relation.

For chain-style recursive rules (parent X Y), (ancestor Y Z) the
inner Y has at most one parent, so the inner lookup returns 0–1
tuples instead of N. chain-25 saturation drops from ~33s to ~18s
real (~2x). chain-50 still long but tractable; next bottleneck is
subst dict copies during unification.

dl-retract! refreshed to keep the new index consistent: kept-index
rebuilt during EDB filter, IDB wipes clear all three slots.

Differential semi-naive test bumped to chain-12, semi-only count
test to chain-25.
This commit is contained in:
2026-05-08 09:27:44 +00:00
parent c7315f5877
commit cc64ec5cf2
6 changed files with 105 additions and 23 deletions

View File

@@ -264,19 +264,33 @@ large graphs.
## Blockers
- **Saturation perf** improving but not free. Resolved hash-set
membership in `dl-add-fact!` and replaced recursive `(rest lits)` in
`dl-find-bindings` with indexed iteration. chain-15 drops from ~25s
to ~16s and chain-25 saturates in ~33s real / 11s user — most CPU
now in unification (assoc-based subst dict copies) and dict
lookups during walks. Future: a per-rule "compiled" body that
pre-resolves arg positions and intern variable indices, then
unification can use array slots instead of dict assoc.
- **Saturation perf**: three rounds done.
- hash-set membership in `dl-add-fact!` (Phase 5b)
- indexed iteration in `dl-find-bindings` (Phase 5c)
- first-arg index per relation (Phase 5e) — when a body literal's
first arg walks to a non-variable, dl-match-positive looks up
by `(str arg)` instead of scanning the full relation.
chain-25 saturation drops from ~33s to ~18s real (10s user).
chain-50 still long (~120s+) due to dict-copy overhead in
unification subst threading. Future: per-rule "compiled" body
with pre-resolved var positions, slot-based subst representation
to avoid `assoc` per binding.
## Progress log
_Newest first._
- 2026-05-08 — Phase 5e perf: first-arg index per relation. db gains
`:facts-index {<rel>: {<first-arg-key>: tuples}}` mirroring the
existing `:facts-keys` membership index. `dl-add-fact!` populates
it; `dl-match-positive` walks the body literal's first arg under
the current subst — if it's bound to a non-var, look up by
`(str arg)` and iterate only the matching subset. chain-25
saturation 33s → 18s real (~2x). chain-50 still slow (~120s+)
but tractable; next bottleneck is subst dict copies during
unification. Differential test bumped to chain-12, semi-only
count to chain-25.
- 2026-05-08 — Demo: tag co-occurrence. `(cotagged P T1 T2)` — post
has both T1 and T2 with T1 != T2 — and `(tag-pair-count T1 T2 N)`
counting posts per distinct tag pair. Demonstrates count