datalog: hash-set membership for facts (Phase 5b perf)
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 52s

db gains a parallel :facts-keys {<rel>: {<tuple-string>: true}}
index alongside :facts. dl-tuple-key derives a stable string via
(str lit) — (p 30) and (p 30.0) collide correctly because SX
prints them identically. dl-add-fact! membership is now O(1)
instead of O(n) list scan; insert sequences for relations sized
N drop from O(N²) to O(N).

Wall clock on chain-7 saturation halves (~12s → ~6s); chain-15
roughly halves (~50s → ~25s) under shared CPU. Larger chains
still slow due to body-join overhead in dl-find-bindings —
Blocker entry refreshed with proposed follow-ups.

dl-retract! keeps both indices consistent: kept-keys is rebuilt
during the EDB filter, IDB wipes clear both lists and key dicts.
This commit is contained in:
2026-05-08 08:42:10 +00:00
parent ce603e9879
commit 3cc760082c
3 changed files with 47 additions and 15 deletions

View File

@@ -246,17 +246,29 @@ large graphs.
## Blockers
- **Hash-set membership for relations.** `dl-tuple-member?` uses a linear
list scan; insert is O(n) and saturating chain-N pushes O(n²) → O(n³)
total. Under bundled conformance (CPU shared with other loop agents)
even chain-15 hits multi-minute wall-clock. Tests scoped to chain-5
for now. Fix: maintain a `{tuple-key → true}` dict per relation
alongside the list; key tuples by their serialized form.
- **Saturation perf on long chains.** Resolved one bottleneck (hash-set
membership in `dl-add-fact!`) but `dl-saturate!` still spends
significant time per iteration on rule body joins — chain-15 takes
~25s real / 3s user under contention even after the membership fix.
Two follow-ups to consider: (a) avoid `(rest lits)` in
`dl-find-bindings`/`dl-fbs-aux` (uses indexed iteration like the
membership fix), (b) memoize the per-rule body shape so `(len lits)`
and accessor calls don't re-walk the list each step.
## Progress log
_Newest first._
- 2026-05-08 — Phase 5b perf: hash-set membership in `dl-add-fact!`.
db gains a parallel `:facts-keys {<rel>: {<tuple-string>: true}}`
index alongside `:facts`. `dl-tuple-key` derives a stable string
key via `(str lit)` — `(p 30)` and `(p 30.0)` collide correctly
because SX prints them identically. Insertion is O(1) instead of
O(n). chain-7 saturation drops from ~12s to ~6s; chain-15 from
~50s to ~25s under shared CPU. Larger chains are still slow due
to body-join overhead in dl-find-bindings (Blocker updated).
`dl-retract!` updated to keep both indices consistent. 143/143.
- 2026-05-08 — Phase 9 done. New `lib/datalog/api.sx` exposes a
parser-free embedding: `dl-program-data facts rules` accepts SX
data lists, with rules in either dict form or list form using