datalog: hash-set membership for facts (Phase 5b perf)
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 52s
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 52s
db gains a parallel :facts-keys {<rel>: {<tuple-string>: true}}
index alongside :facts. dl-tuple-key derives a stable string via
(str lit) — (p 30) and (p 30.0) collide correctly because SX
prints them identically. dl-add-fact! membership is now O(1)
instead of O(n) list scan; insert sequences for relations sized
N drop from O(N²) to O(N).
Wall clock on chain-7 saturation halves (~12s → ~6s); chain-15
roughly halves (~50s → ~25s) under shared CPU. Larger chains
still slow due to body-join overhead in dl-find-bindings —
Blocker entry refreshed with proposed follow-ups.
dl-retract! keeps both indices consistent: kept-keys is rebuilt
during the EDB filter, IDB wipes clear both lists and key dicts.
This commit is contained in:
@@ -111,21 +111,29 @@
|
|||||||
(has-key? (get db :facts) rel-key)
|
(has-key? (get db :facts) rel-key)
|
||||||
(let
|
(let
|
||||||
((existing (get (get db :facts) rel-key))
|
((existing (get (get db :facts) rel-key))
|
||||||
(kept (list)))
|
(kept (list))
|
||||||
|
(kept-keys {}))
|
||||||
(do
|
(do
|
||||||
(for-each
|
(for-each
|
||||||
(fn
|
(fn
|
||||||
(t)
|
(t)
|
||||||
(when
|
(when
|
||||||
(not (dl-tuple-equal? t lit))
|
(not (dl-tuple-equal? t lit))
|
||||||
(append! kept t)))
|
(do
|
||||||
|
(append! kept t)
|
||||||
|
(dict-set! kept-keys (dl-tuple-key t) true))))
|
||||||
existing)
|
existing)
|
||||||
(dict-set! (get db :facts) rel-key kept))))
|
(dict-set! (get db :facts) rel-key kept)
|
||||||
|
(dict-set! (get db :facts-keys) rel-key kept-keys))))
|
||||||
;; Wipe all relations that have a rule (these are IDB) so the
|
;; Wipe all relations that have a rule (these are IDB) so the
|
||||||
;; saturator regenerates them from the surviving EDB.
|
;; saturator regenerates them from the surviving EDB.
|
||||||
(let ((rule-heads (dl-rule-head-rels db)))
|
(let ((rule-heads (dl-rule-head-rels db)))
|
||||||
(for-each
|
(for-each
|
||||||
(fn (k) (dict-set! (get db :facts) k (list)))
|
(fn
|
||||||
|
(k)
|
||||||
|
(do
|
||||||
|
(dict-set! (get db :facts) k (list))
|
||||||
|
(dict-set! (get db :facts-keys) k {})))
|
||||||
rule-heads))
|
rule-heads))
|
||||||
(dl-saturate! db)
|
(dl-saturate! db)
|
||||||
db))))
|
db))))
|
||||||
|
|||||||
@@ -12,7 +12,7 @@
|
|||||||
;; lib/datalog/builtins.sx) swaps in the real `dl-rule-check-safety`,
|
;; lib/datalog/builtins.sx) swaps in the real `dl-rule-check-safety`,
|
||||||
;; which is order-aware and understands built-in predicates.
|
;; which is order-aware and understands built-in predicates.
|
||||||
|
|
||||||
(define dl-make-db (fn () {:facts {} :rules (list)}))
|
(define dl-make-db (fn () {:facts {} :facts-keys {} :rules (list)}))
|
||||||
|
|
||||||
(define
|
(define
|
||||||
dl-rel-name
|
dl-rel-name
|
||||||
@@ -97,13 +97,19 @@
|
|||||||
(fn
|
(fn
|
||||||
(db rel-key)
|
(db rel-key)
|
||||||
(let
|
(let
|
||||||
((facts (get db :facts)))
|
((facts (get db :facts))
|
||||||
|
(fk (get db :facts-keys)))
|
||||||
(do
|
(do
|
||||||
(when
|
(when
|
||||||
(not (has-key? facts rel-key))
|
(not (has-key? facts rel-key))
|
||||||
(dict-set! facts rel-key (list)))
|
(dict-set! facts rel-key (list)))
|
||||||
|
(when
|
||||||
|
(not (has-key? fk rel-key))
|
||||||
|
(dict-set! fk rel-key {}))
|
||||||
(get facts rel-key)))))
|
(get facts rel-key)))))
|
||||||
|
|
||||||
|
(define dl-tuple-key (fn (lit) (str lit)))
|
||||||
|
|
||||||
(define
|
(define
|
||||||
dl-rel-tuples
|
dl-rel-tuples
|
||||||
(fn
|
(fn
|
||||||
@@ -125,10 +131,16 @@
|
|||||||
(let
|
(let
|
||||||
((rel-key (dl-rel-name lit)))
|
((rel-key (dl-rel-name lit)))
|
||||||
(let
|
(let
|
||||||
((tuples (dl-ensure-rel! db rel-key)))
|
((tuples (dl-ensure-rel! db rel-key))
|
||||||
|
(key-dict (get (get db :facts-keys) rel-key))
|
||||||
|
(tk (dl-tuple-key lit)))
|
||||||
(cond
|
(cond
|
||||||
((dl-tuple-member? lit tuples) false)
|
((has-key? key-dict tk) false)
|
||||||
(else (do (append! tuples lit) true)))))))))
|
(else
|
||||||
|
(do
|
||||||
|
(dict-set! key-dict tk true)
|
||||||
|
(append! tuples lit)
|
||||||
|
true)))))))))
|
||||||
|
|
||||||
;; The full safety check lives in builtins.sx (it has to know which
|
;; The full safety check lives in builtins.sx (it has to know which
|
||||||
;; predicates are built-ins). dl-add-rule! calls it via forward
|
;; predicates are built-ins). dl-add-rule! calls it via forward
|
||||||
|
|||||||
@@ -246,17 +246,29 @@ large graphs.
|
|||||||
|
|
||||||
## Blockers
|
## Blockers
|
||||||
|
|
||||||
- **Hash-set membership for relations.** `dl-tuple-member?` uses a linear
|
- **Saturation perf on long chains.** Resolved one bottleneck (hash-set
|
||||||
list scan; insert is O(n) and saturating chain-N pushes O(n²) → O(n³)
|
membership in `dl-add-fact!`) but `dl-saturate!` still spends
|
||||||
total. Under bundled conformance (CPU shared with other loop agents)
|
significant time per iteration on rule body joins — chain-15 takes
|
||||||
even chain-15 hits multi-minute wall-clock. Tests scoped to chain-5
|
~25s real / 3s user under contention even after the membership fix.
|
||||||
for now. Fix: maintain a `{tuple-key → true}` dict per relation
|
Two follow-ups to consider: (a) avoid `(rest lits)` in
|
||||||
alongside the list; key tuples by their serialized form.
|
`dl-find-bindings`/`dl-fbs-aux` (uses indexed iteration like the
|
||||||
|
membership fix), (b) memoize the per-rule body shape so `(len lits)`
|
||||||
|
and accessor calls don't re-walk the list each step.
|
||||||
|
|
||||||
## Progress log
|
## Progress log
|
||||||
|
|
||||||
_Newest first._
|
_Newest first._
|
||||||
|
|
||||||
|
- 2026-05-08 — Phase 5b perf: hash-set membership in `dl-add-fact!`.
|
||||||
|
db gains a parallel `:facts-keys {<rel>: {<tuple-string>: true}}`
|
||||||
|
index alongside `:facts`. `dl-tuple-key` derives a stable string
|
||||||
|
key via `(str lit)` — `(p 30)` and `(p 30.0)` collide correctly
|
||||||
|
because SX prints them identically. Insertion is O(1) instead of
|
||||||
|
O(n). chain-7 saturation drops from ~12s to ~6s; chain-15 from
|
||||||
|
~50s to ~25s under shared CPU. Larger chains are still slow due
|
||||||
|
to body-join overhead in dl-find-bindings (Blocker updated).
|
||||||
|
`dl-retract!` updated to keep both indices consistent. 143/143.
|
||||||
|
|
||||||
- 2026-05-08 — Phase 9 done. New `lib/datalog/api.sx` exposes a
|
- 2026-05-08 — Phase 9 done. New `lib/datalog/api.sx` exposes a
|
||||||
parser-free embedding: `dl-program-data facts rules` accepts SX
|
parser-free embedding: `dl-program-data facts rules` accepts SX
|
||||||
data lists, with rules in either dict form or list form using
|
data lists, with rules in either dict form or list form using
|
||||||
|
|||||||
Reference in New Issue
Block a user