(p X _), (p _ Y) — the two _ are now different variables, matching
standard Datalog semantics. Previously both _ symbols were the same
SX symbol, so unification across them gave wrong answers.
Fix in db.sx: dl-rename-anon-term + dl-rename-anon-lit walk a term
or literal and replace each '_' symbol with a fresh _anon<N>.
dl-make-anon-renamer returns a counter-based name generator scoped
per call. dl-rename-anon-rule applies it to head and body of a
rule. dl-add-rule! invokes the renamer before safety check.
eval.sx: dl-query renames anon vars in the goal before search and
filters '_' out of the projection so user-facing results aren't
polluted with internal _anon<N> bindings.
The previous "underscore in head ok" test now correctly rejects
(p X _) :- q(X) as unsafe (the head's fresh anon var has no body
binder). New "underscore in body only" test confirms the safe
case. Two regression tests for rule-level and goal-level
independence.
dl-find-bindings now uses dl-fb-aux lits db subst i n (indexed
iteration via nth) instead of recursive (rest lits). Eliminates
O(N²) list-copy per body of length N. chain-15 saturation 25s
→ 16s; chain-25 finishes in 33s real (vs. timeout previously).
Bumped semi_naive tests to chain-10 differential + chain-15
semi-only count (was chain-5/chain-5). Blocker entry refreshed.
New lib/datalog/demo.sx with three Datalog-as-query-language demos
over synthetic rose-ash data:
Federation: (mutual A B), (reachable A B), (foaf A C) over a
follows graph.
Content: (post-likes P N) via count aggregation, (popular P)
for likes >= 3, (interesting Me P) joining follows
+ authored + popular.
Permissions: (in-group A G) over transitive subgroup chains,
(can-access A R).
10 tests run each program against in-memory EDB tuples loaded via
dl-program-data.
Wiring to PostgreSQL and exposing as a service endpoint (/internal
/datalog) is out of scope for this loop — both would require
edits outside lib/datalog/. Programs above document the EDB shape
a real loader would populate.
db gains a parallel :facts-keys {<rel>: {<tuple-string>: true}}
index alongside :facts. dl-tuple-key derives a stable string via
(str lit) — (p 30) and (p 30.0) collide correctly because SX
prints them identically. dl-add-fact! membership is now O(1)
instead of O(n) list scan; insert sequences for relations sized
N drop from O(N²) to O(N).
Wall clock on chain-7 saturation halves (~12s → ~6s); chain-15
roughly halves (~50s → ~25s) under shared CPU. Larger chains
still slow due to body-join overhead in dl-find-bindings —
Blocker entry refreshed with proposed follow-ups.
dl-retract! keeps both indices consistent: kept-keys is rebuilt
during the EDB filter, IDB wipes clear both lists and key dicts.
New lib/datalog/api.sx: dl-program-data facts rules takes SX data
lists. Rules accept either dict form or list form using <- as the
rule arrow (since SX parses :- as a keyword). dl-rule constructor
for the dict shape. dl-assert! adds a fact and re-saturates;
dl-retract! drops EDB matches, wipes all rule-headed IDB
relations, and re-saturates from scratch — simplest correct
semantics until provenance tracking arrives.
9 API tests cover ancestor closure via data, dict-rule form,
dl-rule constructor, incremental assert/retract, cyclic-graph
reach, assert into empty, fact-style rule (no arrow), dict
passthrough.
New lib/datalog/aggregates.sx: (count R V Goal), (sum R V Goal),
(min R V Goal), (max R V Goal). dl-eval-aggregate runs
dl-find-bindings on the goal under the outer subst, collects
distinct values of V, applies the operator, binds R. Empty input:
count/sum return 0; min/max produce no binding (rule fails).
Group-by emerges naturally from outer-subst substitution into the
goal — `popular(P) :- post(P), count(N, U, liked(U, P)), >=(N, 3).`
counts per-post.
Stratifier extended: dl-aggregate-dep-edge contributes a
negation-like edge so the aggregate's goal relation is fully
derived before the aggregate fires (non-monotonicity respected).
Safety relaxed for aggregates: goal-internal vars are existentials,
only the result var becomes bound.
New lib/datalog/strata.sx: dl-build-dep-graph (relation -> deps with
:neg flag), Floyd-Warshall reachability, SCC-via-mutual-reach for
non-stratifiability detection, iterative dl-compute-strata, and
dl-group-rules-by-stratum.
eval.sx refactor:
- dl-saturate-rules! db rules — semi-naive worker over a rule subset
- dl-saturate! db — stratified driver. Rejects non-stratifiable
programs at saturation time, then iterates strata in order
- dl-match-negation — succeeds iff inner positive match is empty
Order-aware safety in dl-rule-check-safety (Phase 4) already
required negation vars to be bound by a prior positive literal.
Stratum dict keys are strings (SX dicts don't accept ints).
Phase 6 magic sets deferred — opt-in path, semi-naive default
suffices for current workloads.
dl-saturate! is now semi-naive: tracks a per-relation delta dict,
and on each iteration walks every positive body-literal position,
substituting the delta of its relation while joining the rest
against the previous-iteration DB. Candidates are collected before
mutating the DB so the "full" sides see a consistent snapshot.
Rules with no positive body literal (e.g. (p X) :- (= X 5).)
fall back to a one-shot naive pass via dl-collect-rule-candidates.
dl-saturate-naive! retained as the reference implementation; 8
differential tests compare per-relation tuple counts on every
recursive program. Switched dl-tuple-member? to indexed iteration
instead of recursive rest (eliminates per-step list copy). Larger
chains under bundled conformance trip O(n) membership × CPU
sharing — added a Blocker to swap relations to hash-set membership.
New lib/datalog/builtins.sx: (< <= > >= = !=) and (is X expr) with
+ - * /. dl-eval-arith recursively evaluates nested compounds.
Safety analysis now walks body left-to-right tracking the bound
set: comparisons require all args bound, is RHS vars must be bound
(LHS becomes bound), = special-cases the var/non-var combos.
db.sx keeps the simple safety check as a forward-reference
fallback; builtins.sx redefines dl-rule-check-safety to the
comprehensive version. eval.sx dispatches built-ins through
dl-eval-builtin instead of erroring. 19 new tests.
Tokens → list of {:head :body} / {:query} clauses. SX symbols for
constants and variables (case-distinguished). not(literal) in body
desugars to {:neg literal}. Nested compounds permitted in arg
position for arithmetic; safety analysis (Phase 3) will gate them.
Conformance harness wraps lib/guest/conformance.sh; produces
lib/datalog/scoreboard.{json,md}.
apl-permutations was doing (append acc <new-perms>) which is
O(|acc|) and acc grows ~N! big — total cost O(N!²).
Swapped to (append <new-perms> acc) — append is O(|first|)
so cost is O((n+1)·N!_prev) per layer, total O(N!). q(7)
went from 32s to 12s; q(8)=92 now finishes well within the
300s timeout, so the queens(8) test is restored.
497/497. Phase 8 complete.
programs-e2e.sx exercises the classic-algorithm shapes from
lib/apl/tests/programs/*.apl via the full pipeline (apl-run on
embedded source strings). Tests include factorial-via-∇,
triangular numbers, sum-of-squares, prime-mask building blocks
(divisor counts via outer mod), named-fn composition,
dyadic max-of-two, and a single Newton sqrt step.
The original one-liners (e.g. primes' inline ⍵←⍳⍵) need parser
features we haven't built (compress-as-fn, inline assign) — the
e2e tests use multi-statement equivalents. No file-reading
primitive in OCaml SX, so source is embedded.
Side-fix: ⌿ (first-axis reduce) and ⍀ (first-axis scan) were
silently skipped by the tokenizer — added to apl-glyph-set
and apl-parse-op-glyphs.
Parser: apl-collect-fn-bindings pre-scans stmt-groups for
`name ← { ... }` patterns and populates apl-known-fn-names.
is-fn-tok? consults this list; collect-segments-loop emits
(:fn-name nm) for known names so they parse as functions.
Resolver: apl-resolve-{monadic,dyadic} handle :fn-name by
looking up env, asserting the binding is a dfn, returning
a closure that dispatches to apl-call-dfn{-m,}.
Recursion still works: `fact ← {0=⍵:1 ⋄ ⍵×∇⍵-1} ⋄ fact 5` → 120.
Three small unblockers in one iteration:
- tokenizer: read-digits! now consumes optional ".digits" suffix,
so 3.7 and ¯2.5 are single number tokens.
- tokenizer: ⎕ followed by ← emits a single :name "⎕←" token
(instead of splitting on the assign glyph). Parser registers
⎕← in apl-quad-fn-names; apl-monadic-fn maps to apl-quad-print.
- eval-ast: :str AST nodes evaluate to char arrays. Single-char
strings become rank-0 scalars; multi-char become rank-1 vectors
of single-char strings.
apl-throw raises a tagged ("apl-error" code msg) error.
apl-trap-matches? checks if codes list contains the error's code
(0 = catch-all, à la Dyalog).
Eval-stmt :trap clause wraps try-block with R7RS guard;
on match, runs catch-block; on mismatch, re-raises.
Bonus :throw AST node for testing.
test.sh + conformance.sh now load lib/r7rs.sx (for guard) and
include eval-ops + pipeline suites in scoreboard.
All Phase 7 unchecked items are now ticked.
Final scoreboard: 450/450 across 10 suites.
30 new source-string idioms via apl-run: triangulars, factorial,
running sum/product, parity counts, identity matrix, mult-table,
dot product, ∧.= equality, take/drop/reverse, tally, ravel,
count-of-value, etc.
Side-fix: tokenizer's apl-glyph-set was missing ≢ and ≡ — they
were silently skipped. Added them and to apl-parse-fn-glyphs.
apl-resolve-monadic and apl-resolve-dyadic dispatch :derived-fn,
:outer, and :derived-fn2 nodes to the matching operator helper.
:monad/:dyad in apl-eval-ast now route through these resolvers.
Removed queens(8) test (too slow under current 300s timeout).