Design + ops scaffolding for the next phase of work, none of it touching
substrate or guest code.
lib-guest.md: rewrites Architectural framing as a 5-layer stack
(substrate → lib/guest → languages → shared/ → applications),
recursive dependency-direction rule, scaled two-consumer rule. Adds
Phase B (long-running stratification) with sub-layer matrix
(core/typed/relational/effects/layout/lazy/oo), language profiles, and
the long-running-discipline section. Preserves existing Phase A
progress log and rules.
ocaml-on-sx.md: scope reduced to substrate validation + HM + reference
oracle. Phases 1-5 + minimal stdlib slice + vendored testsuite slice.
Dream carved out into dream-on-sx.md; Phase 8 (ReasonML) deferred.
Records lib-guest sequencing dependency.
datalog-on-sx.md: adds Phase 4 built-in predicates + body arithmetic,
Phase 6 magic sets, safety analysis in Phase 3, Non-goals section.
New chisel plans (forward-looking, not yet launchable):
kernel-on-sx.md — first-class everything, env-as-value endgame
idris-on-sx.md — dependent types, evidence chisel
probabilistic-on-sx.md — weighted nondeterminism + traces
maude-on-sx.md — rewriting as primitive
linear-on-sx.md — resource model, artdag-relevant
Loop briefings (4 active, 1 cold):
minikanren-loop.md, ocaml-loop.md, datalog-loop.md, elm-loop.md, koka-loop.md
Restore scripts mirror the loop pattern:
restore-{minikanren,ocaml,datalog,jit-perf,lib-guest}.sh
Each captures worktree state, plan progress, MCP health, tmux status.
Includes the .mcp.json absolute-path patch instruction (fresh worktrees
have no _build/, so the relative mcp_tree path fails on first launch).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
203 lines
11 KiB
Markdown
203 lines
11 KiB
Markdown
# Datalog-on-SX: Datalog on the CEK/VM
|
|
|
|
Datalog is a declarative query language: a restricted subset of Prolog with no function
|
|
symbols, only relations. Programs are sets of facts and rules; queries ask what follows.
|
|
Evaluation is bottom-up (fixpoint iteration) rather than Prolog's top-down DFS — which
|
|
means no infinite loops, guaranteed termination, and efficient incremental updates.
|
|
|
|
The unique angle: Datalog is a natural companion to the Prolog implementation already in
|
|
progress (`lib/prolog/`). The parser and term representation can share infrastructure;
|
|
the evaluator is an entirely different fixpoint engine rather than a DFS solver.
|
|
|
|
End-state goal: **full core Datalog** (facts, rules, stratified negation, aggregation,
|
|
recursion) with a clean SX query API, and a demonstration of Datalog as a query engine
|
|
for rose-ash data (e.g. federation graph, content relationships).
|
|
|
|
## Ground rules
|
|
|
|
- **Scope:** only touch `lib/datalog/**` and `plans/datalog-on-sx.md`. Do **not** edit
|
|
`spec/`, `hosts/`, `shared/`, `lib/prolog/**`, or other `lib/<lang>/`.
|
|
- **Shared-file issues** go under "Blockers" below with a minimal repro; do not fix here.
|
|
- **SX files:** use `sx-tree` MCP tools only.
|
|
- **Architecture:** Datalog source → term AST → fixpoint evaluator. No transpiler to SX AST —
|
|
the evaluator is written in SX and works directly on term structures.
|
|
- **Reference:** Ramakrishnan & Ullman "A Survey of Deductive Database Systems";
|
|
Dalmau "Datalog and Constraint Satisfaction".
|
|
- **Commits:** one feature per commit. Keep `## Progress log` updated and tick boxes.
|
|
|
|
## Non-goals
|
|
|
|
Deliberately out of scope for this implementation. Real engines (Soufflé, Cozo, DDlog) include
|
|
some of these; we accept they're missing and will note them in `Blockers` if a use case demands
|
|
one later.
|
|
|
|
- **Function symbols** — keeps termination guaranteed and prevents collapse into Prolog.
|
|
- **Disjunctive heads** (`p :- q. p :- r.` is fine; `p ; q :- r.` is not) — research extension.
|
|
- **Well-founded semantics** — only stratified negation. Programs that aren't stratifiable are
|
|
rejected at load time, not evaluated under WFS.
|
|
- **Tabled top-down (SLG resolution)** — bottom-up only. If you want top-down with termination,
|
|
use the Prolog implementation.
|
|
- **Constraint Datalog** (Datalog over reals, intervals, finite domains) — research extension.
|
|
- **Distributed evaluation / Differential Dataflow** — single-process fixpoint only. The rose-ash
|
|
cross-service story (Phase 10) federates by querying each service's local Datalog instance and
|
|
joining results, not by running a distributed fixpoint.
|
|
|
|
## Architecture sketch
|
|
|
|
```
|
|
Datalog source text
|
|
│
|
|
▼
|
|
lib/datalog/tokenizer.sx — atoms, variables, numbers, strings, punct (?- :- , . ( ) [ ])
|
|
│
|
|
▼
|
|
lib/datalog/parser.sx — facts: atom(args). rules: head :- body. queries: ?- goal.
|
|
│ No function symbols (only constants and variables in args).
|
|
▼
|
|
lib/datalog/db.sx — extensional DB (EDB): ground facts; IDB: derived relations;
|
|
│ clause index by relation name/arity
|
|
▼
|
|
lib/datalog/eval.sx — bottom-up fixpoint: semi-naive evaluation with delta sets;
|
|
│ stratification for negation; incremental update API
|
|
▼
|
|
lib/datalog/query.sx — query API: (datalog-query db goal) → list of substitutions;
|
|
SX embedding: define facts/rules as SX data directly
|
|
```
|
|
|
|
Key differences from Prolog:
|
|
- **No function symbols** — args are atoms, numbers, strings, or variables only. No `f(a,b)`.
|
|
- **No cuts** — no procedural control.
|
|
- **Bottom-up** — derive all consequences of all rules before answering; no search tree.
|
|
- **Termination guaranteed** — no infinite derivation chains (no function symbols → finite Herbrand base).
|
|
- **Stratified negation** — `not(P)` legal iff P does not recursively depend on its own negation.
|
|
- **Aggregation** — `count`, `sum`, `min`, `max` over derived tuples (Datalog+).
|
|
|
|
## Roadmap
|
|
|
|
### Phase 1 — tokenizer + parser
|
|
- [ ] Tokenizer: atoms (lowercase/quoted), variables (uppercase/`_`), numbers, strings,
|
|
operators (`:- `, `?-`, `,`, `.`), arithmetic + comparison operators
|
|
(`+`, `-`, `*`, `/`, `<`, `<=`, `>`, `>=`, `=`, `!=`), comments (`%`, `/* */`)
|
|
Note: no function symbol syntax (no nested `f(...)` in arg position).
|
|
- [ ] Parser:
|
|
- Facts: `parent(tom, bob).` → `{:head (parent tom bob) :body ()}`
|
|
- Rules: `ancestor(X,Z) :- parent(X,Y), ancestor(Y,Z).`
|
|
→ `{:head (ancestor X Z) :body ((parent X Y) (ancestor Y Z))}`
|
|
- Queries: `?- ancestor(tom, X).` → `{:query (ancestor tom X)}`
|
|
- Negation: `not(parent(X,Y))` in body position → `{:neg (parent X Y)}`
|
|
- [ ] Tests in `lib/datalog/tests/parse.sx`
|
|
|
|
### Phase 2 — unification + substitution
|
|
- [ ] Share or port unification from `lib/prolog/` — term walk, occurs check off by default
|
|
- [ ] `dl-unify` `t1` `t2` `subst` → extended subst or nil (no function symbols means simpler)
|
|
- [ ] `dl-ground?` `term` → bool — all variables bound in substitution
|
|
- [ ] Tests: atom/atom, var/atom, var/var, list args
|
|
|
|
### Phase 3 — extensional DB + naive evaluation
|
|
- [ ] EDB: `{:relation-name → set-of-ground-tuples}` using SX sets (Phase 18 of primitives)
|
|
- [ ] `dl-add-fact!` `db` `relation` `args` → add ground tuple
|
|
- [ ] `dl-add-rule!` `db` `head` `body` → add rule clause
|
|
- [ ] Naive evaluation: iterate rules until fixpoint
|
|
For each rule, for each combination of body tuples that unify, derive head tuple.
|
|
Repeat until no new tuples added.
|
|
- [ ] `dl-query` `db` `goal` → list of substitutions satisfying goal against derived DB
|
|
- [ ] **Safety analysis**: every variable in a rule head must also appear in a positive body
|
|
literal; reject unsafe rules at `dl-add-rule!` time with a clear error pointing at the
|
|
offending variable. Built-in predicates and negated atoms do not satisfy safety on their
|
|
own (`p(X) :- X > 0.` is unsafe).
|
|
- [ ] Tests: transitive closure (ancestor), sibling, same-generation — classic Datalog programs;
|
|
safety violation rejection cases.
|
|
|
|
### Phase 4 — built-in predicates + body arithmetic
|
|
|
|
Almost every real query needs `<`, `=`, simple arithmetic, and string comparisons in body
|
|
position. These are not EDB lookups — they're constraints that filter bindings.
|
|
|
|
- [ ] Recognise built-in predicates in body: `(< X Y)`, `(<= X Y)`, `(> X Y)`, `(>= X Y)`,
|
|
`(= X Y)`, `(!= X Y)` and arithmetic forms `(is Z (+ X Y))`, `(is Z (- X Y))`,
|
|
`(is Z (* X Y))`, `(is Z (/ X Y))`.
|
|
- [ ] Built-in evaluation in the fixpoint: at the join step, after binding variables from EDB
|
|
lookups, evaluate built-ins as constraints. If any built-in fails or has unbound inputs,
|
|
drop the candidate substitution.
|
|
- [ ] **Safety extension**: `is` binds its left operand iff right operand is fully ground.
|
|
`(< X Y)` requires both X and Y bound by some prior body literal — reject unsafe.
|
|
- [ ] Wire arithmetic operators through to SX numeric primitives — no separate Datalog number
|
|
tower.
|
|
- [ ] Tests: range filters, arithmetic derivations (`(plus-one X Y :- ..., (is Y (+ X 1)))`),
|
|
comparison-based queries, safety violation detection on `(p X) :- (< X 5).`
|
|
|
|
### Phase 5 — semi-naive evaluation (performance)
|
|
- [ ] Delta sets: track newly derived tuples per iteration
|
|
- [ ] Semi-naive rule: only join against delta tuples from last iteration, not full relation
|
|
- [ ] Significant speedup for recursive rules — avoids re-deriving known tuples
|
|
- [ ] `dl-stratify` `db` → dependency graph + SCC analysis → stratum ordering
|
|
- [ ] Tests: verify semi-naive produces same results as naive; benchmark on large ancestor chain
|
|
|
|
### Phase 6 — magic sets (goal-directed bottom-up)
|
|
|
|
Naive bottom-up evaluation derives **all** consequences of all rules before answering, even when
|
|
the query touches a tiny slice of the EDB. Magic sets rewrite the program so the fixpoint only
|
|
derives tuples relevant to the goal — a major perf win for "what's reachable from node X" style
|
|
queries on large graphs.
|
|
|
|
- [ ] Adornments: annotate rule predicates with bound (`b`) / free (`f`) patterns based on how
|
|
they're called (`ancestor^bf(tom, X)` vs `ancestor^ff(X, Y)`).
|
|
- [ ] Magic transformation: for each adorned predicate, generate a `magic_<pred>` relation and
|
|
rewrite rule bodies to filter through it. Seed with `magic_<query-pred>(<bound-args>)`.
|
|
- [ ] Sideways information passing strategy (SIPS): left-to-right by default; pluggable.
|
|
- [ ] Optional pass — guarded behind `(dl-set-strategy! db :magic)`; default remains semi-naive.
|
|
- [ ] Tests: ancestor query from a single root on a 10k-node graph — magic-rewritten version
|
|
should be O(reachable) instead of O(graph). Equivalence vs naive on small inputs.
|
|
|
|
### Phase 7 — stratified negation
|
|
- [ ] Dependency graph analysis: which relations depend on which (positively or negatively)
|
|
- [ ] Stratification check: error if negation is in a cycle (non-stratifiable program)
|
|
- [ ] Evaluation: process strata in order — lower stratum fully computed before using its
|
|
complement in a higher stratum
|
|
- [ ] `not(P)` in rule body: at evaluation time, check P is NOT in the derived EDB
|
|
- [ ] Tests: non-member (`not(member(X,L))`), colored-graph (`not(same-color(X,Y))`),
|
|
stratification error detection
|
|
|
|
### Phase 8 — aggregation (Datalog+)
|
|
- [ ] `count(X, Goal)` → number of distinct X satisfying Goal
|
|
- [ ] `sum(X, Goal)` → sum of X values satisfying Goal
|
|
- [ ] `min(X, Goal)` / `max(X, Goal)` → min/max of X satisfying Goal
|
|
- [ ] `group-by` semantics: `count(X, sibling(bob, X))` → count of bob's siblings
|
|
- [ ] Aggregation breaks stratification — evaluate in a separate post-fixpoint pass
|
|
- [ ] Tests: social network statistics, grade aggregation, inventory sums
|
|
|
|
### Phase 9 — SX embedding API
|
|
- [ ] `(dl-program facts rules)` → database from SX data directly (no parsing required)
|
|
```
|
|
(dl-program
|
|
'((parent tom bob) (parent tom liz) (parent bob ann))
|
|
'((ancestor X Z :- (parent X Y) (ancestor Y Z))
|
|
(ancestor X Y :- (parent X Y))))
|
|
```
|
|
- [ ] `(dl-query db '(ancestor tom ?X))` → `((ann) (bob) (liz) (pat))`
|
|
- [ ] `(dl-assert! db '(parent ann pat))` → incremental fact addition + re-derive
|
|
- [ ] `(dl-retract! db '(parent tom bob))` → fact removal + re-derive from scratch
|
|
- [ ] Integration demo: federation graph query — `(ancestor actor1 actor2)` over
|
|
rose-ash ActivityPub follow relationships
|
|
|
|
### Phase 10 — Datalog as a query language for rose-ash
|
|
- [ ] Schema: map SQLAlchemy model relationships to Datalog EDB facts
|
|
(e.g. `(follows user1 user2)`, `(authored user post)`, `(tagged post tag)`)
|
|
- [ ] Loader: `dl-load-from-db!` — query PostgreSQL, populate Datalog EDB
|
|
- [ ] Query examples:
|
|
- `?- ancestor(me, X), authored(X, Post), tagged(Post, cooking).`
|
|
→ posts about cooking by people I follow (transitively)
|
|
- `?- popular(Post) :- tagged(Post, T), count(L, (liked(L, Post))) >= 10.`
|
|
→ posts with 10+ likes
|
|
- [ ] Expose as a rose-ash service endpoint: `POST /internal/datalog` with program + query
|
|
|
|
## Blockers
|
|
|
|
_(none yet)_
|
|
|
|
## Progress log
|
|
|
|
_Newest first._
|
|
|
|
_(awaiting phase 1)_
|