Design + ops scaffolding for the next phase of work, none of it touching
substrate or guest code.
lib-guest.md: rewrites Architectural framing as a 5-layer stack
(substrate → lib/guest → languages → shared/ → applications),
recursive dependency-direction rule, scaled two-consumer rule. Adds
Phase B (long-running stratification) with sub-layer matrix
(core/typed/relational/effects/layout/lazy/oo), language profiles, and
the long-running-discipline section. Preserves existing Phase A
progress log and rules.
ocaml-on-sx.md: scope reduced to substrate validation + HM + reference
oracle. Phases 1-5 + minimal stdlib slice + vendored testsuite slice.
Dream carved out into dream-on-sx.md; Phase 8 (ReasonML) deferred.
Records lib-guest sequencing dependency.
datalog-on-sx.md: adds Phase 4 built-in predicates + body arithmetic,
Phase 6 magic sets, safety analysis in Phase 3, Non-goals section.
New chisel plans (forward-looking, not yet launchable):
kernel-on-sx.md — first-class everything, env-as-value endgame
idris-on-sx.md — dependent types, evidence chisel
probabilistic-on-sx.md — weighted nondeterminism + traces
maude-on-sx.md — rewriting as primitive
linear-on-sx.md — resource model, artdag-relevant
Loop briefings (4 active, 1 cold):
minikanren-loop.md, ocaml-loop.md, datalog-loop.md, elm-loop.md, koka-loop.md
Restore scripts mirror the loop pattern:
restore-{minikanren,ocaml,datalog,jit-perf,lib-guest}.sh
Each captures worktree state, plan progress, MCP health, tmux status.
Includes the .mcp.json absolute-path patch instruction (fresh worktrees
have no _build/, so the relative mcp_tree path fails on first launch).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
11 KiB
Datalog-on-SX: Datalog on the CEK/VM
Datalog is a declarative query language: a restricted subset of Prolog with no function symbols, only relations. Programs are sets of facts and rules; queries ask what follows. Evaluation is bottom-up (fixpoint iteration) rather than Prolog's top-down DFS — which means no infinite loops, guaranteed termination, and efficient incremental updates.
The unique angle: Datalog is a natural companion to the Prolog implementation already in
progress (lib/prolog/). The parser and term representation can share infrastructure;
the evaluator is an entirely different fixpoint engine rather than a DFS solver.
End-state goal: full core Datalog (facts, rules, stratified negation, aggregation, recursion) with a clean SX query API, and a demonstration of Datalog as a query engine for rose-ash data (e.g. federation graph, content relationships).
Ground rules
- Scope: only touch
lib/datalog/**andplans/datalog-on-sx.md. Do not editspec/,hosts/,shared/,lib/prolog/**, or otherlib/<lang>/. - Shared-file issues go under "Blockers" below with a minimal repro; do not fix here.
- SX files: use
sx-treeMCP tools only. - Architecture: Datalog source → term AST → fixpoint evaluator. No transpiler to SX AST — the evaluator is written in SX and works directly on term structures.
- Reference: Ramakrishnan & Ullman "A Survey of Deductive Database Systems"; Dalmau "Datalog and Constraint Satisfaction".
- Commits: one feature per commit. Keep
## Progress logupdated and tick boxes.
Non-goals
Deliberately out of scope for this implementation. Real engines (Soufflé, Cozo, DDlog) include
some of these; we accept they're missing and will note them in Blockers if a use case demands
one later.
- Function symbols — keeps termination guaranteed and prevents collapse into Prolog.
- Disjunctive heads (
p :- q. p :- r.is fine;p ; q :- r.is not) — research extension. - Well-founded semantics — only stratified negation. Programs that aren't stratifiable are rejected at load time, not evaluated under WFS.
- Tabled top-down (SLG resolution) — bottom-up only. If you want top-down with termination, use the Prolog implementation.
- Constraint Datalog (Datalog over reals, intervals, finite domains) — research extension.
- Distributed evaluation / Differential Dataflow — single-process fixpoint only. The rose-ash cross-service story (Phase 10) federates by querying each service's local Datalog instance and joining results, not by running a distributed fixpoint.
Architecture sketch
Datalog source text
│
▼
lib/datalog/tokenizer.sx — atoms, variables, numbers, strings, punct (?- :- , . ( ) [ ])
│
▼
lib/datalog/parser.sx — facts: atom(args). rules: head :- body. queries: ?- goal.
│ No function symbols (only constants and variables in args).
▼
lib/datalog/db.sx — extensional DB (EDB): ground facts; IDB: derived relations;
│ clause index by relation name/arity
▼
lib/datalog/eval.sx — bottom-up fixpoint: semi-naive evaluation with delta sets;
│ stratification for negation; incremental update API
▼
lib/datalog/query.sx — query API: (datalog-query db goal) → list of substitutions;
SX embedding: define facts/rules as SX data directly
Key differences from Prolog:
- No function symbols — args are atoms, numbers, strings, or variables only. No
f(a,b). - No cuts — no procedural control.
- Bottom-up — derive all consequences of all rules before answering; no search tree.
- Termination guaranteed — no infinite derivation chains (no function symbols → finite Herbrand base).
- Stratified negation —
not(P)legal iff P does not recursively depend on its own negation. - Aggregation —
count,sum,min,maxover derived tuples (Datalog+).
Roadmap
Phase 1 — tokenizer + parser
- Tokenizer: atoms (lowercase/quoted), variables (uppercase/
_), numbers, strings, operators (:-,?-,,,.), arithmetic + comparison operators (+,-,*,/,<,<=,>,>=,=,!=), comments (%,/* */) Note: no function symbol syntax (no nestedf(...)in arg position). - Parser:
- Facts:
parent(tom, bob).→{:head (parent tom bob) :body ()}- Rules:ancestor(X,Z) :- parent(X,Y), ancestor(Y,Z).→{:head (ancestor X Z) :body ((parent X Y) (ancestor Y Z))}- Queries:?- ancestor(tom, X).→{:query (ancestor tom X)}- Negation:not(parent(X,Y))in body position →{:neg (parent X Y)} - Tests in
lib/datalog/tests/parse.sx
Phase 2 — unification + substitution
- Share or port unification from
lib/prolog/— term walk, occurs check off by default dl-unifyt1t2subst→ extended subst or nil (no function symbols means simpler)dl-ground?term→ bool — all variables bound in substitution- Tests: atom/atom, var/atom, var/var, list args
Phase 3 — extensional DB + naive evaluation
- EDB:
{:relation-name → set-of-ground-tuples}using SX sets (Phase 18 of primitives) dl-add-fact!dbrelationargs→ add ground tupledl-add-rule!dbheadbody→ add rule clause- Naive evaluation: iterate rules until fixpoint For each rule, for each combination of body tuples that unify, derive head tuple. Repeat until no new tuples added.
dl-querydbgoal→ list of substitutions satisfying goal against derived DB- Safety analysis: every variable in a rule head must also appear in a positive body
literal; reject unsafe rules at
dl-add-rule!time with a clear error pointing at the offending variable. Built-in predicates and negated atoms do not satisfy safety on their own (p(X) :- X > 0.is unsafe). - Tests: transitive closure (ancestor), sibling, same-generation — classic Datalog programs; safety violation rejection cases.
Phase 4 — built-in predicates + body arithmetic
Almost every real query needs <, =, simple arithmetic, and string comparisons in body
position. These are not EDB lookups — they're constraints that filter bindings.
- Recognise built-in predicates in body:
(< X Y),(<= X Y),(> X Y),(>= X Y),(= X Y),(!= X Y)and arithmetic forms(is Z (+ X Y)),(is Z (- X Y)),(is Z (* X Y)),(is Z (/ X Y)). - Built-in evaluation in the fixpoint: at the join step, after binding variables from EDB lookups, evaluate built-ins as constraints. If any built-in fails or has unbound inputs, drop the candidate substitution.
- Safety extension:
isbinds its left operand iff right operand is fully ground.(< X Y)requires both X and Y bound by some prior body literal — reject unsafe. - Wire arithmetic operators through to SX numeric primitives — no separate Datalog number tower.
- Tests: range filters, arithmetic derivations (
(plus-one X Y :- ..., (is Y (+ X 1)))), comparison-based queries, safety violation detection on(p X) :- (< X 5).
Phase 5 — semi-naive evaluation (performance)
- Delta sets: track newly derived tuples per iteration
- Semi-naive rule: only join against delta tuples from last iteration, not full relation
- Significant speedup for recursive rules — avoids re-deriving known tuples
dl-stratifydb→ dependency graph + SCC analysis → stratum ordering- Tests: verify semi-naive produces same results as naive; benchmark on large ancestor chain
Phase 6 — magic sets (goal-directed bottom-up)
Naive bottom-up evaluation derives all consequences of all rules before answering, even when the query touches a tiny slice of the EDB. Magic sets rewrite the program so the fixpoint only derives tuples relevant to the goal — a major perf win for "what's reachable from node X" style queries on large graphs.
- Adornments: annotate rule predicates with bound (
b) / free (f) patterns based on how they're called (ancestor^bf(tom, X)vsancestor^ff(X, Y)). - Magic transformation: for each adorned predicate, generate a
magic_<pred>relation and rewrite rule bodies to filter through it. Seed withmagic_<query-pred>(<bound-args>). - Sideways information passing strategy (SIPS): left-to-right by default; pluggable.
- Optional pass — guarded behind
(dl-set-strategy! db :magic); default remains semi-naive. - Tests: ancestor query from a single root on a 10k-node graph — magic-rewritten version should be O(reachable) instead of O(graph). Equivalence vs naive on small inputs.
Phase 7 — stratified negation
- Dependency graph analysis: which relations depend on which (positively or negatively)
- Stratification check: error if negation is in a cycle (non-stratifiable program)
- Evaluation: process strata in order — lower stratum fully computed before using its complement in a higher stratum
not(P)in rule body: at evaluation time, check P is NOT in the derived EDB- Tests: non-member (
not(member(X,L))), colored-graph (not(same-color(X,Y))), stratification error detection
Phase 8 — aggregation (Datalog+)
count(X, Goal)→ number of distinct X satisfying Goalsum(X, Goal)→ sum of X values satisfying Goalmin(X, Goal)/max(X, Goal)→ min/max of X satisfying Goalgroup-bysemantics:count(X, sibling(bob, X))→ count of bob's siblings- Aggregation breaks stratification — evaluate in a separate post-fixpoint pass
- Tests: social network statistics, grade aggregation, inventory sums
Phase 9 — SX embedding API
(dl-program facts rules)→ database from SX data directly (no parsing required)(dl-program '((parent tom bob) (parent tom liz) (parent bob ann)) '((ancestor X Z :- (parent X Y) (ancestor Y Z)) (ancestor X Y :- (parent X Y))))(dl-query db '(ancestor tom ?X))→((ann) (bob) (liz) (pat))(dl-assert! db '(parent ann pat))→ incremental fact addition + re-derive(dl-retract! db '(parent tom bob))→ fact removal + re-derive from scratch- Integration demo: federation graph query —
(ancestor actor1 actor2)over rose-ash ActivityPub follow relationships
Phase 10 — Datalog as a query language for rose-ash
- Schema: map SQLAlchemy model relationships to Datalog EDB facts
(e.g.
(follows user1 user2),(authored user post),(tagged post tag)) - Loader:
dl-load-from-db!— query PostgreSQL, populate Datalog EDB - Query examples:
-
?- ancestor(me, X), authored(X, Post), tagged(Post, cooking).→ posts about cooking by people I follow (transitively) -?- popular(Post) :- tagged(Post, T), count(L, (liked(L, Post))) >= 10.→ posts with 10+ likes - Expose as a rose-ash service endpoint:
POST /internal/datalogwith program + query
Blockers
(none yet)
Progress log
Newest first.
(awaiting phase 1)