Files

giles 9dd9fb9c37 plans: layered-stack framing + chisel sequence + loop scaffolding

Design + ops scaffolding for the next phase of work, none of it touching
substrate or guest code.

lib-guest.md: rewrites Architectural framing as a 5-layer stack
  (substrate → lib/guest → languages → shared/ → applications),
  recursive dependency-direction rule, scaled two-consumer rule. Adds
  Phase B (long-running stratification) with sub-layer matrix
  (core/typed/relational/effects/layout/lazy/oo), language profiles, and
  the long-running-discipline section. Preserves existing Phase A
  progress log and rules.

ocaml-on-sx.md: scope reduced to substrate validation + HM + reference
  oracle. Phases 1-5 + minimal stdlib slice + vendored testsuite slice.
  Dream carved out into dream-on-sx.md; Phase 8 (ReasonML) deferred.
  Records lib-guest sequencing dependency.

datalog-on-sx.md: adds Phase 4 built-in predicates + body arithmetic,
  Phase 6 magic sets, safety analysis in Phase 3, Non-goals section.

New chisel plans (forward-looking, not yet launchable):
  kernel-on-sx.md       — first-class everything, env-as-value endgame
  idris-on-sx.md        — dependent types, evidence chisel
  probabilistic-on-sx.md — weighted nondeterminism + traces
  maude-on-sx.md        — rewriting as primitive
  linear-on-sx.md       — resource model, artdag-relevant

Loop briefings (4 active, 1 cold):
  minikanren-loop.md, ocaml-loop.md, datalog-loop.md, elm-loop.md, koka-loop.md

Restore scripts mirror the loop pattern:
  restore-{minikanren,ocaml,datalog,jit-perf,lib-guest}.sh
  Each captures worktree state, plan progress, MCP health, tmux status.
  Includes the .mcp.json absolute-path patch instruction (fresh worktrees
  have no _build/, so the relative mcp_tree path fails on first launch).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-08 22:27:50 +00:00

11 KiB

Raw Blame History

Datalog-on-SX: Datalog on the CEK/VM

Datalog is a declarative query language: a restricted subset of Prolog with no function symbols, only relations. Programs are sets of facts and rules; queries ask what follows. Evaluation is bottom-up (fixpoint iteration) rather than Prolog's top-down DFS — which means no infinite loops, guaranteed termination, and efficient incremental updates.

The unique angle: Datalog is a natural companion to the Prolog implementation already in progress (lib/prolog/). The parser and term representation can share infrastructure; the evaluator is an entirely different fixpoint engine rather than a DFS solver.

End-state goal: full core Datalog (facts, rules, stratified negation, aggregation, recursion) with a clean SX query API, and a demonstration of Datalog as a query engine for rose-ash data (e.g. federation graph, content relationships).

Ground rules

Scope: only touch lib/datalog/** and plans/datalog-on-sx.md. Do not edit spec/, hosts/, shared/, lib/prolog/**, or other lib/<lang>/.
Shared-file issues go under "Blockers" below with a minimal repro; do not fix here.
SX files: use sx-tree MCP tools only.
Architecture: Datalog source → term AST → fixpoint evaluator. No transpiler to SX AST — the evaluator is written in SX and works directly on term structures.
Reference: Ramakrishnan & Ullman "A Survey of Deductive Database Systems"; Dalmau "Datalog and Constraint Satisfaction".
Commits: one feature per commit. Keep ## Progress log updated and tick boxes.

Non-goals

Deliberately out of scope for this implementation. Real engines (Soufflé, Cozo, DDlog) include some of these; we accept they're missing and will note them in Blockers if a use case demands one later.

Function symbols — keeps termination guaranteed and prevents collapse into Prolog.
Disjunctive heads (p :- q. p :- r. is fine; p ; q :- r. is not) — research extension.
Well-founded semantics — only stratified negation. Programs that aren't stratifiable are rejected at load time, not evaluated under WFS.
Tabled top-down (SLG resolution) — bottom-up only. If you want top-down with termination, use the Prolog implementation.
Constraint Datalog (Datalog over reals, intervals, finite domains) — research extension.
Distributed evaluation / Differential Dataflow — single-process fixpoint only. The rose-ash cross-service story (Phase 10) federates by querying each service's local Datalog instance and joining results, not by running a distributed fixpoint.

Architecture sketch

Datalog source text
    │
    ▼
lib/datalog/tokenizer.sx   — atoms, variables, numbers, strings, punct (?- :- , . ( ) [ ])
    │
    ▼
lib/datalog/parser.sx      — facts: atom(args). rules: head :- body. queries: ?- goal.
    │                        No function symbols (only constants and variables in args).
    ▼
lib/datalog/db.sx          — extensional DB (EDB): ground facts; IDB: derived relations;
    │                        clause index by relation name/arity
    ▼
lib/datalog/eval.sx        — bottom-up fixpoint: semi-naive evaluation with delta sets;
    │                        stratification for negation; incremental update API
    ▼
lib/datalog/query.sx       — query API: (datalog-query db goal) → list of substitutions;
                             SX embedding: define facts/rules as SX data directly

Key differences from Prolog:

No function symbols — args are atoms, numbers, strings, or variables only. No f(a,b).
No cuts — no procedural control.
Bottom-up — derive all consequences of all rules before answering; no search tree.
Termination guaranteed — no infinite derivation chains (no function symbols → finite Herbrand base).
Stratified negation — not(P) legal iff P does not recursively depend on its own negation.
Aggregation — count, sum, min, max over derived tuples (Datalog+).

Roadmap

Phase 1 — tokenizer + parser

Tokenizer: atoms (lowercase/quoted), variables (uppercase/_), numbers, strings, operators (:- , ?-, ,, .), arithmetic + comparison operators (+, -, *, /, <, <=, >, >=, =, !=), comments (%, /* */) Note: no function symbol syntax (no nested f(...) in arg position).
Parser: - Facts: parent(tom, bob). → {:head (parent tom bob) :body ()} - Rules: ancestor(X,Z) :- parent(X,Y), ancestor(Y,Z). → {:head (ancestor X Z) :body ((parent X Y) (ancestor Y Z))} - Queries: ?- ancestor(tom, X). → {:query (ancestor tom X)} - Negation: not(parent(X,Y)) in body position → {:neg (parent X Y)}
Tests in lib/datalog/tests/parse.sx

Phase 2 — unification + substitution

Share or port unification from lib/prolog/ — term walk, occurs check off by default
dl-unify t1 t2 subst → extended subst or nil (no function symbols means simpler)
dl-ground? term → bool — all variables bound in substitution
Tests: atom/atom, var/atom, var/var, list args

Phase 3 — extensional DB + naive evaluation

EDB: {:relation-name → set-of-ground-tuples} using SX sets (Phase 18 of primitives)
dl-add-fact! db relation args → add ground tuple
dl-add-rule! db head body → add rule clause
Naive evaluation: iterate rules until fixpoint For each rule, for each combination of body tuples that unify, derive head tuple. Repeat until no new tuples added.
dl-query db goal → list of substitutions satisfying goal against derived DB
Safety analysis: every variable in a rule head must also appear in a positive body literal; reject unsafe rules at dl-add-rule! time with a clear error pointing at the offending variable. Built-in predicates and negated atoms do not satisfy safety on their own (p(X) :- X > 0. is unsafe).
Tests: transitive closure (ancestor), sibling, same-generation — classic Datalog programs; safety violation rejection cases.

Phase 4 — built-in predicates + body arithmetic

Almost every real query needs <, =, simple arithmetic, and string comparisons in body position. These are not EDB lookups — they're constraints that filter bindings.

Recognise built-in predicates in body: (< X Y), (<= X Y), (> X Y), (>= X Y), (= X Y), (!= X Y) and arithmetic forms (is Z (+ X Y)), (is Z (- X Y)), (is Z (* X Y)), (is Z (/ X Y)).
Built-in evaluation in the fixpoint: at the join step, after binding variables from EDB lookups, evaluate built-ins as constraints. If any built-in fails or has unbound inputs, drop the candidate substitution.
Safety extension: is binds its left operand iff right operand is fully ground. (< X Y) requires both X and Y bound by some prior body literal — reject unsafe.
Wire arithmetic operators through to SX numeric primitives — no separate Datalog number tower.
Tests: range filters, arithmetic derivations ((plus-one X Y :- ..., (is Y (+ X 1)))), comparison-based queries, safety violation detection on (p X) :- (< X 5).

Phase 5 — semi-naive evaluation (performance)

Delta sets: track newly derived tuples per iteration
Semi-naive rule: only join against delta tuples from last iteration, not full relation
Significant speedup for recursive rules — avoids re-deriving known tuples
dl-stratify db → dependency graph + SCC analysis → stratum ordering
Tests: verify semi-naive produces same results as naive; benchmark on large ancestor chain

Phase 6 — magic sets (goal-directed bottom-up)

Naive bottom-up evaluation derives all consequences of all rules before answering, even when the query touches a tiny slice of the EDB. Magic sets rewrite the program so the fixpoint only derives tuples relevant to the goal — a major perf win for "what's reachable from node X" style queries on large graphs.

Adornments: annotate rule predicates with bound (b) / free (f) patterns based on how they're called (ancestor^bf(tom, X) vs ancestor^ff(X, Y)).
Magic transformation: for each adorned predicate, generate a magic_<pred> relation and rewrite rule bodies to filter through it. Seed with magic_<query-pred>(<bound-args>).
Sideways information passing strategy (SIPS): left-to-right by default; pluggable.
Optional pass — guarded behind (dl-set-strategy! db :magic); default remains semi-naive.
Tests: ancestor query from a single root on a 10k-node graph — magic-rewritten version should be O(reachable) instead of O(graph). Equivalence vs naive on small inputs.

Phase 7 — stratified negation

Dependency graph analysis: which relations depend on which (positively or negatively)
Stratification check: error if negation is in a cycle (non-stratifiable program)
Evaluation: process strata in order — lower stratum fully computed before using its complement in a higher stratum
not(P) in rule body: at evaluation time, check P is NOT in the derived EDB
Tests: non-member (not(member(X,L))), colored-graph (not(same-color(X,Y))), stratification error detection

Phase 8 — aggregation (Datalog+)

count(X, Goal) → number of distinct X satisfying Goal
sum(X, Goal) → sum of X values satisfying Goal
min(X, Goal) / max(X, Goal) → min/max of X satisfying Goal
group-by semantics: count(X, sibling(bob, X)) → count of bob's siblings
Aggregation breaks stratification — evaluate in a separate post-fixpoint pass
Tests: social network statistics, grade aggregation, inventory sums

Phase 9 — SX embedding API

(dl-program facts rules) → database from SX data directly (no parsing required) (dl-program '((parent tom bob) (parent tom liz) (parent bob ann)) '((ancestor X Z :- (parent X Y) (ancestor Y Z)) (ancestor X Y :- (parent X Y))))
(dl-query db '(ancestor tom ?X)) → ((ann) (bob) (liz) (pat))
(dl-assert! db '(parent ann pat)) → incremental fact addition + re-derive
(dl-retract! db '(parent tom bob)) → fact removal + re-derive from scratch
Integration demo: federation graph query — (ancestor actor1 actor2) over rose-ash ActivityPub follow relationships

Phase 10 — Datalog as a query language for rose-ash

Schema: map SQLAlchemy model relationships to Datalog EDB facts (e.g. (follows user1 user2), (authored user post), (tagged post tag))
Loader: dl-load-from-db! — query PostgreSQL, populate Datalog EDB
Query examples: - ?- ancestor(me, X), authored(X, Post), tagged(Post, cooking). → posts about cooking by people I follow (transitively) - ?- popular(Post) :- tagged(Post, T), count(L, (liked(L, Post))) >= 10. → posts with 10+ likes
Expose as a rose-ash service endpoint: POST /internal/datalog with program + query

Blockers

(none yet)

Progress log

Newest first.

(awaiting phase 1)

11 KiB Raw Blame History