plans: layered-stack framing + chisel sequence + loop scaffolding

Design + ops scaffolding for the next phase of work, none of it touching substrate or guest code. lib-guest.md: rewrites Architectural framing as a 5-layer stack (substrate → lib/guest → languages → shared/ → applications), recursive dependency-direction rule, scaled two-consumer rule. Adds Phase B (long-running stratification) with sub-layer matrix (core/typed/relational/effects/layout/lazy/oo), language profiles, and the long-running-discipline section. Preserves existing Phase A progress log and rules. ocaml-on-sx.md: scope reduced to substrate validation + HM + reference oracle. Phases 1-5 + minimal stdlib slice + vendored testsuite slice. Dream carved out into dream-on-sx.md; Phase 8 (ReasonML) deferred. Records lib-guest sequencing dependency. datalog-on-sx.md: adds Phase 4 built-in predicates + body arithmetic, Phase 6 magic sets, safety analysis in Phase 3, Non-goals section. New chisel plans (forward-looking, not yet launchable): kernel-on-sx.md — first-class everything, env-as-value endgame idris-on-sx.md — dependent types, evidence chisel probabilistic-on-sx.md — weighted nondeterminism + traces maude-on-sx.md — rewriting as primitive linear-on-sx.md — resource model, artdag-relevant Loop briefings (4 active, 1 cold): minikanren-loop.md, ocaml-loop.md, datalog-loop.md, elm-loop.md, koka-loop.md Restore scripts mirror the loop pattern: restore-{minikanren,ocaml,datalog,jit-perf,lib-guest}.sh Each captures worktree state, plan progress, MCP health, tmux status. Includes the .mcp.json absolute-path patch instruction (fresh worktrees have no _build/, so the relative mcp_tree path fails on first launch). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 22:27:50 +00:00
parent e8246340fc
commit 9dd9fb9c37
19 changed files with 2283 additions and 183 deletions
--- a/plans/datalog-on-sx.md
+++ b/plans/datalog-on-sx.md
@@ -25,6 +25,23 @@ for rose-ash data (e.g. federation graph, content relationships).
  Dalmau "Datalog and Constraint Satisfaction".
 - **Commits:** one feature per commit. Keep `## Progress log` updated and tick boxes.

+## Non-goals
+
+Deliberately out of scope for this implementation. Real engines (Soufflé, Cozo, DDlog) include
+some of these; we accept they're missing and will note them in `Blockers` if a use case demands
+one later.
+
+- **Function symbols** — keeps termination guaranteed and prevents collapse into Prolog.
+- **Disjunctive heads** (`p :- q. p :- r.` is fine; `p ; q :- r.` is not) — research extension.
+- **Well-founded semantics** — only stratified negation. Programs that aren't stratifiable are
+  rejected at load time, not evaluated under WFS.
+- **Tabled top-down (SLG resolution)** — bottom-up only. If you want top-down with termination,
+  use the Prolog implementation.
+- **Constraint Datalog** (Datalog over reals, intervals, finite domains) — research extension.
+- **Distributed evaluation / Differential Dataflow** — single-process fixpoint only. The rose-ash
+  cross-service story (Phase 10) federates by querying each service's local Datalog instance and
+  joining results, not by running a distributed fixpoint.
+
 ## Architecture sketch

 ```
@@ -59,7 +76,8 @@ Key differences from Prolog:

 ### Phase 1 — tokenizer + parser
 - [ ] Tokenizer: atoms (lowercase/quoted), variables (uppercase/`_`), numbers, strings,
-      operators (`:- `, `?-`, `,`, `.`), comments (`%`, `/* */`)
+      operators (`:- `, `?-`, `,`, `.`), arithmetic + comparison operators
+      (`+`, `-`, `*`, `/`, `<`, `<=`, `>`, `>=`, `=`, `!=`), comments (`%`, `/* */`)
      Note: no function symbol syntax (no nested `f(...)` in arg position).
 - [ ] Parser:
      - Facts: `parent(tom, bob).` → `{:head (parent tom bob) :body ()}`
@@ -83,16 +101,55 @@ Key differences from Prolog:
      For each rule, for each combination of body tuples that unify, derive head tuple.
      Repeat until no new tuples added.
 - [ ] `dl-query` `db` `goal` → list of substitutions satisfying goal against derived DB
- [ ] Tests: transitive closure (ancestor), sibling, same-generation — classic Datalog programs
+- [ ] **Safety analysis**: every variable in a rule head must also appear in a positive body
+      literal; reject unsafe rules at `dl-add-rule!` time with a clear error pointing at the
+      offending variable. Built-in predicates and negated atoms do not satisfy safety on their
+      own (`p(X) :- X > 0.` is unsafe).
+- [ ] Tests: transitive closure (ancestor), sibling, same-generation — classic Datalog programs;
+      safety violation rejection cases.

-### Phase 4 — semi-naive evaluation (performance)
+### Phase 4 — built-in predicates + body arithmetic
+
+Almost every real query needs `<`, `=`, simple arithmetic, and string comparisons in body
+position. These are not EDB lookups — they're constraints that filter bindings.
+
+- [ ] Recognise built-in predicates in body: `(< X Y)`, `(<= X Y)`, `(> X Y)`, `(>= X Y)`,
+      `(= X Y)`, `(!= X Y)` and arithmetic forms `(is Z (+ X Y))`, `(is Z (- X Y))`,
+      `(is Z (* X Y))`, `(is Z (/ X Y))`.
+- [ ] Built-in evaluation in the fixpoint: at the join step, after binding variables from EDB
+      lookups, evaluate built-ins as constraints. If any built-in fails or has unbound inputs,
+      drop the candidate substitution.
+- [ ] **Safety extension**: `is` binds its left operand iff right operand is fully ground.
+      `(< X Y)` requires both X and Y bound by some prior body literal — reject unsafe.
+- [ ] Wire arithmetic operators through to SX numeric primitives — no separate Datalog number
+      tower.
+- [ ] Tests: range filters, arithmetic derivations (`(plus-one X Y :- ..., (is Y (+ X 1)))`),
+      comparison-based queries, safety violation detection on `(p X) :- (< X 5).`
+
+### Phase 5 — semi-naive evaluation (performance)
 - [ ] Delta sets: track newly derived tuples per iteration
 - [ ] Semi-naive rule: only join against delta tuples from last iteration, not full relation
 - [ ] Significant speedup for recursive rules — avoids re-deriving known tuples
 - [ ] `dl-stratify` `db` → dependency graph + SCC analysis → stratum ordering
 - [ ] Tests: verify semi-naive produces same results as naive; benchmark on large ancestor chain

-### Phase 5 — stratified negation
+### Phase 6 — magic sets (goal-directed bottom-up)
+
+Naive bottom-up evaluation derives **all** consequences of all rules before answering, even when
+the query touches a tiny slice of the EDB. Magic sets rewrite the program so the fixpoint only
+derives tuples relevant to the goal — a major perf win for "what's reachable from node X" style
+queries on large graphs.
+
+- [ ] Adornments: annotate rule predicates with bound (`b`) / free (`f`) patterns based on how
+      they're called (`ancestor^bf(tom, X)` vs `ancestor^ff(X, Y)`).
+- [ ] Magic transformation: for each adorned predicate, generate a `magic_<pred>` relation and
+      rewrite rule bodies to filter through it. Seed with `magic_<query-pred>(<bound-args>)`.
+- [ ] Sideways information passing strategy (SIPS): left-to-right by default; pluggable.
+- [ ] Optional pass — guarded behind `(dl-set-strategy! db :magic)`; default remains semi-naive.
+- [ ] Tests: ancestor query from a single root on a 10k-node graph — magic-rewritten version
+      should be O(reachable) instead of O(graph). Equivalence vs naive on small inputs.
+
+### Phase 7 — stratified negation
 - [ ] Dependency graph analysis: which relations depend on which (positively or negatively)
 - [ ] Stratification check: error if negation is in a cycle (non-stratifiable program)
 - [ ] Evaluation: process strata in order — lower stratum fully computed before using its
@@ -101,7 +158,7 @@ Key differences from Prolog:
 - [ ] Tests: non-member (`not(member(X,L))`), colored-graph (`not(same-color(X,Y))`),
      stratification error detection

-### Phase 6 — aggregation (Datalog+)
+### Phase 8 — aggregation (Datalog+)
 - [ ] `count(X, Goal)` → number of distinct X satisfying Goal
 - [ ] `sum(X, Goal)` → sum of X values satisfying Goal
 - [ ] `min(X, Goal)` / `max(X, Goal)` → min/max of X satisfying Goal
@@ -109,7 +166,7 @@ Key differences from Prolog:
 - [ ] Aggregation breaks stratification — evaluate in a separate post-fixpoint pass
 - [ ] Tests: social network statistics, grade aggregation, inventory sums

-### Phase 7 — SX embedding API
+### Phase 9 — SX embedding API
 - [ ] `(dl-program facts rules)` → database from SX data directly (no parsing required)
      ```
      (dl-program
@@ -123,7 +180,7 @@ Key differences from Prolog:
 - [ ] Integration demo: federation graph query — `(ancestor actor1 actor2)` over
      rose-ash ActivityPub follow relationships

-### Phase 8 — Datalog as a query language for rose-ash
+### Phase 10 — Datalog as a query language for rose-ash
 - [ ] Schema: map SQLAlchemy model relationships to Datalog EDB facts
      (e.g. `(follows user1 user2)`, `(authored user post)`, `(tagged post tag)`)
 - [ ] Loader: `dl-load-from-db!` — query PostgreSQL, populate Datalog EDB