Files
rose-ash/plans/artdag-on-sx.md
giles e4a8dff9ba
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 1m7s
artdag: Phase 1 DAG model + structural content addressing + 20 tests
Content-addressed node = {:op :inputs :params :commutative}; content-id is a
deterministic canonical serialization (sorted param keys; commutative ops sort
inputs). artdag/build validates dangling/cycles, topo-sorts, dedups identical
subgraphs to one id shared across DAGs. conformance.sh + scoreboard (dag 20/20).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 11:49:43 +00:00

8.5 KiB

artdag-on-sx: Content-addressed dataflow DAG engine

art-dag is rose-ash's media-processing engine: a content-addressed DAG of effects, executed in three phases — Analyze → Plan → Execute. Today it's a separate Python stack (FastAPI + Celery + JAX + IPFS). Its engine logic — dependency analysis, scheduling, content-addressed memoization, incremental recompute, composable s-expression effects — is exactly the kind of declarative, substrate-shaped work SX excels at, and art-dag already speaks s-expressions (its sexp_effects).

This subsystem rebuilds the engine on SX (not the pixel-pushing): the DAG model, the three-phase pipeline, and the incremental/memoized executor. Media ops themselves (JAX kernels, IPFS pins) stay opaque — modelled as abstract node functions in tests, delegated to injected adapters in production. The win is that the same SX substrates already serve the phases:

  • Analyze (deps, reachability, dirtiness) → Datalog (recursive reachability — the acl/relations shape).
  • Plan (schedule under constraints) → topological batching now; miniKanren for constraint-based scheduling later (optional).
  • Execute (composable effects + content-addressed memo) → SX's own perform/cek-resume + a persist-backed content-addressed result cache; incremental recompute drops the cost of re-rendering to the dirty subgraph.
  • Optimize (fuse/dedup effect pipelines) → term rewriting (a later, optional consumer of maude-on-sx's engine — see plans/maude-on-sx.md).

End-state: a content-addressed dataflow engine in lib/artdag/ with analyze, plan, incremental execute, effect-pipeline optimization, and a shared-cache federation extension — the SX heart of art-dag, with media kernels and storage injected at the edges.

Status (rolling)

bash lib/artdag/conformance.sh20/20 (1 suite: dag)

Ground rules

  • Scope: only lib/artdag/** and plans/artdag-on-sx.md. Do not edit spec/, hosts/, shared/, lib/datalog/**, lib/persist/**, or other lib/<lang>/. You may import the public APIs of lib/datalog/ (analyze) and lib/persist/ (memo cache / result store).
  • Design lineage, not code reuse. The existing Python engine lives in the repo's top-level artdag/ (core/ engine, sexp_effects/, l1/ tasks). Read it for design lineage (the 3-phase model, the effect language, content addressing) — do not import or port its code; this is a fresh SX implementation.
  • Media ops are opaque. A node's op is an abstract SX function over its inputs in tests (e.g. (fn (a b) …)); real JAX/IPFS kernels are injected adapters behind an interface. The engine is about scheduling/memo/incremental, never pixels. Determinism: content ids and tests use only the node spec, never a clock.
  • Content addressing is structural. A node's id is a deterministic digest of (op, sorted input-ids, params) so identical subgraphs share an id and a cache slot — the core property. Use a structural digest helper; if a real SHA-256/CID is needed it's an injected host primitive (Blockers if absent), not hand-rolled.
  • Shared-file issues → "Blockers" with a minimal repro; do not fix here.
  • SX files: sx-tree MCP tools only; sx_validate after every edit.
  • Commits: one feature per commit. Keep Progress log updated and tick boxes.

Architecture sketch

DAG spec (nodes + edges)                 rendered results
        │                                       ▲
        ▼                                       │
lib/artdag/dag.sx                       lib/artdag/execute.sx
  — node = {op, inputs, params}           — effect interp (perform per node)
  — content-id = digest(spec)             — content-addressed memo (persist)
  — topo order, validate                  — incremental: only dirty nodes
        │                                       ▲
        ▼                                       │
lib/artdag/analyze.sx                   lib/artdag/plan.sx
  — Datalog: deps/dependents/reach        — schedule: topo batches, parallelism
  — dirty propagation (dirty closure)     — (miniKanren constraints, later/opt)
        │                                       ▲
        ▼                                       │
lib/artdag/optimize.sx                  lib/artdag/federation.sx
  — fuse adjacent ops, dead-node elim,     — shared cache by content-id (L2-style)
    CSE (free from content-addressing)       result import/export + provenance/trust

Phase 1 — DAG model + content addressing

  • lib/artdag/dag.sx — node {:op :inputs :params}; structural content-id = digest of (op, sorted input-ids, params); build/validate a DAG (no dangling inputs, no accidental cycles); topological order
  • identical-subgraph sharing: two structurally-equal nodes get the same id
  • lib/artdag/tests/dag.sx — id determinism, subgraph sharing, cycle/dangling rejection, topo order
  • lib/artdag/conformance.sh + scoreboard

Phase 2 — Analyze (Datalog)

  • lib/artdag/analyze.sx — project edges to Datalog; deps-of, dependents-of, transitive reachable (the recursive-reachability shape)
  • dirty propagation: given a set of changed nodes, compute the transitive set of dependents that must recompute (dirty-closure)
  • lib/artdag/tests/analyze.sx — deep chains, diamonds, dirty closure correctness, unaffected nodes stay clean

Phase 3 — Plan

  • lib/artdag/plan.sx — schedule into topological batches (each batch's nodes have all deps satisfied → run in parallel); respect a max-parallelism limit
  • plan over the dirty subset only (incremental plan)
  • lib/artdag/tests/plan.sx — batch correctness, parallelism cap, dirty-only plan
  • (optional/later) miniKanren constraint scheduling — flag, don't block on it

Phase 4 — Execute (incremental + memoized)

  • lib/artdag/execute.sx — interpret a plan: each node op runs via perform (mocked op in tests); results keyed by content-id
  • content-addressed memo cache backed by lib/persist/: a node whose content-id already has a stored result is skipped (cache hit)
  • incremental execute: re-running after a leaf change recomputes only the dirty closure; everything else is a cache hit
  • lib/artdag/tests/execute.sx — full run, cache-hit on re-run, incremental recompute touches only dirty nodes (assert recompute count)

Phase 5 — Effect-pipeline optimization

  • lib/artdag/optimize.sx — rewrite the DAG before execution: dead-node elimination (unreachable from outputs), common-subexpression sharing (free from content ids), adjacent-op fusion
  • optimizations are content-id-preserving where semantically identical; assert the optimized DAG produces identical results
  • lib/artdag/tests/optimize.sx — DCE, CSE dedup, fusion equivalence
  • (optional/later) rule-based optimization via maude-on-sx's rewriting engine — flag the integration point, don't block on it

Phase 6 — Federation (shared content-addressed cache)

  • a result computed on one instance is reusable on another by content-id (the L2-registry analog): export/import {content-id → result} with provenance
  • trust gating — accept a remote result only from a trusted peer (mirror the fed trust shape; mock the transport in tests)
  • revocation/invalidation — drop a remote result if its provenance is withdrawn
  • lib/artdag/tests/fed.sx — remote cache hit, trust gating, invalidation

Progress log

  • Phase 1 — DAG model + content addressing (dag suite 20/20). lib/artdag/dag.sx: node {:op :inputs :params :commutative}; artdag/content-id = "node:" + a deterministic canonical serialization of (op, inputs, params) with dict keys sorted (param order-insensitive) and commutative ops' inputs sorted (input order-insensitive); non-commutative inputs ordered. artdag/build takes named entries (name op (input-names) params [commutative?]), validates (dangling refs, cycles via fixpoint topo), resolves input-names→content-ids, dedups identical subgraphs to one node + one id (shared across DAGs), returns {:ok :nodes :names :order}. No host sort/string<? — hand-rolled artdag/str<? over char-codes. Gotcha logged: SX equal? is representation-sensitive (cons-built vs vector lists compare unequal even when identical); = is true structural equality — conformance harness compares with =. lib/artdag/conformance.sh + scoreboard.

Blockers

(none)