Content-addressed node = {:op :inputs :params :commutative}; content-id is a
deterministic canonical serialization (sorted param keys; commutative ops sort
inputs). artdag/build validates dangling/cycles, topo-sorts, dedups identical
subgraphs to one id shared across DAGs. conformance.sh + scoreboard (dag 20/20).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
8.5 KiB
artdag-on-sx: Content-addressed dataflow DAG engine
art-dag is rose-ash's media-processing engine: a content-addressed DAG of effects,
executed in three phases — Analyze → Plan → Execute. Today it's a separate
Python stack (FastAPI + Celery + JAX + IPFS). Its engine logic — dependency
analysis, scheduling, content-addressed memoization, incremental recompute,
composable s-expression effects — is exactly the kind of declarative, substrate-shaped
work SX excels at, and art-dag already speaks s-expressions (its sexp_effects).
This subsystem rebuilds the engine on SX (not the pixel-pushing): the DAG model, the three-phase pipeline, and the incremental/memoized executor. Media ops themselves (JAX kernels, IPFS pins) stay opaque — modelled as abstract node functions in tests, delegated to injected adapters in production. The win is that the same SX substrates already serve the phases:
- Analyze (deps, reachability, dirtiness) → Datalog (recursive reachability — the acl/relations shape).
- Plan (schedule under constraints) → topological batching now; miniKanren for constraint-based scheduling later (optional).
- Execute (composable effects + content-addressed memo) → SX's own
perform/cek-resume+ a persist-backed content-addressed result cache; incremental recompute drops the cost of re-rendering to the dirty subgraph. - Optimize (fuse/dedup effect pipelines) → term rewriting (a later, optional
consumer of
maude-on-sx's engine — seeplans/maude-on-sx.md).
End-state: a content-addressed dataflow engine in lib/artdag/ with analyze, plan,
incremental execute, effect-pipeline optimization, and a shared-cache federation
extension — the SX heart of art-dag, with media kernels and storage injected at the
edges.
Status (rolling)
bash lib/artdag/conformance.sh → 20/20 (1 suite: dag)
Ground rules
- Scope: only
lib/artdag/**andplans/artdag-on-sx.md. Do not editspec/,hosts/,shared/,lib/datalog/**,lib/persist/**, or otherlib/<lang>/. You may import the public APIs oflib/datalog/(analyze) andlib/persist/(memo cache / result store). - Design lineage, not code reuse. The existing Python engine lives in the
repo's top-level
artdag/(core/ engine,sexp_effects/, l1/ tasks). Read it for design lineage (the 3-phase model, the effect language, content addressing) — do not import or port its code; this is a fresh SX implementation. - Media ops are opaque. A node's op is an abstract SX function over its inputs
in tests (e.g.
(fn (a b) …)); real JAX/IPFS kernels are injected adapters behind an interface. The engine is about scheduling/memo/incremental, never pixels. Determinism: content ids and tests use only the node spec, never a clock. - Content addressing is structural. A node's id is a deterministic digest of
(op, sorted input-ids, params)so identical subgraphs share an id and a cache slot — the core property. Use a structural digest helper; if a real SHA-256/CID is needed it's an injected host primitive (Blockers if absent), not hand-rolled. - Shared-file issues → "Blockers" with a minimal repro; do not fix here.
- SX files:
sx-treeMCP tools only;sx_validateafter every edit. - Commits: one feature per commit. Keep Progress log updated and tick boxes.
Architecture sketch
DAG spec (nodes + edges) rendered results
│ ▲
▼ │
lib/artdag/dag.sx lib/artdag/execute.sx
— node = {op, inputs, params} — effect interp (perform per node)
— content-id = digest(spec) — content-addressed memo (persist)
— topo order, validate — incremental: only dirty nodes
│ ▲
▼ │
lib/artdag/analyze.sx lib/artdag/plan.sx
— Datalog: deps/dependents/reach — schedule: topo batches, parallelism
— dirty propagation (dirty closure) — (miniKanren constraints, later/opt)
│ ▲
▼ │
lib/artdag/optimize.sx lib/artdag/federation.sx
— fuse adjacent ops, dead-node elim, — shared cache by content-id (L2-style)
CSE (free from content-addressing) result import/export + provenance/trust
Phase 1 — DAG model + content addressing
lib/artdag/dag.sx— node{:op :inputs :params}; structural content-id = digest of(op, sorted input-ids, params); build/validate a DAG (no dangling inputs, no accidental cycles); topological order- identical-subgraph sharing: two structurally-equal nodes get the same id
lib/artdag/tests/dag.sx— id determinism, subgraph sharing, cycle/dangling rejection, topo orderlib/artdag/conformance.sh+ scoreboard
Phase 2 — Analyze (Datalog)
lib/artdag/analyze.sx— project edges to Datalog;deps-of,dependents-of, transitivereachable(the recursive-reachability shape)- dirty propagation: given a set of changed nodes, compute the transitive
set of dependents that must recompute (
dirty-closure) lib/artdag/tests/analyze.sx— deep chains, diamonds, dirty closure correctness, unaffected nodes stay clean
Phase 3 — Plan
lib/artdag/plan.sx— schedule into topological batches (each batch's nodes have all deps satisfied → run in parallel); respect a max-parallelism limit- plan over the dirty subset only (incremental plan)
lib/artdag/tests/plan.sx— batch correctness, parallelism cap, dirty-only plan- (optional/later) miniKanren constraint scheduling — flag, don't block on it
Phase 4 — Execute (incremental + memoized)
lib/artdag/execute.sx— interpret a plan: each node op runs viaperform(mocked op in tests); results keyed by content-id- content-addressed memo cache backed by
lib/persist/: a node whose content-id already has a stored result is skipped (cache hit) - incremental execute: re-running after a leaf change recomputes only the dirty closure; everything else is a cache hit
lib/artdag/tests/execute.sx— full run, cache-hit on re-run, incremental recompute touches only dirty nodes (assert recompute count)
Phase 5 — Effect-pipeline optimization
lib/artdag/optimize.sx— rewrite the DAG before execution: dead-node elimination (unreachable from outputs), common-subexpression sharing (free from content ids), adjacent-op fusion- optimizations are content-id-preserving where semantically identical; assert the optimized DAG produces identical results
lib/artdag/tests/optimize.sx— DCE, CSE dedup, fusion equivalence- (optional/later) rule-based optimization via
maude-on-sx's rewriting engine — flag the integration point, don't block on it
Phase 6 — Federation (shared content-addressed cache)
- a result computed on one instance is reusable on another by content-id (the
L2-registry analog): export/import
{content-id → result}with provenance - trust gating — accept a remote result only from a trusted peer (mirror the fed trust shape; mock the transport in tests)
- revocation/invalidation — drop a remote result if its provenance is withdrawn
lib/artdag/tests/fed.sx— remote cache hit, trust gating, invalidation
Progress log
- Phase 1 — DAG model + content addressing (dag suite 20/20).
lib/artdag/dag.sx: node{:op :inputs :params :commutative};artdag/content-id="node:"+ a deterministic canonical serialization of(op, inputs, params)with dict keys sorted (param order-insensitive) and commutative ops' inputs sorted (input order-insensitive); non-commutative inputs ordered.artdag/buildtakes named entries(name op (input-names) params [commutative?]), validates (dangling refs, cycles via fixpoint topo), resolves input-names→content-ids, dedups identical subgraphs to one node + one id (shared across DAGs), returns{:ok :nodes :names :order}. No hostsort/string<?— hand-rolledartdag/str<?over char-codes. Gotcha logged: SXequal?is representation-sensitive (cons-built vs vector lists compare unequal even when identical);=is true structural equality — conformance harness compares with=.lib/artdag/conformance.sh+ scoreboard.
Blockers
(none)