# artdag-on-sx: Content-addressed dataflow DAG engine art-dag is rose-ash's media-processing engine: a content-addressed DAG of effects, executed in three phases — **Analyze → Plan → Execute**. Today it's a separate Python stack (FastAPI + Celery + JAX + IPFS). Its *engine logic* — dependency analysis, scheduling, content-addressed memoization, incremental recompute, composable s-expression effects — is exactly the kind of declarative, substrate-shaped work SX excels at, and art-dag already speaks s-expressions (its `sexp_effects`). This subsystem rebuilds the **engine** on SX (not the pixel-pushing): the DAG model, the three-phase pipeline, and the incremental/memoized executor. Media ops themselves (JAX kernels, IPFS pins) stay opaque — modelled as abstract node functions in tests, delegated to injected adapters in production. The win is that the same SX substrates already serve the phases: - **Analyze** (deps, reachability, dirtiness) → **Datalog** (recursive reachability — the acl/relations shape). - **Plan** (schedule under constraints) → topological batching now; **miniKanren** for constraint-based scheduling later (optional). - **Execute** (composable effects + content-addressed memo) → SX's own `perform`/`cek-resume` + a **persist**-backed content-addressed result cache; incremental recompute drops the cost of re-rendering to the dirty subgraph. - **Optimize** (fuse/dedup effect pipelines) → term rewriting (a later, optional consumer of `maude-on-sx`'s engine — see `plans/maude-on-sx.md`). End-state: a content-addressed dataflow engine in `lib/artdag/` with analyze, plan, incremental execute, effect-pipeline optimization, and a shared-cache federation extension — the SX heart of art-dag, with media kernels and storage injected at the edges. ## Status (rolling) `bash lib/artdag/conformance.sh` → **158/158** (10 suites: dag, analyze, plan, execute, optimize, fed, cost, serialize, stats, fault) Base roadmap (Phases 1–6) COMPLETE. Now extending. ## Ground rules - **Scope:** only `lib/artdag/**` and `plans/artdag-on-sx.md`. Do **not** edit `spec/`, `hosts/`, `shared/`, `lib/datalog/**`, `lib/persist/**`, or other `lib//`. You may **import** the public APIs of `lib/datalog/` (analyze) and `lib/persist/` (memo cache / result store). - **Design lineage, not code reuse.** The existing Python engine lives in the repo's top-level `artdag/` (core/ engine, `sexp_effects/`, l1/ tasks). **Read it for design lineage** (the 3-phase model, the effect language, content addressing) — do **not** import or port its code; this is a fresh SX implementation. - **Media ops are opaque.** A node's op is an abstract SX function over its inputs in tests (e.g. `(fn (a b) …)`); real JAX/IPFS kernels are injected adapters behind an interface. The engine is about *scheduling/memo/incremental*, never pixels. Determinism: content ids and tests use only the node spec, never a clock. - **Content addressing is structural.** A node's id is a deterministic digest of `(op, sorted input-ids, params)` so identical subgraphs share an id and a cache slot — the core property. Use a structural digest helper; if a real SHA-256/CID is needed it's an injected host primitive (Blockers if absent), not hand-rolled. - **Shared-file issues** → "Blockers" with a minimal repro; do not fix here. - **SX files:** `sx-tree` MCP tools only; `sx_validate` after every edit. - **Commits:** one feature per commit. Keep Progress log updated and tick boxes. ## Architecture sketch ``` DAG spec (nodes + edges) rendered results │ ▲ ▼ │ lib/artdag/dag.sx lib/artdag/execute.sx — node = {op, inputs, params} — effect interp (perform per node) — content-id = digest(spec) — content-addressed memo (persist) — topo order, validate — incremental: only dirty nodes │ ▲ ▼ │ lib/artdag/analyze.sx lib/artdag/plan.sx — Datalog: deps/dependents/reach — schedule: topo batches, parallelism — dirty propagation (dirty closure) — (miniKanren constraints, later/opt) │ ▲ ▼ │ lib/artdag/optimize.sx lib/artdag/federation.sx — fuse adjacent ops, dead-node elim, — shared cache by content-id (L2-style) CSE (free from content-addressing) result import/export + provenance/trust ``` ## Phase 1 — DAG model + content addressing - [x] `lib/artdag/dag.sx` — node `{:op :inputs :params}`; structural content-id = digest of `(op, sorted input-ids, params)`; build/validate a DAG (no dangling inputs, no accidental cycles); topological order - [x] identical-subgraph sharing: two structurally-equal nodes get the same id - [x] `lib/artdag/tests/dag.sx` — id determinism, subgraph sharing, cycle/dangling rejection, topo order - [x] `lib/artdag/conformance.sh` + scoreboard ## Phase 2 — Analyze (Datalog) - [x] `lib/artdag/analyze.sx` — project edges to Datalog; `deps-of`, `dependents-of`, transitive `reachable` (the recursive-reachability shape) - [x] **dirty propagation:** given a set of changed nodes, compute the transitive set of dependents that must recompute (`dirty-closure`) - [x] `lib/artdag/tests/analyze.sx` — deep chains, diamonds, dirty closure correctness, unaffected nodes stay clean ## Phase 3 — Plan - [x] `lib/artdag/plan.sx` — schedule into topological **batches** (each batch's nodes have all deps satisfied → run in parallel); respect a max-parallelism limit - [x] plan over the *dirty* subset only (incremental plan) - [x] `lib/artdag/tests/plan.sx` — batch correctness, parallelism cap, dirty-only plan - [ ] (optional/later) miniKanren constraint scheduling — flag, don't block on it ## Phase 4 — Execute (incremental + memoized) - [x] `lib/artdag/execute.sx` — interpret a plan: each node op runs via `perform` (mocked op in tests); results keyed by content-id - [x] **content-addressed memo cache** backed by `lib/persist/`: a node whose content-id already has a stored result is skipped (cache hit) - [x] **incremental execute:** re-running after a leaf change recomputes only the dirty closure; everything else is a cache hit - [x] `lib/artdag/tests/execute.sx` — full run, cache-hit on re-run, incremental recompute touches only dirty nodes (assert recompute count) ## Phase 5 — Effect-pipeline optimization - [x] `lib/artdag/optimize.sx` — rewrite the DAG before execution: dead-node elimination (unreachable from outputs), common-subexpression sharing (free from content ids), adjacent-op fusion - [x] optimizations are content-id-preserving where semantically identical; assert the optimized DAG produces identical results - [x] `lib/artdag/tests/optimize.sx` — DCE, CSE dedup, fusion equivalence - [ ] (optional/later) rule-based optimization via `maude-on-sx`'s rewriting engine — flag the integration point, don't block on it ## Phase 6 — Federation (shared content-addressed cache) - [x] a result computed on one instance is reusable on another by content-id (the L2-registry analog): export/import `{content-id → result}` with provenance - [x] trust gating — accept a remote result only from a trusted peer (mirror the fed trust shape; mock the transport in tests) - [x] revocation/invalidation — drop a remote result if its provenance is withdrawn - [x] `lib/artdag/tests/fed.sx` — remote cache hit, trust gating, invalidation ## Progress log - **Ext: public API facade** (`lib/artdag/api.sx`, total 158/158 unchanged). Reference index matching the datalog/persist convention: canonical load order + the full public surface across all 10 modules + `artdag/version`. - **Ext: fault-tolerant execution** (fault suite 14/14, total 158/158). `lib/artdag/fault.sx`: a node op may fail via `(artdag/fail reason)`; `run-safe` confines the failure to that node + its transitive dependents (independent branches still compute) and NEVER caches a failed result, so a later run with the fault fixed recomputes only the failed closure and cache-hits the good nodes. `failed?`/`fail` markers, `failed-nodes`/`failure-count`/`all-ok?`. - **Ext: execution stats / cache analytics** (stats suite 12/12, total 144/144). `lib/artdag/stats.sx` over an exec record: `hit-ratio`, `work-recomputed`/`work-saved` (cost-weighted via the cost model), `savings-ratio`, and `exec-summary`. Cold run = 0 hit ratio / all work ran; warm rerun = ratio 1 / all work saved; incremental = saved work counts unchanged nodes, ran work counts the dirty closure. - **Ext: optimize composition pass** (optimize suite 22/22, total 132/132). `artdag/optimize entries outputs fusible?` fuses the entry list then DCEs against the output names (sinks survive fusion since they're never absorbed) — fewer nodes, identical results. Verified: dead branch dropped + chain fused (4→2), an output that is itself "dead" is retained, no-fusible-set still DCEs. - **Ext: DAG wire serialization** (serialize suite 13/13, total 128/128). `lib/artdag/serialize.sx`: `dag->wire` emits a topo-ordered list of `(id op inputs params commutative)` records — plain lists with keyword-keyed param dicts, which survive `write-to-string`/`read` (string-keyed node dicts do NOT; and `()` reads back as nil, so `wire->dag` normalizes empty inputs). `wire->dag` reconstructs a runnable dag by content-id (author names dropped); executes identically to the original. `wire-verify` recomputes each record's content-id and rejects tampered ids or mutated params under a stale id (self-verifying transport). `dag->string`/`string->dag` for text transport. Gotcha logged: `sx-parse` primitive is unbound in the server env — use `(read (open-input-string s))`. - **Ext: cost-based scheduling** (cost suite 13/13, total 115/115). `lib/artdag/cost.sx`: an injected `cost-fn (op params)` keeps media-op costs opaque (`const-cost`, `op-cost table`). `critical-path` = longest weighted path (finish-time fold over topo order) = min makespan with unlimited workers. `makespan dag plan cost-fn` sums each batch's slowest node — full plan (cap 0) makespan == critical path, serial (cap 1) == `total-work`. `speedup` = total-work / makespan. Verified weighted paths follow heavy ops and capped makespan never dips below the critical path. - **Phase 6 — Federation (shared content-addressed cache)** (fed suite 15/15, total 102/102). `lib/artdag/federation.sx`: an instance = `{:cache :prov {cid->origin-peer}}`. `fed-export` dumps the whole cache as `{:cid :result :peer}` records tagged with the exporter's id; `fed-import` accepts only records from trusted peers (trust gating) and records provenance; `fed-pull` imports via an injected `fetch-fn(peer-id)` transport (mocked in tests). Because content-ids are global, a trusted import makes the importer's run a pure cache hit (recompute 0) — the L2-registry analog. `fed-invalidate peer` drops every result provenanced to a peer from cache + prov (trust withdrawn → recompute), peer-scoped (other peers' results survive) and leaving locally-computed (un-provenanced) results untouched. ALL 6 PHASES COMPLETE. - **Phase 5 — Effect-pipeline optimization** (optimize suite 18/18, total 87/87). `lib/artdag/optimize.sx`: `artdag/dce dag outputs` keeps only the outputs plus their transitive ancestors (via analyze), preserving surviving content-ids. `artdag/cse` == build — structural sharing is inherent to content addressing, so identical subexpressions collapse to one node/id and execute once (verified). `artdag/fuse entries fusible?` rewrites entries: a maximal 1-to-1 chain of fusible unary ops (predecessor used only by its single consumer, both fusible) collapses into one `artdag/pipeline` node carrying ordered `{:op :params}` stages, fed by the chain head's external input; leaves, fan-out nodes, and non-fusible ops never fuse. `artdag/fusing-runner` wraps a base runner to replay pipeline stages — output equivalent to the unfused DAG (asserted). Note: CSE auto-dedup means test fixtures intended as distinct nodes must use distinct op/params. - **Phase 4 — Execute (incremental + memoized)** (execute suite 15/15, total 69/69). `lib/artdag/execute.sx`: `artdag/execute` folds a plan, computing each node via an injected `runner (op params input-results)` (production = `perform` to JAX/IPFS adapter; tests = a pure op-table) and memoizing the result in a `lib/persist/` kv backend keyed by **content-id**. A node whose content-id is already cached is a hit (skipped). The keystone falls out of content addressing: changing a leaf changes the ids of its whole dirty closure, so re-running the full plan against a warm cache recomputes exactly those nodes and hits the rest — verified by recompute/hit counts (5 cold → 0 on rerun → 3 after one leaf change, sibling reused). Cross-DAG sharing verified: a different DAG containing a shared subgraph cache-hits it. `run`/`run-dirty` helpers; `result-of`/`recompute-count`/`hit-count`/`recomputed` inspection. - **Phase 3 — Plan** (plan suite 18/18, total 54/54). `lib/artdag/plan.sx`: `artdag/plan` schedules a dag into Kahn-wave topological batches — each batch's nodes have all in-scope deps satisfied by earlier batches, so they run in parallel. A `cap` (>0) splits any wave wider than the cap into consecutive sub-batches; `cap<=0` is unlimited. `artdag/plan-dirty` schedules only the dirty closure: deps outside the scheduled set (clean cache hits) count as already satisfied, so a mid-node change yields just `[[changed]…[downstream]]`. Inspection helpers `plan-batches`/`plan-width`/`plan-size`/`plan-flatten`. - **Phase 2 — Analyze on Datalog** (analyze suite 16/16, total 36/36). `lib/artdag/analyze.sx`: `artdag/edge-facts` projects each `(input-id, node-id)` pair to an `(edge ...)` fact; `artdag/analyze` builds a `dl-program-data` db with recursive `reachable(X,Y) :- edge(X,Y); edge(X,Y),reachable(Y,Z)` (the acl/relations reachability shape). Query helpers `deps-of`/`dependents-of` (direct), `reachable-from` (transitive downstream), `ancestors-of` (transitive upstream), all returning sorted id lists. `dirty-closure` builds a db with `dirty(Y) :- edge(X,Y), dirty(X)` seeded by changed-node facts and returns the transitive forward closure — keystone test confirms changing a mid node dirties only it + downstream, leaving siblings/upstream clean. Content-ids work as opaque Datalog string constants. - **Phase 1 — DAG model + content addressing** (dag suite 20/20). `lib/artdag/dag.sx`: node `{:op :inputs :params :commutative}`; `artdag/content-id` = `"node:"` + a deterministic canonical serialization of `(op, inputs, params)` with dict keys sorted (param order-insensitive) and commutative ops' inputs sorted (input order-insensitive); non-commutative inputs ordered. `artdag/build` takes named entries `(name op (input-names) params [commutative?])`, validates (dangling refs, cycles via fixpoint topo), resolves input-names→content-ids, dedups identical subgraphs to one node + one id (shared across DAGs), returns `{:ok :nodes :names :order}`. No host `sort`/`string