Files
rose-ash/plans/artdag-on-sx.md
giles 298621e2be
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 1m11s
artdag: log api facade in plan progress
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 12:34:30 +00:00

16 KiB
Raw Blame History

artdag-on-sx: Content-addressed dataflow DAG engine

art-dag is rose-ash's media-processing engine: a content-addressed DAG of effects, executed in three phases — Analyze → Plan → Execute. Today it's a separate Python stack (FastAPI + Celery + JAX + IPFS). Its engine logic — dependency analysis, scheduling, content-addressed memoization, incremental recompute, composable s-expression effects — is exactly the kind of declarative, substrate-shaped work SX excels at, and art-dag already speaks s-expressions (its sexp_effects).

This subsystem rebuilds the engine on SX (not the pixel-pushing): the DAG model, the three-phase pipeline, and the incremental/memoized executor. Media ops themselves (JAX kernels, IPFS pins) stay opaque — modelled as abstract node functions in tests, delegated to injected adapters in production. The win is that the same SX substrates already serve the phases:

  • Analyze (deps, reachability, dirtiness) → Datalog (recursive reachability — the acl/relations shape).
  • Plan (schedule under constraints) → topological batching now; miniKanren for constraint-based scheduling later (optional).
  • Execute (composable effects + content-addressed memo) → SX's own perform/cek-resume + a persist-backed content-addressed result cache; incremental recompute drops the cost of re-rendering to the dirty subgraph.
  • Optimize (fuse/dedup effect pipelines) → term rewriting (a later, optional consumer of maude-on-sx's engine — see plans/maude-on-sx.md).

End-state: a content-addressed dataflow engine in lib/artdag/ with analyze, plan, incremental execute, effect-pipeline optimization, and a shared-cache federation extension — the SX heart of art-dag, with media kernels and storage injected at the edges.

Status (rolling)

bash lib/artdag/conformance.sh158/158 (10 suites: dag, analyze, plan, execute, optimize, fed, cost, serialize, stats, fault)

Base roadmap (Phases 16) COMPLETE. Now extending.

Ground rules

  • Scope: only lib/artdag/** and plans/artdag-on-sx.md. Do not edit spec/, hosts/, shared/, lib/datalog/**, lib/persist/**, or other lib/<lang>/. You may import the public APIs of lib/datalog/ (analyze) and lib/persist/ (memo cache / result store).
  • Design lineage, not code reuse. The existing Python engine lives in the repo's top-level artdag/ (core/ engine, sexp_effects/, l1/ tasks). Read it for design lineage (the 3-phase model, the effect language, content addressing) — do not import or port its code; this is a fresh SX implementation.
  • Media ops are opaque. A node's op is an abstract SX function over its inputs in tests (e.g. (fn (a b) …)); real JAX/IPFS kernels are injected adapters behind an interface. The engine is about scheduling/memo/incremental, never pixels. Determinism: content ids and tests use only the node spec, never a clock.
  • Content addressing is structural. A node's id is a deterministic digest of (op, sorted input-ids, params) so identical subgraphs share an id and a cache slot — the core property. Use a structural digest helper; if a real SHA-256/CID is needed it's an injected host primitive (Blockers if absent), not hand-rolled.
  • Shared-file issues → "Blockers" with a minimal repro; do not fix here.
  • SX files: sx-tree MCP tools only; sx_validate after every edit.
  • Commits: one feature per commit. Keep Progress log updated and tick boxes.

Architecture sketch

DAG spec (nodes + edges)                 rendered results
        │                                       ▲
        ▼                                       │
lib/artdag/dag.sx                       lib/artdag/execute.sx
  — node = {op, inputs, params}           — effect interp (perform per node)
  — content-id = digest(spec)             — content-addressed memo (persist)
  — topo order, validate                  — incremental: only dirty nodes
        │                                       ▲
        ▼                                       │
lib/artdag/analyze.sx                   lib/artdag/plan.sx
  — Datalog: deps/dependents/reach        — schedule: topo batches, parallelism
  — dirty propagation (dirty closure)     — (miniKanren constraints, later/opt)
        │                                       ▲
        ▼                                       │
lib/artdag/optimize.sx                  lib/artdag/federation.sx
  — fuse adjacent ops, dead-node elim,     — shared cache by content-id (L2-style)
    CSE (free from content-addressing)       result import/export + provenance/trust

Phase 1 — DAG model + content addressing

  • lib/artdag/dag.sx — node {:op :inputs :params}; structural content-id = digest of (op, sorted input-ids, params); build/validate a DAG (no dangling inputs, no accidental cycles); topological order
  • identical-subgraph sharing: two structurally-equal nodes get the same id
  • lib/artdag/tests/dag.sx — id determinism, subgraph sharing, cycle/dangling rejection, topo order
  • lib/artdag/conformance.sh + scoreboard

Phase 2 — Analyze (Datalog)

  • lib/artdag/analyze.sx — project edges to Datalog; deps-of, dependents-of, transitive reachable (the recursive-reachability shape)
  • dirty propagation: given a set of changed nodes, compute the transitive set of dependents that must recompute (dirty-closure)
  • lib/artdag/tests/analyze.sx — deep chains, diamonds, dirty closure correctness, unaffected nodes stay clean

Phase 3 — Plan

  • lib/artdag/plan.sx — schedule into topological batches (each batch's nodes have all deps satisfied → run in parallel); respect a max-parallelism limit
  • plan over the dirty subset only (incremental plan)
  • lib/artdag/tests/plan.sx — batch correctness, parallelism cap, dirty-only plan
  • (optional/later) miniKanren constraint scheduling — flag, don't block on it

Phase 4 — Execute (incremental + memoized)

  • lib/artdag/execute.sx — interpret a plan: each node op runs via perform (mocked op in tests); results keyed by content-id
  • content-addressed memo cache backed by lib/persist/: a node whose content-id already has a stored result is skipped (cache hit)
  • incremental execute: re-running after a leaf change recomputes only the dirty closure; everything else is a cache hit
  • lib/artdag/tests/execute.sx — full run, cache-hit on re-run, incremental recompute touches only dirty nodes (assert recompute count)

Phase 5 — Effect-pipeline optimization

  • lib/artdag/optimize.sx — rewrite the DAG before execution: dead-node elimination (unreachable from outputs), common-subexpression sharing (free from content ids), adjacent-op fusion
  • optimizations are content-id-preserving where semantically identical; assert the optimized DAG produces identical results
  • lib/artdag/tests/optimize.sx — DCE, CSE dedup, fusion equivalence
  • (optional/later) rule-based optimization via maude-on-sx's rewriting engine — flag the integration point, don't block on it

Phase 6 — Federation (shared content-addressed cache)

  • a result computed on one instance is reusable on another by content-id (the L2-registry analog): export/import {content-id → result} with provenance
  • trust gating — accept a remote result only from a trusted peer (mirror the fed trust shape; mock the transport in tests)
  • revocation/invalidation — drop a remote result if its provenance is withdrawn
  • lib/artdag/tests/fed.sx — remote cache hit, trust gating, invalidation

Progress log

  • Ext: public API facade (lib/artdag/api.sx, total 158/158 unchanged). Reference index matching the datalog/persist convention: canonical load order + the full public surface across all 10 modules + artdag/version.

  • Ext: fault-tolerant execution (fault suite 14/14, total 158/158). lib/artdag/fault.sx: a node op may fail via (artdag/fail reason); run-safe confines the failure to that node + its transitive dependents (independent branches still compute) and NEVER caches a failed result, so a later run with the fault fixed recomputes only the failed closure and cache-hits the good nodes. failed?/fail markers, failed-nodes/failure-count/all-ok?.

  • Ext: execution stats / cache analytics (stats suite 12/12, total 144/144). lib/artdag/stats.sx over an exec record: hit-ratio, work-recomputed/work-saved (cost-weighted via the cost model), savings-ratio, and exec-summary. Cold run = 0 hit ratio / all work ran; warm rerun = ratio 1 / all work saved; incremental = saved work counts unchanged nodes, ran work counts the dirty closure.

  • Ext: optimize composition pass (optimize suite 22/22, total 132/132). artdag/optimize entries outputs fusible? fuses the entry list then DCEs against the output names (sinks survive fusion since they're never absorbed) — fewer nodes, identical results. Verified: dead branch dropped + chain fused (4→2), an output that is itself "dead" is retained, no-fusible-set still DCEs.

  • Ext: DAG wire serialization (serialize suite 13/13, total 128/128). lib/artdag/serialize.sx: dag->wire emits a topo-ordered list of (id op inputs params commutative) records — plain lists with keyword-keyed param dicts, which survive write-to-string/read (string-keyed node dicts do NOT; and () reads back as nil, so wire->dag normalizes empty inputs). wire->dag reconstructs a runnable dag by content-id (author names dropped); executes identically to the original. wire-verify recomputes each record's content-id and rejects tampered ids or mutated params under a stale id (self-verifying transport). dag->string/string->dag for text transport. Gotcha logged: sx-parse primitive is unbound in the server env — use (read (open-input-string s)).

  • Ext: cost-based scheduling (cost suite 13/13, total 115/115). lib/artdag/cost.sx: an injected cost-fn (op params) keeps media-op costs opaque (const-cost, op-cost table). critical-path = longest weighted path (finish-time fold over topo order) = min makespan with unlimited workers. makespan dag plan cost-fn sums each batch's slowest node — full plan (cap 0) makespan == critical path, serial (cap 1) == total-work. speedup = total-work / makespan. Verified weighted paths follow heavy ops and capped makespan never dips below the critical path.

  • Phase 6 — Federation (shared content-addressed cache) (fed suite 15/15, total 102/102). lib/artdag/federation.sx: an instance = {:cache <persist kv> :prov {cid->origin-peer}}. fed-export dumps the whole cache as {:cid :result :peer} records tagged with the exporter's id; fed-import accepts only records from trusted peers (trust gating) and records provenance; fed-pull imports via an injected fetch-fn(peer-id) transport (mocked in tests). Because content-ids are global, a trusted import makes the importer's run a pure cache hit (recompute 0) — the L2-registry analog. fed-invalidate peer drops every result provenanced to a peer from cache + prov (trust withdrawn → recompute), peer-scoped (other peers' results survive) and leaving locally-computed (un-provenanced) results untouched. ALL 6 PHASES COMPLETE.

  • Phase 5 — Effect-pipeline optimization (optimize suite 18/18, total 87/87). lib/artdag/optimize.sx: artdag/dce dag outputs keeps only the outputs plus their transitive ancestors (via analyze), preserving surviving content-ids. artdag/cse == build — structural sharing is inherent to content addressing, so identical subexpressions collapse to one node/id and execute once (verified). artdag/fuse entries fusible? rewrites entries: a maximal 1-to-1 chain of fusible unary ops (predecessor used only by its single consumer, both fusible) collapses into one artdag/pipeline node carrying ordered {:op :params} stages, fed by the chain head's external input; leaves, fan-out nodes, and non-fusible ops never fuse. artdag/fusing-runner wraps a base runner to replay pipeline stages — output equivalent to the unfused DAG (asserted). Note: CSE auto-dedup means test fixtures intended as distinct nodes must use distinct op/params.

  • Phase 4 — Execute (incremental + memoized) (execute suite 15/15, total 69/69). lib/artdag/execute.sx: artdag/execute folds a plan, computing each node via an injected runner (op params input-results) (production = perform to JAX/IPFS adapter; tests = a pure op-table) and memoizing the result in a lib/persist/ kv backend keyed by content-id. A node whose content-id is already cached is a hit (skipped). The keystone falls out of content addressing: changing a leaf changes the ids of its whole dirty closure, so re-running the full plan against a warm cache recomputes exactly those nodes and hits the rest — verified by recompute/hit counts (5 cold → 0 on rerun → 3 after one leaf change, sibling reused). Cross-DAG sharing verified: a different DAG containing a shared subgraph cache-hits it. run/run-dirty helpers; result-of/recompute-count/hit-count/recomputed inspection.

  • Phase 3 — Plan (plan suite 18/18, total 54/54). lib/artdag/plan.sx: artdag/plan schedules a dag into Kahn-wave topological batches — each batch's nodes have all in-scope deps satisfied by earlier batches, so they run in parallel. A cap (>0) splits any wave wider than the cap into consecutive sub-batches; cap<=0 is unlimited. artdag/plan-dirty schedules only the dirty closure: deps outside the scheduled set (clean cache hits) count as already satisfied, so a mid-node change yields just [[changed]…[downstream]]. Inspection helpers plan-batches/plan-width/plan-size/plan-flatten.

  • Phase 2 — Analyze on Datalog (analyze suite 16/16, total 36/36). lib/artdag/analyze.sx: artdag/edge-facts projects each (input-id, node-id) pair to an (edge ...) fact; artdag/analyze builds a dl-program-data db with recursive reachable(X,Y) :- edge(X,Y); edge(X,Y),reachable(Y,Z) (the acl/relations reachability shape). Query helpers deps-of/dependents-of (direct), reachable-from (transitive downstream), ancestors-of (transitive upstream), all returning sorted id lists. dirty-closure builds a db with dirty(Y) :- edge(X,Y), dirty(X) seeded by changed-node facts and returns the transitive forward closure — keystone test confirms changing a mid node dirties only it + downstream, leaving siblings/upstream clean. Content-ids work as opaque Datalog string constants.

  • Phase 1 — DAG model + content addressing (dag suite 20/20). lib/artdag/dag.sx: node {:op :inputs :params :commutative}; artdag/content-id = "node:" + a deterministic canonical serialization of (op, inputs, params) with dict keys sorted (param order-insensitive) and commutative ops' inputs sorted (input order-insensitive); non-commutative inputs ordered. artdag/build takes named entries (name op (input-names) params [commutative?]), validates (dangling refs, cycles via fixpoint topo), resolves input-names→content-ids, dedups identical subgraphs to one node + one id (shared across DAGs), returns {:ok :nodes :names :order}. No host sort/string<? — hand-rolled artdag/str<? over char-codes. Gotcha logged: SX equal? is representation-sensitive (cons-built vs vector lists compare unequal even when identical); = is true structural equality — conformance harness compares with =. lib/artdag/conformance.sh + scoreboard.

Blockers

(none)