Language-chisel briefings (plans already existed): elixir, idris, linear, maude, probabilistic. host-on-sx briefing (native server now, Dream framework layer next). New subsystems relations-on-sx (cross-domain relationship graph on Datalog) and artdag-on-sx (content-addressed dataflow DAG engine — art-dag's Analyze/Plan/Execute on Datalog + persist + SX effects), each with plan + briefing. Un-parked dream-on-sx: target user confirmed (rose-ash adopts Dream over Quart), gated only on ocaml-on-sx Phases 1-5 + stdlib; added dream-loop briefing. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
7.6 KiB
artdag-on-sx: Content-addressed dataflow DAG engine
art-dag is rose-ash's media-processing engine: a content-addressed DAG of effects,
executed in three phases — Analyze → Plan → Execute. Today it's a separate
Python stack (FastAPI + Celery + JAX + IPFS). Its engine logic — dependency
analysis, scheduling, content-addressed memoization, incremental recompute,
composable s-expression effects — is exactly the kind of declarative, substrate-shaped
work SX excels at, and art-dag already speaks s-expressions (its sexp_effects).
This subsystem rebuilds the engine on SX (not the pixel-pushing): the DAG model, the three-phase pipeline, and the incremental/memoized executor. Media ops themselves (JAX kernels, IPFS pins) stay opaque — modelled as abstract node functions in tests, delegated to injected adapters in production. The win is that the same SX substrates already serve the phases:
- Analyze (deps, reachability, dirtiness) → Datalog (recursive reachability — the acl/relations shape).
- Plan (schedule under constraints) → topological batching now; miniKanren for constraint-based scheduling later (optional).
- Execute (composable effects + content-addressed memo) → SX's own
perform/cek-resume+ a persist-backed content-addressed result cache; incremental recompute drops the cost of re-rendering to the dirty subgraph. - Optimize (fuse/dedup effect pipelines) → term rewriting (a later, optional
consumer of
maude-on-sx's engine — seeplans/maude-on-sx.md).
End-state: a content-addressed dataflow engine in lib/artdag/ with analyze, plan,
incremental execute, effect-pipeline optimization, and a shared-cache federation
extension — the SX heart of art-dag, with media kernels and storage injected at the
edges.
Status (rolling)
bash lib/artdag/conformance.sh → 0/0 (not yet started)
Ground rules
- Scope: only
lib/artdag/**andplans/artdag-on-sx.md. Do not editspec/,hosts/,shared/,lib/datalog/**,lib/persist/**, or otherlib/<lang>/. You may import the public APIs oflib/datalog/(analyze) andlib/persist/(memo cache / result store). - Design lineage, not code reuse. The existing Python engine lives in the
repo's top-level
artdag/(core/ engine,sexp_effects/, l1/ tasks). Read it for design lineage (the 3-phase model, the effect language, content addressing) — do not import or port its code; this is a fresh SX implementation. - Media ops are opaque. A node's op is an abstract SX function over its inputs
in tests (e.g.
(fn (a b) …)); real JAX/IPFS kernels are injected adapters behind an interface. The engine is about scheduling/memo/incremental, never pixels. Determinism: content ids and tests use only the node spec, never a clock. - Content addressing is structural. A node's id is a deterministic digest of
(op, sorted input-ids, params)so identical subgraphs share an id and a cache slot — the core property. Use a structural digest helper; if a real SHA-256/CID is needed it's an injected host primitive (Blockers if absent), not hand-rolled. - Shared-file issues → "Blockers" with a minimal repro; do not fix here.
- SX files:
sx-treeMCP tools only;sx_validateafter every edit. - Commits: one feature per commit. Keep Progress log updated and tick boxes.
Architecture sketch
DAG spec (nodes + edges) rendered results
│ ▲
▼ │
lib/artdag/dag.sx lib/artdag/execute.sx
— node = {op, inputs, params} — effect interp (perform per node)
— content-id = digest(spec) — content-addressed memo (persist)
— topo order, validate — incremental: only dirty nodes
│ ▲
▼ │
lib/artdag/analyze.sx lib/artdag/plan.sx
— Datalog: deps/dependents/reach — schedule: topo batches, parallelism
— dirty propagation (dirty closure) — (miniKanren constraints, later/opt)
│ ▲
▼ │
lib/artdag/optimize.sx lib/artdag/federation.sx
— fuse adjacent ops, dead-node elim, — shared cache by content-id (L2-style)
CSE (free from content-addressing) result import/export + provenance/trust
Phase 1 — DAG model + content addressing
lib/artdag/dag.sx— node{:op :inputs :params}; structural content-id = digest of(op, sorted input-ids, params); build/validate a DAG (no dangling inputs, no accidental cycles); topological order- identical-subgraph sharing: two structurally-equal nodes get the same id
lib/artdag/tests/dag.sx— id determinism, subgraph sharing, cycle/dangling rejection, topo orderlib/artdag/conformance.sh+ scoreboard
Phase 2 — Analyze (Datalog)
lib/artdag/analyze.sx— project edges to Datalog;deps-of,dependents-of, transitivereachable(the recursive-reachability shape)- dirty propagation: given a set of changed nodes, compute the transitive
set of dependents that must recompute (
dirty-closure) lib/artdag/tests/analyze.sx— deep chains, diamonds, dirty closure correctness, unaffected nodes stay clean
Phase 3 — Plan
lib/artdag/plan.sx— schedule into topological batches (each batch's nodes have all deps satisfied → run in parallel); respect a max-parallelism limit- plan over the dirty subset only (incremental plan)
lib/artdag/tests/plan.sx— batch correctness, parallelism cap, dirty-only plan- (optional/later) miniKanren constraint scheduling — flag, don't block on it
Phase 4 — Execute (incremental + memoized)
lib/artdag/execute.sx— interpret a plan: each node op runs viaperform(mocked op in tests); results keyed by content-id- content-addressed memo cache backed by
lib/persist/: a node whose content-id already has a stored result is skipped (cache hit) - incremental execute: re-running after a leaf change recomputes only the dirty closure; everything else is a cache hit
lib/artdag/tests/execute.sx— full run, cache-hit on re-run, incremental recompute touches only dirty nodes (assert recompute count)
Phase 5 — Effect-pipeline optimization
lib/artdag/optimize.sx— rewrite the DAG before execution: dead-node elimination (unreachable from outputs), common-subexpression sharing (free from content ids), adjacent-op fusion- optimizations are content-id-preserving where semantically identical; assert the optimized DAG produces identical results
lib/artdag/tests/optimize.sx— DCE, CSE dedup, fusion equivalence- (optional/later) rule-based optimization via
maude-on-sx's rewriting engine — flag the integration point, don't block on it
Phase 6 — Federation (shared content-addressed cache)
- a result computed on one instance is reusable on another by content-id (the
L2-registry analog): export/import
{content-id → result}with provenance - trust gating — accept a remote result only from a trusted peer (mirror the fed trust shape; mock the transport in tests)
- revocation/invalidation — drop a remote result if its provenance is withdrawn
lib/artdag/tests/fed.sx— remote cache hit, trust gating, invalidation
Progress log
(loop fills this in)
Blockers
(loop fills this in)