Files
rose-ash/plans/artdag-on-sx.md
giles e3932237bd plans: briefings for 5 language chisels + host/relations/artdag/dream
Language-chisel briefings (plans already existed): elixir, idris, linear, maude,
probabilistic. host-on-sx briefing (native server now, Dream framework layer next).
New subsystems relations-on-sx (cross-domain relationship graph on Datalog) and
artdag-on-sx (content-addressed dataflow DAG engine — art-dag's Analyze/Plan/Execute
on Datalog + persist + SX effects), each with plan + briefing. Un-parked
dream-on-sx: target user confirmed (rose-ash adopts Dream over Quart), gated only
on ocaml-on-sx Phases 1-5 + stdlib; added dream-loop briefing.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 09:57:46 +00:00

144 lines
7.6 KiB
Markdown

# artdag-on-sx: Content-addressed dataflow DAG engine
art-dag is rose-ash's media-processing engine: a content-addressed DAG of effects,
executed in three phases — **Analyze → Plan → Execute**. Today it's a separate
Python stack (FastAPI + Celery + JAX + IPFS). Its *engine logic* — dependency
analysis, scheduling, content-addressed memoization, incremental recompute,
composable s-expression effects — is exactly the kind of declarative, substrate-shaped
work SX excels at, and art-dag already speaks s-expressions (its `sexp_effects`).
This subsystem rebuilds the **engine** on SX (not the pixel-pushing): the DAG model,
the three-phase pipeline, and the incremental/memoized executor. Media ops
themselves (JAX kernels, IPFS pins) stay opaque — modelled as abstract node
functions in tests, delegated to injected adapters in production. The win is that
the same SX substrates already serve the phases:
- **Analyze** (deps, reachability, dirtiness) → **Datalog** (recursive reachability —
the acl/relations shape).
- **Plan** (schedule under constraints) → topological batching now; **miniKanren**
for constraint-based scheduling later (optional).
- **Execute** (composable effects + content-addressed memo) → SX's own
`perform`/`cek-resume` + a **persist**-backed content-addressed result cache;
incremental recompute drops the cost of re-rendering to the dirty subgraph.
- **Optimize** (fuse/dedup effect pipelines) → term rewriting (a later, optional
consumer of `maude-on-sx`'s engine — see `plans/maude-on-sx.md`).
End-state: a content-addressed dataflow engine in `lib/artdag/` with analyze, plan,
incremental execute, effect-pipeline optimization, and a shared-cache federation
extension — the SX heart of art-dag, with media kernels and storage injected at the
edges.
## Status (rolling)
`bash lib/artdag/conformance.sh`**0/0** (not yet started)
## Ground rules
- **Scope:** only `lib/artdag/**` and `plans/artdag-on-sx.md`. Do **not** edit
`spec/`, `hosts/`, `shared/`, `lib/datalog/**`, `lib/persist/**`, or other
`lib/<lang>/`. You may **import** the public APIs of `lib/datalog/` (analyze) and
`lib/persist/` (memo cache / result store).
- **Design lineage, not code reuse.** The existing Python engine lives in the
repo's top-level `artdag/` (core/ engine, `sexp_effects/`, l1/ tasks). **Read it
for design lineage** (the 3-phase model, the effect language, content addressing)
— do **not** import or port its code; this is a fresh SX implementation.
- **Media ops are opaque.** A node's op is an abstract SX function over its inputs
in tests (e.g. `(fn (a b) …)`); real JAX/IPFS kernels are injected adapters
behind an interface. The engine is about *scheduling/memo/incremental*, never
pixels. Determinism: content ids and tests use only the node spec, never a clock.
- **Content addressing is structural.** A node's id is a deterministic digest of
`(op, sorted input-ids, params)` so identical subgraphs share an id and a cache
slot — the core property. Use a structural digest helper; if a real SHA-256/CID
is needed it's an injected host primitive (Blockers if absent), not hand-rolled.
- **Shared-file issues** → "Blockers" with a minimal repro; do not fix here.
- **SX files:** `sx-tree` MCP tools only; `sx_validate` after every edit.
- **Commits:** one feature per commit. Keep Progress log updated and tick boxes.
## Architecture sketch
```
DAG spec (nodes + edges) rendered results
│ ▲
▼ │
lib/artdag/dag.sx lib/artdag/execute.sx
— node = {op, inputs, params} — effect interp (perform per node)
— content-id = digest(spec) — content-addressed memo (persist)
— topo order, validate — incremental: only dirty nodes
│ ▲
▼ │
lib/artdag/analyze.sx lib/artdag/plan.sx
— Datalog: deps/dependents/reach — schedule: topo batches, parallelism
— dirty propagation (dirty closure) — (miniKanren constraints, later/opt)
│ ▲
▼ │
lib/artdag/optimize.sx lib/artdag/federation.sx
— fuse adjacent ops, dead-node elim, — shared cache by content-id (L2-style)
CSE (free from content-addressing) result import/export + provenance/trust
```
## Phase 1 — DAG model + content addressing
- [ ] `lib/artdag/dag.sx` — node `{:op :inputs :params}`; structural content-id =
digest of `(op, sorted input-ids, params)`; build/validate a DAG (no dangling
inputs, no accidental cycles); topological order
- [ ] identical-subgraph sharing: two structurally-equal nodes get the same id
- [ ] `lib/artdag/tests/dag.sx` — id determinism, subgraph sharing, cycle/dangling
rejection, topo order
- [ ] `lib/artdag/conformance.sh` + scoreboard
## Phase 2 — Analyze (Datalog)
- [ ] `lib/artdag/analyze.sx` — project edges to Datalog; `deps-of`, `dependents-of`,
transitive `reachable` (the recursive-reachability shape)
- [ ] **dirty propagation:** given a set of changed nodes, compute the transitive
set of dependents that must recompute (`dirty-closure`)
- [ ] `lib/artdag/tests/analyze.sx` — deep chains, diamonds, dirty closure
correctness, unaffected nodes stay clean
## Phase 3 — Plan
- [ ] `lib/artdag/plan.sx` — schedule into topological **batches** (each batch's
nodes have all deps satisfied → run in parallel); respect a max-parallelism limit
- [ ] plan over the *dirty* subset only (incremental plan)
- [ ] `lib/artdag/tests/plan.sx` — batch correctness, parallelism cap, dirty-only plan
- [ ] (optional/later) miniKanren constraint scheduling — flag, don't block on it
## Phase 4 — Execute (incremental + memoized)
- [ ] `lib/artdag/execute.sx` — interpret a plan: each node op runs via `perform`
(mocked op in tests); results keyed by content-id
- [ ] **content-addressed memo cache** backed by `lib/persist/`: a node whose
content-id already has a stored result is skipped (cache hit)
- [ ] **incremental execute:** re-running after a leaf change recomputes only the
dirty closure; everything else is a cache hit
- [ ] `lib/artdag/tests/execute.sx` — full run, cache-hit on re-run, incremental
recompute touches only dirty nodes (assert recompute count)
## Phase 5 — Effect-pipeline optimization
- [ ] `lib/artdag/optimize.sx` — rewrite the DAG before execution: dead-node
elimination (unreachable from outputs), common-subexpression sharing (free from
content ids), adjacent-op fusion
- [ ] optimizations are content-id-preserving where semantically identical; assert
the optimized DAG produces identical results
- [ ] `lib/artdag/tests/optimize.sx` — DCE, CSE dedup, fusion equivalence
- [ ] (optional/later) rule-based optimization via `maude-on-sx`'s rewriting engine —
flag the integration point, don't block on it
## Phase 6 — Federation (shared content-addressed cache)
- [ ] a result computed on one instance is reusable on another by content-id (the
L2-registry analog): export/import `{content-id → result}` with provenance
- [ ] trust gating — accept a remote result only from a trusted peer (mirror the
fed trust shape; mock the transport in tests)
- [ ] revocation/invalidation — drop a remote result if its provenance is withdrawn
- [ ] `lib/artdag/tests/fed.sx` — remote cache hit, trust gating, invalidation
## Progress log
(loop fills this in)
## Blockers
(loop fills this in)