Files
rose-ash/plans/artdag-on-sx.md
giles 298621e2be
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 1m11s
artdag: log api facade in plan progress
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 12:34:30 +00:00

257 lines
16 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# artdag-on-sx: Content-addressed dataflow DAG engine
art-dag is rose-ash's media-processing engine: a content-addressed DAG of effects,
executed in three phases — **Analyze → Plan → Execute**. Today it's a separate
Python stack (FastAPI + Celery + JAX + IPFS). Its *engine logic* — dependency
analysis, scheduling, content-addressed memoization, incremental recompute,
composable s-expression effects — is exactly the kind of declarative, substrate-shaped
work SX excels at, and art-dag already speaks s-expressions (its `sexp_effects`).
This subsystem rebuilds the **engine** on SX (not the pixel-pushing): the DAG model,
the three-phase pipeline, and the incremental/memoized executor. Media ops
themselves (JAX kernels, IPFS pins) stay opaque — modelled as abstract node
functions in tests, delegated to injected adapters in production. The win is that
the same SX substrates already serve the phases:
- **Analyze** (deps, reachability, dirtiness) → **Datalog** (recursive reachability —
the acl/relations shape).
- **Plan** (schedule under constraints) → topological batching now; **miniKanren**
for constraint-based scheduling later (optional).
- **Execute** (composable effects + content-addressed memo) → SX's own
`perform`/`cek-resume` + a **persist**-backed content-addressed result cache;
incremental recompute drops the cost of re-rendering to the dirty subgraph.
- **Optimize** (fuse/dedup effect pipelines) → term rewriting (a later, optional
consumer of `maude-on-sx`'s engine — see `plans/maude-on-sx.md`).
End-state: a content-addressed dataflow engine in `lib/artdag/` with analyze, plan,
incremental execute, effect-pipeline optimization, and a shared-cache federation
extension — the SX heart of art-dag, with media kernels and storage injected at the
edges.
## Status (rolling)
`bash lib/artdag/conformance.sh`**158/158** (10 suites: dag, analyze, plan, execute, optimize, fed, cost, serialize, stats, fault)
Base roadmap (Phases 16) COMPLETE. Now extending.
## Ground rules
- **Scope:** only `lib/artdag/**` and `plans/artdag-on-sx.md`. Do **not** edit
`spec/`, `hosts/`, `shared/`, `lib/datalog/**`, `lib/persist/**`, or other
`lib/<lang>/`. You may **import** the public APIs of `lib/datalog/` (analyze) and
`lib/persist/` (memo cache / result store).
- **Design lineage, not code reuse.** The existing Python engine lives in the
repo's top-level `artdag/` (core/ engine, `sexp_effects/`, l1/ tasks). **Read it
for design lineage** (the 3-phase model, the effect language, content addressing)
— do **not** import or port its code; this is a fresh SX implementation.
- **Media ops are opaque.** A node's op is an abstract SX function over its inputs
in tests (e.g. `(fn (a b) …)`); real JAX/IPFS kernels are injected adapters
behind an interface. The engine is about *scheduling/memo/incremental*, never
pixels. Determinism: content ids and tests use only the node spec, never a clock.
- **Content addressing is structural.** A node's id is a deterministic digest of
`(op, sorted input-ids, params)` so identical subgraphs share an id and a cache
slot — the core property. Use a structural digest helper; if a real SHA-256/CID
is needed it's an injected host primitive (Blockers if absent), not hand-rolled.
- **Shared-file issues** → "Blockers" with a minimal repro; do not fix here.
- **SX files:** `sx-tree` MCP tools only; `sx_validate` after every edit.
- **Commits:** one feature per commit. Keep Progress log updated and tick boxes.
## Architecture sketch
```
DAG spec (nodes + edges) rendered results
│ ▲
▼ │
lib/artdag/dag.sx lib/artdag/execute.sx
— node = {op, inputs, params} — effect interp (perform per node)
— content-id = digest(spec) — content-addressed memo (persist)
— topo order, validate — incremental: only dirty nodes
│ ▲
▼ │
lib/artdag/analyze.sx lib/artdag/plan.sx
— Datalog: deps/dependents/reach — schedule: topo batches, parallelism
— dirty propagation (dirty closure) — (miniKanren constraints, later/opt)
│ ▲
▼ │
lib/artdag/optimize.sx lib/artdag/federation.sx
— fuse adjacent ops, dead-node elim, — shared cache by content-id (L2-style)
CSE (free from content-addressing) result import/export + provenance/trust
```
## Phase 1 — DAG model + content addressing
- [x] `lib/artdag/dag.sx` — node `{:op :inputs :params}`; structural content-id =
digest of `(op, sorted input-ids, params)`; build/validate a DAG (no dangling
inputs, no accidental cycles); topological order
- [x] identical-subgraph sharing: two structurally-equal nodes get the same id
- [x] `lib/artdag/tests/dag.sx` — id determinism, subgraph sharing, cycle/dangling
rejection, topo order
- [x] `lib/artdag/conformance.sh` + scoreboard
## Phase 2 — Analyze (Datalog)
- [x] `lib/artdag/analyze.sx` — project edges to Datalog; `deps-of`, `dependents-of`,
transitive `reachable` (the recursive-reachability shape)
- [x] **dirty propagation:** given a set of changed nodes, compute the transitive
set of dependents that must recompute (`dirty-closure`)
- [x] `lib/artdag/tests/analyze.sx` — deep chains, diamonds, dirty closure
correctness, unaffected nodes stay clean
## Phase 3 — Plan
- [x] `lib/artdag/plan.sx` — schedule into topological **batches** (each batch's
nodes have all deps satisfied → run in parallel); respect a max-parallelism limit
- [x] plan over the *dirty* subset only (incremental plan)
- [x] `lib/artdag/tests/plan.sx` — batch correctness, parallelism cap, dirty-only plan
- [ ] (optional/later) miniKanren constraint scheduling — flag, don't block on it
## Phase 4 — Execute (incremental + memoized)
- [x] `lib/artdag/execute.sx` — interpret a plan: each node op runs via `perform`
(mocked op in tests); results keyed by content-id
- [x] **content-addressed memo cache** backed by `lib/persist/`: a node whose
content-id already has a stored result is skipped (cache hit)
- [x] **incremental execute:** re-running after a leaf change recomputes only the
dirty closure; everything else is a cache hit
- [x] `lib/artdag/tests/execute.sx` — full run, cache-hit on re-run, incremental
recompute touches only dirty nodes (assert recompute count)
## Phase 5 — Effect-pipeline optimization
- [x] `lib/artdag/optimize.sx` — rewrite the DAG before execution: dead-node
elimination (unreachable from outputs), common-subexpression sharing (free from
content ids), adjacent-op fusion
- [x] optimizations are content-id-preserving where semantically identical; assert
the optimized DAG produces identical results
- [x] `lib/artdag/tests/optimize.sx` — DCE, CSE dedup, fusion equivalence
- [ ] (optional/later) rule-based optimization via `maude-on-sx`'s rewriting engine —
flag the integration point, don't block on it
## Phase 6 — Federation (shared content-addressed cache)
- [x] a result computed on one instance is reusable on another by content-id (the
L2-registry analog): export/import `{content-id → result}` with provenance
- [x] trust gating — accept a remote result only from a trusted peer (mirror the
fed trust shape; mock the transport in tests)
- [x] revocation/invalidation — drop a remote result if its provenance is withdrawn
- [x] `lib/artdag/tests/fed.sx` — remote cache hit, trust gating, invalidation
## Progress log
- **Ext: public API facade** (`lib/artdag/api.sx`, total 158/158 unchanged).
Reference index matching the datalog/persist convention: canonical load order +
the full public surface across all 10 modules + `artdag/version`.
- **Ext: fault-tolerant execution** (fault suite 14/14, total 158/158).
`lib/artdag/fault.sx`: a node op may fail via `(artdag/fail reason)`; `run-safe`
confines the failure to that node + its transitive dependents (independent branches
still compute) and NEVER caches a failed result, so a later run with the fault fixed
recomputes only the failed closure and cache-hits the good nodes. `failed?`/`fail`
markers, `failed-nodes`/`failure-count`/`all-ok?`.
- **Ext: execution stats / cache analytics** (stats suite 12/12, total 144/144).
`lib/artdag/stats.sx` over an exec record: `hit-ratio`, `work-recomputed`/`work-saved`
(cost-weighted via the cost model), `savings-ratio`, and `exec-summary`. Cold run =
0 hit ratio / all work ran; warm rerun = ratio 1 / all work saved; incremental = saved
work counts unchanged nodes, ran work counts the dirty closure.
- **Ext: optimize composition pass** (optimize suite 22/22, total 132/132).
`artdag/optimize entries outputs fusible?` fuses the entry list then DCEs against
the output names (sinks survive fusion since they're never absorbed) — fewer nodes,
identical results. Verified: dead branch dropped + chain fused (4→2), an output that
is itself "dead" is retained, no-fusible-set still DCEs.
- **Ext: DAG wire serialization** (serialize suite 13/13, total 128/128).
`lib/artdag/serialize.sx`: `dag->wire` emits a topo-ordered list of
`(id op inputs params commutative)` records — plain lists with keyword-keyed param
dicts, which survive `write-to-string`/`read` (string-keyed node dicts do NOT; and
`()` reads back as nil, so `wire->dag` normalizes empty inputs). `wire->dag`
reconstructs a runnable dag by content-id (author names dropped); executes
identically to the original. `wire-verify` recomputes each record's content-id and
rejects tampered ids or mutated params under a stale id (self-verifying transport).
`dag->string`/`string->dag` for text transport. Gotcha logged: `sx-parse` primitive
is unbound in the server env — use `(read (open-input-string s))`.
- **Ext: cost-based scheduling** (cost suite 13/13, total 115/115).
`lib/artdag/cost.sx`: an injected `cost-fn (op params)` keeps media-op costs opaque
(`const-cost`, `op-cost table`). `critical-path` = longest weighted path (finish-time
fold over topo order) = min makespan with unlimited workers. `makespan dag plan
cost-fn` sums each batch's slowest node — full plan (cap 0) makespan == critical
path, serial (cap 1) == `total-work`. `speedup` = total-work / makespan. Verified
weighted paths follow heavy ops and capped makespan never dips below the critical
path.
- **Phase 6 — Federation (shared content-addressed cache)** (fed suite 15/15, total
102/102). `lib/artdag/federation.sx`: an instance = `{:cache <persist kv> :prov
{cid->origin-peer}}`. `fed-export` dumps the whole cache as `{:cid :result :peer}`
records tagged with the exporter's id; `fed-import` accepts only records from
trusted peers (trust gating) and records provenance; `fed-pull` imports via an
injected `fetch-fn(peer-id)` transport (mocked in tests). Because content-ids are
global, a trusted import makes the importer's run a pure cache hit (recompute 0) —
the L2-registry analog. `fed-invalidate peer` drops every result provenanced to a
peer from cache + prov (trust withdrawn → recompute), peer-scoped (other peers'
results survive) and leaving locally-computed (un-provenanced) results untouched.
ALL 6 PHASES COMPLETE.
- **Phase 5 — Effect-pipeline optimization** (optimize suite 18/18, total 87/87).
`lib/artdag/optimize.sx`: `artdag/dce dag outputs` keeps only the outputs plus
their transitive ancestors (via analyze), preserving surviving content-ids.
`artdag/cse` == build — structural sharing is inherent to content addressing, so
identical subexpressions collapse to one node/id and execute once (verified).
`artdag/fuse entries fusible?` rewrites entries: a maximal 1-to-1 chain of fusible
unary ops (predecessor used only by its single consumer, both fusible) collapses
into one `artdag/pipeline` node carrying ordered `{:op :params}` stages, fed by the
chain head's external input; leaves, fan-out nodes, and non-fusible ops never fuse.
`artdag/fusing-runner` wraps a base runner to replay pipeline stages — output
equivalent to the unfused DAG (asserted). Note: CSE auto-dedup means test fixtures
intended as distinct nodes must use distinct op/params.
- **Phase 4 — Execute (incremental + memoized)** (execute suite 15/15, total 69/69).
`lib/artdag/execute.sx`: `artdag/execute` folds a plan, computing each node via an
injected `runner (op params input-results)` (production = `perform` to JAX/IPFS
adapter; tests = a pure op-table) and memoizing the result in a `lib/persist/` kv
backend keyed by **content-id**. A node whose content-id is already cached is a hit
(skipped). The keystone falls out of content addressing: changing a leaf changes the
ids of its whole dirty closure, so re-running the full plan against a warm cache
recomputes exactly those nodes and hits the rest — verified by recompute/hit counts
(5 cold → 0 on rerun → 3 after one leaf change, sibling reused). Cross-DAG sharing
verified: a different DAG containing a shared subgraph cache-hits it. `run`/`run-dirty`
helpers; `result-of`/`recompute-count`/`hit-count`/`recomputed` inspection.
- **Phase 3 — Plan** (plan suite 18/18, total 54/54). `lib/artdag/plan.sx`:
`artdag/plan` schedules a dag into Kahn-wave topological batches — each batch's
nodes have all in-scope deps satisfied by earlier batches, so they run in parallel.
A `cap` (>0) splits any wave wider than the cap into consecutive sub-batches;
`cap<=0` is unlimited. `artdag/plan-dirty` schedules only the dirty closure: deps
outside the scheduled set (clean cache hits) count as already satisfied, so a
mid-node change yields just `[[changed]…[downstream]]`. Inspection helpers
`plan-batches`/`plan-width`/`plan-size`/`plan-flatten`.
- **Phase 2 — Analyze on Datalog** (analyze suite 16/16, total 36/36).
`lib/artdag/analyze.sx`: `artdag/edge-facts` projects each `(input-id, node-id)`
pair to an `(edge ...)` fact; `artdag/analyze` builds a `dl-program-data` db with
recursive `reachable(X,Y) :- edge(X,Y); edge(X,Y),reachable(Y,Z)` (the acl/relations
reachability shape). Query helpers `deps-of`/`dependents-of` (direct),
`reachable-from` (transitive downstream), `ancestors-of` (transitive upstream), all
returning sorted id lists. `dirty-closure` builds a db with `dirty(Y) :- edge(X,Y),
dirty(X)` seeded by changed-node facts and returns the transitive forward closure —
keystone test confirms changing a mid node dirties only it + downstream, leaving
siblings/upstream clean. Content-ids work as opaque Datalog string constants.
- **Phase 1 — DAG model + content addressing** (dag suite 20/20). `lib/artdag/dag.sx`:
node `{:op :inputs :params :commutative}`; `artdag/content-id` = `"node:"` + a
deterministic canonical serialization of `(op, inputs, params)` with dict keys
sorted (param order-insensitive) and commutative ops' inputs sorted (input
order-insensitive); non-commutative inputs ordered. `artdag/build` takes named
entries `(name op (input-names) params [commutative?])`, validates (dangling refs,
cycles via fixpoint topo), resolves input-names→content-ids, dedups identical
subgraphs to one node + one id (shared across DAGs), returns `{:ok :nodes :names
:order}`. No host `sort`/`string<?` — hand-rolled `artdag/str<?` over char-codes.
Gotcha logged: SX `equal?` is representation-sensitive (cons-built vs vector lists
compare unequal even when identical); `=` is true structural equality — conformance
harness compares with `=`. `lib/artdag/conformance.sh` + scoreboard.
## Blockers
(none)