Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 53s
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
401 lines
26 KiB
Markdown
401 lines
26 KiB
Markdown
# artdag-on-sx: Content-addressed dataflow DAG engine
|
||
|
||
art-dag is rose-ash's media-processing engine: a content-addressed DAG of effects,
|
||
executed in three phases — **Analyze → Plan → Execute**. Today it's a separate
|
||
Python stack (FastAPI + Celery + JAX + IPFS). Its *engine logic* — dependency
|
||
analysis, scheduling, content-addressed memoization, incremental recompute,
|
||
composable s-expression effects — is exactly the kind of declarative, substrate-shaped
|
||
work SX excels at, and art-dag already speaks s-expressions (its `sexp_effects`).
|
||
|
||
This subsystem rebuilds the **engine** on SX (not the pixel-pushing): the DAG model,
|
||
the three-phase pipeline, and the incremental/memoized executor. Media ops
|
||
themselves (JAX kernels, IPFS pins) stay opaque — modelled as abstract node
|
||
functions in tests, delegated to injected adapters in production. The win is that
|
||
the same SX substrates already serve the phases:
|
||
|
||
- **Analyze** (deps, reachability, dirtiness) → **Datalog** (recursive reachability —
|
||
the acl/relations shape).
|
||
- **Plan** (schedule under constraints) → topological batching now; **miniKanren**
|
||
for constraint-based scheduling later (optional).
|
||
- **Execute** (composable effects + content-addressed memo) → SX's own
|
||
`perform`/`cek-resume` + a **persist**-backed content-addressed result cache;
|
||
incremental recompute drops the cost of re-rendering to the dirty subgraph.
|
||
- **Optimize** (fuse/dedup effect pipelines) → term rewriting, the declarative
|
||
consumer of `maude-on-sx`'s engine — now ACTIVE as Phase 7 (lib/maude is on
|
||
this branch; fit proven in `lib/maude/tests/effects.sx`).
|
||
|
||
End-state: a content-addressed dataflow engine in `lib/artdag/` with analyze, plan,
|
||
incremental execute, effect-pipeline optimization, and a shared-cache federation
|
||
extension — the SX heart of art-dag, with media kernels and storage injected at the
|
||
edges.
|
||
|
||
## Status (rolling)
|
||
|
||
`bash lib/artdag/conformance.sh` → **198/198** (11 suites: dag, analyze, plan, execute, optimize, fed, cost, serialize, stats, fault, maude-optimize)
|
||
|
||
Base roadmap (Phases 1–6) COMPLETE + Phase 7 (maude rule-based optimization) COMPLETE
|
||
(only optional miniKanren scheduling remains). Now hardening only.
|
||
|
||
## Integration / merge status (2026-06-28)
|
||
|
||
**READY TO MERGE `loops/artdag` → `architecture`.** `origin/architecture`'s `lib/artdag/`
|
||
is stale — it predates the maude-bridge, so it is missing ALL of Phase 7
|
||
(`maude-bridge.sx` + `optimize-rules.sx` both absent). `loops/artdag` is 9 commits ahead
|
||
of `origin/architecture` (the Phase 7 chain `657d8061..4a02a9c4` + the architecture-merge
|
||
`7f7957ba` that pulled in `lib/maude`). Merge facts:
|
||
- **Dependency satisfied on target:** Phase 7 consumes `lib/maude` (incl. `confluence.sx`),
|
||
already on architecture (`0963aa51`); this branch re-absorbed it at `7f7957ba`.
|
||
- **Conflict-free for artdag files:** `origin/architecture` is an ancestor of HEAD and the
|
||
branch already absorbed architecture, so `lib/artdag/**` + this plan are purely additive.
|
||
- **Target is LOCAL architecture** (146 ahead of `origin/architecture`, which is kept
|
||
deliberately stale), matching how other loops landed.
|
||
- **Pushed (2026-06-28):** `loops/artdag` is now on `origin` through `cd2ad707` (credential
|
||
restored). It is 10 commits ahead of `origin/architecture`. A maintainer does the merge
|
||
(the loop agent must NOT touch `architecture` itself).
|
||
|
||
## Ground rules
|
||
|
||
- **Scope:** only `lib/artdag/**` and `plans/artdag-on-sx.md`. Do **not** edit
|
||
`spec/`, `hosts/`, `shared/`, `lib/datalog/**`, `lib/persist/**`, or other
|
||
`lib/<lang>/`. You may **import** the public APIs of `lib/datalog/` (analyze) and
|
||
`lib/persist/` (memo cache / result store).
|
||
- **Design lineage, not code reuse.** The existing Python engine lives in the
|
||
repo's top-level `artdag/` (core/ engine, `sexp_effects/`, l1/ tasks). **Read it
|
||
for design lineage** (the 3-phase model, the effect language, content addressing)
|
||
— do **not** import or port its code; this is a fresh SX implementation.
|
||
- **Media ops are opaque.** A node's op is an abstract SX function over its inputs
|
||
in tests (e.g. `(fn (a b) …)`); real JAX/IPFS kernels are injected adapters
|
||
behind an interface. The engine is about *scheduling/memo/incremental*, never
|
||
pixels. Determinism: content ids and tests use only the node spec, never a clock.
|
||
- **Content addressing is structural.** A node's id is a deterministic digest of
|
||
`(op, sorted input-ids, params)` so identical subgraphs share an id and a cache
|
||
slot — the core property. Use a structural digest helper; if a real SHA-256/CID
|
||
is needed it's an injected host primitive (Blockers if absent), not hand-rolled.
|
||
- **Shared-file issues** → "Blockers" with a minimal repro; do not fix here.
|
||
- **SX files:** `sx-tree` MCP tools only; `sx_validate` after every edit.
|
||
- **Commits:** one feature per commit. Keep Progress log updated and tick boxes.
|
||
|
||
## Architecture sketch
|
||
|
||
```
|
||
DAG spec (nodes + edges) rendered results
|
||
│ ▲
|
||
▼ │
|
||
lib/artdag/dag.sx lib/artdag/execute.sx
|
||
— node = {op, inputs, params} — effect interp (perform per node)
|
||
— content-id = digest(spec) — content-addressed memo (persist)
|
||
— topo order, validate — incremental: only dirty nodes
|
||
│ ▲
|
||
▼ │
|
||
lib/artdag/analyze.sx lib/artdag/plan.sx
|
||
— Datalog: deps/dependents/reach — schedule: topo batches, parallelism
|
||
— dirty propagation (dirty closure) — (miniKanren constraints, later/opt)
|
||
│ ▲
|
||
▼ │
|
||
lib/artdag/optimize.sx lib/artdag/federation.sx
|
||
— fuse adjacent ops, dead-node elim, — shared cache by content-id (L2-style)
|
||
CSE (free from content-addressing) result import/export + provenance/trust
|
||
```
|
||
|
||
## Phase 1 — DAG model + content addressing
|
||
|
||
- [x] `lib/artdag/dag.sx` — node `{:op :inputs :params}`; structural content-id =
|
||
digest of `(op, sorted input-ids, params)`; build/validate a DAG (no dangling
|
||
inputs, no accidental cycles); topological order
|
||
- [x] identical-subgraph sharing: two structurally-equal nodes get the same id
|
||
- [x] `lib/artdag/tests/dag.sx` — id determinism, subgraph sharing, cycle/dangling
|
||
rejection, topo order
|
||
- [x] `lib/artdag/conformance.sh` + scoreboard
|
||
|
||
## Phase 2 — Analyze (Datalog)
|
||
|
||
- [x] `lib/artdag/analyze.sx` — project edges to Datalog; `deps-of`, `dependents-of`,
|
||
transitive `reachable` (the recursive-reachability shape)
|
||
- [x] **dirty propagation:** given a set of changed nodes, compute the transitive
|
||
set of dependents that must recompute (`dirty-closure`)
|
||
- [x] `lib/artdag/tests/analyze.sx` — deep chains, diamonds, dirty closure
|
||
correctness, unaffected nodes stay clean
|
||
|
||
## Phase 3 — Plan
|
||
|
||
- [x] `lib/artdag/plan.sx` — schedule into topological **batches** (each batch's
|
||
nodes have all deps satisfied → run in parallel); respect a max-parallelism limit
|
||
- [x] plan over the *dirty* subset only (incremental plan)
|
||
- [x] `lib/artdag/tests/plan.sx` — batch correctness, parallelism cap, dirty-only plan
|
||
- [ ] (optional/later) miniKanren constraint scheduling — flag, don't block on it
|
||
|
||
## Phase 4 — Execute (incremental + memoized)
|
||
|
||
- [x] `lib/artdag/execute.sx` — interpret a plan: each node op runs via `perform`
|
||
(mocked op in tests); results keyed by content-id
|
||
- [x] **content-addressed memo cache** backed by `lib/persist/`: a node whose
|
||
content-id already has a stored result is skipped (cache hit)
|
||
- [x] **incremental execute:** re-running after a leaf change recomputes only the
|
||
dirty closure; everything else is a cache hit
|
||
- [x] `lib/artdag/tests/execute.sx` — full run, cache-hit on re-run, incremental
|
||
recompute touches only dirty nodes (assert recompute count)
|
||
|
||
## Phase 5 — Effect-pipeline optimization
|
||
|
||
- [x] `lib/artdag/optimize.sx` — rewrite the DAG before execution: dead-node
|
||
elimination (unreachable from outputs), common-subexpression sharing (free from
|
||
content ids), adjacent-op fusion
|
||
- [x] optimizations are content-id-preserving where semantically identical; assert
|
||
the optimized DAG produces identical results
|
||
- [x] `lib/artdag/tests/optimize.sx` — DCE, CSE dedup, fusion equivalence
|
||
- [x] (superseded by Phase 7) integration point flagged
|
||
|
||
## Phase 6 — Federation (shared content-addressed cache)
|
||
|
||
- [x] a result computed on one instance is reusable on another by content-id (the
|
||
L2-registry analog): export/import `{content-id → result}` with provenance
|
||
- [x] trust gating — accept a remote result only from a trusted peer (mirror the
|
||
fed trust shape; mock the transport in tests)
|
||
- [x] revocation/invalidation — drop a remote result if its provenance is withdrawn
|
||
- [x] `lib/artdag/tests/fed.sx` — remote cache hit, trust gating, invalidation
|
||
|
||
## Phase 7 — Rule-based optimization via maude-on-sx (ACTIVE — start here)
|
||
|
||
`lib/maude/` is now present on this branch (term rewriting modulo assoc/comm/id;
|
||
262 tests). The fit is PROVEN — see `lib/maude/tests/effects.sx`: artdag's
|
||
optimise passes (fusion, no-op/dead-op elim, identity elim, CSE/idempotent
|
||
dedup) expressed as equations, where the optimised pipeline IS the normal form
|
||
and confluence ⇒ a stable content id. Reimplement Phase-5 optimisation
|
||
declaratively and prove it equivalent to the hand-written `optimize.sx`.
|
||
|
||
- [x] `lib/artdag/maude-bridge.sx` — adapter between an effect DAG node and a
|
||
maude term: `(op, sorted-input-ids, params)` ⟷ `(mau/app op (args...))`.
|
||
Params become constant subterms; for commutative ops use a maude AC operator
|
||
so input order is irrelevant (mirror the content-id's order-insensitivity).
|
||
Round-trip `dag→term→dag` must be identity on canonical form.
|
||
- [x] `lib/artdag/optimize-rules.sx` — the optimisation laws as a maude module
|
||
(fusion / identity / no-op / dedup), one `eq` per law; `mau/creduce` the term,
|
||
bridge the normal form back to a DAG (`artdag/opt-reduce`: dag→opt-term→creduce→
|
||
decode→`artdag/build`).
|
||
- [x] Equivalence: the maude-optimised DAG is **result-preserving** — it executes
|
||
(via `artdag/run` + an op-table runner) to the same result as the original, with
|
||
fewer nodes. NB: maude's fusion uses a *smaller* normal form (`blur(I, M+N)`) than
|
||
`optimize.sx`'s `artdag/pipeline` stage nodes, so structural identity with
|
||
`optimize.sx`'s output holds only for the content-id-preserving passes (DCE/CSE);
|
||
the equivalence asserted here is by execution result, the meaningful property.
|
||
- [x] Confluence / CID-stability check: **consume `mau/confluent?` from
|
||
`lib/maude/confluence.sx`** — do NOT build your own checker. Assert the
|
||
optimisation rule module is confluent (`(mau/confluent? rules-module)` is
|
||
true) so different rewrite orders reach the same normal form and the optimised
|
||
pipeline's content id is stable. On failure, `mau/non-joinable-pairs` +
|
||
`mau/cp->str` name the offending critical pair (fix the rule set, e.g. add the
|
||
joining law — see the `f(a)=b,a=c` → add `f(c)=b` pattern). It is a syntactic
|
||
critical-pair checker (exact for free/constructor ops; AC overlaps under-
|
||
approximated but joined via canonical form) — that matches what this optimiser
|
||
needs. API also: `mau/critical-pairs`, `mau/joinable?`.
|
||
- [x] `lib/artdag/tests/maude-optimize.sx` — bridge round-trip, each law,
|
||
result-preserving equivalence, `(mau/confluent? rules-module)` holds (33 tests).
|
||
- [x] cost-directed: `artdag/opt-improvement`/`opt-cheaper?` compare the optimised
|
||
cone vs the original cone under an injected `cost-fn` — optimisation is never a
|
||
pessimisation (fewer nodes + fused ops ⇒ total-work and critical-path never
|
||
increase, under const and radius-weighted costs). (optional, not pursued)
|
||
miniKanren scheduling.
|
||
|
||
maude is a READ-ONLY consumed substrate (like datalog/persist) — load it, don't
|
||
edit it. Entry points: `mau/parse-module`, `mau/creduce`/`mau/creduce->str`,
|
||
`mau/canon`/`mau/ac-equal?`, `mau/term->maude`, `mau/app`/`mau/const`/`mau/var`
|
||
+ accessors. Load order: see `lib/maude/conformance.conf` PRELOADS.
|
||
Gotchas (from building it): `id:` affects matching/canon only, not auto-
|
||
reduction — write explicit identity `eq`s or read `mau/canon`; `mau/match-all`/
|
||
`search` enumerate ALL matches (exponential on many identical AC args — keep
|
||
rule sets small + confluent), single-step rewriting is short-circuit and fast;
|
||
juxtaposition `__`/multi-`_` mixfix unparsed (use explicit infix ops); `.` can't
|
||
be an op token.
|
||
|
||
## Progress log
|
||
|
||
- **2026-06-19 Phase 7 — confluence gate is non-vacuous** (maude-optimize 40/40, total
|
||
198/198). Added a regression proving `mau/confluent?` actually discriminates: the
|
||
Peano-arithmetic variant of the same laws (`0 + N = N`, `s M + N = s(M+N)` instead of
|
||
`_+_ [assoc comm id: 0]`) is asserted **non-confluent** with named non-joinable pairs,
|
||
so the green "opt module is confluent" is real evidence, not a checker that rubber-stamps
|
||
everything. Documents the exact AC-vs-Peano design choice as an executable contrast.
|
||
|
||
- **2026-06-19 Phase 7 — cost-directed: optimisation is never a pessimisation**
|
||
(maude-optimize 38/38, total 196/196). `artdag/opt-improvement dag id cost-fn` compares
|
||
the original output cone (`artdag/dce` to `id`) against the maude-reduced DAG under an
|
||
injected `cost-fn (op params)` — returns `{:before :after :before-path :after-path
|
||
:optimized}` (total-work + critical-path each side). `artdag/opt-cheaper?` asserts
|
||
`after <= before`. Under monotone per-node costs the optimised DAG never costs more:
|
||
the 5-node chain drops to 2 (const work 5→2, critical path 5→2) and stays cheaper under
|
||
a radius-weighted cost (5→3 — one `blur(M+N)` costs the same as the two it replaces);
|
||
the `over` dedup and an untouched DAG are both `opt-cheaper?`. Consumes `cost.sx`'s
|
||
`total-work`/`critical-path`. Phase 7 base + the "(later)" cost box now done; only the
|
||
optional miniKanren scheduling remains.
|
||
|
||
- **2026-06-19 Phase 7 — opt-reduce: bridge normal form back to a DAG** (maude-optimize
|
||
33/33, total 191/191). `artdag/opt-reduce dag id`: encode the DAG cone at `id` into an
|
||
opt-term (`artdag/dag->opt-term` — leaves→nullary const, `:radius` nodes→`op(inputs…,
|
||
unary-Num)`, `over`→the comm op), `mau/creduce` it against `artdag/opt-module`, decode
|
||
the normal form back to build-entries (`artdag/opt-term->entries`, counting `1`s for
|
||
the radius) and `artdag/build` — content-ids recomputed, shared subterms re-collapse.
|
||
Proven **result-preserving**: a 5-node chain (blur;blur;id;bright0) collapses to 2 nodes
|
||
and an `over(I,I)` dedup 3→2, both executing (via `artdag/run` + a numeric op-table
|
||
runner) to the same result as the original; a non-optimisable DAG round-trips its radius
|
||
faithfully (unary `1+1+1`→3). Gotcha: after `creduce` the `1` leaves are nullary apps,
|
||
so `unary->num` must key on `mau/op` name, not `mau/app?`. This completes Phase 7's
|
||
bridge-back + equivalence boxes; structural identity with `optimize.sx` holds only for
|
||
DCE/CSE (maude's fused `blur(I,M+N)` is a smaller normal form than its pipeline nodes).
|
||
|
||
- **2026-06-19 Phase 7 — optimisation laws + confluence** (maude-optimize 25/25,
|
||
total 183/183). `lib/artdag/optimize-rules.sx`: the effect-pipeline optimisation
|
||
passes as a maude module `ARTDAGOPT` — `id(I)=I`, `blur(I,0)=I`, `bright(I,0)=I`,
|
||
adjacent fusion `blur(blur(I,M),N)=blur(I,M+N)` (+bright), idempotent
|
||
`over(I,I)=I`. Key result: the radius algebra is `_+_ [assoc comm id: 0]` (unary
|
||
`1`s), NOT Peano successor rules — the Peano version is non-confluent (6
|
||
non-joinable critical pairs: `M+0` sticks, `(A+B)+C` vs `A+(B+C)` doesn't join),
|
||
whereas AC+id makes `(mau/confluent? artdag/opt-module)` true (0 non-joinable
|
||
pairs) by joining those overlaps via canonical form. So the optimised pipeline's
|
||
normal form — and hence its content id — is stable under any rewrite order.
|
||
`artdag/opt-normal-form`/`opt-reduce-term`/`opt-canon` reduce a surface pipeline;
|
||
`opt-same-form?` decides content-id equality; `opt-confluent?`/`opt-non-joinable`
|
||
/`opt-non-joinable->strs` consume `lib/maude/confluence.sx` (loaded into the
|
||
maude-optimize suite). Tests: confluence holds, every law fires, fusion is
|
||
rewrite-order stable, laws compose, dedup vs no-dedup, distinct pipelines stay
|
||
distinct. Gotcha: compare reduced *strings* (`mau/creduce->str`) — canon term
|
||
objects compare unequal under `=` even when the printed normal form is identical.
|
||
Remaining Phase 7: bridge the maude normal form back to a runnable DAG +
|
||
equivalence-with-`optimize.sx`.
|
||
|
||
- **2026-06-07 Phase 7 — maude-bridge** (maude-optimize suite 14/14, total 172/172).
|
||
`lib/artdag/maude-bridge.sx`: lossless adapter between an artdag effect DAG and a
|
||
maude term. `artdag/dag->term dag id` walks from a sink, emitting `(mau/app op
|
||
input-terms ++ meta)` per node; the trailing `artdag:meta` subterm carries params
|
||
(a `write-to-string` token) + a commutativity flag, so no side table is needed.
|
||
`artdag/term->dag` synthesizes build-entries (names `mb0…`) and re-runs
|
||
`artdag/build`, which recomputes content-ids (name-independent) and re-collapses
|
||
shared subterms — so `artdag/mb-roundtrip` is the identity on canonical form:
|
||
sink id, op, params, and the full node survive; commutative ids stay
|
||
order-insensitive; a diamond's shared node de-duplicates back to one (4→4).
|
||
Maude entry points used: `mau/app`/`mau/const`/`mau/op`/`mau/args`/`mau/app?`.
|
||
`conformance.sh` now gates the maude PRELOADS + bridge load to the maude-optimize
|
||
suite (other suites stay maude-free). Gotcha: `write-to-string`/`read` round-trips
|
||
keyword dicts but may reorder keys — fine, dicts compare structurally with `=`.
|
||
|
||
- **Ext: public API facade** (`lib/artdag/api.sx`, total 158/158 unchanged).
|
||
Reference index matching the datalog/persist convention: canonical load order +
|
||
the full public surface across all 10 modules + `artdag/version`.
|
||
|
||
- **Ext: fault-tolerant execution** (fault suite 14/14, total 158/158).
|
||
`lib/artdag/fault.sx`: a node op may fail via `(artdag/fail reason)`; `run-safe`
|
||
confines the failure to that node + its transitive dependents (independent branches
|
||
still compute) and NEVER caches a failed result, so a later run with the fault fixed
|
||
recomputes only the failed closure and cache-hits the good nodes. `failed?`/`fail`
|
||
markers, `failed-nodes`/`failure-count`/`all-ok?`.
|
||
|
||
- **Ext: execution stats / cache analytics** (stats suite 12/12, total 144/144).
|
||
`lib/artdag/stats.sx` over an exec record: `hit-ratio`, `work-recomputed`/`work-saved`
|
||
(cost-weighted via the cost model), `savings-ratio`, and `exec-summary`. Cold run =
|
||
0 hit ratio / all work ran; warm rerun = ratio 1 / all work saved; incremental = saved
|
||
work counts unchanged nodes, ran work counts the dirty closure.
|
||
|
||
- **Ext: optimize composition pass** (optimize suite 22/22, total 132/132).
|
||
`artdag/optimize entries outputs fusible?` fuses the entry list then DCEs against
|
||
the output names (sinks survive fusion since they're never absorbed) — fewer nodes,
|
||
identical results. Verified: dead branch dropped + chain fused (4→2), an output that
|
||
is itself "dead" is retained, no-fusible-set still DCEs.
|
||
|
||
- **Ext: DAG wire serialization** (serialize suite 13/13, total 128/128).
|
||
`lib/artdag/serialize.sx`: `dag->wire` emits a topo-ordered list of
|
||
`(id op inputs params commutative)` records — plain lists with keyword-keyed param
|
||
dicts, which survive `write-to-string`/`read` (string-keyed node dicts do NOT; and
|
||
`()` reads back as nil, so `wire->dag` normalizes empty inputs). `wire->dag`
|
||
reconstructs a runnable dag by content-id (author names dropped); executes
|
||
identically to the original. `wire-verify` recomputes each record's content-id and
|
||
rejects tampered ids or mutated params under a stale id (self-verifying transport).
|
||
`dag->string`/`string->dag` for text transport. Gotcha logged: `sx-parse` primitive
|
||
is unbound in the server env — use `(read (open-input-string s))`.
|
||
|
||
- **Ext: cost-based scheduling** (cost suite 13/13, total 115/115).
|
||
`lib/artdag/cost.sx`: an injected `cost-fn (op params)` keeps media-op costs opaque
|
||
(`const-cost`, `op-cost table`). `critical-path` = longest weighted path (finish-time
|
||
fold over topo order) = min makespan with unlimited workers. `makespan dag plan
|
||
cost-fn` sums each batch's slowest node — full plan (cap 0) makespan == critical
|
||
path, serial (cap 1) == `total-work`. `speedup` = total-work / makespan. Verified
|
||
weighted paths follow heavy ops and capped makespan never dips below the critical
|
||
path.
|
||
|
||
- **Phase 6 — Federation (shared content-addressed cache)** (fed suite 15/15, total
|
||
102/102). `lib/artdag/federation.sx`: an instance = `{:cache <persist kv> :prov
|
||
{cid->origin-peer}}`. `fed-export` dumps the whole cache as `{:cid :result :peer}`
|
||
records tagged with the exporter's id; `fed-import` accepts only records from
|
||
trusted peers (trust gating) and records provenance; `fed-pull` imports via an
|
||
injected `fetch-fn(peer-id)` transport (mocked in tests). Because content-ids are
|
||
global, a trusted import makes the importer's run a pure cache hit (recompute 0) —
|
||
the L2-registry analog. `fed-invalidate peer` drops every result provenanced to a
|
||
peer from cache + prov (trust withdrawn → recompute), peer-scoped (other peers'
|
||
results survive) and leaving locally-computed (un-provenanced) results untouched.
|
||
ALL 6 PHASES COMPLETE.
|
||
|
||
- **Phase 5 — Effect-pipeline optimization** (optimize suite 18/18, total 87/87).
|
||
`lib/artdag/optimize.sx`: `artdag/dce dag outputs` keeps only the outputs plus
|
||
their transitive ancestors (via analyze), preserving surviving content-ids.
|
||
`artdag/cse` == build — structural sharing is inherent to content addressing, so
|
||
identical subexpressions collapse to one node/id and execute once (verified).
|
||
`artdag/fuse entries fusible?` rewrites entries: a maximal 1-to-1 chain of fusible
|
||
unary ops (predecessor used only by its single consumer, both fusible) collapses
|
||
into one `artdag/pipeline` node carrying ordered `{:op :params}` stages, fed by the
|
||
chain head's external input; leaves, fan-out nodes, and non-fusible ops never fuse.
|
||
`artdag/fusing-runner` wraps a base runner to replay pipeline stages — output
|
||
equivalent to the unfused DAG (asserted). Note: CSE auto-dedup means test fixtures
|
||
intended as distinct nodes must use distinct op/params.
|
||
|
||
- **Phase 4 — Execute (incremental + memoized)** (execute suite 15/15, total 69/69).
|
||
`lib/artdag/execute.sx`: `artdag/execute` folds a plan, computing each node via an
|
||
injected `runner (op params input-results)` (production = `perform` to JAX/IPFS
|
||
adapter; tests = a pure op-table) and memoizing the result in a `lib/persist/` kv
|
||
backend keyed by **content-id**. A node whose content-id is already cached is a hit
|
||
(skipped). The keystone falls out of content addressing: changing a leaf changes the
|
||
ids of its whole dirty closure, so re-running the full plan against a warm cache
|
||
recomputes exactly those nodes and hits the rest — verified by recompute/hit counts
|
||
(5 cold → 0 on rerun → 3 after one leaf change, sibling reused). Cross-DAG sharing
|
||
verified: a different DAG containing a shared subgraph cache-hits it. `run`/`run-dirty`
|
||
helpers; `result-of`/`recompute-count`/`hit-count`/`recomputed` inspection.
|
||
|
||
- **Phase 3 — Plan** (plan suite 18/18, total 54/54). `lib/artdag/plan.sx`:
|
||
`artdag/plan` schedules a dag into Kahn-wave topological batches — each batch's
|
||
nodes have all in-scope deps satisfied by earlier batches, so they run in parallel.
|
||
A `cap` (>0) splits any wave wider than the cap into consecutive sub-batches;
|
||
`cap<=0` is unlimited. `artdag/plan-dirty` schedules only the dirty closure: deps
|
||
outside the scheduled set (clean cache hits) count as already satisfied, so a
|
||
mid-node change yields just `[[changed]…[downstream]]`. Inspection helpers
|
||
`plan-batches`/`plan-width`/`plan-size`/`plan-flatten`.
|
||
|
||
- **Phase 2 — Analyze on Datalog** (analyze suite 16/16, total 36/36).
|
||
`lib/artdag/analyze.sx`: `artdag/edge-facts` projects each `(input-id, node-id)`
|
||
pair to an `(edge ...)` fact; `artdag/analyze` builds a `dl-program-data` db with
|
||
recursive `reachable(X,Y) :- edge(X,Y); edge(X,Y),reachable(Y,Z)` (the acl/relations
|
||
reachability shape). Query helpers `deps-of`/`dependents-of` (direct),
|
||
`reachable-from` (transitive downstream), `ancestors-of` (transitive upstream), all
|
||
returning sorted id lists. `dirty-closure` builds a db with `dirty(Y) :- edge(X,Y),
|
||
dirty(X)` seeded by changed-node facts and returns the transitive forward closure —
|
||
keystone test confirms changing a mid node dirties only it + downstream, leaving
|
||
siblings/upstream clean. Content-ids work as opaque Datalog string constants.
|
||
|
||
- **Phase 1 — DAG model + content addressing** (dag suite 20/20). `lib/artdag/dag.sx`:
|
||
node `{:op :inputs :params :commutative}`; `artdag/content-id` = `"node:"` + a
|
||
deterministic canonical serialization of `(op, inputs, params)` with dict keys
|
||
sorted (param order-insensitive) and commutative ops' inputs sorted (input
|
||
order-insensitive); non-commutative inputs ordered. `artdag/build` takes named
|
||
entries `(name op (input-names) params [commutative?])`, validates (dangling refs,
|
||
cycles via fixpoint topo), resolves input-names→content-ids, dedups identical
|
||
subgraphs to one node + one id (shared across DAGs), returns `{:ok :nodes :names
|
||
:order}`. No host `sort`/`string<?` — hand-rolled `artdag/str<?` over char-codes.
|
||
Gotcha logged: SX `equal?` is representation-sensitive (cons-built vs vector lists
|
||
compare unequal even when identical); `=` is true structural equality — conformance
|
||
harness compares with `=`. `lib/artdag/conformance.sh` + scoreboard.
|
||
|
||
## Blockers
|
||
|
||
- ~~**Push to `origin/loops/artdag` unavailable**~~ — RESOLVED 2026-06-28: credential
|
||
restored in `~/.git-credentials`, push works; `loops/artdag` pushed through `cd2ad707`.
|
||
- **sx-tree edit tools raise yojson `"Expected string, got null"` in this worktree**
|
||
(`sx_insert_near`/`sx_replace_*`/`sx_insert_child`). Only `sx_write_file` works for
|
||
writes; assemble the full file and `sx_write_file` (or `cp` a prepared file +
|
||
`sx_validate`). Same gotcha logged in the content/dream loops.
|