artdag: fault-tolerant execution — confined failure, cache never poisoned + 14 tests
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 1m4s

fault.sx run-safe: a node op may return (artdag/fail reason); failure is confined
to that node + downstream dependents while independent branches compute, and failed
results are never cached, so retry after a fix recomputes only the failed closure
and hits the good nodes. fault 14/14, total 158/158.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-07 12:32:14 +00:00
parent f29d8c047b
commit 28fed7c799
6 changed files with 216 additions and 6 deletions

View File

@@ -30,7 +30,7 @@ edges.
## Status (rolling)
`bash lib/artdag/conformance.sh`**144/144** (9 suites: dag, analyze, plan, execute, optimize, fed, cost, serialize, stats)
`bash lib/artdag/conformance.sh`**158/158** (10 suites: dag, analyze, plan, execute, optimize, fed, cost, serialize, stats, fault)
Base roadmap (Phases 16) COMPLETE. Now extending.
@@ -138,6 +138,13 @@ lib/artdag/optimize.sx lib/artdag/federation.sx
## Progress log
- **Ext: fault-tolerant execution** (fault suite 14/14, total 158/158).
`lib/artdag/fault.sx`: a node op may fail via `(artdag/fail reason)`; `run-safe`
confines the failure to that node + its transitive dependents (independent branches
still compute) and NEVER caches a failed result, so a later run with the fault fixed
recomputes only the failed closure and cache-hits the good nodes. `failed?`/`fail`
markers, `failed-nodes`/`failure-count`/`all-ok?`.
- **Ext: execution stats / cache analytics** (stats suite 12/12, total 144/144).
`lib/artdag/stats.sx` over an exec record: `hit-ratio`, `work-recomputed`/`work-saved`
(cost-weighted via the cost model), `savings-ratio`, and `exec-summary`. Cold run =