Files
rose-ash/plans/flow-on-sx.md
giles f8722b3b08
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 45s
flow: remote-failover — try peers in order, fall through to local + 6 tests
(remote-failover addrs fn local) tries fn on each peer in order, moves to the next
on any raised error, and runs the local node if every peer fails. Threads input,
composes in sequences.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 17:44:04 +00:00

145 lines
7.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# flow-on-sx: Durable DAG Workflows on Scheme
rose-ash needs workflows that survive restarts: content pipelines (write → review →
publish → federate), scheduled jobs (digest emails), multi-step user flows (signup,
confirm, onboard). art-dag is the precedent — DAG-of-tasks with pause/resume at IO
boundaries.
Scheme's `call/cc` + delimited continuations make pause/resume natural: a `suspend`
captures the continuation, serializes it as part of the flow record, and `resume`
re-enters at exactly that point. No state-machine bookkeeping by hand. R7RS-small is
already at 2644/2644 (see kernel/architecture status).
End-state: a Scheme-on-SX layer over the existing scheme runtime, with combinators
for sequence/parallel/branch/retry/timeout/suspend, persistent flow store, and a
federation extension via fed-sx for remote-node execution.
## Status (rolling)
`bash lib/flow/conformance.sh`**87/87** (Phases 1-3 done; Phase 4 in progress: remote-node + failover done)
## Ground rules
- **Scope:** only touch `lib/flow/**` and `plans/flow-on-sx.md`. Do **not** edit
`spec/`, `hosts/`, `shared/`, `lib/scheme/**`, or other `lib/<lang>/`. You may
**import** from `lib/scheme/` (public API via `lib/scheme/scheme.sx`); do **not**
modify Scheme.
- **Shared-file issues** go under "Blockers" with a minimal repro; do not fix here.
- **SX files:** use `sx-tree` MCP tools only.
- **Architecture:** flow combinators are Scheme macros + procedures. Runtime is a
driver loop that walks the flow graph and invokes `call/cc` at `suspend` points.
Persistence layer serializes the continuation + open file/socket placeholders are
forbidden (continuations must be resumable across process restart).
- **art-dag awareness:** read `plans/art-dag*` if it exists for design lineage; do not
import code.
- **Commits:** one feature per commit. Keep Progress log updated and tick boxes.
## Architecture sketch
```
(defflow publish
(sequence
(write-content)
(parallel
(review)
(spell-check))
(cond approved?
(sequence (publish) (federate))
(notify-author))))
lib/flow/spec.sx lib/flow/runtime.sx lib/flow/store.sx
— defflow — driver loop — append-only flow log
— sequence/parallel — node dispatch — checkpoint serialize
— cond/retry/timeout — call/cc at suspend — restart loader
— suspend/resume │ │
▼ ▼
lib/flow/api.sx lib/flow/remote.sx
— (flow/start name args) — fed-sx adapter
— (flow/resume id value) — node-on-peer execution
— (flow/cancel id) — failure handling
```
## Phase 1 — Declarative DAG + sequential execution
- [x] `lib/flow/spec.sx``defflow` macro, `sequence` combinator
- [x] node = Scheme procedure of one arg (upstream value threaded in); output
threads to next node (data flow). A node ignoring its arg is a thunk.
- [x] `parallel` combinator (sequential semantics for now — TRUE parallelism in Phase 3)
- [x] runtime executes a flow synchronously, returns final value
- [x] `lib/flow/api.sx``(flow/start flow input)` entry point
- [x] `lib/flow/tests/basic.sx` — 18 cases: single nodes, linear/nested sequence,
data flow between nodes, parallel-with-join, publish-shaped flow
- [x] `lib/flow/scoreboard.{json,md}`
- [x] `lib/flow/conformance.sh`
## Phase 2 — Control flow + error handling
- [x] `cond` combinator — predicate selects branch (named `branch`; `cond` is a
Scheme special form). `(branch pred then else)` — 6 tests.
- [x] `retry n` — re-runs node up to n attempts on a raised exception; last
exception propagates. Only raised exceptions are retried — `(fail ...)` values
pass through. 6 tests. (Backoff deferred: no wall clock in pure SX.)
- [x] `timeout budget` — bounds node execution via a **cooperative step budget**
(deterministic; no scheduler/clock in pure SX). Nodes opt in via `(tick)`;
`budget` ticks allowed, the next raises `flow-timeout`. Non-ticking nodes are
unbounded; budgets nest. 7 tests.
- [x] `try-catch` — exception handler with reified error: `(try-catch node handler)`
runs node; on raise, calls `(handler error)` and returns its value. 6 tests.
- [x] error model — exceptions vs explicit `(fail reason)` results: `fail`/`failed?`/
`fail-reason` produce/inspect failure values that flow downstream as data
(distinct from raised exceptions caught by retry/try-catch). 6 tests.
- [x] `lib/flow/tests/control.sx` — 31 cases: branch, error model, try-catch,
retry, timeout + compositions
## Phase 3 — Suspend / resume (the showcase)
- [x] `(suspend tag)` — guest call/cc is ESCAPE-ONLY (re-entry hangs), so resume
uses **deterministic replay**: suspend escapes to the driver as `(flow-suspended
tag)`; resume re-runs the flow, replaying resolved suspends from a `(tag value)`
log. No live continuation is ever serialized — the log is plain data.
- [x] `lib/flow/store.sx` — flow store: id→record `(flow input log status payload)`;
`flow-drive` runs a flow against a replay log.
- [x] `(flow/resume id value)` — append `(tag value)` to the log, re-drive; raw
result on completion, `(flow-suspended id tag)` on a further suspend.
- [x] `(flow/cancel id)` — mark cancelled; a later resume is rejected (stale replay
cannot wake a cancelled flow).
- [x] crash recovery — `flow-store-export` (procs nulled → plain data),
`flow-store-import!`, `flow-resumable-ids`. Records are name-keyed; resume
re-resolves the proc by name (defflow registers names), so a flow survives a
wiped store. `tests/recovery.sx`, 8 cases (export/wipe/import, resumable scan,
restart-at-every-step, replay-log survival).
- [x] `lib/flow/tests/suspend.sx` — 17 cases: start/resume/cancel, multi-step,
replay determinism, lifecycle guards, suspend-in-branch
- Harness: `flow-run` now reuses one env with a per-test reset (building the full
standard env 66× was too slow) — see `api.sx`.
## Phase 4 — Distributed nodes via fed-sx
- [x] `(remote-node addr fn)` — execute a node on a federation peer. Transport is
the fed-sx boundary, MOCKED via a peer registry (`flow-peer-register!`); raises
`flow-remote-unreachable` / `flow-remote-no-fn`. Composes with sequence, suspend,
retry. `tests/distributed.sx`, 7 cases.
- [x] failure semantics — `(remote-failover addrs fn local)` tries each peer in
order, moves to the next on any raised error, and runs the `local` node if every
peer fails. 6 tests.
- [ ] persistence across instances — flow state replicates via fed-sx
- [ ] handoff — flow started here can resume on a peer if the local instance is down
- [ ] `lib/flow/tests/distributed.sx` — federated flow scenarios (mock fed-sx in tests)
## Progress log
- **Phase 1 (combinators + sequential runtime).** Flow built as a Scheme prelude
loaded onto `scheme-standard-env`: a flow is a Scheme procedure `input -> output`,
so the whole flow runs inside the interpreter (sets up Phase 3 call/cc suspend).
Combinators `flow-node`/`flow-id`/`flow-const`/`sequence`/`parallel`/`defflow` in
`spec.sx`; `flow/start` + SX helpers (`flow-make-env`/`flow-run`) in `api.sx`.
18/18 in `tests/basic.sx`. Substrate constraints found: dotted rest params
`(a . rest)` and named `let` are unsupported in `lib/scheme/eval.sx`, so
combinators use `(lambda args ...)` variadics + top-level recursion. Scheme
strings come back boxed as `{:scm-string "..."}` — unwrap with `(get s :scm-string)`.
## Blockers
(none)