(remote-failover addrs fn local) tries fn on each peer in order, moves to the next on any raised error, and runs the local node if every peer fails. Threads input, composes in sequences. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
7.6 KiB
flow-on-sx: Durable DAG Workflows on Scheme
rose-ash needs workflows that survive restarts: content pipelines (write → review → publish → federate), scheduled jobs (digest emails), multi-step user flows (signup, confirm, onboard). art-dag is the precedent — DAG-of-tasks with pause/resume at IO boundaries.
Scheme's call/cc + delimited continuations make pause/resume natural: a suspend
captures the continuation, serializes it as part of the flow record, and resume
re-enters at exactly that point. No state-machine bookkeeping by hand. R7RS-small is
already at 2644/2644 (see kernel/architecture status).
End-state: a Scheme-on-SX layer over the existing scheme runtime, with combinators for sequence/parallel/branch/retry/timeout/suspend, persistent flow store, and a federation extension via fed-sx for remote-node execution.
Status (rolling)
bash lib/flow/conformance.sh → 87/87 (Phases 1-3 done; Phase 4 in progress: remote-node + failover done)
Ground rules
- Scope: only touch
lib/flow/**andplans/flow-on-sx.md. Do not editspec/,hosts/,shared/,lib/scheme/**, or otherlib/<lang>/. You may import fromlib/scheme/(public API vialib/scheme/scheme.sx); do not modify Scheme. - Shared-file issues go under "Blockers" with a minimal repro; do not fix here.
- SX files: use
sx-treeMCP tools only. - Architecture: flow combinators are Scheme macros + procedures. Runtime is a
driver loop that walks the flow graph and invokes
call/ccatsuspendpoints. Persistence layer serializes the continuation + open file/socket placeholders are forbidden (continuations must be resumable across process restart). - art-dag awareness: read
plans/art-dag*if it exists for design lineage; do not import code. - Commits: one feature per commit. Keep Progress log updated and tick boxes.
Architecture sketch
(defflow publish
(sequence
(write-content)
(parallel
(review)
(spell-check))
(cond approved?
(sequence (publish) (federate))
(notify-author))))
│
▼
lib/flow/spec.sx lib/flow/runtime.sx lib/flow/store.sx
— defflow — driver loop — append-only flow log
— sequence/parallel — node dispatch — checkpoint serialize
— cond/retry/timeout — call/cc at suspend — restart loader
— suspend/resume │ │
▼ ▼
lib/flow/api.sx lib/flow/remote.sx
— (flow/start name args) — fed-sx adapter
— (flow/resume id value) — node-on-peer execution
— (flow/cancel id) — failure handling
Phase 1 — Declarative DAG + sequential execution
lib/flow/spec.sx—defflowmacro,sequencecombinator- node = Scheme procedure of one arg (upstream value threaded in); output threads to next node (data flow). A node ignoring its arg is a thunk.
parallelcombinator (sequential semantics for now — TRUE parallelism in Phase 3)- runtime executes a flow synchronously, returns final value
lib/flow/api.sx—(flow/start flow input)entry pointlib/flow/tests/basic.sx— 18 cases: single nodes, linear/nested sequence, data flow between nodes, parallel-with-join, publish-shaped flowlib/flow/scoreboard.{json,md}lib/flow/conformance.sh
Phase 2 — Control flow + error handling
condcombinator — predicate selects branch (namedbranch;condis a Scheme special form).(branch pred then else)— 6 tests.retry n— re-runs node up to n attempts on a raised exception; last exception propagates. Only raised exceptions are retried —(fail ...)values pass through. 6 tests. (Backoff deferred: no wall clock in pure SX.)timeout budget— bounds node execution via a cooperative step budget (deterministic; no scheduler/clock in pure SX). Nodes opt in via(tick);budgetticks allowed, the next raisesflow-timeout. Non-ticking nodes are unbounded; budgets nest. 7 tests.try-catch— exception handler with reified error:(try-catch node handler)runs node; on raise, calls(handler error)and returns its value. 6 tests.- error model — exceptions vs explicit
(fail reason)results:fail/failed?/fail-reasonproduce/inspect failure values that flow downstream as data (distinct from raised exceptions caught by retry/try-catch). 6 tests. lib/flow/tests/control.sx— 31 cases: branch, error model, try-catch, retry, timeout + compositions
Phase 3 — Suspend / resume (the showcase)
(suspend tag)— guest call/cc is ESCAPE-ONLY (re-entry hangs), so resume uses deterministic replay: suspend escapes to the driver as(flow-suspended tag); resume re-runs the flow, replaying resolved suspends from a(tag value)log. No live continuation is ever serialized — the log is plain data.lib/flow/store.sx— flow store: id→record(flow input log status payload);flow-driveruns a flow against a replay log.(flow/resume id value)— append(tag value)to the log, re-drive; raw result on completion,(flow-suspended id tag)on a further suspend.(flow/cancel id)— mark cancelled; a later resume is rejected (stale replay cannot wake a cancelled flow).- crash recovery —
flow-store-export(procs nulled → plain data),flow-store-import!,flow-resumable-ids. Records are name-keyed; resume re-resolves the proc by name (defflow registers names), so a flow survives a wiped store.tests/recovery.sx, 8 cases (export/wipe/import, resumable scan, restart-at-every-step, replay-log survival). lib/flow/tests/suspend.sx— 17 cases: start/resume/cancel, multi-step, replay determinism, lifecycle guards, suspend-in-branch- Harness:
flow-runnow reuses one env with a per-test reset (building the full standard env 66× was too slow) — seeapi.sx.
Phase 4 — Distributed nodes via fed-sx
(remote-node addr fn)— execute a node on a federation peer. Transport is the fed-sx boundary, MOCKED via a peer registry (flow-peer-register!); raisesflow-remote-unreachable/flow-remote-no-fn. Composes with sequence, suspend, retry.tests/distributed.sx, 7 cases.- failure semantics —
(remote-failover addrs fn local)tries each peer in order, moves to the next on any raised error, and runs thelocalnode if every peer fails. 6 tests. - persistence across instances — flow state replicates via fed-sx
- handoff — flow started here can resume on a peer if the local instance is down
lib/flow/tests/distributed.sx— federated flow scenarios (mock fed-sx in tests)
Progress log
- Phase 1 (combinators + sequential runtime). Flow built as a Scheme prelude
loaded onto
scheme-standard-env: a flow is a Scheme procedureinput -> output, so the whole flow runs inside the interpreter (sets up Phase 3 call/cc suspend). Combinatorsflow-node/flow-id/flow-const/sequence/parallel/defflowinspec.sx;flow/start+ SX helpers (flow-make-env/flow-run) inapi.sx. 18/18 intests/basic.sx. Substrate constraints found: dotted rest params(a . rest)and namedletare unsupported inlib/scheme/eval.sx, so combinators use(lambda args ...)variadics + top-level recursion. Scheme strings come back boxed as{:scm-string "..."}— unwrap with(get s :scm-string).
Blockers
(none)