Guest Scheme call/cc is escape-only (re-entry hangs), so durable resume uses deterministic replay: suspend escapes to the driver; resume re-runs the flow and replays resolved suspends from a (tag value) log. No live continuation is ever serialized — persisted state is plain data, survives restart. Adds flow/start (now state-returning, backward compatible), flow/resume, flow/cancel, store.sx. Harness reuses one env with a per-test reset (full env rebuild 66x was too slow). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
7.1 KiB
flow-on-sx: Durable DAG Workflows on Scheme
rose-ash needs workflows that survive restarts: content pipelines (write → review → publish → federate), scheduled jobs (digest emails), multi-step user flows (signup, confirm, onboard). art-dag is the precedent — DAG-of-tasks with pause/resume at IO boundaries.
Scheme's call/cc + delimited continuations make pause/resume natural: a suspend
captures the continuation, serializes it as part of the flow record, and resume
re-enters at exactly that point. No state-machine bookkeeping by hand. R7RS-small is
already at 2644/2644 (see kernel/architecture status).
End-state: a Scheme-on-SX layer over the existing scheme runtime, with combinators for sequence/parallel/branch/retry/timeout/suspend, persistent flow store, and a federation extension via fed-sx for remote-node execution.
Status (rolling)
bash lib/flow/conformance.sh → 66/66 (Phases 1-2 done; Phase 3 suspend/resume/cancel done, crash-recovery next)
Ground rules
- Scope: only touch
lib/flow/**andplans/flow-on-sx.md. Do not editspec/,hosts/,shared/,lib/scheme/**, or otherlib/<lang>/. You may import fromlib/scheme/(public API vialib/scheme/scheme.sx); do not modify Scheme. - Shared-file issues go under "Blockers" with a minimal repro; do not fix here.
- SX files: use
sx-treeMCP tools only. - Architecture: flow combinators are Scheme macros + procedures. Runtime is a
driver loop that walks the flow graph and invokes
call/ccatsuspendpoints. Persistence layer serializes the continuation + open file/socket placeholders are forbidden (continuations must be resumable across process restart). - art-dag awareness: read
plans/art-dag*if it exists for design lineage; do not import code. - Commits: one feature per commit. Keep Progress log updated and tick boxes.
Architecture sketch
(defflow publish
(sequence
(write-content)
(parallel
(review)
(spell-check))
(cond approved?
(sequence (publish) (federate))
(notify-author))))
│
▼
lib/flow/spec.sx lib/flow/runtime.sx lib/flow/store.sx
— defflow — driver loop — append-only flow log
— sequence/parallel — node dispatch — checkpoint serialize
— cond/retry/timeout — call/cc at suspend — restart loader
— suspend/resume │ │
▼ ▼
lib/flow/api.sx lib/flow/remote.sx
— (flow/start name args) — fed-sx adapter
— (flow/resume id value) — node-on-peer execution
— (flow/cancel id) — failure handling
Phase 1 — Declarative DAG + sequential execution
lib/flow/spec.sx—defflowmacro,sequencecombinator- node = Scheme procedure of one arg (upstream value threaded in); output threads to next node (data flow). A node ignoring its arg is a thunk.
parallelcombinator (sequential semantics for now — TRUE parallelism in Phase 3)- runtime executes a flow synchronously, returns final value
lib/flow/api.sx—(flow/start flow input)entry pointlib/flow/tests/basic.sx— 18 cases: single nodes, linear/nested sequence, data flow between nodes, parallel-with-join, publish-shaped flowlib/flow/scoreboard.{json,md}lib/flow/conformance.sh
Phase 2 — Control flow + error handling
condcombinator — predicate selects branch (namedbranch;condis a Scheme special form).(branch pred then else)— 6 tests.retry n— re-runs node up to n attempts on a raised exception; last exception propagates. Only raised exceptions are retried —(fail ...)values pass through. 6 tests. (Backoff deferred: no wall clock in pure SX.)timeout budget— bounds node execution via a cooperative step budget (deterministic; no scheduler/clock in pure SX). Nodes opt in via(tick);budgetticks allowed, the next raisesflow-timeout. Non-ticking nodes are unbounded; budgets nest. 7 tests.try-catch— exception handler with reified error:(try-catch node handler)runs node; on raise, calls(handler error)and returns its value. 6 tests.- error model — exceptions vs explicit
(fail reason)results:fail/failed?/fail-reasonproduce/inspect failure values that flow downstream as data (distinct from raised exceptions caught by retry/try-catch). 6 tests. lib/flow/tests/control.sx— 31 cases: branch, error model, try-catch, retry, timeout + compositions
Phase 3 — Suspend / resume (the showcase)
(suspend tag)— guest call/cc is ESCAPE-ONLY (re-entry hangs), so resume uses deterministic replay: suspend escapes to the driver as(flow-suspended tag); resume re-runs the flow, replaying resolved suspends from a(tag value)log. No live continuation is ever serialized — the log is plain data.lib/flow/store.sx— flow store: id→record(flow input log status payload);flow-driveruns a flow against a replay log.(flow/resume id value)— append(tag value)to the log, re-drive; raw result on completion,(flow-suspended id tag)on a further suspend.(flow/cancel id)— mark cancelled; a later resume is rejected (stale replay cannot wake a cancelled flow).- crash recovery — on restart, scan store for paused flows, mark resumable
lib/flow/tests/suspend.sx— 17 cases: start/resume/cancel, multi-step, replay determinism, lifecycle guards, suspend-in-branch- Harness:
flow-runnow reuses one env with a per-test reset (building the full standard env 66× was too slow) — seeapi.sx.
Phase 4 — Distributed nodes via fed-sx
(remote-node addr fn args)— execute node on a federation peer- failure semantics — retry on different peer, fall through to local
- persistence across instances — flow state replicates via fed-sx
- handoff — flow started here can resume on a peer if the local instance is down
lib/flow/tests/distributed.sx— federated flow scenarios (mock fed-sx in tests)
Progress log
- Phase 1 (combinators + sequential runtime). Flow built as a Scheme prelude
loaded onto
scheme-standard-env: a flow is a Scheme procedureinput -> output, so the whole flow runs inside the interpreter (sets up Phase 3 call/cc suspend). Combinatorsflow-node/flow-id/flow-const/sequence/parallel/defflowinspec.sx;flow/start+ SX helpers (flow-make-env/flow-run) inapi.sx. 18/18 intests/basic.sx. Substrate constraints found: dotted rest params(a . rest)and namedletare unsupported inlib/scheme/eval.sx, so combinators use(lambda args ...)variadics + top-level recursion. Scheme strings come back boxed as{:scm-string "..."}— unwrap with(get s :scm-string).
Blockers
(none)