Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 45s
(remote-failover addrs fn local) tries fn on each peer in order, moves to the next on any raised error, and runs the local node if every peer fails. Threads input, composes in sequences. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
145 lines
7.6 KiB
Markdown
145 lines
7.6 KiB
Markdown
# flow-on-sx: Durable DAG Workflows on Scheme
|
||
|
||
rose-ash needs workflows that survive restarts: content pipelines (write → review →
|
||
publish → federate), scheduled jobs (digest emails), multi-step user flows (signup,
|
||
confirm, onboard). art-dag is the precedent — DAG-of-tasks with pause/resume at IO
|
||
boundaries.
|
||
|
||
Scheme's `call/cc` + delimited continuations make pause/resume natural: a `suspend`
|
||
captures the continuation, serializes it as part of the flow record, and `resume`
|
||
re-enters at exactly that point. No state-machine bookkeeping by hand. R7RS-small is
|
||
already at 2644/2644 (see kernel/architecture status).
|
||
|
||
End-state: a Scheme-on-SX layer over the existing scheme runtime, with combinators
|
||
for sequence/parallel/branch/retry/timeout/suspend, persistent flow store, and a
|
||
federation extension via fed-sx for remote-node execution.
|
||
|
||
## Status (rolling)
|
||
|
||
`bash lib/flow/conformance.sh` → **87/87** (Phases 1-3 done; Phase 4 in progress: remote-node + failover done)
|
||
|
||
## Ground rules
|
||
|
||
- **Scope:** only touch `lib/flow/**` and `plans/flow-on-sx.md`. Do **not** edit
|
||
`spec/`, `hosts/`, `shared/`, `lib/scheme/**`, or other `lib/<lang>/`. You may
|
||
**import** from `lib/scheme/` (public API via `lib/scheme/scheme.sx`); do **not**
|
||
modify Scheme.
|
||
- **Shared-file issues** go under "Blockers" with a minimal repro; do not fix here.
|
||
- **SX files:** use `sx-tree` MCP tools only.
|
||
- **Architecture:** flow combinators are Scheme macros + procedures. Runtime is a
|
||
driver loop that walks the flow graph and invokes `call/cc` at `suspend` points.
|
||
Persistence layer serializes the continuation + open file/socket placeholders are
|
||
forbidden (continuations must be resumable across process restart).
|
||
- **art-dag awareness:** read `plans/art-dag*` if it exists for design lineage; do not
|
||
import code.
|
||
- **Commits:** one feature per commit. Keep Progress log updated and tick boxes.
|
||
|
||
## Architecture sketch
|
||
|
||
```
|
||
(defflow publish
|
||
(sequence
|
||
(write-content)
|
||
(parallel
|
||
(review)
|
||
(spell-check))
|
||
(cond approved?
|
||
(sequence (publish) (federate))
|
||
(notify-author))))
|
||
│
|
||
▼
|
||
lib/flow/spec.sx lib/flow/runtime.sx lib/flow/store.sx
|
||
— defflow — driver loop — append-only flow log
|
||
— sequence/parallel — node dispatch — checkpoint serialize
|
||
— cond/retry/timeout — call/cc at suspend — restart loader
|
||
— suspend/resume │ │
|
||
▼ ▼
|
||
lib/flow/api.sx lib/flow/remote.sx
|
||
— (flow/start name args) — fed-sx adapter
|
||
— (flow/resume id value) — node-on-peer execution
|
||
— (flow/cancel id) — failure handling
|
||
```
|
||
|
||
## Phase 1 — Declarative DAG + sequential execution
|
||
|
||
- [x] `lib/flow/spec.sx` — `defflow` macro, `sequence` combinator
|
||
- [x] node = Scheme procedure of one arg (upstream value threaded in); output
|
||
threads to next node (data flow). A node ignoring its arg is a thunk.
|
||
- [x] `parallel` combinator (sequential semantics for now — TRUE parallelism in Phase 3)
|
||
- [x] runtime executes a flow synchronously, returns final value
|
||
- [x] `lib/flow/api.sx` — `(flow/start flow input)` entry point
|
||
- [x] `lib/flow/tests/basic.sx` — 18 cases: single nodes, linear/nested sequence,
|
||
data flow between nodes, parallel-with-join, publish-shaped flow
|
||
- [x] `lib/flow/scoreboard.{json,md}`
|
||
- [x] `lib/flow/conformance.sh`
|
||
|
||
## Phase 2 — Control flow + error handling
|
||
|
||
- [x] `cond` combinator — predicate selects branch (named `branch`; `cond` is a
|
||
Scheme special form). `(branch pred then else)` — 6 tests.
|
||
- [x] `retry n` — re-runs node up to n attempts on a raised exception; last
|
||
exception propagates. Only raised exceptions are retried — `(fail ...)` values
|
||
pass through. 6 tests. (Backoff deferred: no wall clock in pure SX.)
|
||
- [x] `timeout budget` — bounds node execution via a **cooperative step budget**
|
||
(deterministic; no scheduler/clock in pure SX). Nodes opt in via `(tick)`;
|
||
`budget` ticks allowed, the next raises `flow-timeout`. Non-ticking nodes are
|
||
unbounded; budgets nest. 7 tests.
|
||
- [x] `try-catch` — exception handler with reified error: `(try-catch node handler)`
|
||
runs node; on raise, calls `(handler error)` and returns its value. 6 tests.
|
||
- [x] error model — exceptions vs explicit `(fail reason)` results: `fail`/`failed?`/
|
||
`fail-reason` produce/inspect failure values that flow downstream as data
|
||
(distinct from raised exceptions caught by retry/try-catch). 6 tests.
|
||
- [x] `lib/flow/tests/control.sx` — 31 cases: branch, error model, try-catch,
|
||
retry, timeout + compositions
|
||
|
||
## Phase 3 — Suspend / resume (the showcase)
|
||
|
||
- [x] `(suspend tag)` — guest call/cc is ESCAPE-ONLY (re-entry hangs), so resume
|
||
uses **deterministic replay**: suspend escapes to the driver as `(flow-suspended
|
||
tag)`; resume re-runs the flow, replaying resolved suspends from a `(tag value)`
|
||
log. No live continuation is ever serialized — the log is plain data.
|
||
- [x] `lib/flow/store.sx` — flow store: id→record `(flow input log status payload)`;
|
||
`flow-drive` runs a flow against a replay log.
|
||
- [x] `(flow/resume id value)` — append `(tag value)` to the log, re-drive; raw
|
||
result on completion, `(flow-suspended id tag)` on a further suspend.
|
||
- [x] `(flow/cancel id)` — mark cancelled; a later resume is rejected (stale replay
|
||
cannot wake a cancelled flow).
|
||
- [x] crash recovery — `flow-store-export` (procs nulled → plain data),
|
||
`flow-store-import!`, `flow-resumable-ids`. Records are name-keyed; resume
|
||
re-resolves the proc by name (defflow registers names), so a flow survives a
|
||
wiped store. `tests/recovery.sx`, 8 cases (export/wipe/import, resumable scan,
|
||
restart-at-every-step, replay-log survival).
|
||
- [x] `lib/flow/tests/suspend.sx` — 17 cases: start/resume/cancel, multi-step,
|
||
replay determinism, lifecycle guards, suspend-in-branch
|
||
- Harness: `flow-run` now reuses one env with a per-test reset (building the full
|
||
standard env 66× was too slow) — see `api.sx`.
|
||
|
||
## Phase 4 — Distributed nodes via fed-sx
|
||
|
||
- [x] `(remote-node addr fn)` — execute a node on a federation peer. Transport is
|
||
the fed-sx boundary, MOCKED via a peer registry (`flow-peer-register!`); raises
|
||
`flow-remote-unreachable` / `flow-remote-no-fn`. Composes with sequence, suspend,
|
||
retry. `tests/distributed.sx`, 7 cases.
|
||
- [x] failure semantics — `(remote-failover addrs fn local)` tries each peer in
|
||
order, moves to the next on any raised error, and runs the `local` node if every
|
||
peer fails. 6 tests.
|
||
- [ ] persistence across instances — flow state replicates via fed-sx
|
||
- [ ] handoff — flow started here can resume on a peer if the local instance is down
|
||
- [ ] `lib/flow/tests/distributed.sx` — federated flow scenarios (mock fed-sx in tests)
|
||
|
||
## Progress log
|
||
|
||
- **Phase 1 (combinators + sequential runtime).** Flow built as a Scheme prelude
|
||
loaded onto `scheme-standard-env`: a flow is a Scheme procedure `input -> output`,
|
||
so the whole flow runs inside the interpreter (sets up Phase 3 call/cc suspend).
|
||
Combinators `flow-node`/`flow-id`/`flow-const`/`sequence`/`parallel`/`defflow` in
|
||
`spec.sx`; `flow/start` + SX helpers (`flow-make-env`/`flow-run`) in `api.sx`.
|
||
18/18 in `tests/basic.sx`. Substrate constraints found: dotted rest params
|
||
`(a . rest)` and named `let` are unsupported in `lib/scheme/eval.sx`, so
|
||
combinators use `(lambda args ...)` variadics + top-level recursion. Scheme
|
||
strings come back boxed as `{:scm-string "..."}` — unwrap with `(get s :scm-string)`.
|
||
|
||
## Blockers
|
||
|
||
(none)
|