Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 1m4s
(remote-node addr fn) runs a node on a federation peer. Transport is the fed-sx boundary, mocked by a peer registry (flow-peer-register!); raises flow-remote-unreachable / flow-remote-no-fn. Composes with sequence/suspend/retry. Also fixes conformance.sh to load remote.sx before api.sx. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
143 lines
7.5 KiB
Markdown
143 lines
7.5 KiB
Markdown
# flow-on-sx: Durable DAG Workflows on Scheme
|
||
|
||
rose-ash needs workflows that survive restarts: content pipelines (write → review →
|
||
publish → federate), scheduled jobs (digest emails), multi-step user flows (signup,
|
||
confirm, onboard). art-dag is the precedent — DAG-of-tasks with pause/resume at IO
|
||
boundaries.
|
||
|
||
Scheme's `call/cc` + delimited continuations make pause/resume natural: a `suspend`
|
||
captures the continuation, serializes it as part of the flow record, and `resume`
|
||
re-enters at exactly that point. No state-machine bookkeeping by hand. R7RS-small is
|
||
already at 2644/2644 (see kernel/architecture status).
|
||
|
||
End-state: a Scheme-on-SX layer over the existing scheme runtime, with combinators
|
||
for sequence/parallel/branch/retry/timeout/suspend, persistent flow store, and a
|
||
federation extension via fed-sx for remote-node execution.
|
||
|
||
## Status (rolling)
|
||
|
||
`bash lib/flow/conformance.sh` → **81/81** (Phases 1-3 done; Phase 4 in progress: remote-node done)
|
||
|
||
## Ground rules
|
||
|
||
- **Scope:** only touch `lib/flow/**` and `plans/flow-on-sx.md`. Do **not** edit
|
||
`spec/`, `hosts/`, `shared/`, `lib/scheme/**`, or other `lib/<lang>/`. You may
|
||
**import** from `lib/scheme/` (public API via `lib/scheme/scheme.sx`); do **not**
|
||
modify Scheme.
|
||
- **Shared-file issues** go under "Blockers" with a minimal repro; do not fix here.
|
||
- **SX files:** use `sx-tree` MCP tools only.
|
||
- **Architecture:** flow combinators are Scheme macros + procedures. Runtime is a
|
||
driver loop that walks the flow graph and invokes `call/cc` at `suspend` points.
|
||
Persistence layer serializes the continuation + open file/socket placeholders are
|
||
forbidden (continuations must be resumable across process restart).
|
||
- **art-dag awareness:** read `plans/art-dag*` if it exists for design lineage; do not
|
||
import code.
|
||
- **Commits:** one feature per commit. Keep Progress log updated and tick boxes.
|
||
|
||
## Architecture sketch
|
||
|
||
```
|
||
(defflow publish
|
||
(sequence
|
||
(write-content)
|
||
(parallel
|
||
(review)
|
||
(spell-check))
|
||
(cond approved?
|
||
(sequence (publish) (federate))
|
||
(notify-author))))
|
||
│
|
||
▼
|
||
lib/flow/spec.sx lib/flow/runtime.sx lib/flow/store.sx
|
||
— defflow — driver loop — append-only flow log
|
||
— sequence/parallel — node dispatch — checkpoint serialize
|
||
— cond/retry/timeout — call/cc at suspend — restart loader
|
||
— suspend/resume │ │
|
||
▼ ▼
|
||
lib/flow/api.sx lib/flow/remote.sx
|
||
— (flow/start name args) — fed-sx adapter
|
||
— (flow/resume id value) — node-on-peer execution
|
||
— (flow/cancel id) — failure handling
|
||
```
|
||
|
||
## Phase 1 — Declarative DAG + sequential execution
|
||
|
||
- [x] `lib/flow/spec.sx` — `defflow` macro, `sequence` combinator
|
||
- [x] node = Scheme procedure of one arg (upstream value threaded in); output
|
||
threads to next node (data flow). A node ignoring its arg is a thunk.
|
||
- [x] `parallel` combinator (sequential semantics for now — TRUE parallelism in Phase 3)
|
||
- [x] runtime executes a flow synchronously, returns final value
|
||
- [x] `lib/flow/api.sx` — `(flow/start flow input)` entry point
|
||
- [x] `lib/flow/tests/basic.sx` — 18 cases: single nodes, linear/nested sequence,
|
||
data flow between nodes, parallel-with-join, publish-shaped flow
|
||
- [x] `lib/flow/scoreboard.{json,md}`
|
||
- [x] `lib/flow/conformance.sh`
|
||
|
||
## Phase 2 — Control flow + error handling
|
||
|
||
- [x] `cond` combinator — predicate selects branch (named `branch`; `cond` is a
|
||
Scheme special form). `(branch pred then else)` — 6 tests.
|
||
- [x] `retry n` — re-runs node up to n attempts on a raised exception; last
|
||
exception propagates. Only raised exceptions are retried — `(fail ...)` values
|
||
pass through. 6 tests. (Backoff deferred: no wall clock in pure SX.)
|
||
- [x] `timeout budget` — bounds node execution via a **cooperative step budget**
|
||
(deterministic; no scheduler/clock in pure SX). Nodes opt in via `(tick)`;
|
||
`budget` ticks allowed, the next raises `flow-timeout`. Non-ticking nodes are
|
||
unbounded; budgets nest. 7 tests.
|
||
- [x] `try-catch` — exception handler with reified error: `(try-catch node handler)`
|
||
runs node; on raise, calls `(handler error)` and returns its value. 6 tests.
|
||
- [x] error model — exceptions vs explicit `(fail reason)` results: `fail`/`failed?`/
|
||
`fail-reason` produce/inspect failure values that flow downstream as data
|
||
(distinct from raised exceptions caught by retry/try-catch). 6 tests.
|
||
- [x] `lib/flow/tests/control.sx` — 31 cases: branch, error model, try-catch,
|
||
retry, timeout + compositions
|
||
|
||
## Phase 3 — Suspend / resume (the showcase)
|
||
|
||
- [x] `(suspend tag)` — guest call/cc is ESCAPE-ONLY (re-entry hangs), so resume
|
||
uses **deterministic replay**: suspend escapes to the driver as `(flow-suspended
|
||
tag)`; resume re-runs the flow, replaying resolved suspends from a `(tag value)`
|
||
log. No live continuation is ever serialized — the log is plain data.
|
||
- [x] `lib/flow/store.sx` — flow store: id→record `(flow input log status payload)`;
|
||
`flow-drive` runs a flow against a replay log.
|
||
- [x] `(flow/resume id value)` — append `(tag value)` to the log, re-drive; raw
|
||
result on completion, `(flow-suspended id tag)` on a further suspend.
|
||
- [x] `(flow/cancel id)` — mark cancelled; a later resume is rejected (stale replay
|
||
cannot wake a cancelled flow).
|
||
- [x] crash recovery — `flow-store-export` (procs nulled → plain data),
|
||
`flow-store-import!`, `flow-resumable-ids`. Records are name-keyed; resume
|
||
re-resolves the proc by name (defflow registers names), so a flow survives a
|
||
wiped store. `tests/recovery.sx`, 8 cases (export/wipe/import, resumable scan,
|
||
restart-at-every-step, replay-log survival).
|
||
- [x] `lib/flow/tests/suspend.sx` — 17 cases: start/resume/cancel, multi-step,
|
||
replay determinism, lifecycle guards, suspend-in-branch
|
||
- Harness: `flow-run` now reuses one env with a per-test reset (building the full
|
||
standard env 66× was too slow) — see `api.sx`.
|
||
|
||
## Phase 4 — Distributed nodes via fed-sx
|
||
|
||
- [x] `(remote-node addr fn)` — execute a node on a federation peer. Transport is
|
||
the fed-sx boundary, MOCKED via a peer registry (`flow-peer-register!`); raises
|
||
`flow-remote-unreachable` / `flow-remote-no-fn`. Composes with sequence, suspend,
|
||
retry. `tests/distributed.sx`, 7 cases.
|
||
- [ ] failure semantics — retry on different peer, fall through to local
|
||
- [ ] persistence across instances — flow state replicates via fed-sx
|
||
- [ ] handoff — flow started here can resume on a peer if the local instance is down
|
||
- [ ] `lib/flow/tests/distributed.sx` — federated flow scenarios (mock fed-sx in tests)
|
||
|
||
## Progress log
|
||
|
||
- **Phase 1 (combinators + sequential runtime).** Flow built as a Scheme prelude
|
||
loaded onto `scheme-standard-env`: a flow is a Scheme procedure `input -> output`,
|
||
so the whole flow runs inside the interpreter (sets up Phase 3 call/cc suspend).
|
||
Combinators `flow-node`/`flow-id`/`flow-const`/`sequence`/`parallel`/`defflow` in
|
||
`spec.sx`; `flow/start` + SX helpers (`flow-make-env`/`flow-run`) in `api.sx`.
|
||
18/18 in `tests/basic.sx`. Substrate constraints found: dotted rest params
|
||
`(a . rest)` and named `let` are unsupported in `lib/scheme/eval.sx`, so
|
||
combinators use `(lambda args ...)` variadics + top-level recursion. Scheme
|
||
strings come back boxed as `{:scm-string "..."}` — unwrap with `(get s :scm-string)`.
|
||
|
||
## Blockers
|
||
|
||
(none)
|