Files
rose-ash/plans/flow-on-sx.md
giles f8722b3b08
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 45s
flow: remote-failover — try peers in order, fall through to local + 6 tests
(remote-failover addrs fn local) tries fn on each peer in order, moves to the next
on any raised error, and runs the local node if every peer fails. Threads input,
composes in sequences.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 17:44:04 +00:00

7.6 KiB
Raw Blame History

flow-on-sx: Durable DAG Workflows on Scheme

rose-ash needs workflows that survive restarts: content pipelines (write → review → publish → federate), scheduled jobs (digest emails), multi-step user flows (signup, confirm, onboard). art-dag is the precedent — DAG-of-tasks with pause/resume at IO boundaries.

Scheme's call/cc + delimited continuations make pause/resume natural: a suspend captures the continuation, serializes it as part of the flow record, and resume re-enters at exactly that point. No state-machine bookkeeping by hand. R7RS-small is already at 2644/2644 (see kernel/architecture status).

End-state: a Scheme-on-SX layer over the existing scheme runtime, with combinators for sequence/parallel/branch/retry/timeout/suspend, persistent flow store, and a federation extension via fed-sx for remote-node execution.

Status (rolling)

bash lib/flow/conformance.sh87/87 (Phases 1-3 done; Phase 4 in progress: remote-node + failover done)

Ground rules

  • Scope: only touch lib/flow/** and plans/flow-on-sx.md. Do not edit spec/, hosts/, shared/, lib/scheme/**, or other lib/<lang>/. You may import from lib/scheme/ (public API via lib/scheme/scheme.sx); do not modify Scheme.
  • Shared-file issues go under "Blockers" with a minimal repro; do not fix here.
  • SX files: use sx-tree MCP tools only.
  • Architecture: flow combinators are Scheme macros + procedures. Runtime is a driver loop that walks the flow graph and invokes call/cc at suspend points. Persistence layer serializes the continuation + open file/socket placeholders are forbidden (continuations must be resumable across process restart).
  • art-dag awareness: read plans/art-dag* if it exists for design lineage; do not import code.
  • Commits: one feature per commit. Keep Progress log updated and tick boxes.

Architecture sketch

(defflow publish
  (sequence
    (write-content)
    (parallel
      (review)
      (spell-check))
    (cond approved?
      (sequence (publish) (federate))
      (notify-author))))
        │
        ▼
lib/flow/spec.sx           lib/flow/runtime.sx          lib/flow/store.sx
  — defflow                  — driver loop                 — append-only flow log
  — sequence/parallel        — node dispatch               — checkpoint serialize
  — cond/retry/timeout       — call/cc at suspend          — restart loader
  — suspend/resume                  │                            │
                                    ▼                            ▼
                            lib/flow/api.sx              lib/flow/remote.sx
                              — (flow/start name args)     — fed-sx adapter
                              — (flow/resume id value)     — node-on-peer execution
                              — (flow/cancel id)           — failure handling

Phase 1 — Declarative DAG + sequential execution

  • lib/flow/spec.sxdefflow macro, sequence combinator
  • node = Scheme procedure of one arg (upstream value threaded in); output threads to next node (data flow). A node ignoring its arg is a thunk.
  • parallel combinator (sequential semantics for now — TRUE parallelism in Phase 3)
  • runtime executes a flow synchronously, returns final value
  • lib/flow/api.sx(flow/start flow input) entry point
  • lib/flow/tests/basic.sx — 18 cases: single nodes, linear/nested sequence, data flow between nodes, parallel-with-join, publish-shaped flow
  • lib/flow/scoreboard.{json,md}
  • lib/flow/conformance.sh

Phase 2 — Control flow + error handling

  • cond combinator — predicate selects branch (named branch; cond is a Scheme special form). (branch pred then else) — 6 tests.
  • retry n — re-runs node up to n attempts on a raised exception; last exception propagates. Only raised exceptions are retried — (fail ...) values pass through. 6 tests. (Backoff deferred: no wall clock in pure SX.)
  • timeout budget — bounds node execution via a cooperative step budget (deterministic; no scheduler/clock in pure SX). Nodes opt in via (tick); budget ticks allowed, the next raises flow-timeout. Non-ticking nodes are unbounded; budgets nest. 7 tests.
  • try-catch — exception handler with reified error: (try-catch node handler) runs node; on raise, calls (handler error) and returns its value. 6 tests.
  • error model — exceptions vs explicit (fail reason) results: fail/failed?/ fail-reason produce/inspect failure values that flow downstream as data (distinct from raised exceptions caught by retry/try-catch). 6 tests.
  • lib/flow/tests/control.sx — 31 cases: branch, error model, try-catch, retry, timeout + compositions

Phase 3 — Suspend / resume (the showcase)

  • (suspend tag) — guest call/cc is ESCAPE-ONLY (re-entry hangs), so resume uses deterministic replay: suspend escapes to the driver as (flow-suspended tag); resume re-runs the flow, replaying resolved suspends from a (tag value) log. No live continuation is ever serialized — the log is plain data.
  • lib/flow/store.sx — flow store: id→record (flow input log status payload); flow-drive runs a flow against a replay log.
  • (flow/resume id value) — append (tag value) to the log, re-drive; raw result on completion, (flow-suspended id tag) on a further suspend.
  • (flow/cancel id) — mark cancelled; a later resume is rejected (stale replay cannot wake a cancelled flow).
  • crash recovery — flow-store-export (procs nulled → plain data), flow-store-import!, flow-resumable-ids. Records are name-keyed; resume re-resolves the proc by name (defflow registers names), so a flow survives a wiped store. tests/recovery.sx, 8 cases (export/wipe/import, resumable scan, restart-at-every-step, replay-log survival).
  • lib/flow/tests/suspend.sx — 17 cases: start/resume/cancel, multi-step, replay determinism, lifecycle guards, suspend-in-branch
  • Harness: flow-run now reuses one env with a per-test reset (building the full standard env 66× was too slow) — see api.sx.

Phase 4 — Distributed nodes via fed-sx

  • (remote-node addr fn) — execute a node on a federation peer. Transport is the fed-sx boundary, MOCKED via a peer registry (flow-peer-register!); raises flow-remote-unreachable / flow-remote-no-fn. Composes with sequence, suspend, retry. tests/distributed.sx, 7 cases.
  • failure semantics — (remote-failover addrs fn local) tries each peer in order, moves to the next on any raised error, and runs the local node if every peer fails. 6 tests.
  • persistence across instances — flow state replicates via fed-sx
  • handoff — flow started here can resume on a peer if the local instance is down
  • lib/flow/tests/distributed.sx — federated flow scenarios (mock fed-sx in tests)

Progress log

  • Phase 1 (combinators + sequential runtime). Flow built as a Scheme prelude loaded onto scheme-standard-env: a flow is a Scheme procedure input -> output, so the whole flow runs inside the interpreter (sets up Phase 3 call/cc suspend). Combinators flow-node/flow-id/flow-const/sequence/parallel/defflow in spec.sx; flow/start + SX helpers (flow-make-env/flow-run) in api.sx. 18/18 in tests/basic.sx. Substrate constraints found: dotted rest params (a . rest) and named let are unsupported in lib/scheme/eval.sx, so combinators use (lambda args ...) variadics + top-level recursion. Scheme strings come back boxed as {:scm-string "..."} — unwrap with (get s :scm-string).

Blockers

(none)