Files
rose-ash/plans/flow-on-sx.md
giles 97c7623743
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 58s
flow: crash recovery — store export/import + resumable scan + 8 tests (Phase 3 complete)
Records are name-keyed (defflow registers names); flow-store-export nulls live
procs to plain data, flow-store-import! restores, flow-resumable-ids scans for
paused flows. Resume re-resolves the proc by name, so a flow survives a wiped
store (simulated restart). The whole durable model persists only plain data.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 17:25:47 +00:00

7.3 KiB
Raw Blame History

flow-on-sx: Durable DAG Workflows on Scheme

rose-ash needs workflows that survive restarts: content pipelines (write → review → publish → federate), scheduled jobs (digest emails), multi-step user flows (signup, confirm, onboard). art-dag is the precedent — DAG-of-tasks with pause/resume at IO boundaries.

Scheme's call/cc + delimited continuations make pause/resume natural: a suspend captures the continuation, serializes it as part of the flow record, and resume re-enters at exactly that point. No state-machine bookkeeping by hand. R7RS-small is already at 2644/2644 (see kernel/architecture status).

End-state: a Scheme-on-SX layer over the existing scheme runtime, with combinators for sequence/parallel/branch/retry/timeout/suspend, persistent flow store, and a federation extension via fed-sx for remote-node execution.

Status (rolling)

bash lib/flow/conformance.sh74/74 (Phases 1-3 done; Phase 4 fed-sx next)

Ground rules

  • Scope: only touch lib/flow/** and plans/flow-on-sx.md. Do not edit spec/, hosts/, shared/, lib/scheme/**, or other lib/<lang>/. You may import from lib/scheme/ (public API via lib/scheme/scheme.sx); do not modify Scheme.
  • Shared-file issues go under "Blockers" with a minimal repro; do not fix here.
  • SX files: use sx-tree MCP tools only.
  • Architecture: flow combinators are Scheme macros + procedures. Runtime is a driver loop that walks the flow graph and invokes call/cc at suspend points. Persistence layer serializes the continuation + open file/socket placeholders are forbidden (continuations must be resumable across process restart).
  • art-dag awareness: read plans/art-dag* if it exists for design lineage; do not import code.
  • Commits: one feature per commit. Keep Progress log updated and tick boxes.

Architecture sketch

(defflow publish
  (sequence
    (write-content)
    (parallel
      (review)
      (spell-check))
    (cond approved?
      (sequence (publish) (federate))
      (notify-author))))
        │
        ▼
lib/flow/spec.sx           lib/flow/runtime.sx          lib/flow/store.sx
  — defflow                  — driver loop                 — append-only flow log
  — sequence/parallel        — node dispatch               — checkpoint serialize
  — cond/retry/timeout       — call/cc at suspend          — restart loader
  — suspend/resume                  │                            │
                                    ▼                            ▼
                            lib/flow/api.sx              lib/flow/remote.sx
                              — (flow/start name args)     — fed-sx adapter
                              — (flow/resume id value)     — node-on-peer execution
                              — (flow/cancel id)           — failure handling

Phase 1 — Declarative DAG + sequential execution

  • lib/flow/spec.sxdefflow macro, sequence combinator
  • node = Scheme procedure of one arg (upstream value threaded in); output threads to next node (data flow). A node ignoring its arg is a thunk.
  • parallel combinator (sequential semantics for now — TRUE parallelism in Phase 3)
  • runtime executes a flow synchronously, returns final value
  • lib/flow/api.sx(flow/start flow input) entry point
  • lib/flow/tests/basic.sx — 18 cases: single nodes, linear/nested sequence, data flow between nodes, parallel-with-join, publish-shaped flow
  • lib/flow/scoreboard.{json,md}
  • lib/flow/conformance.sh

Phase 2 — Control flow + error handling

  • cond combinator — predicate selects branch (named branch; cond is a Scheme special form). (branch pred then else) — 6 tests.
  • retry n — re-runs node up to n attempts on a raised exception; last exception propagates. Only raised exceptions are retried — (fail ...) values pass through. 6 tests. (Backoff deferred: no wall clock in pure SX.)
  • timeout budget — bounds node execution via a cooperative step budget (deterministic; no scheduler/clock in pure SX). Nodes opt in via (tick); budget ticks allowed, the next raises flow-timeout. Non-ticking nodes are unbounded; budgets nest. 7 tests.
  • try-catch — exception handler with reified error: (try-catch node handler) runs node; on raise, calls (handler error) and returns its value. 6 tests.
  • error model — exceptions vs explicit (fail reason) results: fail/failed?/ fail-reason produce/inspect failure values that flow downstream as data (distinct from raised exceptions caught by retry/try-catch). 6 tests.
  • lib/flow/tests/control.sx — 31 cases: branch, error model, try-catch, retry, timeout + compositions

Phase 3 — Suspend / resume (the showcase)

  • (suspend tag) — guest call/cc is ESCAPE-ONLY (re-entry hangs), so resume uses deterministic replay: suspend escapes to the driver as (flow-suspended tag); resume re-runs the flow, replaying resolved suspends from a (tag value) log. No live continuation is ever serialized — the log is plain data.
  • lib/flow/store.sx — flow store: id→record (flow input log status payload); flow-drive runs a flow against a replay log.
  • (flow/resume id value) — append (tag value) to the log, re-drive; raw result on completion, (flow-suspended id tag) on a further suspend.
  • (flow/cancel id) — mark cancelled; a later resume is rejected (stale replay cannot wake a cancelled flow).
  • crash recovery — flow-store-export (procs nulled → plain data), flow-store-import!, flow-resumable-ids. Records are name-keyed; resume re-resolves the proc by name (defflow registers names), so a flow survives a wiped store. tests/recovery.sx, 8 cases (export/wipe/import, resumable scan, restart-at-every-step, replay-log survival).
  • lib/flow/tests/suspend.sx — 17 cases: start/resume/cancel, multi-step, replay determinism, lifecycle guards, suspend-in-branch
  • Harness: flow-run now reuses one env with a per-test reset (building the full standard env 66× was too slow) — see api.sx.

Phase 4 — Distributed nodes via fed-sx

  • (remote-node addr fn args) — execute node on a federation peer
  • failure semantics — retry on different peer, fall through to local
  • persistence across instances — flow state replicates via fed-sx
  • handoff — flow started here can resume on a peer if the local instance is down
  • lib/flow/tests/distributed.sx — federated flow scenarios (mock fed-sx in tests)

Progress log

  • Phase 1 (combinators + sequential runtime). Flow built as a Scheme prelude loaded onto scheme-standard-env: a flow is a Scheme procedure input -> output, so the whole flow runs inside the interpreter (sets up Phase 3 call/cc suspend). Combinators flow-node/flow-id/flow-const/sequence/parallel/defflow in spec.sx; flow/start + SX helpers (flow-make-env/flow-run) in api.sx. 18/18 in tests/basic.sx. Substrate constraints found: dotted rest params (a . rest) and named let are unsupported in lib/scheme/eval.sx, so combinators use (lambda args ...) variadics + top-level recursion. Scheme strings come back boxed as {:scm-string "..."} — unwrap with (get s :scm-string).

Blockers

(none)