spike: PERSISTENT next/ kernel is viable — unblocks RA-live + TA-live

The shared prerequisite for both live steps was: does a next/ kernel process hold gen_server state
(flow_store) across HTTP requests? Confirmed yes. plans/ra_kernel.erl is a minimal kernel
(flow_store + register the publish-digest flow, then a blocking http:listen that keeps the
er-scheduler + gen_server alive); plans/ra-kernel-spike.sh boots it as a background sx_server and
drives it with two SEPARATE curls: GET /start suspends instance 1, GET /resume resumes that SAME
live instance → done. So durable suspend→resume across requests works on a persistent kernel.

Design decision (per the discussion): chose the persistent-kernel path (B) over host-side replay-log
(A). B serves BOTH durability (RA) and federation (TA) on one fed-sx-native substrate and exposes the
full next/ kernel (projections, outbox, actor model); A only solves flow durability and mixes Erlang
into the host process. The er-scheduler-context bug (which kills an in-process kernel, option C) does
NOT bite a separate-process kernel — er-bif-http-listen spawns each handler in-scheduler, so
gen_server:call completes. Gotchas recorded: a blocking listener hangs any in-process
erlang-eval-ast (the kernel must be a dedicated TCP-driven process), and binary =:= is buggy (always
true) so routes must pattern-match paths as byte-list binaries.

RA-live + TA-live are now BUILD work (a real kernel service + the host as HTTP client + the actor
model), not research — the prerequisite is proven.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
2026-07-02 17:00:32 +00:00
parent 39e5f906f2
commit 836d32474f
3 changed files with 132 additions and 5 deletions

View File

@@ -264,11 +264,20 @@ the flow instance Id is the resume handle.
spawned processes across separate erlang-eval-ast invocations"). So a boot-per-call proves the
module (done), but TRUE async (suspend → return the request → resume LATER in another call) needs a
PERSISTENT next/ kernel PROCESS holding flow_store — the async boundary (DEBT #3) is deeper than
"off the request path". REMAINING: (a) stand up a long-lived next/ kernel (nx_kernel/http_server
already run persistently for TCP) that RA talks to; (b) wire a DURABLE behavior binding ({:erl-flow
"blog_digest" :needs (effect branch suspend)}) into the live publish engine, routed to RA via
select-runner; (c) the resumed completion re-enters via the transport inbound + behavior/pump.
The runner + marshalling + suspend/resume mechanics are all proven; this is process lifecycle + wiring.
"off the request path".
**PERSISTENT-KERNEL SPIKE PASSED 2026-07-02 (plans/ra-kernel-spike.sh + ra_kernel.erl).** A
background sx_server running `ra_kernel:start` (flow_store + a blocking http:listen keeps the
er-scheduler + gen_server alive) survives across HTTP requests: GET /start suspends instance 1, a
SEPARATE GET /resume resumes that SAME live instance → done. So a persistent kernel process IS
viable, and the er-scheduler-context fear does NOT bite (er-bif-http-listen spawns each handler
IN-scheduler, so gen_server:call completes). Gotchas: start blocking http:listen hangs any
in-process erlang-eval-ast (so the kernel is a DEDICATED process, driven over TCP, not epoch cmds);
binary =:= is buggy (always true) → dispatch paths by PATTERN (byte-list binaries), not =:=.
REMAINING for RA-live: (a) a real kernel module (flow + inbox/outbox routes) run as a persistent
service (its own container/placement); (b) the host's RA runner POSTs activities to it (start) +
the completion re-enters via the transport inbound + behavior/pump (resume); (c) a durable behavior
binding ({:erl-flow "blog_digest" :needs (effect branch suspend)}) routed to RA via select-runner.
The prerequisite is PROVEN; this is now build (kernel service + host HTTP client), not research.
## TA — the FED-SX TRANSPORT adapter ← federation proper
- [x] **TA TRANSPORT BUILT + the federation LOOP PROVEN 2026-07-02.** lib/host/ta.sx — a seam
@@ -316,6 +325,15 @@ covers everything until a DAG's cost/latency/placement forces the substrate.
activities), so business logic can change state, which federates, which triggers more flows.
## Progress log (newest first)
- 2026-07-02 — PERSISTENT-KERNEL SPIKE PASSED (plans/ra-kernel-spike.sh + ra_kernel.erl). The shared
prerequisite for RA-live + TA-live is REACHABLE: a background sx_server (flow_store + blocking
http:listen) holds gen_server state across HTTP requests — /start suspends instance 1, a separate
/resume resumes the SAME live instance → done. The er-scheduler-context fear doesn't bite (handlers
spawn in-scheduler). Chose the persistent-kernel path (B) over host-side replay-log (A) — it serves
BOTH durability + federation on one fed-sx-native substrate + gives the full next/ kernel. Gotchas:
a blocking listener hangs in-process erlang-eval-ast (kernel = a dedicated TCP-driven process);
binary =:= buggy → pattern-match paths. RA-live/TA-live are now BUILD (kernel service + host HTTP
client + actor model), not research. NEXT: build the real kernel service + wire the host as client.
- 2026-07-02 — TA TRANSPORT built + the federation LOOP proven (lib/host/ta.sx, ta 5/5). A seam
transport over a directional wire (serialization boundary; activities cross as SX-source). Proven
in-memory: A emits → wire → B pump → B's engine fires ITS behavior on A's activity (directional, no

68
plans/ra-kernel-spike.sh Executable file
View File

@@ -0,0 +1,68 @@
#!/usr/bin/env bash
# plans/ra-kernel-spike.sh — does a PERSISTENT next/ kernel hold flow_store across HTTP requests?
# Boots a background sx_server running ra_kernel:start (flow_store + a blocking http:listen), then
# drives it with TWO separate curls: /start (suspend instance 1) then /resume (resume instance 1).
# If /resume returns done, the gen_server persisted across requests → RA-live + TA-live are unblocked.
set -uo pipefail
cd "$(git rev-parse --show-toplevel)"
SX_SERVER="${SX_SERVER:-hosts/ocaml/_build/default/bin/sx_server.exe}"
[ -x "$SX_SERVER" ] || SX_SERVER="/root/rose-ash/hosts/ocaml/_build/default/bin/sx_server.exe"
PORT=51877
EPOCH_FILE=$(mktemp); LOG_FILE=$(mktemp)
cleanup() {
[ -n "${SXPID:-}" ] && { kill -KILL "$SXPID" 2>/dev/null || true; wait "$SXPID" 2>/dev/null || true; }
[ -n "${HOLDPID:-}" ] && { kill -KILL "$HOLDPID" 2>/dev/null || true; wait "$HOLDPID" 2>/dev/null || true; }
rm -f "$EPOCH_FILE" "$LOG_FILE"
}
trap cleanup EXIT
cat > "$EPOCH_FILE" <<EPOCHS
(epoch 1)
(load "lib/erlang/tokenizer.sx")
(load "lib/erlang/parser.sx")
(load "lib/erlang/parser-core.sx")
(load "lib/erlang/parser-expr.sx")
(load "lib/erlang/parser-module.sx")
(load "lib/erlang/transpile.sx")
(load "lib/erlang/runtime.sx")
(load "lib/erlang/vm/dispatcher.sx")
(epoch 2)
(eval "(er-load-gen-server!)")
(eval "(get (erlang-load-module (file-read \"next/kernel/envelope.erl\")) :name)")
(eval "(get (erlang-load-module (file-read \"next/flow/flow.erl\")) :name)")
(eval "(get (erlang-load-module (file-read \"next/flow/flow_spec.erl\")) :name)")
(eval "(get (erlang-load-module (file-read \"next/flow/flow_store.erl\")) :name)")
(eval "(get (erlang-load-module (file-read \"next/flow/flows/blog_publish_digest.erl\")) :name)")
(eval "(get (erlang-load-module (file-read \"plans/ra_kernel.erl\")) :name)")
(epoch 3)
(eval "(erlang-eval-ast \"ra_kernel:start($PORT)\")")
EPOCHS
FIFO=$(mktemp -u); mkfifo "$FIFO"
( cat "$EPOCH_FILE"; sleep 120 ) > "$FIFO" &
HOLDPID=$!
"$SX_SERVER" < "$FIFO" > "$LOG_FILE" 2>&1 &
SXPID=$!
rm -f "$FIFO"
echo "── waiting for the kernel to bind :$PORT ──"
BOUND=""
for i in $(seq 1 240); do
if (exec 3<>/dev/tcp/127.0.0.1/$PORT) 2>/dev/null; then BOUND=1; exec 3>&- 3<&-; echo "bound (iter $i)"; break; fi
sleep 1
done
if [ -z "$BOUND" ]; then echo "FAIL: never bound"; echo "--- log ---"; tail -20 "$LOG_FILE"; exit 1; fi
echo "── request 1: GET /start (creates instance 1, suspends) ──"
R1=$(curl -s -m 8 "http://127.0.0.1:$PORT/start")
echo " /start → $R1"
echo "── request 2 (SEPARATE): GET /resume (must hit the SAME live instance 1) ──"
R2=$(curl -s -m 8 "http://127.0.0.1:$PORT/resume")
echo " /resume → $R2"
echo "─────────────────────────────────────────────────────"
if echo "$R1" | grep -q "start:suspended" && echo "$R2" | grep -q "resume:done"; then
echo "PASS — flow_store PERSISTED across requests. Persistent kernel is VIABLE."
else
echo "FAIL — R1='$R1' R2='$R2'"; echo "--- log tail ---"; tail -20 "$LOG_FILE"
fi

41
plans/ra_kernel.erl Normal file
View File

@@ -0,0 +1,41 @@
%% plans/ra_kernel.erl — RA/TA persistent-kernel SPIKE.
%% A minimal long-lived next/ kernel: starts flow_store + registers the publish-digest flow, then
%% blocks in http:listen — keeping the er-scheduler (and flow_store's gen_server) alive across
%% requests. Two routes drive flow_store over HTTP: GET /start (start a newsletter flow → suspend)
%% and GET /resume (resume instance 1 → done). If /resume completes in a SEPARATE request from
%% /start, the gen_server persisted across requests — the persistent-kernel prerequisite for
%% RA-live + TA-live holds.
-module(ra_kernel).
-export([start/1]).
start(Port) ->
flow_store:start_link(),
FF = fun (_) -> [f1, f2, f3] end,
flow_store:register_flow(bd, blog_publish_digest:build([{fetch_followers, FF}])),
http:listen(Port, fun (Req) -> route(Req) end).
route(Req) ->
[{status, 200}, {headers, []}, {body, respond(field(path, Req))}].
%% /start (bytes 47,115,116,97,114,116)
respond(<<47,115,116,97,114,116>>) ->
Env = [{activity, [{type, create}, {actor, alice}, {id, <<110,49>>},
{object, [{type, article}, {category, newsletter}]}]},
{actor, alice}],
case flow_store:start(bd, Env) of
{ok, _Id, {flow_suspended, _}} -> <<"start:suspended">>;
{ok, _Id, {flow_done, _}} -> <<"start:done">>;
_ -> <<"start:other">>
end;
%% /resume (bytes 47,114,101,115,117,109,101)
respond(<<47,114,101,115,117,109,101>>) ->
case flow_store:resume(1, morning_ts) of
{ok, {flow_done, _}} -> <<"resume:done">>;
{flow_done, _} -> <<"resume:done">>;
_ -> <<"resume:other">>
end;
respond(_) -> <<"path:unknown">>.
field(K, [{K, V} | _]) -> V;
field(K, [_ | Rest]) -> field(K, Rest);
field(_, []) -> nil.