fed-sx-m2: resolve Blockers #4 — kernel routes now work over real HTTP
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 1m6s

Substrate fix: two-line change to lib/erlang/runtime.sx that lets
http-listen handler routes call gen_server:call without deadlocking.

  1. er-sched-step-alive!: pass :pending-args (when set) to the
     initial-fun call instead of always passing an empty list.
     Default behavior (no field) stays (list) — drop-in safe.

  2. er-bif-http-listen sx-handler: instead of er-apply-fun handler
     inline (which blows up on receive's er-suspend-marker because
     the connection thread has no scheduler step on its stack),
     create a real er-process with :initial-fun = handler and
     :pending-args = (list req-pl), then er-sched-run-all! to drain.
     Any receive (e.g. gen_server:call) suspends + resumes inside
     the SX scheduler frame the process owns. Read :exit-result
     for the response proplist; marshal back to SX dict.

Investigation arc (see plans/fed-sx-milestone-2.md Blockers #4 +
Progress log):
  - loops/fed-prims bf8d0bf2 diagnosed it as Erlang-substrate, not
    OCaml mutex (Pattern A wrong, Pattern B right but sketchy).
  - First Pattern B attempt failed: tried er-spawn-fun on a raw SX
    lambda, hit (er-fun? fv) gate. Connection-thread bisect
    pinpointed the exact line.
  - Real fix: use the existing er-fun (user's handler) directly,
    but feed it via :pending-args so step-alive's hardcoded
    (list) doesn't drop the request arg.

Acceptance:
  - new next/tests/smoke_kernel_route.sh: 6/6 over real HTTP
    (welcome /, /actors/alice, /actors/alice/outbox with
    gen_server-backed tip, /actors/alice/inbox, unknown-actor,
    via http_server:start(P, [{kernel, nx_kernel}])).
  - next/tests/http_server_tcp.sh: 5/5 (bumped wait_bound from
    30s to 180s — cold boot is slow under sibling-loop CPU load
    and the per-handler scheduler ramp adds a small margin).
  - Erlang conformance: 761/761.

Step 12's two-instance smoke test is now unblocked — its full
Follow / Accept / Note flow can layer on top of this kernel-route
surface. m2 plan updated.

Pre-existing httpc_request.sh flakiness ("Undefined symbol:
http-request" on the live-call epochs) reproduces WITHOUT this
change — see git stash A/B in the investigation. Unrelated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-07 20:04:19 +00:00
parent 600d292ba2
commit 03c32cda5f
4 changed files with 183 additions and 14 deletions

View File

@@ -731,7 +731,10 @@
0
(if
(= prev-k nil)
(er-apply-fun (er-proc-field pid :initial-fun) (list))
(er-apply-fun
(er-proc-field pid :initial-fun)
(let ((args (er-proc-field pid :pending-args)))
(cond (= args nil) (list) :else args)))
(do (er-proc-set! pid :continuation nil) (prev-k nil)))))
(let
((r (nth result-ref 0)))
@@ -1612,11 +1615,31 @@
;; 78eae9ef deleted them as dead because the BIF body
;; still referenced them — Blockers #1. This rewrite
;; threads through the live marshallers instead.)
;; Run the handler as a SCHEDULED er-process so any
;; `receive` (e.g. gen_server:call inside a kernel-aware
;; route) suspends and resumes inside the SX scheduler.
;; Without this, native http-listen invokes the handler
;; closure on a fresh OCaml thread that has no scheduler
;; frame, so the receive's er-suspend-marker propagates
;; out and the connection writes nothing — the Blockers
;; #4 deadlock the m2 loop observed.
;;
;; er-spawn-fun requires an er-fun (Erlang-AST-shaped
;; dict); handler IS one (created by user `fun (Req) ->
;; route(Req, Cfg) end`). To feed req-pl as the call
;; argument we stash it on the process record's
;; :pending-args field — er-sched-step-alive! reads it
;; on first step (the alternative was a host-closure-to-
;; er-fun wrapper, which needs AST construction).
((sx-handler
(fn (req-dict)
(let ((req-pl (er-request-dict-to-proplist req-dict)))
(let ((resp-pl (er-apply-fun handler (list req-pl))))
(er-proplist-to-dict resp-pl))))))
(let ((proc (er-proc-new! (er-env-new))))
(dict-set! proc :initial-fun handler)
(dict-set! proc :pending-args (list req-pl))
(er-sched-run-all!)
(let ((resp-pl (er-proc-field (get proc :pid) :exit-result)))
(er-proplist-to-dict resp-pl)))))))
(http-listen port sx-handler))))))
;; httpc:request/4(Url, Method, Headers, Body) - BRIEFING-EXCEPTION:

View File

@@ -72,9 +72,11 @@ HOLDPID=$!
SXPID=$!
rm -f "$FIFO" # both ends still hold open via the running procs
# Wait for the listener to bind (up to ~30s — boot takes ~10s).
# Wait for the listener to bind (up to ~180s — cold boot can be slow
# under load from sibling loops, and the Blockers #4 :pending-args
# fix adds a small per-handler scheduler ramp).
BOUND=""
for i in $(seq 1 60); do
for i in $(seq 1 360); do
if (exec 3<>/dev/tcp/127.0.0.1/$PORT) 2>/dev/null; then
exec 3<&-; exec 3>&-
BOUND="yes"

121
next/tests/smoke_kernel_route.sh Executable file
View File

@@ -0,0 +1,121 @@
#!/usr/bin/env bash
# next/tests/smoke_kernel_route.sh — m2 Blockers #4 unblock test.
#
# Proves a real HTTP listener over http:listen + http_server:start
# CAN now serve kernel-aware routes (the surface Blockers #4 made
# unreachable). Spins up a single sx_server instance, bootstraps an
# actor, starts http_server with {kernel, nx_kernel} in Cfg, and
# curls a route that fans through nx_kernel via gen_server:call.
#
# This is the kernel-route portion of Step 12's two-instance smoke
# test. The full two-instance flow (Follow + auto-accept + Note
# delivery) layers on top of this surface; this test is the
# load-bearing proof point that the underlying wiring works.
set -uo pipefail
cd "$(git rev-parse --show-toplevel)"
SX_SERVER="${SX_SERVER:-hosts/ocaml/_build/default/bin/sx_server.exe}"
if [ ! -x "$SX_SERVER" ]; then
SX_SERVER="/root/rose-ash/hosts/ocaml/_build/default/bin/sx_server.exe"
fi
if [ ! -x "$SX_SERVER" ]; then
echo "ERROR: sx_server.exe not found." >&2
exit 1
fi
VERBOSE="${1:-}"
PASS=0; FAIL=0; ERRORS=""
PORT=$(python3 -c 'import socket;s=socket.socket();s.bind(("127.0.0.1",0));print(s.getsockname()[1]);s.close()')
EF=$(mktemp); LOG=$(mktemp); FIFO=$(mktemp -u); mkfifo "$FIFO"
cleanup() {
for pid in ${SXP:-} ${HOLDP:-}; do
kill -KILL "$pid" 2>/dev/null || true
wait "$pid" 2>/dev/null || true
done
rm -f "$EF" "$LOG" "$FIFO"
}
trap cleanup EXIT
cat > "$EF" <<EPOCHS
(epoch 1)
(load "lib/erlang/tokenizer.sx")
(load "lib/erlang/parser.sx")
(load "lib/erlang/parser-core.sx")
(load "lib/erlang/parser-expr.sx")
(load "lib/erlang/parser-module.sx")
(load "lib/erlang/transpile.sx")
(load "lib/erlang/runtime.sx")
(load "lib/erlang/vm/dispatcher.sx")
(epoch 2)
(eval "(er-load-gen-server!)")
(eval "(get (erlang-load-module (file-read \"next/kernel/envelope.erl\")) :name)")
(eval "(get (erlang-load-module (file-read \"next/kernel/log.erl\")) :name)")
(eval "(get (erlang-load-module (file-read \"next/kernel/pipeline.erl\")) :name)")
(eval "(get (erlang-load-module (file-read \"next/kernel/term_codec.erl\")) :name)")
(eval "(get (erlang-load-module (file-read \"next/kernel/outbox.erl\")) :name)")
(eval "(get (erlang-load-module (file-read \"next/kernel/nx_kernel.erl\")) :name)")
(eval "(get (erlang-load-module (file-read \"next/kernel/http_server.erl\")) :name)")
(epoch 3)
(eval "(erlang-eval-ast \"AK = <<1,1,1,1>>, AKS = [{key_id,k1},{algorithm,ed25519},{value,AK}], AAS = [{public_keys,[[{id,k1},{created,0},{value,AK}]]}], nx_kernel:start_link(alice, AKS, AAS), http_server:start(${PORT}, [{kernel, nx_kernel}])\")")
EPOCHS
( cat "$EF"; sleep 900 ) > "$FIFO" &
HOLDP=$!
"$SX_SERVER" < "$FIFO" > "$LOG" 2>&1 &
SXP=$!
rm -f "$FIFO"
START=$(date +%s)
BOUND=
while [ $(($(date +%s) - START)) -lt 300 ]; do
if (exec 3<>/dev/tcp/127.0.0.1/$PORT) 2>/dev/null; then
exec 3<&-; exec 3>&-
BOUND="yes after $(($(date +%s) - START))s"
break
fi
sleep 1
done
if [ -z "$BOUND" ]; then
echo "FAIL: listener never bound on port $PORT"
echo "--- log tail ---"
tail -20 "$LOG"
exit 1
fi
[ "$VERBOSE" = "-v" ] && echo " ok listener up ($BOUND)"
check() {
local desc="$1" path="$2" needle="$3"
local resp
resp=$(curl -s --max-time 10 "http://127.0.0.1:$PORT$path" 2>/dev/null || echo "<curl-failed>")
if echo "$resp" | grep -qF -- "$needle"; then
PASS=$((PASS+1))
[ "$VERBOSE" = "-v" ] && echo " ok $desc"
else
FAIL=$((FAIL+1))
ERRORS+=" FAIL [$desc] expected '$needle' in resp: $(echo "$resp" | head -c 100)
"
fi
}
check "non-kernel welcome /" "/" "fed-sx kernel m1"
check "kernel-aware /actors/alice" "/actors/alice" "actor: alice"
check "kernel-aware /actors/alice/outbox" "/actors/alice/outbox" "outbox: alice"
check "kernel-aware /actors/alice/outbox tip" "/actors/alice/outbox" "tip: 0"
check "kernel-aware /actors/alice/inbox" "/actors/alice/inbox" "inbox: alice"
check "unknown actor /actors/zzz/outbox" "/actors/zzz/outbox" "outbox: zzz"
TOTAL=$((PASS+FAIL))
if [ $FAIL -eq 0 ]; then
echo "ok $PASS/$TOTAL next/tests/smoke_kernel_route.sh passed (port $PORT)"
else
echo "FAIL $PASS/$TOTAL passed, $FAIL failed:"
echo "$ERRORS"
if [ "$VERBOSE" = "-v" ]; then
echo "--- log tail ---"; tail -20 "$LOG"
fi
fi
[ $FAIL -eq 0 ]

View File

@@ -851,14 +851,23 @@ re-broadcast another actor's content to their own followers.
## Step 12 — Two-instance smoke test
**GATED on Blockers #4** (http-listen handler holds the SX runtime
mutex, deadlocking any `gen_server:call` from inside a route — see
Blockers section for verification + fix patterns). Without this,
the only request shapes that survive over real HTTP are the static /
capabilities / static-stub paths; every kernel-aware route hangs
indefinitely. The smoke test framework is sketched out (see the
withdrawn `smoke_federate.sh` in this loop's history at commit
`8d33d02f`'s tree state) but cannot exit 0 until Blockers #4 lifts.
**Blockers #4 RESOLVED 2026-06-07.** The substrate fix turned out
to be a two-line change in `lib/erlang/runtime.sx`: extend
`er-sched-step-alive!` to read `:pending-args` when present (was
hardcoded to `(list)`), and have `er-bif-http-listen`'s sx-handler
spawn the user handler as a real er-process with `:pending-args
(list req-pl)` instead of calling it inline. With this in place
any `receive` inside a kernel-aware route (e.g. `gen_server:call`)
suspends and resumes correctly inside the SX scheduler instead of
propagating out of the connection thread.
Verified by `next/tests/smoke_kernel_route.sh` (6/6, single-instance):
welcome `/`, `/actors/alice`, `/actors/alice/outbox` (gen_server-
backed, with `tip:` from kernel state), `/actors/alice/inbox`,
unknown-actor outbox — all serve over real HTTP through
`http_server:start` with `Cfg = [{kernel, nx_kernel}]`. The
full two-instance Follow / Accept / Note flow can layer on top
of this surface.
**The proof point.** `next/tests/smoke_federate.sh` spins up two kernel
instances on distinct ports, walks them through the full federation
@@ -1087,7 +1096,21 @@ proceed.
4. **`http-listen` handler holds the SX runtime mutex →
`gen_server:call` from inside an HTTP route deadlocks.** —
discovered during Step 12 prep. The native `http-listen`
~~discovered during Step 12 prep~~ **RESOLVED 2026-06-07**
by a two-line `lib/erlang/runtime.sx` change: extend
`er-sched-step-alive!` to read `:pending-args` when present
(was hardcoded to `(list)`), and rewrite
`er-bif-http-listen`'s sx-handler to spawn the user handler
as a real er-process with `:pending-args (list req-pl)`
instead of `er-apply-fun handler` inline. Any `receive`
inside a kernel-aware route now suspends + resumes inside
the SX scheduler. Verified via the new
`next/tests/smoke_kernel_route.sh` (6/6, single-instance
`http_server:start(P, [{kernel, nx_kernel}])` serves
welcome + `/actors/alice/outbox` with kernel-backed `tip:`
etc.). The full Pattern A vs Pattern B analysis below is
preserved for the audit trail. The original native
`http-listen`
primitive in `bin/sx_server.ml:735+` serialises handler calls
with `Mutex.lock mtx` / `Mutex.unlock mtx` so the SX runtime
isn't re-entered concurrently. The wrapped Erlang handler