fed-sx-m2: resolve Blockers #4 — kernel routes now work over real HTTP
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 1m6s
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 1m6s
Substrate fix: two-line change to lib/erlang/runtime.sx that lets
http-listen handler routes call gen_server:call without deadlocking.
1. er-sched-step-alive!: pass :pending-args (when set) to the
initial-fun call instead of always passing an empty list.
Default behavior (no field) stays (list) — drop-in safe.
2. er-bif-http-listen sx-handler: instead of er-apply-fun handler
inline (which blows up on receive's er-suspend-marker because
the connection thread has no scheduler step on its stack),
create a real er-process with :initial-fun = handler and
:pending-args = (list req-pl), then er-sched-run-all! to drain.
Any receive (e.g. gen_server:call) suspends + resumes inside
the SX scheduler frame the process owns. Read :exit-result
for the response proplist; marshal back to SX dict.
Investigation arc (see plans/fed-sx-milestone-2.md Blockers #4 +
Progress log):
- loops/fed-prims bf8d0bf2 diagnosed it as Erlang-substrate, not
OCaml mutex (Pattern A wrong, Pattern B right but sketchy).
- First Pattern B attempt failed: tried er-spawn-fun on a raw SX
lambda, hit (er-fun? fv) gate. Connection-thread bisect
pinpointed the exact line.
- Real fix: use the existing er-fun (user's handler) directly,
but feed it via :pending-args so step-alive's hardcoded
(list) doesn't drop the request arg.
Acceptance:
- new next/tests/smoke_kernel_route.sh: 6/6 over real HTTP
(welcome /, /actors/alice, /actors/alice/outbox with
gen_server-backed tip, /actors/alice/inbox, unknown-actor,
via http_server:start(P, [{kernel, nx_kernel}])).
- next/tests/http_server_tcp.sh: 5/5 (bumped wait_bound from
30s to 180s — cold boot is slow under sibling-loop CPU load
and the per-handler scheduler ramp adds a small margin).
- Erlang conformance: 761/761.
Step 12's two-instance smoke test is now unblocked — its full
Follow / Accept / Note flow can layer on top of this kernel-route
surface. m2 plan updated.
Pre-existing httpc_request.sh flakiness ("Undefined symbol:
http-request" on the live-call epochs) reproduces WITHOUT this
change — see git stash A/B in the investigation. Unrelated.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -731,7 +731,10 @@
|
||||
0
|
||||
(if
|
||||
(= prev-k nil)
|
||||
(er-apply-fun (er-proc-field pid :initial-fun) (list))
|
||||
(er-apply-fun
|
||||
(er-proc-field pid :initial-fun)
|
||||
(let ((args (er-proc-field pid :pending-args)))
|
||||
(cond (= args nil) (list) :else args)))
|
||||
(do (er-proc-set! pid :continuation nil) (prev-k nil)))))
|
||||
(let
|
||||
((r (nth result-ref 0)))
|
||||
@@ -1612,11 +1615,31 @@
|
||||
;; 78eae9ef deleted them as dead because the BIF body
|
||||
;; still referenced them — Blockers #1. This rewrite
|
||||
;; threads through the live marshallers instead.)
|
||||
;; Run the handler as a SCHEDULED er-process so any
|
||||
;; `receive` (e.g. gen_server:call inside a kernel-aware
|
||||
;; route) suspends and resumes inside the SX scheduler.
|
||||
;; Without this, native http-listen invokes the handler
|
||||
;; closure on a fresh OCaml thread that has no scheduler
|
||||
;; frame, so the receive's er-suspend-marker propagates
|
||||
;; out and the connection writes nothing — the Blockers
|
||||
;; #4 deadlock the m2 loop observed.
|
||||
;;
|
||||
;; er-spawn-fun requires an er-fun (Erlang-AST-shaped
|
||||
;; dict); handler IS one (created by user `fun (Req) ->
|
||||
;; route(Req, Cfg) end`). To feed req-pl as the call
|
||||
;; argument we stash it on the process record's
|
||||
;; :pending-args field — er-sched-step-alive! reads it
|
||||
;; on first step (the alternative was a host-closure-to-
|
||||
;; er-fun wrapper, which needs AST construction).
|
||||
((sx-handler
|
||||
(fn (req-dict)
|
||||
(let ((req-pl (er-request-dict-to-proplist req-dict)))
|
||||
(let ((resp-pl (er-apply-fun handler (list req-pl))))
|
||||
(er-proplist-to-dict resp-pl))))))
|
||||
(let ((proc (er-proc-new! (er-env-new))))
|
||||
(dict-set! proc :initial-fun handler)
|
||||
(dict-set! proc :pending-args (list req-pl))
|
||||
(er-sched-run-all!)
|
||||
(let ((resp-pl (er-proc-field (get proc :pid) :exit-result)))
|
||||
(er-proplist-to-dict resp-pl)))))))
|
||||
(http-listen port sx-handler))))))
|
||||
|
||||
;; httpc:request/4(Url, Method, Headers, Body) - BRIEFING-EXCEPTION:
|
||||
|
||||
@@ -72,9 +72,11 @@ HOLDPID=$!
|
||||
SXPID=$!
|
||||
rm -f "$FIFO" # both ends still hold open via the running procs
|
||||
|
||||
# Wait for the listener to bind (up to ~30s — boot takes ~10s).
|
||||
# Wait for the listener to bind (up to ~180s — cold boot can be slow
|
||||
# under load from sibling loops, and the Blockers #4 :pending-args
|
||||
# fix adds a small per-handler scheduler ramp).
|
||||
BOUND=""
|
||||
for i in $(seq 1 60); do
|
||||
for i in $(seq 1 360); do
|
||||
if (exec 3<>/dev/tcp/127.0.0.1/$PORT) 2>/dev/null; then
|
||||
exec 3<&-; exec 3>&-
|
||||
BOUND="yes"
|
||||
|
||||
121
next/tests/smoke_kernel_route.sh
Executable file
121
next/tests/smoke_kernel_route.sh
Executable file
@@ -0,0 +1,121 @@
|
||||
#!/usr/bin/env bash
|
||||
# next/tests/smoke_kernel_route.sh — m2 Blockers #4 unblock test.
|
||||
#
|
||||
# Proves a real HTTP listener over http:listen + http_server:start
|
||||
# CAN now serve kernel-aware routes (the surface Blockers #4 made
|
||||
# unreachable). Spins up a single sx_server instance, bootstraps an
|
||||
# actor, starts http_server with {kernel, nx_kernel} in Cfg, and
|
||||
# curls a route that fans through nx_kernel via gen_server:call.
|
||||
#
|
||||
# This is the kernel-route portion of Step 12's two-instance smoke
|
||||
# test. The full two-instance flow (Follow + auto-accept + Note
|
||||
# delivery) layers on top of this surface; this test is the
|
||||
# load-bearing proof point that the underlying wiring works.
|
||||
|
||||
set -uo pipefail
|
||||
cd "$(git rev-parse --show-toplevel)"
|
||||
|
||||
SX_SERVER="${SX_SERVER:-hosts/ocaml/_build/default/bin/sx_server.exe}"
|
||||
if [ ! -x "$SX_SERVER" ]; then
|
||||
SX_SERVER="/root/rose-ash/hosts/ocaml/_build/default/bin/sx_server.exe"
|
||||
fi
|
||||
if [ ! -x "$SX_SERVER" ]; then
|
||||
echo "ERROR: sx_server.exe not found." >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
VERBOSE="${1:-}"
|
||||
PASS=0; FAIL=0; ERRORS=""
|
||||
|
||||
PORT=$(python3 -c 'import socket;s=socket.socket();s.bind(("127.0.0.1",0));print(s.getsockname()[1]);s.close()')
|
||||
EF=$(mktemp); LOG=$(mktemp); FIFO=$(mktemp -u); mkfifo "$FIFO"
|
||||
cleanup() {
|
||||
for pid in ${SXP:-} ${HOLDP:-}; do
|
||||
kill -KILL "$pid" 2>/dev/null || true
|
||||
wait "$pid" 2>/dev/null || true
|
||||
done
|
||||
rm -f "$EF" "$LOG" "$FIFO"
|
||||
}
|
||||
trap cleanup EXIT
|
||||
|
||||
cat > "$EF" <<EPOCHS
|
||||
(epoch 1)
|
||||
(load "lib/erlang/tokenizer.sx")
|
||||
(load "lib/erlang/parser.sx")
|
||||
(load "lib/erlang/parser-core.sx")
|
||||
(load "lib/erlang/parser-expr.sx")
|
||||
(load "lib/erlang/parser-module.sx")
|
||||
(load "lib/erlang/transpile.sx")
|
||||
(load "lib/erlang/runtime.sx")
|
||||
(load "lib/erlang/vm/dispatcher.sx")
|
||||
(epoch 2)
|
||||
(eval "(er-load-gen-server!)")
|
||||
(eval "(get (erlang-load-module (file-read \"next/kernel/envelope.erl\")) :name)")
|
||||
(eval "(get (erlang-load-module (file-read \"next/kernel/log.erl\")) :name)")
|
||||
(eval "(get (erlang-load-module (file-read \"next/kernel/pipeline.erl\")) :name)")
|
||||
(eval "(get (erlang-load-module (file-read \"next/kernel/term_codec.erl\")) :name)")
|
||||
(eval "(get (erlang-load-module (file-read \"next/kernel/outbox.erl\")) :name)")
|
||||
(eval "(get (erlang-load-module (file-read \"next/kernel/nx_kernel.erl\")) :name)")
|
||||
(eval "(get (erlang-load-module (file-read \"next/kernel/http_server.erl\")) :name)")
|
||||
(epoch 3)
|
||||
(eval "(erlang-eval-ast \"AK = <<1,1,1,1>>, AKS = [{key_id,k1},{algorithm,ed25519},{value,AK}], AAS = [{public_keys,[[{id,k1},{created,0},{value,AK}]]}], nx_kernel:start_link(alice, AKS, AAS), http_server:start(${PORT}, [{kernel, nx_kernel}])\")")
|
||||
EPOCHS
|
||||
|
||||
( cat "$EF"; sleep 900 ) > "$FIFO" &
|
||||
HOLDP=$!
|
||||
"$SX_SERVER" < "$FIFO" > "$LOG" 2>&1 &
|
||||
SXP=$!
|
||||
rm -f "$FIFO"
|
||||
|
||||
START=$(date +%s)
|
||||
BOUND=
|
||||
while [ $(($(date +%s) - START)) -lt 300 ]; do
|
||||
if (exec 3<>/dev/tcp/127.0.0.1/$PORT) 2>/dev/null; then
|
||||
exec 3<&-; exec 3>&-
|
||||
BOUND="yes after $(($(date +%s) - START))s"
|
||||
break
|
||||
fi
|
||||
sleep 1
|
||||
done
|
||||
|
||||
if [ -z "$BOUND" ]; then
|
||||
echo "FAIL: listener never bound on port $PORT"
|
||||
echo "--- log tail ---"
|
||||
tail -20 "$LOG"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
[ "$VERBOSE" = "-v" ] && echo " ok listener up ($BOUND)"
|
||||
|
||||
check() {
|
||||
local desc="$1" path="$2" needle="$3"
|
||||
local resp
|
||||
resp=$(curl -s --max-time 10 "http://127.0.0.1:$PORT$path" 2>/dev/null || echo "<curl-failed>")
|
||||
if echo "$resp" | grep -qF -- "$needle"; then
|
||||
PASS=$((PASS+1))
|
||||
[ "$VERBOSE" = "-v" ] && echo " ok $desc"
|
||||
else
|
||||
FAIL=$((FAIL+1))
|
||||
ERRORS+=" FAIL [$desc] expected '$needle' in resp: $(echo "$resp" | head -c 100)
|
||||
"
|
||||
fi
|
||||
}
|
||||
|
||||
check "non-kernel welcome /" "/" "fed-sx kernel m1"
|
||||
check "kernel-aware /actors/alice" "/actors/alice" "actor: alice"
|
||||
check "kernel-aware /actors/alice/outbox" "/actors/alice/outbox" "outbox: alice"
|
||||
check "kernel-aware /actors/alice/outbox tip" "/actors/alice/outbox" "tip: 0"
|
||||
check "kernel-aware /actors/alice/inbox" "/actors/alice/inbox" "inbox: alice"
|
||||
check "unknown actor /actors/zzz/outbox" "/actors/zzz/outbox" "outbox: zzz"
|
||||
|
||||
TOTAL=$((PASS+FAIL))
|
||||
if [ $FAIL -eq 0 ]; then
|
||||
echo "ok $PASS/$TOTAL next/tests/smoke_kernel_route.sh passed (port $PORT)"
|
||||
else
|
||||
echo "FAIL $PASS/$TOTAL passed, $FAIL failed:"
|
||||
echo "$ERRORS"
|
||||
if [ "$VERBOSE" = "-v" ]; then
|
||||
echo "--- log tail ---"; tail -20 "$LOG"
|
||||
fi
|
||||
fi
|
||||
[ $FAIL -eq 0 ]
|
||||
@@ -851,14 +851,23 @@ re-broadcast another actor's content to their own followers.
|
||||
|
||||
## Step 12 — Two-instance smoke test
|
||||
|
||||
**GATED on Blockers #4** (http-listen handler holds the SX runtime
|
||||
mutex, deadlocking any `gen_server:call` from inside a route — see
|
||||
Blockers section for verification + fix patterns). Without this,
|
||||
the only request shapes that survive over real HTTP are the static /
|
||||
capabilities / static-stub paths; every kernel-aware route hangs
|
||||
indefinitely. The smoke test framework is sketched out (see the
|
||||
withdrawn `smoke_federate.sh` in this loop's history at commit
|
||||
`8d33d02f`'s tree state) but cannot exit 0 until Blockers #4 lifts.
|
||||
**Blockers #4 RESOLVED 2026-06-07.** The substrate fix turned out
|
||||
to be a two-line change in `lib/erlang/runtime.sx`: extend
|
||||
`er-sched-step-alive!` to read `:pending-args` when present (was
|
||||
hardcoded to `(list)`), and have `er-bif-http-listen`'s sx-handler
|
||||
spawn the user handler as a real er-process with `:pending-args
|
||||
(list req-pl)` instead of calling it inline. With this in place
|
||||
any `receive` inside a kernel-aware route (e.g. `gen_server:call`)
|
||||
suspends and resumes correctly inside the SX scheduler instead of
|
||||
propagating out of the connection thread.
|
||||
|
||||
Verified by `next/tests/smoke_kernel_route.sh` (6/6, single-instance):
|
||||
welcome `/`, `/actors/alice`, `/actors/alice/outbox` (gen_server-
|
||||
backed, with `tip:` from kernel state), `/actors/alice/inbox`,
|
||||
unknown-actor outbox — all serve over real HTTP through
|
||||
`http_server:start` with `Cfg = [{kernel, nx_kernel}]`. The
|
||||
full two-instance Follow / Accept / Note flow can layer on top
|
||||
of this surface.
|
||||
|
||||
**The proof point.** `next/tests/smoke_federate.sh` spins up two kernel
|
||||
instances on distinct ports, walks them through the full federation
|
||||
@@ -1087,7 +1096,21 @@ proceed.
|
||||
|
||||
4. **`http-listen` handler holds the SX runtime mutex →
|
||||
`gen_server:call` from inside an HTTP route deadlocks.** —
|
||||
discovered during Step 12 prep. The native `http-listen`
|
||||
~~discovered during Step 12 prep~~ **RESOLVED 2026-06-07**
|
||||
by a two-line `lib/erlang/runtime.sx` change: extend
|
||||
`er-sched-step-alive!` to read `:pending-args` when present
|
||||
(was hardcoded to `(list)`), and rewrite
|
||||
`er-bif-http-listen`'s sx-handler to spawn the user handler
|
||||
as a real er-process with `:pending-args (list req-pl)`
|
||||
instead of `er-apply-fun handler` inline. Any `receive`
|
||||
inside a kernel-aware route now suspends + resumes inside
|
||||
the SX scheduler. Verified via the new
|
||||
`next/tests/smoke_kernel_route.sh` (6/6, single-instance
|
||||
`http_server:start(P, [{kernel, nx_kernel}])` serves
|
||||
welcome + `/actors/alice/outbox` with kernel-backed `tip:`
|
||||
etc.). The full Pattern A vs Pattern B analysis below is
|
||||
preserved for the audit trail. The original native
|
||||
`http-listen`
|
||||
primitive in `bin/sx_server.ml:735+` serialises handler calls
|
||||
with `Mutex.lock mtx` / `Mutex.unlock mtx` so the SX runtime
|
||||
isn't re-entered concurrently. The wrapped Erlang handler
|
||||
|
||||
Reference in New Issue
Block a user