fed-sx-m2: Step 12 gated on new Blockers #4 (handler mutex deadlock)
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 22s
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 22s
Step 12 prep tried to build the two-instance smoke test on top of the now-resolved Blockers #1 fix (http-listen marshaller bridge). Both sx_server instances boot and bind, GET / returns the welcome body, but every request that touches the kernel hangs past curl's --max-time. Root cause (verified): the native `http-listen` primitive in bin/sx_server.ml serialises handler calls with Mutex.lock / Mutex.unlock so the SX runtime isn't re-entered concurrently. The wrapped Erlang handler eventually does gen_server:call(nx_kernel, ...) for any kernel-aware route (actor_doc_response_for/3, actor_outbox_response_for/3, handle_inbox_post, etc.); the gen_server reply needs the scheduler to run, which needs the SX runtime, which is locked by the calling handler. Deadlock. Verification: a sx_server with http_server:start(P, []) serves GET / and welcome routes fine; the same instance with http_server:start(P, [{kernel, nx_kernel}]) hangs on the first GET /actors/<id>/outbox. Blockers #4 entry added. Two fix patterns documented (release the mutex around gen_server:call's reply wait; OR run the handler in a fresh er-spawn'd process). Belongs on loops/erlang or loops/fed-prims — substrate-level, not m2. Step 12 header updated to flag the gate. Withdrew the in-flight smoke_federate.sh — its framework was correct (two instances boot, sequential GET / proves the listener survives more than one request) but Step 12's actual proof point — Follow → Accept → Note fan-out — requires kernel-touching routes on every request. m2's other 11 steps stay individually proven by their per-step suites; this loop has reached its substrate ceiling and the autonomous pace is dialled down accordingly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -851,6 +851,15 @@ re-broadcast another actor's content to their own followers.
|
||||
|
||||
## Step 12 — Two-instance smoke test
|
||||
|
||||
**GATED on Blockers #4** (http-listen handler holds the SX runtime
|
||||
mutex, deadlocking any `gen_server:call` from inside a route — see
|
||||
Blockers section for verification + fix patterns). Without this,
|
||||
the only request shapes that survive over real HTTP are the static /
|
||||
capabilities / static-stub paths; every kernel-aware route hangs
|
||||
indefinitely. The smoke test framework is sketched out (see the
|
||||
withdrawn `smoke_federate.sh` in this loop's history at commit
|
||||
`8d33d02f`'s tree state) but cannot exit 0 until Blockers #4 lifts.
|
||||
|
||||
**The proof point.** `next/tests/smoke_federate.sh` spins up two kernel
|
||||
instances on distinct ports, walks them through the full federation
|
||||
flow, and exits 0.
|
||||
@@ -1076,12 +1085,72 @@ proceed.
|
||||
retry semantics pure-functionally in 8b-pure so 8b-timer
|
||||
becomes a 1-shot wiring when the primitive lands.
|
||||
|
||||
4. **`http-listen` handler holds the SX runtime mutex →
|
||||
`gen_server:call` from inside an HTTP route deadlocks.** —
|
||||
discovered during Step 12 prep. The native `http-listen`
|
||||
primitive in `bin/sx_server.ml:735+` serialises handler calls
|
||||
with `Mutex.lock mtx` / `Mutex.unlock mtx` so the SX runtime
|
||||
isn't re-entered concurrently. The wrapped Erlang handler
|
||||
eventually does `gen_server:call(nx_kernel, ...)` (for kernel-
|
||||
aware routes like `actor_doc_response_for/3`,
|
||||
`actor_outbox_response_for/3`, `handle_inbox_post`,
|
||||
`nx_kernel:state_for/1`, etc.); the gen_server reply needs the
|
||||
scheduler to run, which needs the SX runtime, which is locked
|
||||
by the calling handler. Deadlock — curl hangs until the test
|
||||
`--max-time` fires.
|
||||
|
||||
Verification: a sx_server with `http_server:start(P, [])` (no
|
||||
Cfg, no kernel routes) serves GET / and welcome paths fine;
|
||||
the same instance with `Cfg = [{kernel, nx_kernel}]` hangs on
|
||||
the first GET /actors/<id>/outbox (or any /actors/<id> with
|
||||
`Accept: application/vnd.fed-sx.actor-doc`).
|
||||
|
||||
Belongs on `loops/erlang` or `loops/fed-prims`. Two fix
|
||||
patterns:
|
||||
- Release the mutex around the `gen_server:call` reply wait
|
||||
(substrate change in http-listen's handler-call code).
|
||||
- Run the handler in a fresh er-spawn'd process so the
|
||||
gen_server runs on a different scheduler frame.
|
||||
|
||||
Step 12's two-instance smoke test gates on this — without
|
||||
it, the only request shapes that survive over real HTTP are
|
||||
the static / capabilities / static-stub paths.
|
||||
|
||||
In-flight `smoke_federate.sh` test was withdrawn during this
|
||||
tick after the deadlock surfaced (it boots both instances
|
||||
successfully but every kernel-touching request hangs); the
|
||||
plan's Step 12 acceptance criterion stays open pending
|
||||
Blockers #4 resolution. m2's other 11 steps are fully
|
||||
landed and individually proven by their per-step suites.
|
||||
|
||||
---
|
||||
|
||||
## Progress log
|
||||
|
||||
Newest first.
|
||||
|
||||
- **2026-06-07** — Step 12 prep discovered Blockers #4
|
||||
(http-listen handler holds the SX runtime mutex; any
|
||||
`gen_server:call` from inside an HTTP route deadlocks
|
||||
because the gen_server reply scheduler needs the SX runtime
|
||||
the calling handler is sitting on). Verified by spinning
|
||||
up a single `http_server:start(P, [{kernel, nx_kernel}])`
|
||||
instance: GET / works, GET /actors/alice (text) works
|
||||
(no gen_server touch), but GET /actors/alice/outbox or
|
||||
GET /actors/alice with `Accept: application/vnd.fed-sx.
|
||||
actor-doc` both hang past curl's --max-time. m2's Step 12
|
||||
acceptance gates on this — its proof-point is the
|
||||
two-instance smoke test which walks the full Follow →
|
||||
Accept → Note fan-out path, and every step touches the
|
||||
kernel via gen_server. The in-flight `smoke_federate.sh`
|
||||
was withdrawn (boots both instances + serves welcome
|
||||
routes successfully, but every kernel-aware request hangs);
|
||||
Blockers #4 entry documents the substrate-level fix
|
||||
patterns. m2's other 11 steps remain individually proven
|
||||
by their per-step suites. Pivot: pacing the autonomous
|
||||
loop down — substrate work is owed to `loops/erlang` or
|
||||
`loops/fed-prims`, not m2.
|
||||
|
||||
- **2026-06-07** — Blockers #1 RESOLVED. The
|
||||
`er-bif-http-listen` sx-handler in `lib/erlang/runtime.sx`
|
||||
referenced `er-http-resp-to-sx` / `er-http-req-of-sx` —
|
||||
|
||||
Reference in New Issue
Block a user