diff --git a/plans/fed-sx-milestone-2.md b/plans/fed-sx-milestone-2.md index 7a130cb4..a945ada4 100644 --- a/plans/fed-sx-milestone-2.md +++ b/plans/fed-sx-milestone-2.md @@ -851,6 +851,15 @@ re-broadcast another actor's content to their own followers. ## Step 12 — Two-instance smoke test +**GATED on Blockers #4** (http-listen handler holds the SX runtime +mutex, deadlocking any `gen_server:call` from inside a route — see +Blockers section for verification + fix patterns). Without this, +the only request shapes that survive over real HTTP are the static / +capabilities / static-stub paths; every kernel-aware route hangs +indefinitely. The smoke test framework is sketched out (see the +withdrawn `smoke_federate.sh` in this loop's history at commit +`8d33d02f`'s tree state) but cannot exit 0 until Blockers #4 lifts. + **The proof point.** `next/tests/smoke_federate.sh` spins up two kernel instances on distinct ports, walks them through the full federation flow, and exits 0. @@ -1076,12 +1085,72 @@ proceed. retry semantics pure-functionally in 8b-pure so 8b-timer becomes a 1-shot wiring when the primitive lands. +4. **`http-listen` handler holds the SX runtime mutex → + `gen_server:call` from inside an HTTP route deadlocks.** — + discovered during Step 12 prep. The native `http-listen` + primitive in `bin/sx_server.ml:735+` serialises handler calls + with `Mutex.lock mtx` / `Mutex.unlock mtx` so the SX runtime + isn't re-entered concurrently. The wrapped Erlang handler + eventually does `gen_server:call(nx_kernel, ...)` (for kernel- + aware routes like `actor_doc_response_for/3`, + `actor_outbox_response_for/3`, `handle_inbox_post`, + `nx_kernel:state_for/1`, etc.); the gen_server reply needs the + scheduler to run, which needs the SX runtime, which is locked + by the calling handler. Deadlock — curl hangs until the test + `--max-time` fires. + + Verification: a sx_server with `http_server:start(P, [])` (no + Cfg, no kernel routes) serves GET / and welcome paths fine; + the same instance with `Cfg = [{kernel, nx_kernel}]` hangs on + the first GET /actors//outbox (or any /actors/ with + `Accept: application/vnd.fed-sx.actor-doc`). + + Belongs on `loops/erlang` or `loops/fed-prims`. Two fix + patterns: + - Release the mutex around the `gen_server:call` reply wait + (substrate change in http-listen's handler-call code). + - Run the handler in a fresh er-spawn'd process so the + gen_server runs on a different scheduler frame. + + Step 12's two-instance smoke test gates on this — without + it, the only request shapes that survive over real HTTP are + the static / capabilities / static-stub paths. + + In-flight `smoke_federate.sh` test was withdrawn during this + tick after the deadlock surfaced (it boots both instances + successfully but every kernel-touching request hangs); the + plan's Step 12 acceptance criterion stays open pending + Blockers #4 resolution. m2's other 11 steps are fully + landed and individually proven by their per-step suites. + --- ## Progress log Newest first. +- **2026-06-07** — Step 12 prep discovered Blockers #4 + (http-listen handler holds the SX runtime mutex; any + `gen_server:call` from inside an HTTP route deadlocks + because the gen_server reply scheduler needs the SX runtime + the calling handler is sitting on). Verified by spinning + up a single `http_server:start(P, [{kernel, nx_kernel}])` + instance: GET / works, GET /actors/alice (text) works + (no gen_server touch), but GET /actors/alice/outbox or + GET /actors/alice with `Accept: application/vnd.fed-sx. + actor-doc` both hang past curl's --max-time. m2's Step 12 + acceptance gates on this — its proof-point is the + two-instance smoke test which walks the full Follow → + Accept → Note fan-out path, and every step touches the + kernel via gen_server. The in-flight `smoke_federate.sh` + was withdrawn (boots both instances + serves welcome + routes successfully, but every kernel-aware request hangs); + Blockers #4 entry documents the substrate-level fix + patterns. m2's other 11 steps remain individually proven + by their per-step suites. Pivot: pacing the autonomous + loop down — substrate work is owed to `loops/erlang` or + `loops/fed-prims`, not m2. + - **2026-06-07** — Blockers #1 RESOLVED. The `er-bif-http-listen` sx-handler in `lib/erlang/runtime.sx` referenced `er-http-resp-to-sx` / `er-http-req-of-sx` —