Files
rose-ash/plans
giles eafb687b53
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 22s
fed-sx-m2: Step 12 gated on new Blockers #4 (handler mutex deadlock)
Step 12 prep tried to build the two-instance smoke test on top of
the now-resolved Blockers #1 fix (http-listen marshaller bridge).
Both sx_server instances boot and bind, GET / returns the welcome
body, but every request that touches the kernel hangs past curl's
--max-time.

Root cause (verified): the native `http-listen` primitive in
bin/sx_server.ml serialises handler calls with Mutex.lock /
Mutex.unlock so the SX runtime isn't re-entered concurrently. The
wrapped Erlang handler eventually does gen_server:call(nx_kernel,
...) for any kernel-aware route (actor_doc_response_for/3,
actor_outbox_response_for/3, handle_inbox_post, etc.); the
gen_server reply needs the scheduler to run, which needs the SX
runtime, which is locked by the calling handler. Deadlock.

Verification: a sx_server with
  http_server:start(P, [])
serves GET / and welcome routes fine; the same instance with
  http_server:start(P, [{kernel, nx_kernel}])
hangs on the first GET /actors/<id>/outbox.

Blockers #4 entry added. Two fix patterns documented (release the
mutex around gen_server:call's reply wait; OR run the handler in a
fresh er-spawn'd process). Belongs on loops/erlang or
loops/fed-prims — substrate-level, not m2.

Step 12 header updated to flag the gate. Withdrew the in-flight
smoke_federate.sh — its framework was correct (two instances
boot, sequential GET / proves the listener survives more than one
request) but Step 12's actual proof point — Follow → Accept → Note
fan-out — requires kernel-touching routes on every request.

m2's other 11 steps stay individually proven by their per-step
suites; this loop has reached its substrate ceiling and the
autonomous pace is dialled down accordingly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-07 14:03:37 +00:00
..