fed-sx-m2: resolve Blockers #4 — kernel routes now work over real HTTP
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 1m6s
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 1m6s
Substrate fix: two-line change to lib/erlang/runtime.sx that lets
http-listen handler routes call gen_server:call without deadlocking.
1. er-sched-step-alive!: pass :pending-args (when set) to the
initial-fun call instead of always passing an empty list.
Default behavior (no field) stays (list) — drop-in safe.
2. er-bif-http-listen sx-handler: instead of er-apply-fun handler
inline (which blows up on receive's er-suspend-marker because
the connection thread has no scheduler step on its stack),
create a real er-process with :initial-fun = handler and
:pending-args = (list req-pl), then er-sched-run-all! to drain.
Any receive (e.g. gen_server:call) suspends + resumes inside
the SX scheduler frame the process owns. Read :exit-result
for the response proplist; marshal back to SX dict.
Investigation arc (see plans/fed-sx-milestone-2.md Blockers #4 +
Progress log):
- loops/fed-prims bf8d0bf2 diagnosed it as Erlang-substrate, not
OCaml mutex (Pattern A wrong, Pattern B right but sketchy).
- First Pattern B attempt failed: tried er-spawn-fun on a raw SX
lambda, hit (er-fun? fv) gate. Connection-thread bisect
pinpointed the exact line.
- Real fix: use the existing er-fun (user's handler) directly,
but feed it via :pending-args so step-alive's hardcoded
(list) doesn't drop the request arg.
Acceptance:
- new next/tests/smoke_kernel_route.sh: 6/6 over real HTTP
(welcome /, /actors/alice, /actors/alice/outbox with
gen_server-backed tip, /actors/alice/inbox, unknown-actor,
via http_server:start(P, [{kernel, nx_kernel}])).
- next/tests/http_server_tcp.sh: 5/5 (bumped wait_bound from
30s to 180s — cold boot is slow under sibling-loop CPU load
and the per-handler scheduler ramp adds a small margin).
- Erlang conformance: 761/761.
Step 12's two-instance smoke test is now unblocked — its full
Follow / Accept / Note flow can layer on top of this kernel-route
surface. m2 plan updated.
Pre-existing httpc_request.sh flakiness ("Undefined symbol:
http-request" on the live-call epochs) reproduces WITHOUT this
change — see git stash A/B in the investigation. Unrelated.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -851,14 +851,23 @@ re-broadcast another actor's content to their own followers.
|
||||
|
||||
## Step 12 — Two-instance smoke test
|
||||
|
||||
**GATED on Blockers #4** (http-listen handler holds the SX runtime
|
||||
mutex, deadlocking any `gen_server:call` from inside a route — see
|
||||
Blockers section for verification + fix patterns). Without this,
|
||||
the only request shapes that survive over real HTTP are the static /
|
||||
capabilities / static-stub paths; every kernel-aware route hangs
|
||||
indefinitely. The smoke test framework is sketched out (see the
|
||||
withdrawn `smoke_federate.sh` in this loop's history at commit
|
||||
`8d33d02f`'s tree state) but cannot exit 0 until Blockers #4 lifts.
|
||||
**Blockers #4 RESOLVED 2026-06-07.** The substrate fix turned out
|
||||
to be a two-line change in `lib/erlang/runtime.sx`: extend
|
||||
`er-sched-step-alive!` to read `:pending-args` when present (was
|
||||
hardcoded to `(list)`), and have `er-bif-http-listen`'s sx-handler
|
||||
spawn the user handler as a real er-process with `:pending-args
|
||||
(list req-pl)` instead of calling it inline. With this in place
|
||||
any `receive` inside a kernel-aware route (e.g. `gen_server:call`)
|
||||
suspends and resumes correctly inside the SX scheduler instead of
|
||||
propagating out of the connection thread.
|
||||
|
||||
Verified by `next/tests/smoke_kernel_route.sh` (6/6, single-instance):
|
||||
welcome `/`, `/actors/alice`, `/actors/alice/outbox` (gen_server-
|
||||
backed, with `tip:` from kernel state), `/actors/alice/inbox`,
|
||||
unknown-actor outbox — all serve over real HTTP through
|
||||
`http_server:start` with `Cfg = [{kernel, nx_kernel}]`. The
|
||||
full two-instance Follow / Accept / Note flow can layer on top
|
||||
of this surface.
|
||||
|
||||
**The proof point.** `next/tests/smoke_federate.sh` spins up two kernel
|
||||
instances on distinct ports, walks them through the full federation
|
||||
@@ -1087,7 +1096,21 @@ proceed.
|
||||
|
||||
4. **`http-listen` handler holds the SX runtime mutex →
|
||||
`gen_server:call` from inside an HTTP route deadlocks.** —
|
||||
discovered during Step 12 prep. The native `http-listen`
|
||||
~~discovered during Step 12 prep~~ **RESOLVED 2026-06-07**
|
||||
by a two-line `lib/erlang/runtime.sx` change: extend
|
||||
`er-sched-step-alive!` to read `:pending-args` when present
|
||||
(was hardcoded to `(list)`), and rewrite
|
||||
`er-bif-http-listen`'s sx-handler to spawn the user handler
|
||||
as a real er-process with `:pending-args (list req-pl)`
|
||||
instead of `er-apply-fun handler` inline. Any `receive`
|
||||
inside a kernel-aware route now suspends + resumes inside
|
||||
the SX scheduler. Verified via the new
|
||||
`next/tests/smoke_kernel_route.sh` (6/6, single-instance
|
||||
`http_server:start(P, [{kernel, nx_kernel}])` serves
|
||||
welcome + `/actors/alice/outbox` with kernel-backed `tip:`
|
||||
etc.). The full Pattern A vs Pattern B analysis below is
|
||||
preserved for the audit trail. The original native
|
||||
`http-listen`
|
||||
primitive in `bin/sx_server.ml:735+` serialises handler calls
|
||||
with `Mutex.lock mtx` / `Mutex.unlock mtx` so the SX runtime
|
||||
isn't re-entered concurrently. The wrapped Erlang handler
|
||||
|
||||
Reference in New Issue
Block a user