fed-sx-m2: resolve Blockers #4 — kernel routes now work over real HTTP
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 1m6s

Substrate fix: two-line change to lib/erlang/runtime.sx that lets
http-listen handler routes call gen_server:call without deadlocking.

  1. er-sched-step-alive!: pass :pending-args (when set) to the
     initial-fun call instead of always passing an empty list.
     Default behavior (no field) stays (list) — drop-in safe.

  2. er-bif-http-listen sx-handler: instead of er-apply-fun handler
     inline (which blows up on receive's er-suspend-marker because
     the connection thread has no scheduler step on its stack),
     create a real er-process with :initial-fun = handler and
     :pending-args = (list req-pl), then er-sched-run-all! to drain.
     Any receive (e.g. gen_server:call) suspends + resumes inside
     the SX scheduler frame the process owns. Read :exit-result
     for the response proplist; marshal back to SX dict.

Investigation arc (see plans/fed-sx-milestone-2.md Blockers #4 +
Progress log):
  - loops/fed-prims bf8d0bf2 diagnosed it as Erlang-substrate, not
    OCaml mutex (Pattern A wrong, Pattern B right but sketchy).
  - First Pattern B attempt failed: tried er-spawn-fun on a raw SX
    lambda, hit (er-fun? fv) gate. Connection-thread bisect
    pinpointed the exact line.
  - Real fix: use the existing er-fun (user's handler) directly,
    but feed it via :pending-args so step-alive's hardcoded
    (list) doesn't drop the request arg.

Acceptance:
  - new next/tests/smoke_kernel_route.sh: 6/6 over real HTTP
    (welcome /, /actors/alice, /actors/alice/outbox with
    gen_server-backed tip, /actors/alice/inbox, unknown-actor,
    via http_server:start(P, [{kernel, nx_kernel}])).
  - next/tests/http_server_tcp.sh: 5/5 (bumped wait_bound from
    30s to 180s — cold boot is slow under sibling-loop CPU load
    and the per-handler scheduler ramp adds a small margin).
  - Erlang conformance: 761/761.

Step 12's two-instance smoke test is now unblocked — its full
Follow / Accept / Note flow can layer on top of this kernel-route
surface. m2 plan updated.

Pre-existing httpc_request.sh flakiness ("Undefined symbol:
http-request" on the live-call epochs) reproduces WITHOUT this
change — see git stash A/B in the investigation. Unrelated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-07 20:04:19 +00:00
parent 600d292ba2
commit 03c32cda5f
4 changed files with 183 additions and 14 deletions

View File

@@ -851,14 +851,23 @@ re-broadcast another actor's content to their own followers.
## Step 12 — Two-instance smoke test
**GATED on Blockers #4** (http-listen handler holds the SX runtime
mutex, deadlocking any `gen_server:call` from inside a route — see
Blockers section for verification + fix patterns). Without this,
the only request shapes that survive over real HTTP are the static /
capabilities / static-stub paths; every kernel-aware route hangs
indefinitely. The smoke test framework is sketched out (see the
withdrawn `smoke_federate.sh` in this loop's history at commit
`8d33d02f`'s tree state) but cannot exit 0 until Blockers #4 lifts.
**Blockers #4 RESOLVED 2026-06-07.** The substrate fix turned out
to be a two-line change in `lib/erlang/runtime.sx`: extend
`er-sched-step-alive!` to read `:pending-args` when present (was
hardcoded to `(list)`), and have `er-bif-http-listen`'s sx-handler
spawn the user handler as a real er-process with `:pending-args
(list req-pl)` instead of calling it inline. With this in place
any `receive` inside a kernel-aware route (e.g. `gen_server:call`)
suspends and resumes correctly inside the SX scheduler instead of
propagating out of the connection thread.
Verified by `next/tests/smoke_kernel_route.sh` (6/6, single-instance):
welcome `/`, `/actors/alice`, `/actors/alice/outbox` (gen_server-
backed, with `tip:` from kernel state), `/actors/alice/inbox`,
unknown-actor outbox — all serve over real HTTP through
`http_server:start` with `Cfg = [{kernel, nx_kernel}]`. The
full two-instance Follow / Accept / Note flow can layer on top
of this surface.
**The proof point.** `next/tests/smoke_federate.sh` spins up two kernel
instances on distinct ports, walks them through the full federation
@@ -1087,7 +1096,21 @@ proceed.
4. **`http-listen` handler holds the SX runtime mutex →
`gen_server:call` from inside an HTTP route deadlocks.** —
discovered during Step 12 prep. The native `http-listen`
~~discovered during Step 12 prep~~ **RESOLVED 2026-06-07**
by a two-line `lib/erlang/runtime.sx` change: extend
`er-sched-step-alive!` to read `:pending-args` when present
(was hardcoded to `(list)`), and rewrite
`er-bif-http-listen`'s sx-handler to spawn the user handler
as a real er-process with `:pending-args (list req-pl)`
instead of `er-apply-fun handler` inline. Any `receive`
inside a kernel-aware route now suspends + resumes inside
the SX scheduler. Verified via the new
`next/tests/smoke_kernel_route.sh` (6/6, single-instance
`http_server:start(P, [{kernel, nx_kernel}])` serves
welcome + `/actors/alice/outbox` with kernel-backed `tip:`
etc.). The full Pattern A vs Pattern B analysis below is
preserved for the audit trail. The original native
`http-listen`
primitive in `bin/sx_server.ml:735+` serialises handler calls
with `Mutex.lock mtx` / `Mutex.unlock mtx` so the SX runtime
isn't re-entered concurrently. The wrapped Erlang handler