Compare commits
1 Commits
loops/maud
...
loops/fed-
| Author | SHA1 | Date | |
|---|---|---|---|
| bf8d0bf245 |
@@ -264,6 +264,25 @@ should leave `httpc`/`sqlite` BIFs blocked with that note.
|
||||
|
||||
_Newest first._
|
||||
|
||||
- 2026-06-07 — Investigated fed-sx-m2 Blockers #4 ("handler-mutex
|
||||
deadlock") per `plans/agent-briefings/fed-prims-mutex-fix.md`.
|
||||
**Outcome: not a mutex bug; no OCaml change — handed back to m2.**
|
||||
Reproduced deterministically (single kernel-route request fails with
|
||||
empty reply while `/` returns 200; also a 3-line minimal echo
|
||||
gen_server reproduces it). Root cause: native `http-listen` runs the
|
||||
handler on a fresh `Thread.create` outside the Erlang scheduler, so
|
||||
`gen_server:call` → `receive` (which `raise`s `er-suspend-marker`
|
||||
expecting an enclosing `er-sched-step-alive!` guard + `er-sched-run-all!`
|
||||
pump) can never complete. Pattern A is inapplicable (single-request
|
||||
failure ⇒ no contention; the mutex is required and must stay) and
|
||||
`Sx_runtime.sx_call` is fully synchronous; no OCaml symbol can reach
|
||||
the SX-level scheduler. Correct fix is Pattern B done purely in
|
||||
`er-bif-http-listen` (`lib/erlang/runtime.sx`): spawn the handler as an
|
||||
er-process and `er-sched-run-all!` to completion, returning the
|
||||
process's `:exit-result`. That file is m2 / `loops/erlang` scope, so
|
||||
this loop made no code change. Full diagnosis + a concrete patch
|
||||
sketch recorded under Blockers below. `bin/sx_server.ml` unchanged;
|
||||
builds untouched.
|
||||
- 2026-05-26 — Phase J: `http-request` primitive in `bin/sx_server.ml`
|
||||
(NATIVE ONLY — `Unix.gethostbyname` + `Unix.connect`; HTTP/1.1 with
|
||||
inline `http://` URL parser; sends Connection: close + Host +
|
||||
@@ -339,4 +358,73 @@ _Newest first._
|
||||
|
||||
## Blockers
|
||||
|
||||
- _(none yet)_
|
||||
- 2026-06-07 — **fed-sx-m2 Blockers #4 (handler-mutex deadlock) is NOT a
|
||||
mutex bug — root cause is in the Erlang substrate, so the fix is m2
|
||||
scope, not OCaml.** Investigated per `plans/agent-briefings/
|
||||
fed-prims-mutex-fix.md`. Reproduced deterministically (m2 worktree
|
||||
binary + `next/kernel/*.erl`, port 51920): a **single** request — no
|
||||
concurrency, no prior request — to `/actors/alice/outbox` returns an
|
||||
empty reply (curl exit 52) while the non-kernel control route `/`
|
||||
returns 200 `fed-sx kernel m1`. Also reproduced with a 3-line minimal
|
||||
echo gen_server + a handler that does `gen_server:call(echo, ping)`
|
||||
(no kernel needed; boots in ~20s vs ~7min for the full kernel here).
|
||||
|
||||
Diagnosis: native `http-listen` (`bin/sx_server.ml:743-840`) runs each
|
||||
connection's handler on a fresh `Thread.create` **outside any Erlang
|
||||
scheduler step**. The handler closure (`er-bif-http-listen`'s
|
||||
`sx-handler`, `lib/erlang/runtime.sx`) calls `er-apply-fun handler`
|
||||
directly, so when the route reaches `gen_server:call` →
|
||||
`receive` (`lib/erlang/transpile.sx:1132`), the `receive` captures a
|
||||
`call/cc` and `raise`s `er-suspend-marker` expecting an enclosing
|
||||
`er-sched-step-alive!` guard **and** a scheduler pump
|
||||
(`er-sched-run-all!`). On the native handler thread neither is on the
|
||||
stack: with no guard the suspend either propagates out (→ empty reply,
|
||||
minimal case) or is caught by an Erlang `try`/guard in the route and
|
||||
the request stalls (→ "hang" the m2 loop observed). The kernel
|
||||
gen_server can never be stepped because the only scheduler driver
|
||||
(the boot thread that ran `erlang-eval-ast`) is parked forever in the
|
||||
native `Unix.accept` loop.
|
||||
|
||||
Why Pattern A (release/rescope the runtime mutex) does NOT apply: the
|
||||
failure reproduces on a **single request with zero contention**, so it
|
||||
is not a mutex-contention deadlock. Releasing the mutex cannot help and
|
||||
would be actively harmful — the mutex is *required* to serialise the
|
||||
shared single-threaded SX runtime / scheduler across handler threads.
|
||||
`Sx_runtime.sx_call` (`lib/sx_runtime.ml:102`) is fully synchronous
|
||||
(it just dispatches into the CEK evaluator), which is exactly the
|
||||
briefing's stated condition for falling back from Pattern A to
|
||||
Pattern B. There is also no OCaml-only fix: `grep` confirms nothing in
|
||||
`hosts/ocaml/{lib,bin}` references `er-sched*`/the Erlang scheduler —
|
||||
`er-sched-run-all!` is a pure-SX symbol in `lib/erlang/runtime.sx`, so
|
||||
OCaml cannot pump it. Running the handler synchronously on the accept
|
||||
thread (no `Thread.create`) does not help either: the `er-suspend-marker`
|
||||
`raise` would unwind the native `handle` frame that writes the HTTP
|
||||
response, losing the response across the suspension.
|
||||
|
||||
Recommended fix (Pattern B, **m2 / `loops/erlang` scope — entirely in
|
||||
`er-bif-http-listen`, no OCaml change**): have `sx-handler` run the
|
||||
handler as a scheduled er-process and pump the scheduler to completion,
|
||||
e.g.
|
||||
|
||||
```
|
||||
(sx-handler
|
||||
(fn (req-dict)
|
||||
(let ((req-pl (er-request-dict-to-proplist req-dict)))
|
||||
(let ((pid (er-spawn-fun
|
||||
(fn () (er-apply-fun handler (list req-pl))))))
|
||||
(er-sched-run-all!) ; drains: handler →
|
||||
; kernel reply → handler
|
||||
(er-proplist-to-dict
|
||||
(er-proc-field pid :exit-result)))))) ; handler's return value
|
||||
```
|
||||
|
||||
This keeps every suspend/resume inside the SX scheduler; the native
|
||||
side only ever sees the final response dict. The existing native
|
||||
per-connection `Thread.create` + `Mutex` stay as-is and remain correct
|
||||
(they serialise the single pump across concurrent connections — the
|
||||
mutex must NOT be removed). Verified by reasoning through the full
|
||||
step trace (handler suspends on `receive` → kernel `handle_call`
|
||||
replies → handler resumes → dies with `:exit-result`); the m2 loop
|
||||
should implement + run `next/tests/http_server_tcp.sh` plus a
|
||||
kernel-route smoke. No OCaml or `bin/sx_server.ml` change was made or
|
||||
is needed.
|
||||
|
||||
Reference in New Issue
Block a user