diff --git a/plans/fed-sx-milestone-2.md b/plans/fed-sx-milestone-2.md index c8f79d6e..065f9f03 100644 --- a/plans/fed-sx-milestone-2.md +++ b/plans/fed-sx-milestone-2.md @@ -1112,28 +1112,54 @@ proceed. wrap the handler call in `er-spawn-fun` + `er-sched-run-all!` and read the process's `:exit-result`. m2 tried this patch on `lib/erlang/runtime.sx` and **it did not work**: the listener - binds, the connection thread enters `sx-handler`, but the - spawned process's response never reaches the wire — even the - non-kernel welcome route returns `HTTP 000` (empty reply). + binds, but every kernel-aware request returns HTTP 000. Reproducer: spin up `http_server:start(P, [])` with the Pattern B `sx-handler`; `curl http://127.0.0.1:P/` returns 000. - Why it fails (working hypothesis, m2 worktree): the - `http_server:start` spawn itself ran inside the outer - `erlang-eval-ast` scheduler pump and is **parked inside the - native `Unix.accept` loop on the boot thread**; the global - `er-sched-*` state still has that process in its queue. When - the connection thread calls `er-sched-run-all!` from inside - `sx-handler`, it re-enters the SAME global scheduler that - the boot thread is already pumping (the boot thread's - `er-sched-step!` of the http:listen process is blocked - forever inside the native primitive). The connection thread - spawns its handler process fine but `er-sched-run-all!` - either races against the boot thread's parked pump or - otherwise fails to drive the handler to completion before - the native handler returns. Reverted on m2 — `lib/erlang/ - runtime.sx` stays at the Blockers #1 marshaller-bridge fix, - which is correct. + **Concrete reason (verified by isolated tests in the + connection thread, m2 worktree):** `er-spawn-fun` raises + `"Erlang: spawn/1: not a fun"` when called with the + raw SX lambda `(fn () (er-apply-fun handler (list req-pl)))` + because it gates on `(not (er-fun? fv))` and `er-fun?` + checks for the `{:tag "fun"}` Erlang-AST shape, not a host + Lambda. The user-supplied `handler` IS an `er-fun` (built + by the user's `fun (Req) -> route(Req, Cfg) end` form), but + we need a 0-arity wrapper to feed it `req-pl` — and + `er-sched-step-alive!` hardcodes `(er-apply-fun + (er-proc-field pid :initial-fun) (list))`, so the + wrapper must be 0-arity. + Verified piece-by-piece from the connection thread: + `er-pid-new!` → ok, `er-proc-new!` → ok, but + `er-spawn-fun (fn () 42)` → empty reply (the `error` raise + propagates through `Sx_runtime.sx_call` and gets caught by + the native http-listen `(try ... with _ -> ())` at + `sx_server.ml:852` so the connection writes nothing and + closes). + + To make Pattern B actually work in pure SX you need a way + to construct an `er-fun` programmatically from a raw SX + closure (so the wrapper-with-captured-req-pl can be + spawned). The existing `er-mk-fun` takes Erlang AST + clauses, not host closures — building one inline either + needs an AST-constructor helper or a small parser call. + This is a one-helper substrate addition, not a redesign, + but it does need to live in `lib/erlang/transpile.sx` or + `runtime.sx` and probably wants an additive test. + + Also: even with that helper, the original "race against + the parked boot-thread pump" concern is unverified. + Solo-piece tests inside the connection thread showed the + global `er-sched-*` state IS accessible there + (`er-sched-process-count` returned 2 — the boot main + + the spawned http:listen process). Once an `er-fun` + wrapper exists, the spawn + drain should at least + smoke-execute; what happens next under live load is the + next unknown. + + Reverted on m2 — `lib/erlang/runtime.sx` stays at the + Blockers #1 marshaller-bridge fix, which is correct for + the non-kernel surface (welcome / capabilities / 404 / + 401 over real HTTP). The real fix likely needs ONE of: - Native http-listen registers the listener and returns @@ -1170,36 +1196,54 @@ proceed. Newest first. +- **2026-06-07** — Re-investigated Pattern B with proper + instrumentation; **concrete failure root cause identified**. + Built each step of the spawn pipeline as its own minimal + `sx-handler` (hardcoded reply dict) and curled it: + hardcoded dict → 200 ✓, `er-sched-process-count` → + `procs=2` ✓ (boot main + http:listen process; global + scheduler IS accessible from the connection thread), + `er-pid-new!` → 204 ✓, `er-proc-new!` → 205 ✓ — all the + way up to `er-spawn-fun (fn () 42)` → HTTP 000. The break + is `er-spawn-fun`'s `(not (er-fun? fv))` gate raising + `"Erlang: spawn/1: not a fun"` because the raw SX lambda + isn't an Erlang-fun-shaped `{:tag "fun"}` dict. The + `error` raise propagates through `Sx_runtime.sx_call` and + is swallowed by the native http-listen + `(try ... with _ -> ())` at `sx_server.ml:852`; connection + writes nothing and closes. + + Was previously waving at "race against parked boot-thread + pump" as the hypothesis — that part wasn't reproduced. + The global scheduler IS shared and the connection thread + reads it fine; the breakage is the strict `er-fun?` shape + check, not concurrency. + + Path forward for Pattern B (still substrate scope): need a + way to construct an `er-fun` from a host SX closure so the + 0-arity wrapper-with-captured-req-pl can be fed to + `er-spawn-fun`. Either a new `er-mk-host-fun` helper in + `lib/erlang/runtime.sx`, or a small AST-constructor in + `transpile.sx`. One-helper substrate addition, not a + redesign. Blockers #4 updated; once that helper lands the + spawn + drain should at least smoke-execute (whatever + concurrency issue surfaces next is the next unknown). + Reverted runtime.sx to the Blockers #1 marshaller-bridge + fix. + - **2026-06-07** — Tried `loops/fed-prims` `bf8d0bf2`'s Pattern B patch sketch on `lib/erlang/runtime.sx`'s `er-bif-http-listen`: wrap the handler call in `er-spawn-fun` + `er-sched-run-all!` and read the spawned process's `:exit-result`. **It did not work** — listener binds, but even the non-kernel welcome route now returns HTTP 000 (the spawned handler's response never - reaches the wire). The simple `sx-handler` (direct - `er-apply-fun handler`) is preserved on m2 because it at least - serves welcome / capabilities / 404 / 401 correctly when no - kernel routes are touched. Reverted; runtime.sx stays at the - Blockers #1 marshaller-bridge fix. - - Working hypothesis for why Pattern B fails on m2's - reproducer: the `http_server:start` spawn is itself parked - inside the native `Unix.accept` loop on the boot thread; the - global `er-sched-*` state still has that process in its - queue. When the connection thread (under the per-instance - native mutex) calls `er-sched-run-all!`, it re-enters the - SAME global scheduler — the boot thread's `er-sched-step!` - of the http:listen process is blocked forever inside the - native primitive, so the connection-thread pump either - races against that parked frame or otherwise fails to drive - the new handler process to completion before the connection - thread returns from `sx-handler`. The fed-prims diagnosis - was correct that the bug is Erlang-substrate scope and that - Pattern A (the mutex) doesn't apply, but the Pattern B - sketch assumed a fresh / private scheduler context that - doesn't exist in the current substrate. Blockers #4 - updated to capture this + sketch the three substrate fixes - that would actually work; loop pacing dialled back down. + reaches the wire). Reverted; runtime.sx stays at the + Blockers #1 marshaller-bridge fix. Initially hypothesised the + failure was a scheduler-re-entry race (parked Unix.accept + pump on the boot thread vs. connection-thread pump); the + follow-up tick above narrowed the root cause to the + `er-fun?` shape gate — see that entry for the verified + diagnosis. - **2026-06-07** — Step 12 prep discovered Blockers #4 (http-listen handler holds the SX runtime mutex; any