Files
rose-ash/lib/erlang/bench_ring_results.md

4.0 KiB
Raw Blame History

Ring Benchmark Results

Generated by lib/erlang/bench_ring.sh against sx_server.exe on the synchronous Erlang-on-SX scheduler.

N (processes) Hops Wall-clock Throughput
10 10 907ms 11 hops/s
50 50 2107ms 24 hops/s
100 100 3827ms 26 hops/s
500 500 17004ms 29 hops/s
1000 1000 29832ms 34 hops/s

(Each Nm row spawns N processes connected in a ring and passes a single token N hops total — i.e. the token completes one full lap.)

Status of the 1M-process target

Phase 3's stretch goal in plans/erlang-on-sx.md is a million-process ring benchmark. That target is not met in the current synchronous scheduler; extrapolating from the table above, 1M hops would take ~30 000 s. Correctness is fine — the program runs at every measured size — but throughput is bound by per-hop overhead.

Per-hop cost is dominated by:

  • er-env-copy per fun clause attempt (whole-dict copy each time)
  • call/cc capture + raise/guard unwind on every receive
  • er-q-delete-at! rebuilds the mailbox backing list on every match
  • dict-set!/dict-has? lookups in the global processes table

To reach 1M-process throughput in this architecture would need at least: persistent (path-copying) envs, an inline scheduler that doesn't call/cc on the common path (msg-already-in-mailbox), and a linked-list mailbox. None of those are in scope for the Phase 3 checkbox — captured here as the floor we're starting from.

Phase 9 status (2026-05-14)

Specialized opcodes 9b9f landed as stub dispatchers in lib/erlang/vm/dispatcher.sx: OP_PATTERN_TUPLE/LIST/BINARY, OP_PERFORM/HANDLE, OP_RECEIVE_SCAN, OP_SPAWN/SEND, and ten OP_BIF_* hot dispatch entries. Each opcode's handler is a thin wrapper over the existing er-match-* / er-bif-* / runtime impls, so the perf numbers above are unchanged — same per-hop cost, same scheduler. The stubs exist to nail down opcode IDs, operand contracts, and tests against er-match! parity before 9a (the OCaml opcode-extension mechanism in hosts/ocaml/evaluator/) lands.

When 9a integrates and the bytecode compiler can emit these opcodes at hot call sites, the real speedup story (~3000× ring throughput, ~1000× spawn) starts. Until then this file documents the pre-integration ceiling. 72 vm-suite tests guard the stub correctness; full conformance is 709/709 with the stub infrastructure loaded.

Phase 9g — post-integration bench (2026-05-15)

9a (vm-ext mechanism), 9h (erlang_ext.ml registering erlang.OP_* ids 222-239), and 9i (SX dispatcher consulting extension-opcode-id) are now integrated and built into hosts/ocaml/_build/default/bin/sx_server.exe. Re-ran the ring ladder on that binary:

N (processes) Hops Wall-clock Throughput
10 10 938ms 11 hops/s
100 100 2772ms 36 hops/s
500 500 14190ms 35 hops/s
1000 1000 31814ms 31 hops/s

Numbers are unchanged from the pre-integration baseline — and that is the expected, correct result. The opcode handlers (both the SX stub dispatcher and the OCaml erlang_ext module) wrap the existing er-match-* / er-bif-* / scheduler implementations 1-to-1, and the bytecode compiler does not yet emit erlang.OP_* opcodes, so every hop still goes through the general CEK path exactly as before. The unchanged numbers therefore double as a no-regression check: the full extension wiring (cherry-picked vm-ext A-E + force-link + erlang_ext + SX bridge) added zero per-hop cost. Conformance 715/715 on this binary.

The ~3000×/~1000× targets remain gated on a future phase (Phase 10 — bytecode emission): teach lib/compiler.sx (or the Erlang transpiler) to emit erlang.OP_PATTERN_TUPLE etc. at hot call sites, then give erlang_ext.ml real register-machine handlers instead of the current honest not-wired raise. That is a substantial standalone phase, tracked in plans/erlang-on-sx.md. 9g's deliverable — honest measurement + recorded numbers on the integrated binary — is complete.