Files
rose-ash/lib/erlang/bench_ring_results.md

87 lines
4.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Ring Benchmark Results
Generated by `lib/erlang/bench_ring.sh` against `sx_server.exe` on the
synchronous Erlang-on-SX scheduler.
| N (processes) | Hops | Wall-clock | Throughput |
|---|---|---|---|
| 10 | 10 | 907ms | 11 hops/s |
| 50 | 50 | 2107ms | 24 hops/s |
| 100 | 100 | 3827ms | 26 hops/s |
| 500 | 500 | 17004ms | 29 hops/s |
| 1000 | 1000 | 29832ms | 34 hops/s |
(Each `Nm` row spawns N processes connected in a ring and passes a
single token N hops total — i.e. the token completes one full lap.)
## Status of the 1M-process target
Phase 3's stretch goal in `plans/erlang-on-sx.md` is a million-process
ring benchmark. **That target is not met** in the current synchronous
scheduler; extrapolating from the table above, 1M hops would take
~30 000 s. Correctness is fine — the program runs at every measured
size — but throughput is bound by per-hop overhead.
Per-hop cost is dominated by:
- `er-env-copy` per fun clause attempt (whole-dict copy each time)
- `call/cc` capture + `raise`/`guard` unwind on every `receive`
- `er-q-delete-at!` rebuilds the mailbox backing list on every match
- `dict-set!`/`dict-has?` lookups in the global processes table
To reach 1M-process throughput in this architecture would need at
least: persistent (path-copying) envs, an inline scheduler that
doesn't call/cc on the common path (msg-already-in-mailbox), and a
linked-list mailbox. None of those are in scope for the Phase 3
checkbox — captured here as the floor we're starting from.
## Phase 9 status (2026-05-14)
Specialized opcodes 9b9f landed as **stub dispatchers** in
`lib/erlang/vm/dispatcher.sx`: `OP_PATTERN_TUPLE/LIST/BINARY`,
`OP_PERFORM/HANDLE`, `OP_RECEIVE_SCAN`, `OP_SPAWN/SEND`, and ten
`OP_BIF_*` hot dispatch entries. Each opcode's handler is a thin
wrapper over the existing `er-match-*` / `er-bif-*` / runtime impls,
so **the perf numbers above are unchanged** — same per-hop cost, same
scheduler. The stubs exist to nail down opcode IDs, operand contracts,
and tests against `er-match!` parity *before* 9a (the OCaml
opcode-extension mechanism in `hosts/ocaml/evaluator/`) lands.
When 9a integrates and the bytecode compiler can emit these opcodes
at hot call sites, the real speedup story (~3000× ring throughput,
~1000× spawn) starts. Until then this file documents the
pre-integration ceiling. 72 vm-suite tests guard the stub correctness;
full conformance is **709/709** with the stub infrastructure loaded.
## Phase 9g — post-integration bench (2026-05-15)
9a (vm-ext mechanism), 9h (`erlang_ext.ml` registering `erlang.OP_*`
ids 222-239), and 9i (SX dispatcher consulting `extension-opcode-id`)
are now integrated and built into `hosts/ocaml/_build/default/bin/sx_server.exe`.
Re-ran the ring ladder on that binary:
| N (processes) | Hops | Wall-clock | Throughput |
|---|---|---|---|
| 10 | 10 | 938ms | 11 hops/s |
| 100 | 100 | 2772ms | 36 hops/s |
| 500 | 500 | 14190ms | 35 hops/s |
| 1000 | 1000 | 31814ms | 31 hops/s |
**Numbers are unchanged from the pre-integration baseline** — and that
is the expected, correct result. The opcode handlers (both the SX stub
dispatcher and the OCaml `erlang_ext` module) wrap the existing
`er-match-*` / `er-bif-*` / scheduler implementations 1-to-1, and the
**bytecode compiler does not yet emit `erlang.OP_*` opcodes**, so every
hop still goes through the general CEK path exactly as before. The
unchanged numbers therefore double as a no-regression check: the full
extension wiring (cherry-picked vm-ext A-E + force-link + erlang_ext +
SX bridge) added zero per-hop cost. Conformance **715/715** on this
binary.
The ~3000×/~1000× targets remain gated on a **future phase (Phase 10 —
bytecode emission)**: teach `lib/compiler.sx` (or the Erlang
transpiler) to emit `erlang.OP_PATTERN_TUPLE` etc. at hot call sites,
then give `erlang_ext.ml` real register-machine handlers instead of the
current honest not-wired raise. That is a substantial standalone phase,
tracked in `plans/erlang-on-sx.md`. 9g's deliverable — *honest
measurement + recorded numbers on the integrated binary* — is complete.