87 lines
4.0 KiB
Markdown
87 lines
4.0 KiB
Markdown
# Ring Benchmark Results
|
||
|
||
Generated by `lib/erlang/bench_ring.sh` against `sx_server.exe` on the
|
||
synchronous Erlang-on-SX scheduler.
|
||
|
||
| N (processes) | Hops | Wall-clock | Throughput |
|
||
|---|---|---|---|
|
||
| 10 | 10 | 907ms | 11 hops/s |
|
||
| 50 | 50 | 2107ms | 24 hops/s |
|
||
| 100 | 100 | 3827ms | 26 hops/s |
|
||
| 500 | 500 | 17004ms | 29 hops/s |
|
||
| 1000 | 1000 | 29832ms | 34 hops/s |
|
||
|
||
(Each `Nm` row spawns N processes connected in a ring and passes a
|
||
single token N hops total — i.e. the token completes one full lap.)
|
||
|
||
## Status of the 1M-process target
|
||
|
||
Phase 3's stretch goal in `plans/erlang-on-sx.md` is a million-process
|
||
ring benchmark. **That target is not met** in the current synchronous
|
||
scheduler; extrapolating from the table above, 1M hops would take
|
||
~30 000 s. Correctness is fine — the program runs at every measured
|
||
size — but throughput is bound by per-hop overhead.
|
||
|
||
Per-hop cost is dominated by:
|
||
- `er-env-copy` per fun clause attempt (whole-dict copy each time)
|
||
- `call/cc` capture + `raise`/`guard` unwind on every `receive`
|
||
- `er-q-delete-at!` rebuilds the mailbox backing list on every match
|
||
- `dict-set!`/`dict-has?` lookups in the global processes table
|
||
|
||
To reach 1M-process throughput in this architecture would need at
|
||
least: persistent (path-copying) envs, an inline scheduler that
|
||
doesn't call/cc on the common path (msg-already-in-mailbox), and a
|
||
linked-list mailbox. None of those are in scope for the Phase 3
|
||
checkbox — captured here as the floor we're starting from.
|
||
|
||
## Phase 9 status (2026-05-14)
|
||
|
||
Specialized opcodes 9b–9f landed as **stub dispatchers** in
|
||
`lib/erlang/vm/dispatcher.sx`: `OP_PATTERN_TUPLE/LIST/BINARY`,
|
||
`OP_PERFORM/HANDLE`, `OP_RECEIVE_SCAN`, `OP_SPAWN/SEND`, and ten
|
||
`OP_BIF_*` hot dispatch entries. Each opcode's handler is a thin
|
||
wrapper over the existing `er-match-*` / `er-bif-*` / runtime impls,
|
||
so **the perf numbers above are unchanged** — same per-hop cost, same
|
||
scheduler. The stubs exist to nail down opcode IDs, operand contracts,
|
||
and tests against `er-match!` parity *before* 9a (the OCaml
|
||
opcode-extension mechanism in `hosts/ocaml/evaluator/`) lands.
|
||
|
||
When 9a integrates and the bytecode compiler can emit these opcodes
|
||
at hot call sites, the real speedup story (~3000× ring throughput,
|
||
~1000× spawn) starts. Until then this file documents the
|
||
pre-integration ceiling. 72 vm-suite tests guard the stub correctness;
|
||
full conformance is **709/709** with the stub infrastructure loaded.
|
||
|
||
## Phase 9g — post-integration bench (2026-05-15)
|
||
|
||
9a (vm-ext mechanism), 9h (`erlang_ext.ml` registering `erlang.OP_*`
|
||
ids 222-239), and 9i (SX dispatcher consulting `extension-opcode-id`)
|
||
are now integrated and built into `hosts/ocaml/_build/default/bin/sx_server.exe`.
|
||
Re-ran the ring ladder on that binary:
|
||
|
||
| N (processes) | Hops | Wall-clock | Throughput |
|
||
|---|---|---|---|
|
||
| 10 | 10 | 938ms | 11 hops/s |
|
||
| 100 | 100 | 2772ms | 36 hops/s |
|
||
| 500 | 500 | 14190ms | 35 hops/s |
|
||
| 1000 | 1000 | 31814ms | 31 hops/s |
|
||
|
||
**Numbers are unchanged from the pre-integration baseline** — and that
|
||
is the expected, correct result. The opcode handlers (both the SX stub
|
||
dispatcher and the OCaml `erlang_ext` module) wrap the existing
|
||
`er-match-*` / `er-bif-*` / scheduler implementations 1-to-1, and the
|
||
**bytecode compiler does not yet emit `erlang.OP_*` opcodes**, so every
|
||
hop still goes through the general CEK path exactly as before. The
|
||
unchanged numbers therefore double as a no-regression check: the full
|
||
extension wiring (cherry-picked vm-ext A-E + force-link + erlang_ext +
|
||
SX bridge) added zero per-hop cost. Conformance **715/715** on this
|
||
binary.
|
||
|
||
The ~3000×/~1000× targets remain gated on a **future phase (Phase 10 —
|
||
bytecode emission)**: teach `lib/compiler.sx` (or the Erlang
|
||
transpiler) to emit `erlang.OP_PATTERN_TUPLE` etc. at hot call sites,
|
||
then give `erlang_ext.ml` real register-machine handlers instead of the
|
||
current honest not-wired raise. That is a substantial standalone phase,
|
||
tracked in `plans/erlang-on-sx.md`. 9g's deliverable — *honest
|
||
measurement + recorded numbers on the integrated binary* — is complete.
|