fed-sx-m2: draft milestone-2 plan — multi-actor + federation (12 steps, two-instance smoke test)
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 19s

This commit is contained in:
2026-06-06 08:26:45 +00:00
parent 78eae9ef12
commit 7ea9d04564

652
plans/fed-sx-milestone-2.md Normal file
View File

@@ -0,0 +1,652 @@
# fed-sx Milestone 2 — Multi-actor + Federation
Real federation between two fed-sx instances. Per-actor state, signed
inbox delivery, Follow lifecycle, audience-resolving outbound queue, and
the rich verbs (Note, Announce, Endorse) needed for federated propagation.
Reference: `plans/fed-sx-design.md` (especially §9 identity, §13 federation,
§16 HTTP endpoints). Builds on Milestone 1 (see `plans/fed-sx-milestone-1.md`).
## Goal
Two cooperating fed-sx instances `A` and `B`, each hosting one or more
actors, can:
1. **Discover** each other's actors via webfinger + actor docs.
2. **Follow** across instances (`Follow``Accept` → state).
3. **Publish** a `Note` on `B` and have it land in every follower's
`actor-state` projection on `A` via signed inbox delivery.
4. **Announce** a peer's activity, propagating it to followers of the
announcer.
5. **Rotate keys** on either side without breaking historical sig
verification (per §9.6).
Acceptance: the §11 smoke test (`smoke_federate.sh`) drives all of the
above against two locally-running kernel instances on distinct ports, no
human-in-the-loop, and exits 0.
## Non-goals (what milestone 2 deliberately does NOT do)
- **Real WAN federation.** Both instances run on `localhost:PortA` and
`localhost:PortB`. Cross-instance HTTP is unencrypted plaintext.
TLS, NAT traversal, and signed HTTP-message headers (per RFC 9421)
are v3.
- **ActivityPub Mastodon interop.** No HTTP-signatures-2018 compat layer,
no Linked-Data-Signatures, no JSON-LD canonicalisation. Cross-fed-sx
only.
- **IPFS / S3 storage backends.** Still local files only.
- **Browser client + operator dashboard.** Curl-shaped API only.
- **Capability tokens / delegation.** Multi-actor means multi-user, not
multi-device for a single actor. Capability tokens (per §9.5) defer.
- **Cross-host conformance.** Only OCaml/Erlang-on-SX host runs fed-sx
in v2.
- **Performance work.** Functional correctness first.
- **Spam/abuse infrastructure.** Per §13.6 the layers are designed; v2
implements signature verification + replay defense; reputation,
rate-limiting, instance allowlists / blocklists are v3.
- **Operator quarantine UX.** Logs only.
## Architecture summary
```
Instance A Instance B
(port 9999) (port 9998)
Outbox ┌─────────────────┐ ┌─────────────────┐
────────▶ │ HTTP server │ │ HTTP server │
│ POST /activity │ │ POST /activity │
│ POST /inbox │ │ POST /inbox │
│ GET /actors/.. │ │ GET /actors/.. │
│ GET /.well- │ │ GET /.well- │
│ known/* │ │ known/* │
└────────┬─────────┘ └────────┬─────────┘
│ │
┌────────▼─────────┐ ┌────────▼─────────┐
│ nx_kernel │ ◀ HTTPS ▶ │ nx_kernel │
│ multi-actor │ deliveries │ multi-actor │
│ bucket map │ (signed) │ bucket map │
│ ActorA -> {…} │ │ ActorB -> {…} │
│ ActorC -> {…} │ │ │
└────────┬─────────┘ └────────┬─────────┘
│ │
┌────────▼─────────┐ ┌────────▼─────────┐
│ Delivery queue │ │ Delivery queue │
│ (one worker per │ │ (one worker per │
│ peer instance) │ │ peer instance) │
└──────────────────┘ └──────────────────┘
│ HTTP POST /inbox to peer
(peer instance)
```
The federation transport is plain HTTP POST of canonical-bytes-signed
activities to each follower's actor inbox. Delivery is push (§13.1); pull
+ relay deferred to v3.
## Build order
Twelve steps in dependency order.
| Step | Title | Depends on |
|------|----------------------------------------------------|-----------------------|
| **1** | Per-actor state buckets in nx_kernel | M1 closeout |
| **2** | Actor lifecycle activities (Person/Service/Group) | Step 1 |
| **3** | Key rotation via Update + actor-state projection | Steps 2, M1 §9.6 |
| **4** | Multi-actor HTTP routing (per-actor outbox/inbox) | Steps 1, M1 8b-start |
| **5** | POST /inbox: peer signature verify + ingestion | Steps 3, 4 |
| **6** | Follow lifecycle (Follow / Accept / Reject / Undo) | Step 5 |
| **7** | Audience-resolving delivery set computation | Step 6 |
| **8** | Outbound delivery queue + retry/backoff | Step 7 |
| **9** | Backfill modes on Follow accept | Steps 6, 8 |
| **10** | Discovery: webfinger + actor doc fetch | Step 4 |
| **11** | Rich verbs as runtime artifacts (Note, Announce, Endorse) | Step 8 |
| **12** | Two-instance smoke test (`smoke_federate.sh`) | Steps 1-11 |
Steps 1-3 are the multi-actor foundation. Steps 4-10 are the federation
core. Steps 11-12 close the proof points.
---
## Step 1 — Per-actor state buckets
Today `nx_kernel` holds one actor's state at the top of its property list.
Make it bucketed by ActorId so a single kernel can host any number of
actors.
**Deliverables:**
```erlang
%% nx_kernel state shape becomes:
%% [{actors, [{ActorId, ActorBucket}, ...]},
%% {next_actor_seq, NextN}]
%%
%% ActorBucket = [{key_spec, KS}, {actor_state, AS},
%% {log, LogState}, {projections, [Name]},
%% {next_published, NextSeq}]
-export([new/0, add_actor/4, has_actor/2,
publish/2, publish/3, %% /2 = first actor only
actor_log_tip/2, actor_state/2, ...]).
new() -> [{actors, []}, {next_actor_seq, 1}].
add_actor(ActorId, KeySpec, AS, State) -> {ok, NewState}.
publish(ActorId, Request, State) -> ... %% per-actor
```
`bootstrap:start/3` continues to work — it adds one actor named `alice`
to a fresh kernel — preserving every M1 test that uses the
single-actor entry point.
**Tests:**
- New kernel has no actors.
- add_actor + has_actor round-trip.
- Two actors maintain independent logs + sequences.
- publish/3 advances only the named actor's bucket.
- Concurrent gen_server-mediated publishes for different actors don't
serialise.
**Acceptance:** `bash next/tests/nx_kernel_multi.sh` passes 12+ cases.
---
## Step 2 — Actor lifecycle activities
Per design §9.1, an actor is a Person, Service, or Group object,
created by `Create{Person{...}}`. The kernel needs to fold this into
an actor-state projection that downstream code can read for keys,
publicKey rotation history, profile fields, follower counts, etc.
**Deliverables:**
- Genesis additions: `DefineObject{Person}` / `DefineObject{Service}` /
`DefineObject{Group}` — three object-type SX files.
- Actor-state projection fold (Erlang-fun stand-in, mirrors Step 5d-pure):
- On `Create{Person|Service|Group}`: register the actor's profile.
- On `Update{Person, patch}`: apply patch.
- On `Move`: record `:movedTo` pointer.
- `nx_kernel:bootstrap_actor/4(ActorId, Profile, KeySpec, State)`
publishes `Create{Person{...}}` as the actor's first activity,
bootstrapping their own log.
**Tests:**
- `Create{Person}` registers the actor.
- Two actors created via lifecycle activities have independent state.
- Profile updates apply.
**Acceptance:** `bash next/tests/actor_lifecycle.sh` passes 10+ cases.
---
## Step 3 — Key rotation via Update + actor-state
Per §9.2: rotation is itself an activity. The actor-state projection
keeps the full key history (with `created` / `superseded_at`) so
`envelope:verify_signature/2` continues to find historical keys when
verifying activities published before the rotation.
**Deliverables:**
- Update fold extension: `Update{Person, patch: {add_publicKey: K, supersede: {OldId, NewId}}}`.
- A `key-history` view on actor-state.
- `envelope:verify_signature/2` already does time-aware lookup (M1
§Step 2c); confirm it works against the projection-driven actor-state.
**Tests:**
- Rotation publishes a new key; old key marked superseded.
- Pre-rotation activities verify against the old key.
- Post-rotation activities verify against the new key.
- A rotation activity must itself be signed by an active key with
appropriate purpose (`sign-activity` or `rotate-key`).
**Acceptance:** `bash next/tests/key_rotation.sh` passes 12+ cases.
---
## Step 4 — Multi-actor HTTP routing
Per-actor URLs per design §16.1:
```
GET /actors/<id> # actor doc
GET /actors/<id>/outbox # OrderedCollection
GET /actors/<id>/outbox?page=N # page
POST /actors/<id>/inbox # peer delivery to this actor
GET /actors/<id>/followers # follower list
GET /actors/<id>/following # following list
POST /activity # authenticated publisher API (existing)
```
`POST /activity` still picks the publishing actor from the bearer
token; the token now maps to an `:actor_id` rather than a fixed `alice`.
**Deliverables:**
- New route prefixes: `/actors/<id>/inbox`, `/actors/<id>/followers`,
`/actors/<id>/following`.
- `http_server:route/3` (Cfg → Cfg+Kernel) so handlers can look up
actor state.
- Cfg's `:publish_token` becomes `:tokens => #{Token => ActorId}` map.
- `cid_response_for/2` already format-aware; per-actor outbox listing
uses the same machinery.
**Tests:**
- GET /actors/alice → 200 with actor doc.
- GET /actors/unknown → 404.
- POST /activity with alice's token publishes to alice.
- POST /activity with bob's token publishes to bob.
- Two actors' outboxes are independent.
**Acceptance:** `bash next/tests/http_multi_actor.sh` passes 14+ cases.
---
## Step 5 — POST /inbox: signature verify + ingestion
The receiving side of federation. A peer instance POSTs a signed activity
to `/actors/<id>/inbox`; the kernel verifies the signature, runs the
inbound validation pipeline, appends to the receiving actor's log
(separate from outbox — the inbox is its own log for activities the
actor *received*), and broadcasts to projections.
**Deliverables:**
- New per-actor log: `actor_inbox`. Same shape as outbox; activities
marked `:received_from => PeerActorId`.
- Inbound pipeline: `stage_envelope``stage_signature` (against
peer's actor-state, not local) → `stage_replay`.
- Peer signature verification needs `:public_keys` from the peer's
actor-state. v2 fetches the peer's actor doc lazily on first
contact, caches it in a `peer-actors` projection. Stale-key
invalidation deferred to v3.
- HTTP handler: `POST /actors/<id>/inbox` returns 202 on accept,
401 on bad sig, 422 on replay or validation failure.
**Tests:**
- POST /inbox with valid signed activity → 202, activity in inbox log.
- POST /inbox with tampered envelope → 401.
- POST /inbox with unknown actor target → 404.
- POST /inbox with replay → 422.
- Activity broadcast to receiving actor's projections.
**Acceptance:** `bash next/tests/inbox.sh` passes 16+ cases.
---
## Step 6 — Follow lifecycle
Per §13.2:
```sx
(activity 'Follow ;; from A → B
:object actor-id-B
:to (list actor-id-B))
```
B responds with `Accept` (or `Reject`); A's follower-graph projection
tracks the state. `Undo{Follow}` reverses it.
**Deliverables:**
- New activity-types (runtime via DefineActivity, ideally):
Follow, Accept, Reject, Undo.
- Follower-graph projection (Erlang-fun stand-in): tracks
`{ActorId => #{following => [PeerId], followers => [PeerId],
pending_outbound => [PeerId], pending_inbound => [PeerId]}}`.
- Accept-handling fold logic: when A receives `Accept{Follow A→B}`,
move B from `pending_outbound` to `following`.
- Reciprocal: when B receives `Follow A→B`, automatically queue an
outbound `Accept` (auto-accept policy; manual moderation v3).
**Tests:**
- Follow → 202; sender's pending_outbound includes target.
- Auto-Accept on receiving Follow; both sides' graphs update.
- Reject leaves no following relationship.
- Undo{Follow} removes the following.
- Self-follow rejected.
**Acceptance:** `bash next/tests/follow_lifecycle.sh` passes 14+ cases.
---
## Step 7 — Audience-resolving delivery set
For each outbound activity, compute the set of inbox URLs to POST to.
Sources: explicit `:to` + `:cc` recipients, plus `Public` / `Followers`
expansion via the audience predicates from M1's genesis bundle.
**Deliverables:**
- `outbox:delivery_set/2(Activity, KernelState) -> [InboxUrl]`.
- Public expansion: every known peer instance's shared inbox (or every
follower of the publishing actor — both modes supported).
- Followers expansion: follower-graph lookup.
- Self-delivery suppression (don't POST to your own inbox).
- Returns a list of `{PeerInstanceUrl, ActorId}` tuples.
**Tests:**
- Activity with `:to: [bob]` → delivery set is bob's inbox.
- Activity with `:to: [Followers]` → set is current followers' inboxes.
- Activity with `:to: [Public]` → set is public reach.
- Self-deliveries excluded.
- Empty audience → empty set.
**Acceptance:** `bash next/tests/delivery_set.sh` passes 12+ cases.
---
## Step 8 — Outbound delivery queue
Per §13.4: every queued delivery has retry semantics. v2 uses one
gen_server-per-peer-instance worker holding a small queue. Failures
back off exponentially; permanent failures (HTTP 410, bad TLS) move to
a dead-letter list visible via `/admin/dead-letter`.
**Deliverables:**
- `delivery_worker.erl`: gen_server per-peer queue with `enqueue/2`
and a private retry loop.
- Backoff schedule: 30s / 5m / 30m / 6h / 24h then dead-letter.
- Delivery state stored as a projection (`delivery-state`) so it
survives kernel restarts.
- `outbox:publish/2` augmented: after `log:append`, dispatch to the
delivery worker for each delivery-set entry.
- HTTP client: extend the existing native httpc primitive to
carry signed envelope bytes + the right Content-Type.
**Tests:**
- Successful delivery → worker queue empties.
- Failed delivery → backoff schedule respected.
- Dead-letter after max attempts.
- Cross-restart: queue restored from delivery-state projection.
- Concurrent deliveries to multiple peers don't serialise.
**Acceptance:** `bash next/tests/delivery_queue.sh` passes 16+ cases.
---
## Step 9 — Backfill on Follow accept
Per §13.3: A wants B's history when A first follows B. Four modes:
| Mode | Behavior |
|-----------|---------------------------------------------|
| `none` | New follower sees only forward-going content |
| `last-N` | Backfill last N activities |
| `last-T` | Backfill last T duration of activities |
| `full` | Backfill entire outbox |
**Deliverables:**
- Follow activity may carry `:backfill {:mode :last-N :limit 100}`.
- On Accept, B's outbox is GET-paged with appropriate filters.
- `GET /actors/<id>/outbox?since=Cid&limit=N` returns a paged response.
- Backfill bodies wrap the original activities in `:backfilled true`
so projections can decide whether to re-fold or skip.
**Tests:**
- `last-N` mode delivers exactly N most-recent activities.
- `last-T` mode delivers everything published since `now - T`.
- `full` mode delivers everything, page by page.
- `none` mode delivers nothing.
- Backfilled activities preserve original `:id` (CID).
**Acceptance:** `bash next/tests/backfill.sh` passes 12+ cases.
---
## Step 10 — Discovery
Per §13.7: webfinger plus actor doc fetch.
**Deliverables:**
- `GET /.well-known/webfinger?resource=acct:alice@<host>` returns the
actor URL.
- `GET /actors/<id>` returns the actor doc (already exists from
M1 Step 8c-actors).
- Peer-actor cache: when verifying a peer's signature for the first
time, fetch their actor doc, store in `peer-actors` projection.
- `discovery:resolve/1("acct:alice@host:port")` returns the actor URL.
**Tests:**
- Webfinger for known actor → 200 with `links[].href`.
- Webfinger for unknown → 404.
- Cross-instance: A resolves an acct on B → fetch succeeds.
- Actor-doc fetch caches the result.
- Cache invalidation on key rotation (v3 — for now, no TTL).
**Acceptance:** `bash next/tests/discovery.sh` passes 12+ cases.
---
## Step 11 — Rich verbs as runtime artifacts
Per the verb-extensibility proof point (M1 §9a), new verbs land as
`DefineActivity` artifacts published into the genesis-equivalent boot
log, not as kernel code changes. v2 adds:
| Verb | Object shape | Use case |
|---------|---------------------------------------|---------------------------------------|
| `Note` | `{content, tags?}` | Short authored message |
| `Announce` | `{object: <ActivityCid>}` | Propagate a peer's activity to followers |
| `Endorse` | `{object: <Cid>, kind: like|share}` | Cross-actor signaling |
Announce is the critical one for federation — it lets one actor
re-broadcast another actor's content to their own followers.
**Deliverables:**
- Three new SX files in a `next/genesis/runtime-verbs/` directory.
- Each is shipped to a fresh instance via a bootstrap manifest entry
*or* published as the first activity on the actor's outbox; either
works because of the verb-extensibility mechanism.
- Announce-specific delivery: the announced activity's CID is included
in the Announce; followers can re-fetch the referenced activity from
the original instance if their projection wants to fold the body.
**Tests:**
- Define + publish Note works end-to-end.
- Define + publish Announce wraps another activity by CID.
- Announce delivery: A announces B's Note; A's followers see the
Announce; their `feed` projection optionally fetches the wrapped Note.
- Endorse increments an endorsement counter on the target Activity.
- Verb registration is observable in the `define-registry` projection.
**Acceptance:** `bash next/tests/rich_verbs.sh` passes 14+ cases.
---
## Step 12 — Two-instance smoke test
**The proof point.** `next/tests/smoke_federate.sh` spins up two kernel
instances on distinct ports, walks them through the full federation
flow, and exits 0.
**Test outline:**
```bash
# 0. Start two instances: A on 9999, B on 9998
./next/scripts/start_pair.sh
# 1. Bootstrap two actors: alice@A, bob@B
curl -X POST :9999/activity \
-H "Authorization: Bearer $TOKEN_A" \
-d '{"type":"Create","object":{"type":"Person","name":"alice"}}'
curl -X POST :9998/activity \
-H "Authorization: Bearer $TOKEN_B" \
-d '{"type":"Create","object":{"type":"Person","name":"bob"}}'
# 2. alice@A discovers bob@B via webfinger
curl :9999/.well-known/webfinger?resource=acct:bob@localhost:9998
# 3. alice follows bob
curl -X POST :9999/activity \
-d '{"type":"Follow","object":"http://localhost:9998/actors/bob"}'
# 4. Expect alice's follower-graph: pending_outbound includes bob
curl :9999/actors/alice/following | jq -e '.[] | select(.id == "bob")'
# 5. Expect bob auto-accepts; alice's pending_outbound clears
sleep 1
curl :9999/actors/alice/following | jq -e '.[] | select(.id == "bob")'
# 6. bob publishes a Note
curl -X POST :9998/activity -d '{"type":"Create","object":{"type":"Note","content":"hi"}}'
# 7. alice's inbox receives the Note
sleep 1
curl :9999/actors/alice/inbox?page=true | jq -e '.orderedItems[] | .type == "Create" and .object.type == "Note"'
# 8. alice's actor-state projection has the new Note
curl :9999/projections/feed | jq -e ". | length > 0"
# 9. Key rotation: bob rotates keys
curl -X POST :9998/activity -d '{"type":"Update","object":"bob","patch":{...}}'
# 10. alice still verifies older Notes against the old key
# (via actor-state's key history)
# 11. Announce: alice announces bob's Note
curl -X POST :9999/activity -d '{"type":"Announce","object":"<bob-note-cid>"}'
# 12. Verify Announce delivers to alice's followers (zero in v1 but
# the activity should be in alice's outbox)
# 13. Shutdown both instances; restart; verify state survives
./next/scripts/stop_pair.sh
./next/scripts/start_pair.sh
curl :9999/actors/alice/following | jq -e '.[] | select(.id == "bob")'
```
**Acceptance for Step 12:** `smoke_federate.sh` exits 0. The full flow
runs without any human-in-the-loop coordination, both instances'
projections converge, and a restart preserves all federation state.
---
## Acceptance criteria for milestone 2
All of:
1. **Each step's test suite passes** (`bash next/tests/<step>.sh`).
2. **The federation smoke test passes** (`bash next/tests/smoke_federate.sh`).
3. **Milestone 1 baseline preserved** — the entire M1 test suite still
passes (~560 assertions across 50 suites).
4. **Erlang-on-SX conformance** — adding multi-actor + federation kernel
code in `next/kernel/*.erl` doesn't break Phase 1-8 conformance
(currently 761/761).
5. **Restart durability** — kill both instances mid-delivery, restart,
queues resume, projections converge, no log corruption.
6. **Manual real Mastodon poke** — point a Mastodon account at
`https://next-A.rose-ash.com/actors/alice` and verify the actor
doc fetches. (Read-only AP interop only — Mastodon Follow is v3
gating on HTTP-Signatures-2018 compat.)
## What lands when
Steps 1-3 are sequential (multi-actor foundation). Steps 4-10 are
mostly sequential within the federation core but some can parallelise:
4-6 are sequential; 7-9 can interleave after 6 lands.
```
M1 closeout (HEAD) ──┐
┌─── Step 1 ──┬─── Step 2 ──┬─── Step 3
│ │ │
└─────────────┼─── Step 4 ──┘
└─── Step 5 ────┐
Step 6 ───┤
Step 7 ───┤
Step 8 ───┤
Step 9 ───┤
Step 10 ──┤
Step 11 ──┤
Step 12 ──┘
```
Estimated effort: ~40-60 commits across all 12 steps. A focused agent
loop (`loops/fed-sx-m2`) should be able to land this with the same
discipline as M1.
## What's deferred to milestone 3
- **rose-ash port** (the headline of M3). Blog, market, events,
federation hub, account, orders — all delivered as fed-sx
applications. Each existing rose-ash domain becomes
`DefineApplication{...}` artifacts.
- **TLS / HTTP-Signatures-2018 / RFC 9421**. Real Mastodon interop.
- **Multi-instance over real WAN.** Cross-instance over TLS, NAT
traversal, peer instance allowlists.
- **IPFS / S3 storage backends** as `DefineStorage` entries.
- **Browser client + operator dashboard.** Probably Elm-on-SX.
- **Cross-host conformance** — Python / JS / Haskell hosts running
fed-sx with the same conformance corpus.
- **OpenTimestamps proofs** as `DefineProof` entries.
- **Reputation, allowlists, rate-limiting** — full §13.6 abuse
posture.
- **Performance work** — JIT-compiled folds, snapshot acceleration,
federation batching, mailbox prioritisation.
- **Capability tokens / delegation** — multi-device for a single
actor.
---
## Appendix A: open questions for milestone 2
Things still under-specified; resolve as work begins.
1. **Inbox-side stage_signature key fetching.** When A receives a
POST /inbox from peer instance B for the first time, A needs B's
actor doc to verify the signature. Synchronous fetch vs. queue-
and-retry? Synchronous is simpler but blocks the inbox handler;
queue-and-retry needs deferred validation state. Probably
synchronous with a 5s timeout for v2.
2. **Backfill granularity for `last-N`.** N counts forward (oldest
first) or backward (newest first)? Forward matches projection-fold
semantics; backward matches user expectation. Probably forward
for v2, document the choice.
3. **Auto-Accept policy on Follow.** v2 ships open-world: every
Follow is auto-accepted. Manual moderation (held in a `pending`
list, accepted via /admin/) is v3 with the operator dashboard.
4. **Delivery worker per peer instance vs. per peer actor.** Per
instance is simpler (one HTTPS connection pool) but throttles
inter-actor bandwidth on busy peers. v2 starts with per-instance;
per-actor sharding is a perf tweak in §15.
5. **Two-instance test harness.** How do we start a pair of kernels
in one bash test? Probably `bootstrap:start/3` twice with different
ActorIds + ports + base paths. Need to confirm `nx_kernel` can be
started under different registered atoms (`nx_kernel_a`, `nx_kernel_b`)
for the test. Process registration in this port supports arbitrary
atom names (verified in M1).
6. **Multi-host conformance.** Adding cross-host tests for federation
requires Python/JS hosts to implement the v2 spec corpus too.
Deferred to v3; v2 conformance is one-host only.
7. **Storage of received activities.** When A receives a Note from B
via /inbox, does A keep B's signed envelope verbatim (for re-broadcast
on Announce), or does A re-construct + re-sign with A's own key?
AP-canon: keep verbatim. Confirm at Step 5.