Files
rose-ash/plans/fed-sx-design.md

2639 lines
124 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# fed-sx — Federated SX Activity Substrate
A federated, content-addressed, extensible application substrate where the unit of
computation is a signed activity, the unit of state is a pure SX projection over the
activity log, and the substrate's own extensibility (new verbs, new object types, new
projections, new validators) is itself published through the same mechanism.
Status: **design** — not yet implemented. Target subdomain: `next.rose-ash.com`.
Target location in repo: `next/` (new top-level dir, sibling to `blog/`, `market/`,
etc.). Stack: pure SX-on-OCaml. Implementation language(s) to be chosen after design
is complete.
---
## 1. Premise
ActivityPub's data model — actors, signed activities, inboxes/outboxes — generalises
beyond social posting to any domain where state evolves via signed messages. fed-sx
takes that generalisation seriously:
- The unit of communication is a **signed AP activity**.
- The unit of content is an **AP object**, content-addressed by **CID** (multihash +
multicodec, default `dag-cbor` over the parsed SX AST).
- State is the **deterministic fold** of pure SX functions over the activity log.
- The substrate is **self-extending**: new activity types, object types, projections,
validators, codecs, transports, and signature suites are themselves published as
`Define*` activities — federated like any other content.
Three commitments make the rest fall into place:
1. **The kernel is dumb.** It only knows envelope shape, signature verification,
append-to-log, fetch-by-id, transport in/out. It does not know what `Create` or
`Pin` *mean*.
2. **Everything else is registry-driven.** Verbs, object types, validators, projections,
codecs, transports, audiences, proofs, sig suites — all looked up in registries the
kernel calls into.
3. **The registries are themselves publishable.** New entries arrive as `Define*`
activities. Bootstrap registries load from a known set of CIDs at startup; everything
else is replayed from the log.
Result: the only code that ever needs to change in the kernel is the envelope itself.
New verbs = published SX, federated like any other artifact.
---
## 2. CIDs and content addressing
Every artifact has a CID. Default codec is **dag-cbor** over the parsed SX AST (not
the raw text). This buys:
- **Sub-AST addressing for free.** Each nested structure has an implicit CID; IPLD can
walk paths like `<file-cid>/components/card`. The "file CID *and* component CID"
question dissolves: every node is a CID, you choose the granularity at reference
time.
- **Polyglot canonicalization.** JS, OCaml, Python only need to agree on AST shape +
CBOR's deterministic encoding (RFC 8949 §4.2.1). No byte-identical pretty-printer
required across hosts.
- **Format immunity.** Reformatting, indent changes, equivalent-form normalisations
do not change the CID.
- **Tooling fit.** sx-tree already has the parsed form in memory; computing or
verifying a CID is just an encode + hash.
Costs accepted:
- One spec to maintain: SX↔CBOR mapping (number → CBOR int/float, string → text,
symbol → tag, keyword → tag, list → array, dict → map). ~50 lines of code per host.
- Author's exact source text is not preserved; re-pretty-print on fetch.
- "Why don't these CIDs match" requires comparing CBOR (a `cid-explain` tool helps).
The CID format itself is multicodec-agile: the substrate also accepts `raw`,
`dag-json`, `dag-pb`, etc. when seen, dispatched via the codec registry.
---
## 3. Kernel surface (fixed — get this right)
The kernel is the only thing that's hard to change later. Everything else is in
registries. Two envelope shapes plus five operations.
### 3.1 Activity envelope
```
{ id, type, actor, published,
to, cc, audience-extras,
object | target | origin | result, # AP slots, opaque to kernel
capabilities-required: [...], # so receivers can refuse cleanly
proofs: [...], # OTS, on-chain, multi-sig — all opaque
signature: { key-id, algorithm, value, covered-fields } }
```
### 3.2 Object envelope
```
{ id, type, cid, media-type,
where: inline | cid | url,
content?, link? } # only one populated based on `where`
```
### 3.3 Kernel verbs
The only verbs implemented directly by the kernel:
- **Append signed activity** to outbox (after envelope check + sig verify + validator
pipeline).
- **Verify signature** against actor's published keys, time-aware (which key was
active at `published`).
- **Fetch** by `id` or by `cid`.
- **Receive at inbox** (verify + dispatch to registered handlers).
- **Replay log** to rebuild registries on boot.
Everything else is registry-resolved.
---
## 4. Registries
Each registry has a default-populated set (loaded from genesis-bundled CIDs) and
accepts new entries via `Define*` activities. Default entries themselves are SX
artifacts — versioning, audit, replacement work the same way as user content.
| Registry | Bootstrap defaults | Extended by |
|----------|-------------------|-------------|
| **Activity types** | `Create`, `Update`, `Delete`, `Announce` | `DefineActivity{type, schema-sx, semantics-sx}` |
| **Object types** | `SXArtifact`, `Note`, `Image`, `Tombstone` | `DefineObject{type, schema-sx, render-hint}` |
| **Validators** | envelope shape, signature, type-schema | `DefineValidator{applies-to, predicate-sx}` |
| **Projections** | identity, by-type, by-cid, by-actor, actor-state, define-registry, audience-graph, by-object | `DefineProjection{name, fold-sx, query-sx}` |
| **Codecs** | dag-cbor, raw, dag-json | `DefineCodec{multicodec, encode-sx, decode-sx}` |
| **Hash algorithms** | sha2-256 | multihash table — agile by spec |
| **Transports** | http-inbox-push | `DefineTransport{name, deliver-sx, receive-sx}` |
| **Audience predicates** | `Public`, `Followers`, direct | `DefineAudience{name, member-of-sx}` |
| **Subscription types** | `Follow` (AP-standard) | `DefineSubscription{name, schema-sx, match-sx, delivery}` |
| **Proof types** | (none) | `DefineProof{type, attach-sx, verify-sx}` |
| **Storage backends** | files-on-disk | `DefineStorage{where-tag, put-sx, get-sx}` |
| **Triggers** | (none) | `DefineTrigger{when-subscription, then-sx, cascade-limit}` |
| **Signature suites** | rsa-sha256 (AP-compatible) | `DefineSigSuite{name, sign-sx, verify-sx}` |
| **Application bundles** | (none) | `DefineApplication{name, subscriptions, triggers, projections, storage}` |
Adding `Pin`, `Endorse`, `Supersede`, `Test`, `Build`, `Compose`, etc. later is just
publishing `DefineActivity` artifacts — no kernel diff, no redeploy required if
registries are hot.
---
## 5. The meta-level
A `DefineActivity` is itself an AP `Create` activity over an `SXArtifact` of a
specific type:
```sx
(activity 'Create
:object {:type "DefineActivity"
:name "Pin"
:schema (fn (act)
(and (string? (-> act :object :path))
(cid? (-> act :object :cid))))
:semantics
'(fn (act state)
(assoc-in state [:pins (-> act :object :path)]
(-> act :object :cid)))})
```
When the kernel receives an activity with `type: "Pin"` it looks up the registered
semantics from a `DefineActivity{name: "Pin"}` artifact, runs the SX, projects the new
state. The semantics are themselves content-addressed and federated — every receiver
runs the same code.
Same pattern handles `DefineProjection`, `DefineValidator`, etc. The substrate is
genuinely self-extending.
---
## 6. Verbs
### 6.1 Bootstrap verbs (milestone 1)
The substrate exposes `POST /activity` (not `POST /publish`) — generalised entry
point that takes any well-formed AP activity, validates, signs, appends to outbox.
`(publish sx)` is sugar at the SX layer for `Create{SXArtifact}`.
Day-one verbs (cost ~zero once `/activity` exists):
- **`Create`** — the publish primitive.
- **`Update`** — supersede a previous activity (correct metadata, change a path
mapping). Distinct from "publishing new content" — new content is always a new
`Create` with a new CID.
- **`Delete`** — tombstone. AP-native; readers honour it.
- **`Announce`** — boost another actor's artifact into your outbox. Comes free.
- **`Subscribe`** — generalised subscription verb (parallel to publish/`Create`).
Wraps any registered `DefineSubscription` type. `Follow` is the standard AP
`Subscribe{Follow{actor: ...}}` for wire compatibility. See §18.
- **`Unsubscribe`** — `Undo` of a prior `Subscribe`. Same shape as AP
`Undo{Follow}`.
### 6.2 Custom verbs (designed-for, defined later)
Substrate accepts these from day one (any signed activity can be appended); semantics
projected once `DefineActivity` artifacts exist.
- **`Pin`** — assign `domain:path/name → CID`. The future name-resolution layer made
of activities. Each pin is signed; the resolver replays the outbox to compute current
state.
- **`Endorse`** (modelled on `Like`/`Approve`) — third-party signature on a CID.
Web-of-trust style code review without central authority.
- **`Supersede`** — "CID A replaces CID B". Stronger than `Update`; readers can chase
the chain.
- **`Test`** — published assertion that running CID A under conditions X yields result
Y. Test-as-artifact, federated.
- **`Build`** — links a source CID to a compiled-output CID, with provenance.
- **`Compose`** — derived artifact citing input CIDs. Provenance graph in the outbox
itself.
- **`Note`** (AP-native) — comments / reviews / discussion attached to a CID.
- **`Follow`** / **`Undo(Follow)`** — subscribe to another instance's outbox.
The pattern that matters: your outbox isn't just "things published," it's an
**append-only log of every assertion this actor makes about the SX universe.**
---
## 7. Capability discovery
Two pieces:
- **`GET /.well-known/sx-capabilities`** — JSON listing every registered activity-type,
object-type, codec, transport, sig-suite, proof-type. Each with the CID of the
`Define*` artifact that introduced it. Peers can diff capabilities before federating.
- **`capabilities-required`** field on activities — sender declares "this needs `Pin`
semantics + `dag-cbor` codec." Receivers without those capabilities return a clean
422 referencing the missing CIDs; sender knows whether to replay-and-deliver the
bootstrapping `Define*` artifacts first.
Federation degrades gracefully across instances at different versions.
---
## 8. Axes of flexibility (all designed-for)
1. **Object types** beyond SXArtifact — `Note`, `Article`, `Image`, `Video`, `Question`,
`Event`, etc. via the object-type registry.
2. **Storage tier per-object**`where: inline | cid | url`. Tiny things inline; big
things to IPFS; legacy stuff URL-linked. Migrating storage backends doesn't migrate
the substrate.
3. **Multihash + multicodec agility** — sha2-256 + dag-cbor by default; substrate
accepts blake3, raw, dag-json, dag-pb, etc.
4. **Multi-key actors**`publicKeys` array always; per-key `purpose`; multiple key
types (RSA for AP wire compat, Ed25519 modern). See §9.
5. **Audience / visibility** — AP-native `to`, `cc`, `bto`, `bcc`. Public, followers,
direct, unlisted. Custom audiences via `DefineAudience`.
6. **Outbox-as-database** — no source-of-truth other than the log. Projections are
recomputable views.
7. **Programmable activities** — activities can carry SX. Reactive federation,
conditional pins, automated propose/test/release pipelines, all expressed as AP
activities.
8. **Federation transport pluggable** — outbox is canonical; how peers exchange is
pluggable (HTTP push, pull, libp2p, polling).
9. **Optional timestamp proofs** — every activity has an attachable `proofs` slot.
OpenTimestamps, on-chain merkle commit, third-party TSA all slot in without changing
activity semantics.
Explicitly **not** pursuing for MVP:
- Schema-version negotiation (premature; `@context` handles extension).
- Configurable conflict-resolution per actor (last-signed-wins, log preserved for
audit).
- Verb-specific kernel handlers (other than `Create`'s "compute CID, store body").
---
## 9. Identity & actor lifecycle
### 9.1 Actor doc shape
```jsonld
{
"@context": ["https://www.w3.org/ns/activitystreams",
"https://w3id.org/security/v1",
"https://next.rose-ash.com/ns/fed-sx/v1"],
"type": "Person", // or Service, Group, Application
"id": "https://next.rose-ash.com/actors/giles",
"preferredUsername": "giles",
"inbox": "https://next.rose-ash.com/actors/giles/inbox",
"outbox": "https://next.rose-ash.com/actors/giles/outbox",
"followers": "...",
"following": "...",
"publicKeys": [ // ARRAY from day one — never `publicKey`
{ "id": "...#key-2026-05",
"type": "RsaVerificationKey2018",
"owner": "<actor-id>",
"publicKeyPem": "...",
"purpose": ["sign-activity", "sign-http"],
"created": "2026-05-14T...",
"expires": null,
"supersedes": null,
"supersededBy": null },
{ "id": "...#key-ed25519-2026-05",
"type": "Ed25519VerificationKey2020",
"owner": "<actor-id>",
"publicKeyMultibase": "z6Mk...",
"purpose": ["sign-activity"],
"created": "2026-05-14T..." }
],
"capabilities": "https://.../actors/giles/capabilities", // what verbs they speak
"alsoKnownAs": ["did:web:rose-ash.com:giles", ...], // bridge to DID, AP migration
"movedTo": null // set on Move
}
```
Key shape decisions:
- **`publicKeys` array always.** Single-key actors have an array of length 1. AP
standard `publicKey` is *also* served as the first array element for back-compat
with vanilla AP servers (Mastodon etc. ignore the array).
- **Per-key `purpose`** — separates signing weight. Day-to-day publish key vs. high-
value key for `Pin`/`Endorse` vs. delegated machine key. Validators can require
specific purposes per activity type (registry-driven).
- **Multiple key types** — RSA for AP wire compat, Ed25519 for everything else
(smaller, faster, modern). Sig suite registry decides which suites are accepted.
- **`supersedes` / `supersededBy`** — keys form a chain, not a snapshot. Old activities
still verify against historical keys.
### 9.2 Key rotation
Key rotation is itself an activity, signed by the *old* key (or a recovery key):
```sx
(activity 'Update
:object actor-id
:patch {:add-publicKey new-key
:supersede {old-key-id new-key-id}})
```
Kernel:
1. Fetches actor's current state (a projection over their own outbox).
2. Verifies activity is signed by a key with `purpose: rotate-key` (or any active key,
if registry allows).
3. Appends. The actor-state projection now has the new key.
Old activities still verify because the projection retains the historical key with
`supersededBy` set — sig verification looks up "what keys were active at activity
timestamp T."
### 9.3 Key recovery / loss
- **Recovery key** — separate key at actor creation, never used except to rotate.
Stored offline. `purpose: ["recover"]`. Validator allows
`Update{actor, patch: rotate-all-keys}` if signed by a recovery key.
- **Social recovery** — designate N trusted actors, M-of-N can co-sign a recovery
`Update`. Implemented as a `DefineValidator` extension; multi-sig slot in `proofs`
makes it possible without changing the envelope.
- **Total loss** — if both signing and recovery keys are gone, the actor is dead.
They publish a new actor with `alsoKnownAs: <old-actor-id>` from a fresh key.
Followers can choose to re-follow but there's no cryptographic continuity.
### 9.4 Migration (`Move`)
AP-native:
```sx
(activity 'Move
:object old-actor-id
:target new-actor-id)
```
Receivers update their follow lists. New actor's `alsoKnownAs` must include old
actor — bidirectional handshake prevents hijacking.
For fed-sx, `Move` should also carry an outbox migration hint (CID of an export bundle)
so receivers can re-anchor projections without re-fetching activity-by-activity.
### 9.5 Subordinate actors / delegation
Two patterns supported:
- **Service actors** (AP-native `type: Service`): bots, build servers, test runners.
Their own keys, their own outboxes, but `attributedTo` a parent actor.
- **Capability tokens**: parent publishes `Authorize{actor: child, capabilities: [...],
expires: ...}` signed by parent. Child publishes activities normally with their own
key; receivers verify the capability chain when child invokes an authority they don't
own outright. Useful for: temporary publish access, delegated `Pin` rights for a
specific path prefix, multi-device.
Both work *without* new kernel mechanism — just activities.
### 9.6 Implications
- **Sig verification is timestamp-aware.** Verifying an old activity needs the key
state at the time it was published — actor-state projection must support time-travel
queries.
- **Inbox doesn't trust `keyId` blindly.** Fetches actor doc, projects current key
state, checks key was valid at `published`.
- **Cross-instance identity via `alsoKnownAs` and DIDs.** Don't depend on DIDs but
slot them in for Bluesky-bridge, Solid-bridge, etc.
---
## 10. Projection model
The architectural commitment: **state is what you get when you fold pure SX over the
log.** No DB-of-record. Everything queryable is a projection.
### 10.1 What a projection is
A `DefineProjection` activity registers four things:
```sx
(activity 'Create
:object {:type "DefineProjection"
:name "actor-state"
:initial-state {} ; pure SX value
:fold (fn (state activity) ; pure SX
(case (:type activity)
"Create" (when (= "Person" (-> activity :object :type))
(assoc state (:id activity) (:object activity)))
"Update" (apply-patch state activity)
"Move" (set-moved state activity)
state))
:snapshot-codec "dag-cbor"
:indexes [{:by :id} {:by :preferredUsername}]})
```
- **`name`** — query handle. Unique per actor; collisions resolved by CID + supersession.
- **`initial-state`** — pure SX value used as state-zero.
- **`fold`** — pure SX function `(state activity) → state`. The only thing the kernel
calls.
- **`indexes`** — optional hint for materializing lookup paths.
The CID of the `DefineProjection` artifact is the projection's identity. Two instances
running the same projection are running the same CID's `fold` over the same log slice
— equivalence is decidable.
### 10.2 The fold contract — purity, determinism, gas
The fold function must be **pure and deterministic**. Non-negotiable; it's what makes
cross-instance equivalence and replay possible.
- **No IO.** No HTTP, no file access, no DB calls, no clock. The activity carries its
own `published` timestamp.
- **No randomness.** No host-seeded PRNG. (If pseudo-randomness is needed, seed from
the activity's CID — deterministic across hosts.)
- **No mutation outside the returned state.**
- **Bounded execution.** Each fold call gets a gas budget (default tunable, e.g. 100k
CEK steps). Exceeding it is a hard failure.
Enforced at the SX evaluator level by running folds in a sandboxed environment with
the IO platform stripped to nothing. Same sandbox model applies to validators and
trigger semantics.
**Cross-host equivalence guarantee:** for the same projection CID + same activity log
slice, every conforming SX host (JS, OCaml, Python, Haskell-on-SX, …) must produce a
state value with the same canonical CID. Tested via the spec test suite.
### 10.3 Bootstrap projections
The kernel cannot start without some projections, because the kernel itself uses them.
Baked into the genesis bundle (see §11), superseded only by deliberate kernel-version
upgrades.
| Projection | What it computes | Used by |
|------------|------------------|---------|
| `activity-log` | Identity — every activity, indexed by id and CID | Everything |
| `by-type` | `type → ordered list of activity-CIDs` | Most queries |
| `by-actor` | `actor-id → ordered list of activity-CIDs` | Per-actor outbox view |
| `by-object` | `object-CID → list of referencing activity-CIDs` | "Who pinned this?" |
| `actor-state` | `actor-id → current actor doc with key history` | Sig verification (kernel) |
| `define-registry` | `kind+name → currently-active Define* CID` | All other Define* lookups |
| `audience-graph` | `actor → followers/following` | Federation push |
`define-registry` is the bootstrap chicken-and-egg: it's the projection that knows
which projections (and validators, codecs, etc.) are currently active. Kernel ships
with it hardcoded; once running, every other projection (including a future replacement
of `define-registry` itself) is a regular `DefineProjection` superseding it.
### 10.4 Snapshotting
Replaying the entire log on every restart is unacceptable past day one.
- **Snapshot = `(activity-tip-CID, projection-state, projection-CID)` tuple,**
dag-cbor encoded, content-addressed.
- **Snapshot rule** — every K activities (default 1000) and every T seconds (default
60), serialize, hash, store on disk.
- **Resume** — on startup, find latest snapshot for each (projection-CID, log-tip),
load state, fold forward.
- **Snapshot CID is verifiable** — anyone with the same log slice and projection-CID
can recompute and check the CID matches. This is the cross-instance agreement proof.
Snapshots are themselves publishable as activities (`Create{Snapshot}`): an instance
can publish "here's my computed state for projection X at log-tip Y, CID Z." Other
instances can fetch and use as a starting point. **Federated state sharing falls out of
federated activities.**
Snapshots are pruning-friendly: keep latest + snapshots referenced by published
`Create{Snapshot}` activities; everything else is GC-able.
### 10.5 Reprojection on definition change
When `DefineProjection{name: "actor-state"}` is superseded by a new CID with a
different fold:
1. `define-registry` projection sees the supersession; its state advances.
2. New projection materialized **alongside** the old one — both kept live during
migration.
3. New projection runs in catch-up mode: replay from genesis (or from deepest
compatible snapshot).
4. When new projection catches up to log tip, queries cut over. Old projection state
can be retired.
5. Snapshots of old version stay around as long as referenced (e.g. for time-travel
queries against historical state under old semantics).
Changing a projection definition is **safe and online**. Cost: temporary state
duplication during catch-up. Slow folds → slow migrations, but never breakage.
For projections too expensive to fully reproject, `Update{DefineProjection}` can
declare `migrationHint: <fn from old-state to new-state>` — opt-in, used at migrator's
risk.
### 10.6 Time-travel queries
Folds are deterministic functions of `(initial-state, activity-list-prefix)`.
Time-travel is fold-up-to:
- `state-as-of(projection, activity-id-or-timestamp)` → walk to requested point,
return state.
- Snapshots act as accelerators (resume from nearest snapshot ≤ target).
- Used by sig verification ("what keys did this actor have when this activity was
signed?"), audit, "what did we believe last Tuesday."
### 10.7 Projection composition
**Projections do not directly read each other's state during folding.** Preserves
locality and parallelism — every projection runs independently against the same log.
Composition via:
- **Query time** — `(query (projection actor-state) ...)` joins are SX expressions
over multiple projection states.
- **Republishing as activities** — a projection that exposes its state as input to
others publishes `Create{Snapshot}` periodically. Downstream projections fold over
those.
Direct cross-projection reads during fold introduce ordering, cycles, cache-
invalidation problems we don't need.
### 10.8 Querying
Three layers:
- **Raw projection state** — `GET /projections/<name>?at=<timestamp>` returns dag-cbor
(also JSON for tooling). Large states paginated by index.
- **SX queries** — `POST /query` with an SX expression that runs against one or more
projection states in pure mode. Equivalent to Datalog/GraphQL.
- **Materialized indexes** — declared on projection (`indexes:` field). Kernel
maintains as side-tables for `O(log n)` lookup.
Real-time: clients `GET /projections/<name>/subscribe` (SSE), receive deltas as
activities land. Delta is `(old-state, new-state, applied-activity-CID)`; clients can
verify by re-folding.
### 10.9 Lag, async, concurrency
- **Append is sync; projection is async.** `POST /activity` returns once activity is
durably in the log. Projections run in a separate worker pool; query results carry
`projected-up-to` so callers know whether the latest write is visible.
- **One worker per projection.** Folds are sequential, but projections run in parallel
with each other.
- **Sync option** — `POST /activity?wait-for=projection-name` blocks until the named
projection has folded the new activity. Use sparingly.
### 10.10 Failure modes
| Failure | Response |
|---------|----------|
| **Gas exhaustion** | Activity tagged `projection-failed` for this projection. State unchanged. Operator alert. |
| **SX runtime error** (assertion, type mismatch) | Same as gas: activity skipped, error logged, state unchanged. |
| **Schema violation** | Caught earlier in validation pipeline, never reaches projection. |
The log itself is always written successfully if it passes envelope + signature +
validator checks. Projection failures don't gate appending — that would couple writes
to arbitrary user-defined code.
### 10.11 Operational implications
- **Projection determinism is the linchpin.** If JS and OCaml ever produce different
state for the same log + projection, federation cracks. Spec test suite must cover
projection equivalence across hosts as a first-class requirement.
- **Snapshots are eventual consensus.** Two instances publish `Create{Snapshot}` for
the same log+projection; if their CIDs match, they agree without coordination.
- **Kernel reads its own projections.** `actor-state` for sig verification;
`define-registry` for every Define* lookup. Startup sequence must bootstrap these
before serving traffic.
- **Reprojection cost is real.** Heavy projection changes mean replaying from genesis.
Encourage incremental schemas (small per-activity work, idempotent updates) and
provide profiling.
---
## 11. Sandbox & determinism
The runtime contract that makes folds (and validators, triggers, semantics) safe to
execute, and that guarantees every conforming SX host computes the same state from
the same log.
### 11.1 Three sandbox levels
Different registry entries need different power. We define three nested execution
modes; the registry entry declares which mode it requires.
| Mode | Used by | IO | Clock | Random | Determinism |
|------|---------|----|----|--------|-------------|
| **pure** | folds, validators, audience predicates, semantics, trigger `when-sx` | none | activity's own `published` only | seeded from activity CID only | required across hosts |
| **crypto** | sig suite verify, codec encode/decode | crypto primitives only | none | sign-only secure RNG | required across hosts (verify); single-host (sign) |
| **effectful** | storage backends, transports, trigger `then-sx`, some proof verifiers | per-capability grant only | host clock | host RNG | not required; single-host |
Default mode is **pure**. The other two are opt-in at registration time, and the
registration is itself a signed activity — anyone can audit which extensions claim
which powers.
### 11.2 Pure sandbox (the load-bearing one)
This is the mode every projection fold runs in. It must produce identical results on
every conforming SX host, every time.
**Allowed:**
- All spec primitives in `spec/primitives.sx` that don't perform IO (arithmetic,
comparison, predicates, string ops, collection ops, dict ops, format helpers).
- The activity being processed (full envelope), as the function's argument.
- The current state value, as the function's argument.
- A small set of fed-sx-specific deterministic primitives:
- `(activity-cid act)` → CID of the activity envelope
- `(activity-time act)` → ISO timestamp from `published`
- `(actor-state-as-of state-snapshot actor-id activity-time)` → if the projection
has been declared dependent on `actor-state` (see §10.7), reads from a snapshot
of that projection at the activity's timestamp
- `(seeded-rng cid)` → deterministic PRNG seeded from a CID, returns a stream of
uniform values
**Forbidden:**
- All IO: HTTP, file, network, stdin/stdout, environment.
- Wall-clock access. The host's `now` is not in scope; the only time available is
`(activity-time act)`.
- Host-seeded randomness. Only `seeded-rng` (CID-derived) is available.
- Mutation outside the returned value. Enforced by the SX evaluator's lack of
ambient mutable bindings; folds may use local `let` and mutation within their own
closure but cannot reach outside.
- Calling other registry entries by name. Composition happens at query time, not
fold time (see §10.7).
**Enforced by:** evaluator runs the fold with the IO platform stripped to nothing.
The fed-sx kernel constructs a `pure-platform` (no fetch, no query, no action, no
DOM, no storage) and uses it as the sole evaluator platform when calling the fold.
Any IO primitive call raises a hard error caught as a fold failure.
### 11.3 Crypto sandbox
Sig suites and codec encode/decode need hash + crypto + encoding primitives but
nothing else. They're still deterministic across hosts (verify case) but get a
narrower platform than effectful, wider than pure.
**Additional primitives over pure:**
- `(sha2-256 bytes)`, `(sha3-256 bytes)`, `(blake3 bytes)`, …
- `(rsa-verify pubkey msg sig)`, `(ed25519-verify pubkey msg sig)`, …
- `(rsa-sign privkey msg)`, `(ed25519-sign privkey msg)` — sign-only; requires the
caller to supply a secure RNG handle (which is *not* in pure mode)
- `(cbor-encode value)`, `(cbor-decode bytes)` — for codecs implementing CBOR variants
- `(base32-encode bytes)`, `(base58btc-encode bytes)`, `(multibase-encode tag bytes)`
- `(multihash-encode tag digest-bytes)`, `(multihash-decode bytes)`
- `(cid-encode codec mhash)`, `(cid-decode bytes)`
**Sign vs verify:** verify is pure (deterministic). Sign is not — it consumes
randomness. fed-sx draws a clean line: signing happens *outside* registry-entry SX
(it's an operation the kernel/runtime performs on behalf of the actor with their
private key); registry SX only ever *verifies*. This keeps the pure↔crypto distinction
tractable.
### 11.4 Effectful sandbox
Storage backends, transports, trigger `then-sx`, and proof verifiers that need the
network (e.g. blockchain RPC for on-chain proof verification) all need real IO.
These are not used to compute projected state; they're how the substrate interacts
with the outside world.
**Capability-granted primitives.** The registration activity declares the
capabilities the entry needs:
```sx
(activity 'Create
:object {:type "DefineStorage"
:where-tag "ipfs"
:capabilities [{:type "http-client" :allowlist ["http://localhost:5001/*"]}
{:type "fs-read" :path-prefix "/var/cache/fed-sx/ipfs/"}
{:type "fs-write" :path-prefix "/var/cache/fed-sx/ipfs/"}]
:put-sx (fn (cid bytes) ...)
:get-sx (fn (cid) ...)})
```
**Capability types** (initial set; extensible):
- `http-client` with `allowlist` (URL prefix patterns)
- `http-server` with `path-prefix` (mounts a sub-handler)
- `fs-read` / `fs-write` with `path-prefix` (chroot-style)
- `subprocess` with `command-allowlist`
- `clock-read` (wall clock; granted if registry entry needs to timestamp something)
- `random-bytes` (host CSPRNG)
**No ambient authority.** Default capability set is empty; every capability is
explicit, declared, signed, and auditable. A peer can refuse to load a registry
entry whose capability claim is unacceptable to them.
**Capabilities are content-addressed.** Each capability descriptor has a CID. The
substrate maintains a registry of "capability CIDs that this instance trusts to
honour" — operator policy, not protocol.
### 11.5 Gas and resource accounting
Each sandbox call gets a budget:
- **CEK gas** — every evaluator step costs 1 unit; primitive calls cost a per-
primitive amount declared in `spec/primitives.sx`. Default budget: 100k units per
fold call. Tunable per-projection via `DefineProjection.gas-limit`.
- **Memory ceiling** — peak heap size for the fold call. Default 64 MB. Tunable.
- **IO budget** (effectful only) — bytes read/written and network calls per
invocation, granted separately per capability.
- **Wall-clock budget** (effectful only) — max real-time before forced termination.
Exceeding any budget is a hard failure; the call returns an error value, the fold's
state is unchanged, and the activity is tagged for the projection.
Gas accounting is part of the spec — every conforming host must charge the same
units for the same operations, so "this fold runs out of gas" is a deterministic
property of the (projection, activity) pair, not a host-specific outcome.
### 11.6 Determinism gotchas
The pure sandbox is only as deterministic as its primitives. Worth nailing:
- **Floating point.** IEEE 754 binary operations are bitwise-identical across
conforming hosts, but transcendentals (`sin`, `cos`, `log`, `exp`) are *not* —
libm implementations differ. **Decision: floats are forbidden in pure mode unless
the projection declares `requires-deterministic-floats: true` and uses only the
IEEE 754 basic operations (+, -, *, /, sqrt, comparison, conversion).** For exact
arithmetic, use integers or rationals (fed-sx will provide a rational primitive).
- **Map / dict iteration order.** Must be sorted-key always in pure mode. The SX
spec mandates this for `for-each` and `map` over dicts; we tighten it: pure mode
forbids relying on insertion order.
- **String encoding.** All strings are UTF-8 NFC at ingestion; pure-mode operations
use byte-level comparison after normalization. Codepoint operations (`length`,
`substring`) return identical results across hosts because they operate on the
normalized form.
- **Integer overflow.** Pure mode uses arbitrary-precision integers (the SX spec
default). No undefined behaviour. Overflow is impossible.
- **Equality.** Structural equality (`equal?`) compared across hosts must yield the
same result for the same canonical-CID values. Implies dict equality is
order-independent (as it should be), and float equality follows IEEE 754 (NaN ≠
NaN; +0.0 = -0.0).
- **Error values.** When a primitive errors, the error must be representable as a
dag-cbor value with a stable CID across hosts. Reserve a `{:error :type ... :msg
...}` shape; standard error types defined in the spec.
### 11.7 Failure model
A pure-mode call ends in one of three terminal states:
1. **Success** — returns a value. Fold uses it as new state.
2. **Sandbox violation** — IO attempted, capability denied, etc. Returns a stable
error value; fold's state is unchanged; activity tagged
`{:projection-failed :reason :sandbox-violation :detail ...}`.
3. **Resource exhaustion** — gas, memory, IO budget exceeded. Same handling as
sandbox violation but with `:reason :resource-exhausted`.
Crypto-mode failures (e.g. invalid signature) are *return values*, not exceptions —
verify returns boolean, sign returns either a sig or an error. This forces callers
to handle failure explicitly.
Effectful-mode failures (network down, disk full) propagate to the operator as
errors but never affect projected state. The substrate retries effectful operations
according to the registry entry's policy (declared at registration).
### 11.8 Conformance testing
Cross-host equivalence isn't aspirational; it's tested.
- **Spec test suite** ships projection equivalence tests: a corpus of (log slice,
projection CID, expected snapshot CID) tuples. Every conforming SX host must
produce the expected snapshot CID for each input.
- **Validator equivalence tests** likewise: (validator CID, activity, expected
result).
- **Codec equivalence tests:** (codec CID, value, expected encoded bytes), in both
encode and decode directions.
- **Sandbox isolation tests:** "this fold attempts to call `fetch`; expected
outcome: sandbox violation error with stable CID."
Hosts run the conformance suite to claim "fed-sx pure-mode conformance." Failures
are publishable as `Test{result: failed, host: ..., projection: ...}` activities —
the conformance graph itself is federated.
### 11.9 Operational implications
- **The pure sandbox is the heart of cross-host federation.** Every divergence is a
spec bug or a host bug; both are caught by snapshot CID mismatches and surfaced
via `Test` activities.
- **Capability descriptors are the new audit trail.** "What can the IPFS storage
backend do?" is a question with a precise answer at any timestamp — the registered
capability CIDs.
- **Floats are mostly absent.** This is unusual but defensible — most state in the
substrate is ids, counts, sets, references. Numerical computation belongs in
effectful registry entries (e.g. an analytics projection that publishes summaries
as activities, projected by a downstream pure projection that just stores them).
- **Gas is part of the protocol.** Two hosts disagreeing about whether a fold runs
out of gas is a conformance failure. Spec primitive gas costs are normative.
## 12. Bootstrap & genesis
How a fresh instance starts with no log, where the initial registry entries come
from, and how the kernel evolves without bricking peers.
### 12.1 The genesis problem
The substrate is "everything is a `Define*` activity in the log." But on a fresh
instance the log is empty — so there are no `Define*` activities to tell the kernel
what `Create` means, how to verify a signature, or what dag-cbor is. Strict
turtles-all-the-way-down would deadlock startup.
Solution: **the kernel ships with a baked-in genesis bundle** containing the minimal
set of definitions it needs to interpret its own log. The bundle is a constant of
the kernel binary; its CID is hardcoded; the kernel verifies on startup that the
bundle matches its hardcoded CID. After that, everything (including superseding the
bundled definitions themselves) goes through the activity log.
The genesis bundle is *not* itself a federated artifact in the AP sense. It's the
dictionary you need before you can read any activities. Optionally, an actor can
`Create{GenesisRecord}` as their first published activity to advertise which genesis
they started from — informational, not load-bearing.
### 12.2 Genesis bundle contents
Minimal viable bundle (dag-cbor object, content-addressed):
```
{
"type": "fed-sx-genesis",
"kernel-version": "1.0.0",
"envelope-spec": { ... }, // canonical schema for activity envelope
"object-spec": { ... }, // canonical schema for object envelope
"definitions": {
"activity-types": {
"Create": { "schema": <sx>, "semantics": <sx> },
"Update": { "schema": <sx>, "semantics": <sx> },
"Delete": { "schema": <sx>, "semantics": <sx> },
"Announce": { "schema": <sx>, "semantics": <sx> }
},
"object-types": {
"SXArtifact": { "schema": <sx> },
"Note": { "schema": <sx> },
"Tombstone": { "schema": <sx> },
"DefineActivity": { "schema": <sx> },
"DefineObject": { "schema": <sx> },
"DefineProjection": { "schema": <sx> },
"DefineValidator": { "schema": <sx> },
"DefineCodec": { "schema": <sx> },
"DefineTransport": { "schema": <sx> },
"DefineAudience": { "schema": <sx> },
"DefineProof": { "schema": <sx> },
"DefineStorage": { "schema": <sx> },
"DefineTrigger": { "schema": <sx> },
"DefineSigSuite": { "schema": <sx> },
"Snapshot": { "schema": <sx> }
},
"sig-suites": {
"rsa-sha256-2018": { "verify": <sx>, "key-format": <sx> },
"ed25519-2020": { "verify": <sx>, "key-format": <sx> }
},
"codecs": {
"dag-cbor": { "encode": <sx>, "decode": <sx> },
"raw": { "encode": <sx>, "decode": <sx> },
"dag-json": { "encode": <sx>, "decode": <sx> }
},
"projections": {
"activity-log": { "initial-state": ..., "fold": <sx> },
"by-type": { "initial-state": ..., "fold": <sx> },
"by-actor": { "initial-state": ..., "fold": <sx> },
"by-object": { "initial-state": ..., "fold": <sx> },
"actor-state": { "initial-state": ..., "fold": <sx> },
"define-registry": { "initial-state": ..., "fold": <sx> },
"audience-graph": { "initial-state": ..., "fold": <sx> }
},
"validators": {
"envelope-shape": { "predicate": <sx> },
"signature": { "predicate": <sx> },
"type-schema": { "predicate": <sx> }
},
"audience-predicates": {
"Public": { "member-of": <sx> },
"Followers": { "member-of": <sx> },
"Direct": { "member-of": <sx> }
}
},
"capability-types": [ // schema for capability descriptors
"http-client", "http-server",
"fs-read", "fs-write",
"subprocess", "clock-read", "random-bytes"
]
}
```
Each definition's body is **SX source**, not bytecode. The kernel evaluates it at
startup using the same SX evaluator user-published `Define*` artifacts use — there
is no privileged "native" path. The bootstrap is just SX loaded from the binary
instead of from the log.
### 12.3 Hardcoded CID and verification
The kernel binary contains:
- The full genesis bundle (embedded as bytes).
- The CID computed over those bytes at build time.
On startup:
1. Compute the actual CID of the embedded bundle.
2. Compare to the hardcoded CID.
3. **Mismatch → refuse to start.** Either the binary has been tampered with or the
build process is broken. Either way, the operator should know immediately.
4. **Match → proceed.** Every running instance with a given kernel binary has
byte-identical bootstrap state — no version drift possible within a binary.
The genesis CID is exposed at `GET /.well-known/sx-capabilities` so peers can see
which kernel version they're talking to.
### 12.4 Fresh instance startup sequence
```
1. Load and verify genesis bundle (panic on mismatch)
2. Parse all definition SX sources, instantiate evaluator closures
3. Initialize registries from definitions (in the order: codecs → sig-suites →
validators → object-types → activity-types → audience-predicates → projections)
4. Open log file (create if missing)
5. Replay any existing log: for each activity, validate, then fold into each
projection (resuming from snapshots where available)
6. Load or generate actor keypair (filesystem path from config)
7. If actor has never published a Create{Person} for itself, generate and append
one as the first activity of this instance's outbox
8. Initialize HTTP server, wire routes
9. Open inbox: start accepting federated activities
10. Mark instance as ready
```
Steps 1-3 are the bootstrap. Step 5 is replay-and-project. Step 7 is the
"actor genesis" — every instance has at least one local actor; it publishes itself
as its first activity, and that activity (signed by the actor's own key) anchors all
subsequent activity from that actor.
### 12.5 First activity — actor creation
Every fresh actor's outbox starts with:
```sx
(activity 'Create
:id "https://next.rose-ash.com/actors/giles/activities/<uuid>"
:actor "https://next.rose-ash.com/actors/giles"
:published "<iso-timestamp>"
:to ["https://www.w3.org/ns/activitystreams#Public"]
:object <full actor doc with publicKeys array>
:signature <signed by the new key over the activity envelope>)
```
Self-signed: the activity introduces the key it's signed with. Verifiers fetch the
actor doc embedded in the activity, find the key, verify against the activity. This
is the trust-on-first-encounter for a new actor — the same model AP uses.
The kernel emits this automatically on first startup if the actor has no prior
activity. Subsequent actor changes (key rotation, profile updates) are `Update`
activities signed by an existing key.
### 12.6 Joining federation
A new instance has no peers initially. Discovery is operator-driven for v1:
1. Operator configures one or more peer URLs (or a well-known seed list).
2. Instance fetches peer's actor doc and `/.well-known/sx-capabilities`.
3. Instance verifies it can interpret the peer's activities (envelope compatible,
sig suites overlap). Reports incompatibilities to operator.
4. If compatible, instance follows peer's primary actor (`POST /inbox` with a
`Follow` activity).
5. Peer streams or backfills outbox to this instance.
6. Activities arrive, validate, fold into local projections.
Discovery beyond manual config (e.g. peer recommendations, federation directories)
is a v2 concern.
### 12.7 Kernel version evolution
The substrate must evolve without forcing every instance to upgrade in lockstep.
Three rules:
**Rule 1: The activity envelope shape is forward-compatible only.**
We may *add* optional fields to the envelope; we may not change semantics or remove
fields. Old activities still validate under new kernels. New activities with new
fields are accepted by old kernels (which ignore the unknown fields, store the raw
envelope, and project conservatively).
This is the AP discipline. We adopt it strictly. If we ever need a breaking envelope
change, it's a major version (fed-sx 2.0) and instances at different majors don't
federate directly — only via bridges.
**Rule 2: Everything else evolves via supersession.**
New sig suite, new codec, new projection definition, new validator: publish a
`Define*` activity that supersedes the old one. Both old and new versions stay valid
at their respective timestamps. Old activities verify under old definitions; new
activities use new definitions. Time-aware lookup (§9.6, §10.6) makes this work.
**Rule 3: New genesis bundles supersede old ones via published activities.**
When the kernel team ships a new version with an updated bundle:
- The new bundle's CID is different.
- Operators upgrading the kernel get the new bundle automatically.
- The new bundle's *contents* are largely supersession `Update{DefineProjection,
DefineValidator, ...}` activities relative to the old bundle's definitions.
- A peer running the old kernel sees these `Update` activities (when they appear in
followed outboxes) and *can* opt to load them dynamically (§12.8) or stay on the
old bundle definitions until the operator upgrades.
In other words: the kernel binary evolution and the activity-log evolution are
parallel tracks. The binary determines what's *built in*; the log determines what's
*currently active*. They converge over time but don't have to be lockstep.
### 12.8 Dynamic Define* loading
When an instance receives an activity of `type: "PinV3"` and has no `DefineActivity{
name: "PinV3"}` in its define-registry, it has three options (operator policy):
- **Strict mode** — store the activity envelope (it's valid AP), tag it `unknown-type`
in `by-type`, do not project semantics. Operator must explicitly load the
definition to enable projection.
- **Permissive mode** — fetch the `DefineActivity{name: "PinV3"}` artifact (its CID
is in the activity's `capabilities-required` list), validate, evaluate the
semantics SX (in pure sandbox), reproject the activity. Operator notified.
- **Trusted-peers-only mode** — like permissive, but only auto-loads `Define*` from
actors on a configured trust list.
Default for fed-sx v1: **strict mode**. Operators opt-in to broader policies.
This lets the substrate genuinely live-extend — new verbs land via federation, no
binary upgrade — while keeping a clean audit trail of what got loaded when.
### 12.9 Genesis as the substrate's manifest
A useful framing: the genesis bundle is the substrate's **manifest** (in the package-
manager sense). It declares "this kernel ships with these definitions, identified by
these CIDs, and this is what the kernel does until the log says otherwise."
Two instances with the same genesis CID start identical. Two instances with
different genesis CIDs can federate as long as their *active* registry states (after
log replay) overlap enough.
The genesis bundle is also the **conformance reference**: a kernel implementation
claims fed-sx v1.0 conformance by reproducing the standard genesis bundle's CID
from its own build of the included SX sources. If two implementations build the same
spec sources and produce different CIDs, one of them is non-conformant. Cheap,
deterministic conformance check.
### 12.10 Operational implications
- **Build-time CID computation is part of the kernel build.** The build pipeline
must include the genesis-bundling step and embed the resulting CID. Mismatch
protection requires the binary to know what it expects.
- **Genesis evolution is a deliberate kernel-team decision.** Adding a new bundled
projection or sig suite is a kernel release, not a federated activity. (User-
defined projections still federate normally.)
- **Strict-mode default protects against malicious extensions.** Operators have to
consciously opt into auto-loading remote `Define*`. This trades convenience for
security — appropriate for v1.
- **Cross-major federation is a bridge problem.** If/when fed-sx 2.0 ships with an
envelope change, bridges between v1 and v2 are themselves federated artifacts —
built by anyone, signed, audited.
## 13. Federation mechanics
How instances exchange activities, how peers subscribe, how new followers backfill,
how delivery survives unreliable networks, and how the substrate resists abuse.
### 13.1 Push, pull, hybrid
ActivityPub canonically uses **push**: actor A publishes by POSTing each delivery to
each follower's inbox URL. This gives low latency and clear delivery semantics, but
requires a reliable per-recipient delivery queue and falls over when peers go down.
fed-sx supports both, with a **push-primary, pull-fallback** model:
- **Push** is the default delivery mechanism. When an activity is appended to A's
outbox, A's delivery worker posts it to each follower's inbox.
- **Pull** is always available: any peer can `GET /actors/<id>/outbox?since=<cursor>`
and stream activities in order. Used for backfill, recovery from delivery gaps,
and instances that prefer pull-only operation.
- **Hybrid in practice:** push delivers *notifications* (the activity itself, or a
pointer to its CID); receivers may pull the full content if not inlined. Useful
when the activity body is large.
Operators can configure their actors as push-only, pull-only, or hybrid. The
default is hybrid.
### 13.2 The Follow lifecycle
AP-standard, slightly tightened:
```sx
;; A wants to follow B
(activity 'Follow
:actor "https://a.example/actors/alice"
:object "https://b.example/actors/bob")
;; → POST to B's inbox
;; B accepts (or rejects)
(activity 'Accept
:actor "https://b.example/actors/bob"
:object <follow-activity-id-or-embedded>)
;; → POST to A's inbox
;; A unfollows later
(activity 'Undo
:actor "https://a.example/actors/alice"
:object <follow-activity-id-or-embedded>)
;; → POST to B's inbox
```
State derived by the `audience-graph` projection on each instance:
- `(followers actor)` — set of actors who follow `actor`, projected from
`Accept{Follow}` activities in `actor`'s outbox (and the inverse via received
`Follow` activities).
- `(following actor)` — symmetric.
**Auto-accept by default.** Public actors auto-publish `Accept` for any incoming
`Follow`. Locked actors require manual approval, implemented as an operator UI that
publishes the `Accept` (or `Reject`) once a human decides.
### 13.3 Backfill
When A first follows B, A wants B's history. Four supported modes:
| Mode | Mechanism | Trade-off |
|------|-----------|-----------|
| **No backfill** | Just stream new activities going forward | Cheapest, missing context for new followers |
| **Pull paginated** | `GET /outbox?since=epoch&limit=100` repeatedly | Standard, slow for large outboxes |
| **Snapshot fetch** | Find latest `Create{Snapshot}` published by B for the projection of interest, fetch + verify, then pull only activities after the snapshot's tip | Fast, requires B to publish snapshots |
| **Bundle fetch** | Out-of-band: B publishes a CID for an export bundle (a dag-cbor list of activities + actor doc + sig suite verification metadata); A fetches once, validates the chain, replays | Fastest for cold starts; bundle creation is opt-in |
Default: snapshot fetch when available, paginated pull otherwise.
A new instance joining federation typically combines: snapshot-fetch the
`actor-state` and `define-registry` projections from a trusted peer (so it knows who
exists and what verbs are defined), then incrementally backfill specific actors of
interest.
### 13.4 Delivery queue and retry
Every push delivery attempt has a fate:
| Outcome | Action |
|---------|--------|
| 2xx | Mark delivered |
| 3xx | Follow redirect (with limit) |
| 4xx (except 429) | Mark *permanently failed* — peer rejected the activity. Log; don't retry. |
| 429 | Honour `Retry-After`; reschedule |
| 5xx | Exponential backoff; reschedule |
| Connection error | Exponential backoff; reschedule |
**Retry schedule** (default, tunable per peer):
```
1 min, 5 min, 15 min, 1 h, 4 h, 12 h, 24 h, 48 h, 96 h
```
After the last attempt fails, the activity is **abandoned for push** but remains in
A's outbox. Followers can still pull it via `GET /outbox?since=...`. The peer will
eventually catch up if they come back online and pull. Push is best-effort; pull is
the source of truth.
**Persistent queue.** Delivery state is itself stored in the local instance — it's
operator-internal, not federated. (Could be a regular SQLite table; doesn't need to
be a projection because it's not state-the-world-cares-about.) On instance restart,
the queue resumes from where it left off.
**Queue-as-projection (alternative):** for instances that want every aspect to be
log-derived, the delivery state could be a local-only projection over a stream of
`Attempt` / `DeliverySuccess` / `DeliveryFailure` activities written to a private
local-only outbox. Out of scope for v1 but the design admits it.
### 13.5 Audience-respecting delivery
Each activity carries `to`, `cc`, `bto`, `bcc`. The delivery worker computes the
**delivery set**: union of explicit recipients + (if `as:Public` or `Followers` in
audience) the actor's followers projection.
- `bto` and `bcc` are stripped before delivery (recipients shouldn't see who else is
blind-copied).
- **Receivers honour audience.** When an instance receives an activity it should
not be in the audience for (e.g. a `Direct` activity to someone else, leaked via a
misconfigured peer), it logs and discards. Validators in the inbound pipeline
enforce this.
- **Public ≠ unlisted.** `to: as:Public` means deliver to followers AND make
publicly fetchable AND show in public projections. Some actors prefer "publicly
fetchable but not pushed broadly" — `cc: as:Public` with `to: Followers`.
### 13.6 Spam and abuse posture
ActivityPub has well-known abuse vectors (Mastodon's history is instructive). fed-sx
defends in layers:
**Signature verification.** Every inbound activity must have a valid signature
matching an actor whose key was active at `published`. Forgeries are dropped at the
envelope-validation stage (§14). Necessary but not sufficient — signatures only
prove the message wasn't tampered with, not that the sender is benign.
**Per-source rate limits.** Per-actor and per-instance request rate limits on
`/inbox`. Default: 100/min per actor, 1000/min per instance. Exceeded → 429.
**Per-instance trust state.** Three categories, operator-configured (and
overridable per actor):
- **Trusted** — auto-accept, auto-load Define* (if permissive mode), no rate-
multiplier penalty.
- **Default** — accept signed activities, standard rate limits, do not auto-load
Define*.
- **Suspended** — drop all inbound activities, refuse outbound delivery, do not
fetch artifacts. Operator decision (e.g. spam source, harassment instance).
Trust state is local-only (operator policy); it is not federated. Different
instances can disagree.
**Audience refusal.** Activities not addressed to anyone on this instance (no local
followers, not `as:Public`, not `to:` a local actor) are dropped on receipt.
Discourages spam targeting random instances.
**Content validators.** Registry-driven content moderation: a `DefineValidator`
with `applies-to: "inbound"` runs against every inbound activity and can reject
based on content rules. Examples: link-spam detection, ML moderation models served
via an effectful validator (note: effectful validators are a special case — they
*can* fail-closed without affecting determinism, because validators happen *before*
projection and don't contribute to projected state).
**Capability vetting.** If an inbound activity declares `capabilities-required`
that includes definitions this instance hasn't loaded *and* trust policy is strict-
mode, the activity is quarantined (stored but not projected) pending operator
review.
**Federation circuit breakers.** Per-peer error rate triggers temporary defederation:
if a peer is sending malformed activities, exceeding rate limits, or signing with
revoked keys, automatic suspension for an exponential cool-off.
### 13.7 Discovery
How an instance finds other instances and actors:
- **WebFinger** (RFC 7033). `GET /.well-known/webfinger?resource=acct:user@host`
returns links to actor URLs. AP-standard. fed-sx implements.
- **Well-known capabilities.** `GET /.well-known/sx-capabilities` (§7) for cross-
instance compatibility checks.
- **Manual peer config.** Operators add peer instance URLs to their config.
- **Peer recommendations.** An instance can publish `Recommend{actor}` activities
pointing at peers it considers worth following. Receivers can use these as
discovery hints (subject to local trust). Out of scope for v1 but the verb is
reservable.
- **Federation directories.** Community-maintained lists of instances; an instance
can opt into being listed by publishing a `Directory{listed-by}` activity. v2
concern.
For v1: WebFinger + capabilities + manual config. Discovery beyond that is opt-in
via standard verbs.
### 13.8 Streaming and real-time
Two streaming mechanisms:
- **Outbox SSE** — `GET /actors/<id>/outbox/stream` opens a Server-Sent Events
connection. Each new activity appended to the outbox is sent as an event. Allows
pull-style federation peers to maintain a live connection without polling.
- **Projection SSE** — `GET /projections/<name>/subscribe` (§10.8) streams projection
deltas. Useful for clients (browsers) wanting reactive views.
Both are local-only mechanisms; the canonical federation transport remains push to
inbox + pull from outbox. SSE is convenience, not protocol.
### 13.9 Operational implications
- **Push is best-effort, pull is authoritative.** Operators should treat the outbox
as the canonical record; delivery queue is bookkeeping.
- **Trust is per-instance and not federated.** Two instances may have different
views of "good actors" and "bad instances." This is a feature — defederation
decisions are local sovereignty.
- **Backfill via snapshots is the cheap path.** Encouraging actors to publish
`Create{Snapshot}` regularly makes new-follower onboarding fast.
- **Audience semantics are enforced both ways.** Senders compute delivery set;
receivers honour audience. Defence-in-depth against misconfigured peers.
- **Capability-based extension loading is opt-in.** Strict-mode default means
unknown verbs are stored-but-not-projected — safe by default, with explicit
operator control over what extensions load.
## 14. Validation pipeline
Every activity entering the substrate (whether published locally or received from a
peer) flows through a fixed pipeline of checks. Order matters: cheap and fail-safe
first, expensive and content-aware last. Each stage has a defined failure response
(reject, quarantine, drop). Registry-driven validators plug in at a specific stage.
### 14.1 The two pipelines
**Inbound** — activities arriving via `POST /inbox` or pulled from a peer's outbox:
```
HTTP transport → envelope → signature → replay → audience →
activity-type schema → object-type schema → content validators →
capabilities → trust state → log append → projection (async)
```
**Outbound** — activities being published locally via `POST /activity`:
```
authentication → authorization → envelope construction → object handling →
activity-type schema → signature → log append → projection (async) →
delivery (async)
```
Stages they share are implemented as the same SX functions called from both pipelines.
### 14.2 Inbound pipeline — stage by stage
| # | Stage | Check | Failure response |
|---|-------|-------|------------------|
| 1 | **Transport** | Valid HTTP request, content-type acceptable, body parseable as JSON-LD or dag-cbor | `400 Bad Request`; log |
| 2 | **Envelope** | Matches kernel's envelope spec (required fields present, types valid, recognised activity type or `unknown` allowed) | `400`; log; structured error in response body |
| 3 | **Signature** | Time-aware sig verification: fetch (or cache-lookup) actor doc, find key with `id == sig.key-id` that was active at `published`, verify against canonical envelope bytes per the named sig suite | `401`; log; do not retry; mark sender's instance for circuit-breaker accounting |
| 4 | **Replay** | Activity id and CID not already in `activity-log` projection | `200 OK` with `{status: "duplicate"}`, no-op |
| 5 | **Audience** | This instance has at least one local actor in `to`/`cc`, OR audience contains `as:Public`/`Followers` and the actor has local followers | Drop silently (no response indicating either acceptance or refusal — prevents inbox-membership probing); do not store |
| 6 | **Activity-type schema** | Look up `DefineActivity{name: <type>}` in `define-registry`; run its `schema` predicate over the activity in pure sandbox | If type unknown: per trust policy (strict: 422 with missing-definition CID; permissive: attempt dynamic load §12.8). If schema fails: 422 with violation detail |
| 7 | **Object-type schema** | If activity has an `object` with a `type`, look up `DefineObject{name: <type>}` and run its `schema` | Same as #6 |
| 8 | **Content validators** | All registered validators with `applies-to: inbound` or `applies-to: all` run sequentially; each is a pure-sandbox predicate that returns `:accept` / `:reject` / `:quarantine` | `:reject` → 422 with reason. `:quarantine` → store activity but mark `quarantined`, do not project, alert operator |
| 9 | **Capabilities** | Every CID in `capabilities-required` is present in this instance's loaded registries (or auto-loadable per trust policy) | Missing → 422 with list of missing CIDs (sender can deliver bootstrapping `Define*` artifacts first). Auto-load attempt can be triggered by re-POST with `?retry-after-load=true` |
| 10 | **Trust state** | Sender's actor and instance are not in `Suspended` state on this instance | Drop silently; do not respond |
| 11 | **Log append** | Write activity envelope (and inlined object content) to local mirror of sender's outbox; assign local sequence number | Disk error → 503 (transient); sender retries |
| 12 | **Projection** | Asynchronously fold the activity into every relevant projection (per `define-registry`) | Per-projection failure (gas, sandbox violation) → tag activity `projection-failed:<projection-name>`; do not affect log durability |
Pipeline halts at the first failing stage. Stages 110 are synchronous (`POST /inbox`
holds the connection). Stage 11 is synchronous; stage 12 is asynchronous and the
HTTP response returns once the log append succeeds.
### 14.3 Outbound pipeline — stage by stage
| # | Stage | Check | Failure response |
|---|-------|-------|------------------|
| 1 | **Authentication** | Caller has a valid bearer token, mTLS cert, or session for the actor | `401` |
| 2 | **Authorization** | Caller's identity is allowed to publish as the named `actor` (capability token §9.5 or owns the actor key) | `403` |
| 3 | **Envelope construction** | Kernel fills in `id`, `published`, normalises `to`/`cc`, computes `capabilities-required` (by walking referenced `Define*` CIDs) | n/a |
| 4 | **Object handling** | If `object` has inline content: canonicalize, compute CID, optionally store per `where`. If `object` references a CID, verify the artifact exists locally or remotely (or accept as a forward reference) | Storage error → `503` |
| 5 | **Activity-type schema** | Same as inbound #6 — schema must pass | `422` with violation detail (caller bug) |
| 6 | **Signature** | Sign envelope with the actor's currently-active key matching the activity type's required `purpose` (e.g. `Pin` requires `purpose: pin`) | If no suitable key: `400` |
| 7 | **Log append** | Write to local outbox; assign sequence number | `503` |
| 8 | **Projection** | Async fold (same as inbound #12) | Per-projection failure tag |
| 9 | **Delivery** | Async push to follower inboxes per audience | Per-recipient retry per §13.4 |
Caller's HTTP response returns after stage 7 (log append). The activity is durable
and queryable as soon as the response is sent; projection lag is reported via
`projected-up-to` headers and `?wait-for=` parameter.
### 14.4 Failure response taxonomy
Three response categories with explicit semantics:
**Reject** — tell sender, don't store, reject can be retried after sender corrects.
Used for: malformed envelope, invalid signature, schema violation, missing
capabilities. HTTP 4xx with structured error.
**Quarantine** — store envelope (it's a valid signed message) but don't project,
alert operator. Used for: content-validator soft-fail, unloaded capabilities under
permissive policy, suspect-but-not-banned senders. Activity sits in a quarantine
projection until operator reviews; operator can release (project) or expunge.
**Drop silently** — don't store, don't respond informatively. Used for: replay (ack
as duplicate), audience refusal (would leak inbox membership otherwise), suspended-
sender activities. The sender experiences this as a successful POST with no visible
effect; they can detect it only by polling for their activity not appearing in our
outbox.
### 14.5 Registry-driven validators
Most of the pipeline is **fixed kernel logic** (envelope, signature, replay, audience,
log append, delivery). Two stages are **registry-driven** and extend dynamically:
- **Stage 8 (content validators)** — operators add/remove `DefineValidator` entries
with `applies-to: inbound | outbound | all`. Each runs in pure or effectful
sandbox per its declaration. Returns one of `:accept` / `:reject{:reason}` /
`:quarantine{:reason}`.
- **Stages 67 (schema validators)** — these *are* registry entries
(`DefineActivity.schema`, `DefineObject.schema`); the pipeline calls into the
registry to fetch them.
**Pure-mode validators** are deterministic and cheap; results can be cached per
(activity-CID, validator-CID).
**Effectful-mode validators** can call out to ML models, blocklist services,
external moderation APIs. They get a per-call IO budget; exceeding it counts as
`:reject{:reason :validator-timeout}`. Effectful validators do *not* break
determinism because validation happens **before projection** — a rejected activity
never enters projected state.
### 14.6 Validator composition and ordering
Validators have an integer `priority` field; lower priority runs first. Pipeline
short-circuits on first `:reject`. `:quarantine` is *not* short-circuiting; later
validators still run, and `:quarantine` results aggregate.
Default priorities (room for operator-added validators):
```
0-99 : kernel-internal (envelope, sig, replay, audience)
100-199 : standard schema validators
200-299 : standard content validators (rate limit, audience leak)
300-399 : operator-added moderation
400-499 : effectful (ML, third-party APIs)
500+ : reserved
```
Operators can publish `Update{DefineValidator}` to change priorities or add new
ones; takes effect on next inbound activity.
### 14.7 Determinism requirement and its limit
A subtlety worth being explicit about: **inbound validation is not required to be
deterministic across instances.** Two instances can disagree about whether to
accept a given activity (e.g. one has a stricter content validator). Their projected
states will then diverge — but only on activities one accepted and the other didn't.
This is fine. Federation does not require state convergence; it requires *fold
determinism for activities both instances accepted*. Validators are sovereignty
controls, not protocol invariants.
Where determinism *is* required: schema validators (§14.2 stages 67). If two
instances disagree on whether `Pin v3` matches its schema, they can't federate
`Pin v3` activities meaningfully. So schema validators must be pure-mode and
referenced by CID.
### 14.8 Operational implications
- **The pipeline is the security perimeter.** Every checkable property is checked
here, not deeper in the kernel. No "trust the caller" assumptions inside log or
projection code.
- **Quarantine is the operator's friend.** Anything suspicious sits in quarantine
with full envelope, sig, and reason — operator can review and decide. Better than
outright drop because it preserves audit.
- **Schema validators are protocol-load-bearing; content validators are policy.**
The first set must converge across instances for federation to work; the second
set can diverge (and that's how local moderation policy is expressed).
- **Outbound validation catches local bugs early.** A malformed `Pin` activity
fails at outbound stage 5, never enters the local log, never gets delivered.
## 15. Storage layout
The on-disk shape of an instance. Three concerns kept separate: the **activity log**
(append-only, canonical), **content-addressed object storage** (keyed by CID,
immutable), and **operational state** (projections, indexes, queues — derived,
rebuildable).
### 15.1 Storage tiers
```
/var/lib/fed-sx/
├── log/ # canonical, append-only
│ ├── actors/
│ │ ├── <local-actor-id>/
│ │ │ ├── outbox/
│ │ │ │ ├── 000001.jsonl # segment, ~64MB cap
│ │ │ │ ├── 000002.jsonl
│ │ │ │ └── tip # symlink to current segment
│ │ │ ├── inbox/ # received, pre-projection
│ │ │ └── seq # next sequence number
│ │ └── <other-local-actor-id>/...
│ └── mirrors/ # local mirrors of followed remote outboxes
│ └── <remote-actor-id-hashed>/
│ ├── 000001.jsonl
│ └── ...
├── objects/ # CID → bytes
│ └── <cid-prefix-2>/<cid-prefix-2>/<full-cid>
├── snapshots/
│ └── <projection-cid>/
│ ├── <log-tip-cid>.cbor # snapshot value
│ └── index # ordered list of (log-tip, file)
├── projections/ # live projection state
│ └── <projection-cid>.cbor # latest in-memory state, periodically flushed
├── indexes/
│ └── fed-sx.db # SQLite: lookups, queue, trust state
├── keys/
│ └── <actor-id>/ # private keys, mode 0600
│ ├── primary.pem
│ ├── recovery.pem
│ └── sigs.toml # key metadata
├── genesis/
│ └── bundle.cbor # extracted from binary at first run
└── config.toml # operator config
```
### 15.2 The log — append-only segments
The activity log is the only thing the substrate cannot lose. It is the source of
truth from which everything else is derived.
**Format: JSONL segments.** Each line is one activity envelope, encoded as JSON-LD
(canonical form), terminated by `\n`. Easy to inspect, easy to grep, trivially
streamable.
**Why JSON-LD on disk, not dag-cbor?** Two reasons:
- Operability: humans can `tail -f` and `grep` the log. dag-cbor is opaque.
- AP wire compatibility: activities arrive over HTTP as JSON-LD anyway; storing the
same form avoids round-trip conversion.
The CID of each activity is computed from its **canonical dag-cbor representation**
(per §2), independent of how it's stored. CIDs are stable across storage formats.
**Segments cap at ~64MB.** Rotation by size, not time. Old segments are immutable;
new writes go to the tip segment. Compression (zstd) applied on segments older than
the current tip — saves disk, doesn't slow appends.
**Per-actor outboxes.** Each local actor has its own outbox directory. This matches
AP semantics (one outbox per actor) and means:
- Backing up a single actor is a simple directory copy
- Per-actor sequence numbers (no cross-actor coordination)
- Migration (`Move`) is a directory rename + a `Move` activity
**Mirror outboxes.** When a local actor follows a remote one, the remote's outbox is
mirrored locally for replay. Same JSONL format. Tracked under `log/mirrors/<hashed-
remote-id>/` to avoid filesystem path issues with URL characters. The hash is
purely a filesystem-friendly encoding; the canonical actor id stays in the log
content.
**Inbox vs outbox distinction.** Inboxes hold *received* activities pre-validation;
outboxes hold *committed* activities post-pipeline. An inbound activity that passes
the validation pipeline (§14) is moved from inbox to the appropriate mirror outbox.
This makes inbox a transient queue, not a permanent record.
### 15.3 Object storage
Content-addressed blob store, sharded directories.
**Path scheme:** `objects/<first-2-chars>/<next-2-chars>/<full-cid>`. Sha2-256 CIDs
are uniformly distributed; this gives ~65k buckets with a couple-hundred files each
at moderate scale. Standard pattern (matches IPFS, Git).
**Storage backends.** Pluggable per `where: cid` object:
- **`files-on-disk`** (default) — write to local filesystem.
- **`ipfs`** — register-driven backend; calls out to a local IPFS node.
- **`s3`** — object storage in cloud bucket.
- **`memory-only`** — in-memory cache, evictable; useful for ephemeral artifacts.
The kernel uses the `where-tag` on each object to dispatch to the correct backend.
Backends are registry entries (`DefineStorage`); operators install only the ones
they want.
**Garbage collection** is opt-in per backend. Default policy: **never GC** (objects
are immutable and may be referenced by future activities). Operators can configure
per-backend retention rules:
- "Keep last N versions of objects referenced by `Pin` activities for path X"
- "Evict objects not referenced in last 90 days from the `memory-only` cache"
- "Mirror objects referenced by ≥ 3 endorsements; evict others after 30 days"
GC operates on the projected reference graph (a `reference-graph` projection that
maintains "what activities reference this CID"). Removing an object that's still
referenced is allowed but produces a warning logged in operations.
### 15.4 Snapshots
Per §10.4, snapshots are the (projection-CID, log-tip-CID, state) triples that let
us resume without full replay.
**Storage:** `snapshots/<projection-cid>/<log-tip-cid>.cbor`. The state value is
dag-cbor-encoded; the file's content CID matches the snapshot's claimed CID.
**Index:** `snapshots/<projection-cid>/index` is a sorted list of `(log-tip-time,
log-tip-cid, file)` triples. On startup, kernel finds the latest snapshot ≤ current
log tip and resumes from it. On time-travel queries, finds the latest snapshot
≤ target time and folds forward.
**Retention:** keep at least:
- Latest snapshot per active projection
- Snapshots referenced by published `Create{Snapshot}` activities (federation
proofs)
- One snapshot per day for the last 7 days (audit / time-travel)
Older snapshots GC'd by default. Operators can increase retention.
### 15.5 Operational state — SQLite
Things that are derived, frequently-queried, but not federated:
- **Lookup indexes** for projections (when `indexes:` declared) — `(projection,
index-key, value) → activity-cid` rows
- **Delivery queue** — outbound activities pending push, retry counts, next-attempt
timestamps
- **Trust state** — per-actor and per-instance trust levels (Trusted / Default /
Suspended)
- **Quarantine queue** — activities pending operator review
- **Configuration cache** — currently-active registry entries (also in memory; on-
disk cache for fast restart)
Single SQLite file (`indexes/fed-sx.db`). Recoverable: if corrupted or deleted,
rebuilt from the log on next startup (with cost proportional to log size). The
SQLite is a cache, not authoritative.
WAL mode for concurrent readers. Single-writer (the kernel); reads from many
HTTP request workers.
### 15.6 Backup and export
The substrate is an append-only log of immutable artifacts; backup is simple.
- **Full backup:** rsync `/var/lib/fed-sx/log/` and `/var/lib/fed-sx/objects/`. The
rest is rebuildable.
- **Per-actor export:** tar `log/actors/<actor-id>/` + the objects referenced by
activities in that outbox. Self-contained, importable into another instance.
- **Activity bundle export:** for federation backfill, produce a dag-cbor bundle of
`[activity envelopes... + referenced objects]` for a specified actor + range.
Single file, content-addressed, signed by the source instance with a `Bundle`
activity attesting to its contents.
Exports are themselves publishable (`Create{Bundle}` activity carrying the bundle
CID). This is how an actor migrates instances cleanly: export bundle, import on
new instance, publish `Move` activity.
### 15.7 Mirroring and replication
Two patterns:
- **Federation mirroring** (the canonical kind) — when actor A follows B, A's
instance mirrors B's outbox locally. This is just normal federation (§13). Each
follower keeps its own copy.
- **Operational mirroring** — for high availability. An operator runs two instances
with shared filesystem (NFS / EFS) for `log/` and `objects/`, separate SQLite
files. Reads can hit either; writes go through one. Or: rsync-based hot standby
with manual failover.
Operational mirroring is out of scope for v1. Federation mirroring is the substrate-
level redundancy: as long as one peer that followed you is still online, your log is
still recoverable.
### 15.8 Storage size estimates
Rough targets at moderate scale (10 active local actors, 1000 followed peers, 1
year of activity at 100 activities/actor/day):
- **Log:** 10 actors × 100 act/day × 1 KB avg envelope × 365 days ≈ 365 MB local
outbox. Mirrors: 1000 peers × 10 act/day × 1 KB × 365 ≈ 3.6 GB.
- **Objects:** depends heavily on content. Assume 50% of activities have inline
content of avg 5 KB → ~2 GB total inline. CID-referenced larger objects: count
separately, depends on use case.
- **Snapshots:** typically much smaller than the log. ~10 active projections ×
~10 MB per snapshot × ~8 retained snapshots ≈ 800 MB.
- **SQLite:** index sizes proportional to indexed projection content; typical few
hundred MB.
Total: order of 10 GB at the described scale. Single-machine viable; SSD recommended
for log throughput; spinning disk fine for snapshots and object storage cold tier.
### 15.9 Operational implications
- **The log is sacred.** Never modify, never delete. Backups go to multiple media.
Loss of `log/` means loss of identity (actor activities) and loss of state-of-
record. Loss of `objects/` means loss of content but log + peers can recover most
of it.
- **Everything else is rebuildable.** Projections, indexes, snapshots, queue state
can all be recomputed from the log at startup cost. Operationally, this means
upgrades and migrations are forgiving.
- **CID-addressed storage is naturally idempotent.** Two instances writing the same
artifact write the same bytes to the same path. Race conditions become no-ops.
- **JSONL on disk pays for itself** the first time an operator needs to debug a
weird federation issue with `grep` and `jq`. Worth the storage cost vs dag-cbor.
## 16. API surface
HTTP API for reading the log, publishing activities, querying projections, and
streaming updates. Three layers: **AP-standard** endpoints (for vanilla AP
interop), **fed-sx-specific** endpoints (publish, query, capabilities), and
**discovery** endpoints (webfinger, well-known).
### 16.1 Endpoint catalog
#### AP-standard
| Method | Path | Purpose |
|--------|------|---------|
| GET | `/actors/<id>` | Actor doc (Person/Service/Group/Application) |
| GET | `/actors/<id>/inbox` | Read inbox — auth required |
| POST | `/actors/<id>/inbox` | Receive federated activity (HTTP Signature required) |
| GET | `/actors/<id>/outbox` | OrderedCollection of actor's published activities |
| POST | `/actors/<id>/outbox` | AP-standard publish (alias for `POST /activity` with `actor` set) |
| GET | `/actors/<id>/followers` | OrderedCollection of follower actor URIs |
| GET | `/actors/<id>/following` | OrderedCollection of followed actor URIs |
| GET | `/activities/<uuid>` | Single activity by id |
| GET | `/objects/<uuid>` | Single object by id (note: distinct from CID-addressed `/artifacts/<cid>`) |
#### fed-sx-specific
| Method | Path | Purpose |
|--------|------|---------|
| POST | `/activity` | Generalised publish — accepts any well-formed activity |
| GET | `/artifacts/<cid>` | CID-addressed artifact fetch (content negotiated) |
| GET | `/artifacts/<cid>/raw` | Raw bytes (whatever the codec stored) |
| GET | `/artifacts/<cid>/<path>` | IPLD path traversal into the artifact |
| GET | `/projections` | List of registered projections (name, CID, last-folded-tip) |
| GET | `/projections/<name>` | Full projection state (paginated for large states) |
| GET | `/projections/<name>?at=<ts>` | Time-travel: state as of timestamp |
| GET | `/projections/<name>/<key>` | Single key from a projection (uses indexes) |
| POST | `/query` | Run an SX query expression against one or more projections |
| GET | `/define-registry` | Currently active `Define*` artifacts by kind |
| GET | `/capabilities/<actor-id>` | Per-actor declared capabilities |
#### Discovery and well-known
| Method | Path | Purpose |
|--------|------|---------|
| GET | `/.well-known/webfinger?resource=acct:<user>@<host>` | RFC 7033 actor discovery |
| GET | `/.well-known/sx-capabilities` | This instance's capability advertisement (§7) |
| GET | `/.well-known/host-meta` | XRD describing the host |
| GET | `/.well-known/nodeinfo` | Standard fediverse node metadata (Mastodon, Pleroma compatibility) |
#### Real-time (SSE)
| Method | Path | Purpose |
|--------|------|---------|
| GET | `/actors/<id>/outbox/stream` | New activities as they're appended (events: `activity`) |
| GET | `/actors/<id>/inbox/stream` | New inbound activities (auth required) |
| GET | `/projections/<name>/subscribe` | Projection deltas (events: `delta`) |
| GET | `/federation/health/stream` | Per-peer delivery health (events: `peer-status`) |
WebSocket equivalents (`/ws/...` paths) available where SSE is awkward (browsers
behind proxies); same event payloads, different framing.
### 16.2 Authentication
Three mechanisms, each appropriate to a different caller type:
- **HTTP Signatures** (RFC draft-cavage-http-signatures) — the AP-standard mechanism
for inter-instance calls. Sender signs a digest of relevant headers + body with
their actor's private key; receiver verifies via the actor's public keys
projection (§9.6). Used for: `POST /inbox`, peer-to-peer outbox pulls when
authentication is desired.
- **Bearer tokens** — for interactive clients (CLIs, web UIs, mobile apps).
Issued via OAuth2 (or simple admin-issued tokens for v1). Used for:
`POST /activity`, `GET /actors/<id>/inbox`, anything requiring caller identity.
- **Capability tokens** (§9.5) — for delegated publish. Token includes the granting
actor, the granted capabilities (e.g. `publish: Pin for path-prefix /docs/`), the
bearer's actor, expiry, and signature from the granter. Used for: child actors,
service accounts, temporary publish access.
Public reads (most GET endpoints to public-audience activities) require no auth.
Private/followers-only reads check the caller's identity against the audience.
### 16.3 Content negotiation
Same resource, multiple representations. `Accept` header dispatches:
| Accept header | Returns |
|---------------|---------|
| `application/activity+json` | AP-standard JSON-LD (default for ambiguous Accepts) |
| `application/ld+json; profile="..."` | JSON-LD with explicit profile |
| `application/cbor` | dag-cbor |
| `application/json` | Plain JSON (compact, no `@context` expansion) |
| `application/sx` | Canonical SX wire format |
| `text/html` | HTML representation (for browsers — renders the artifact via SX) |
Same negotiation applies to `/artifacts/<cid>`, `/activities/<uuid>`,
`/projections/<name>`. Servers MUST honour the request; absent `Accept` defaults to
`application/activity+json`.
### 16.4 Pagination
Cursor-based via AP's `OrderedCollectionPage`:
```
GET /actors/giles/outbox
→ {
"type": "OrderedCollection",
"totalItems": 12345,
"first": "/actors/giles/outbox?page=true",
"last": "/actors/giles/outbox?page=true&min_id=0"
}
GET /actors/giles/outbox?page=true
→ {
"type": "OrderedCollectionPage",
"id": "...?page=true",
"next": "...?page=true&max_id=<cid>",
"prev": "...?page=true&min_id=<cid>",
"orderedItems": [...]
}
```
Cursors are CIDs of the boundary activity (not opaque tokens). Stable across
restarts and instances. `max_id` returns activities **before** the cursor (newest
first); `min_id` returns activities **after** the cursor.
Default page size: 50. Max: 1000. `Link: <...>; rel="next"` header also provided
for HTTP-native pagination.
For projections: same shape, items are projection entries.
### 16.5 The query API
`POST /query` takes an SX expression evaluated in pure mode against named
projections:
```sx
POST /query
Content-Type: application/sx
Accept: application/sx
(let ((actors (projection actor-state))
(pins (projection pin-state)))
(for-each ([(actor-id actor) actors])
(when (> (count (filter (fn ((path cid)) (= (:owner cid) actor-id)) pins)) 10)
{:actor (:preferredUsername actor)
:pins-published (count ...)})))
```
Query semantics:
- Evaluated in pure sandbox; all the determinism rules apply.
- Projection access is read-only and snapshot-consistent: the query sees state
as-of the time of the request (or `?at=` if specified).
- Result is serialized in the negotiated content type.
- Gas limit applies (default 1M units per query, tunable by operator).
- Cacheable: query CID + projection state CIDs uniquely determine the result.
Query results can themselves be published as `Create{QueryResult}` activities,
making derived analyses federable.
### 16.6 Errors
Uniform JSON error envelope:
```json
{
"error": {
"type": "https://next.rose-ash.com/ns/fed-sx/errors/v1#InvalidSignature",
"status": 401,
"title": "Activity signature invalid",
"detail": "Key id 'https://example/actors/x#key-1' was superseded at 2026-01-15T...",
"activity-id": "https://...",
"key-id": "...#key-1",
"instance": "/incidents/<incident-cid>"
}
}
```
Error types are URIs in the fed-sx namespace; receivers can check `type` for
programmatic handling. Standard errors:
- `MissingCapability` — includes `missing` array of CIDs
- `SchemaViolation` — includes `schema-cid`, `field-path`, `expected`, `got`
- `InvalidSignature`
- `Quarantined` — includes `quarantine-id` for operator-status tracking
- `RateLimited` — includes `retry-after`
- `ResourceExhausted` — for query gas exhaustion
### 16.7 Streaming details
SSE event format:
```
event: activity
id: <activity-cid>
data: { ...activity envelope... }
event: delta
id: <activity-cid that triggered the delta>
data: {"projection": "actor-state", "key": "...", "old": ..., "new": ...}
event: heartbeat
data: {"projected-up-to": "<cid>", "ts": "..."}
```
Clients reconnect with `Last-Event-ID: <cid>` to resume from the last event seen.
Server replays from that point in the log (or returns 410 if too far behind, in
which case client should switch to paginated pull).
### 16.8 Versioning
The substrate is versioned at three levels:
- **Envelope version** — declared in `/.well-known/sx-capabilities`. Currently `1`.
Forward-compatible (new fields OK; semantics fixed).
- **API version** — URL prefix optional: `/v1/...` works the same as `/...`. Future
major version: `/v2/...` paths in parallel.
- **Definition versions** — supersession via activity log (§§9.2, 12.7). No special
URL handling.
Capability negotiation happens before federation; clients shouldn't hard-code
URL paths beyond the canonical set documented here.
### 16.9 Operational implications
- **The API is small but layered.** AP compatibility is one layer; fed-sx
extensions are another; both share auth and content negotiation. Adding a new
endpoint shouldn't require new transport machinery.
- **Content negotiation is the polyglot bridge.** Same artifact addressable in JSON-
LD (for AP peers), dag-cbor (for fed-sx peers), SX (for SX clients), HTML (for
humans). One CID, four representations.
- **Cursor pagination is CID-based.** Stable identifiers, no opaque tokens to
invalidate, peers can synchronize without coordination.
- **The query API is a load-bearing differentiator.** Datalog/GraphQL-equivalent
expressiveness with no separate query language — it's just SX. Federable, signable,
versionable like any other SX artifact.
---
## 17. Implementation languages
Polyglot **authoring**, monoglot **runtime**: every language-on-SX compiles to core
SX and runs on any host with the SX evaluator. The language is an authoring choice;
the federated artifact is uniform SX. Authors of `Define*` artifacts pick the
source language they prefer; consumers don't need that compiler installed to
execute the compiled SX.
Languages are picked because they **genuinely fit the problem**, not to demonstrate
the polyglot story. Where a chosen language has gaps (e.g. Erlang-on-SX missing hot
reload), we invest in maturing the port rather than working around the gap.
### 17.1 The v1 stack
| Layer | Language | Why |
|-------|----------|-----|
| **Native primitives** | OCaml (existing runtime) | Crypto (RSA, Ed25519, SHA), dag-cbor encode/decode, HTTP socket, file IO, SQLite. Surfaced as Erlang-on-SX BIFs. |
| **Kernel orchestration** | Erlang-on-SX | Actor model = federation. `gen_server` per actor / per projection / per peer. `supervisor` for delivery workers. Message passing is literally the substrate. Hot code reload (Phase 7) for `Define*` live extension. |
| **Query API back-end** | Datalog-on-SX | Projection state is relational; trust graph walks, provenance, projection joins are textbook Datalog. Already mature (276/276 tests, full core Datalog with stratified negation, aggregation, magic sets, federation-graph demo). |
| **`Define*` semantics, schemas, validators, codecs, audience predicates** | Core SX | The canonical federated language. Everything content-addressed and federated lives here. |
### 17.2 Languages explicitly **not** booked for v1
Available, mature, considered — would be reached for if a real fed-sx need surfaced,
but no preemptive use:
- **Haskell-on-SX** (285/285 tests, 36 programs, type checker working) — for complex
operator-authored extensions that benefit from typed pattern matching. Schemas in
fed-sx are short predicates; types don't earn their keep here.
- **Smalltalk-on-SX** (625/629 tests, classic corpus running) — natural fit for a
live operator dashboard / Glamorous-Toolkit-style introspection. v2/v3 territory;
a browser UI likely wins for operator audiences.
- **APL-on-SX** — high-throughput batch reprojection if scalar SX folds become a
bottleneck. Premature without measured need.
- **JS-on-SX**, **Elm-on-SX** — browser-side client SDK / viewer. v2.
- **Common Lisp-on-SX**, **Forth-on-SX**, **Go-on-SX**, **Dream-on-SX**,
**Elixir-on-SX**, **Erlang-on-SX (alternative form)** — case by case if a use
case appears.
### 17.3 The FFI BIF layer
Erlang-on-SX has no FFI / NIF mechanism in its current form (Phase 6 plan: "out of
scope entirely"). fed-sx adds a **BIF layer** in `lib/erlang/transpile.sx` (or a
dedicated `lib/erlang/fed_bifs.sx`) exposing native primitives:
```
crypto:rsa_verify/3 crypto:ed25519_verify/3
crypto:sha2_256/1 crypto:sha3_256/1
cid:cbor_encode/1 cid:cbor_decode/1
cid:multihash/2 cid:from_bytes/2
cid:to_string/1 cid:from_string/1
log:append/2 log:read/3
log:tip/1 log:replay/3
http:listen/2 http:request/2
http:respond/3 http:sse_send/2
fs:read/1 fs:write/2
fs:exists/1 fs:list/1
sqlite:open/1 sqlite:exec/2
sqlite:query/3 sqlite:close/1
snapshot:put/3 snapshot:get/2
```
Each BIF is a thin Erlang-on-SX function dispatching to the corresponding SX runtime
IO primitive. Returns Erlang-shaped values (atoms, tuples, binaries). Errors raise
appropriate Erlang exceptions (`badarg`, `enoent`, `eaccess`).
This is the **only** native-FFI surface in fed-sx. All other I/O goes through these
BIFs. Operators can audit the BIF list to know exactly what the substrate touches
outside SX.
### 17.4 Build pipeline
```
.sx files (core SX, registry entries) ──┐
.erl files (Erlang-on-SX kernel) ──┼──> compile to core SX
.dl files (Datalog-on-SX queries) ──┘
content-addressed SX artifacts
genesis bundle (CID-verified)
OCaml runtime evaluates everything
```
Each authoring language's compiler runs at build time, producing core SX that goes
into the genesis bundle (for bootstrap definitions) or gets published as activities
(for runtime extensions).
### 17.5 Prerequisite work
Pieces of investment land in or alongside the Erlang-on-SX loop. The first two
land **before** fed-sx kernel code starts; the third runs in parallel, not
blocking milestone 1, but blocking production-grade throughput.
1. **Phase 7 — hot code reload.** `code:load_binary/3`, `gen_server`
`code_change/3` callback dispatch, atomic module-version swap. Required for
`Define*` live extension (no kernel restart to load new verbs). Reload-
semantics choice (two-version coexistence vs single-version atomic swap with
closure capture) decided during the work.
2. **Phase 8 — FFI mechanism + initial BIFs.** `define-bif` registration + term
marshalling + error mapping, then BIFs for `crypto:*`, `cid:*` (dag-cbor),
`fs:*`, `http:*`, `sqlite:*`. Required for fed-sx kernel to call native
primitives. Lands before kernel code that calls them.
3. **Phase 9 — specialized opcodes (the BEAM analog).** *Layered perf strategy:*
- **Layer 1 (Phase 9, in scope)** — specialized bytecode opcodes that bypass
the general-purpose CEK machine for hot Erlang operations. `OP_PATTERN_TUPLE`,
`OP_PERFORM`/`OP_HANDLE`, `OP_RECEIVE_SCAN`, `OP_SPAWN`/`OP_SEND`, BIF
dispatch table. Targets: 100k+ message hops/sec, 1M-process spawn under
30sec — roughly 1000-3000× speedup over the current general-purpose path.
- **Layer 2 (Phase 10, deferred)** — multi-core scheduler via OCaml 5
domains. Decided empirically after Layer 1 lands; likely unnecessary if
Layer 1 alone hits target throughput.
- **Layer 3 (skipped)** — incremental tuning of the existing call/cc-based
receive and env-copy-per-call machinery. Obsoleted by Layer 1; not pursued.
**Architectural note for Phase 9.** Phase 9a (the **opcode extension
mechanism in `hosts/ocaml/evaluator/`**) is out of scope for the Erlang loop
— it's SX VM core, used by every language port that wants specialized
opcodes. Designed in `plans/sx-vm-opcode-extension.md`; lands as a separate
focused workstream (~1-2 weeks) owning `hosts/`. Phase 9b-9g (the actual
Erlang opcodes in `lib/erlang/vm/`) are designed and tested against a stub
dispatcher in the Erlang loop until 9a is available.
**Shared-opcode discipline.** Opcodes Phase 9 produces that other language
ports could plausibly use (pattern match, perform/handle, record access)
become candidates for chiselling out to **`lib/guest/vm/`** — same lib/guest
discipline, applied at the bytecode layer. Don't pre-extract; promote to
`lib/guest/vm/` when a second language port has an actual second use. The
substrate accumulates a richer opcode surface over time as ports contribute,
and every port benefits from every shared opcode (the structural advantage
over BEAM, which is special-purpose-built for one language).
**fed-sx is not blocked by Phase 9.** Milestone 1 ships on current Erlang-
on-SX perf (which has 100-1000× headroom for a single demo instance). Phase
9 lands in parallel; by the time fed-sx needs production-grade throughput
(federation hub use cases, milestone 2-3), Phase 9 is ready.
After Phases 7 and 8 land, fed-sx milestone 1 (kernel + registries + bootstrap
entries + Pin smoke test + reactive application smoke test) becomes the next
workstream. Phase 9 work continues in parallel.
---
## 18. Subscription model
Symmetric to the publish-side extensibility: just as `DefineActivity` registers what
*kinds of things can be published*, `DefineSubscription` registers what *kinds of
patterns can be subscribed to*. `Follow` becomes one standard subscription type
among many, not a hardcoded primitive.
### 18.1 The asymmetry being fixed
Without this, the substrate has rich publish-side extensibility (any new verb is a
`DefineActivity`) and *one* hardcoded subscription primitive (`Follow`). That
mirrors AP but it's an arbitrary limitation in a substrate where everything else
is registry-driven. Generalising restores symmetry.
### 18.2 The `DefineSubscription` shape
```sx
(activity 'Create
:object {:type "DefineSubscription"
:name "Follow" ; AP-standard
:schema (fn (sub) ; what params the sub takes
(and (cid? (-> sub :object))
(= "Person" (-> sub :object-type))))
:match (fn (subscription activity) ; pure-mode predicate
(= (-> subscription :object) (:actor activity)))
:delivery {:default :push
:modes [:push :pull :sse]
:digest-window nil}
:capabilities-required []}) ; some subs may need authority
```
Four mandatory parts:
- **`schema`** — pure-mode predicate validating subscription parameters at
`Subscribe` time. Catches malformed subscriptions before they enter state.
- **`match`** — pure-mode predicate `(subscription, activity) → bool`. Decides
whether a given activity is a hit for this subscription. Determinism rules
apply (§11.2).
- **`delivery`** — supported modes (push to inbox / pull on demand / SSE
streaming / batched digest). The subscription instance picks its preferred
mode at `Subscribe` time from the supported set.
- **`capabilities-required`** — capability tokens the subscriber must hold
(empty for public subs; populated for paywalled/gated/private streams).
### 18.3 The `Subscribe` verb
The bootstrap verb that activates a subscription:
```sx
(activity 'Subscribe
:object {:type "Follow" :object "https://alice.example/actors/alice"})
(activity 'Subscribe
:object {:type "Topic" :tag "climate-change"
:delivery :digest :digest-window "P1D"})
(activity 'Subscribe
:object {:type "CidWatch" :cid "bafy..."
:events [:supersede :endorse]})
(activity 'Subscribe
:object {:type "Predicate"
:pred '(fn (act) (and (= (:type act) "Note")
(string-contains? (-> act :object :content) "fed-sx")))})
```
`Unsubscribe` is `Undo{Subscribe}` — AP's standard pattern, retains audit.
### 18.4 Standard subscription types (defined later, not bootstrap)
Same status as the custom verbs in §6.2 — substrate accepts any subscription
type once a `DefineSubscription` artifact registers it. Standard set:
| Name | Params | Match semantics | Use case |
|------|--------|-----------------|----------|
| **`Follow`** | `{object: actor-id}` | activity.actor == subscription.object | AP-standard actor following |
| **`Topic`** | `{tag: string}` | tag in activity.object.tags | Hashtag follows, RSS-like |
| **`CidWatch`** | `{cid, events: [...]}` | activity references cid AND activity.type in events | "Notify me when this artifact is updated/endorsed/forked" |
| **`PathWatch`** | `{path, events: [...]}` | activity is a Pin/Update of named path | "Notify me when domain:foo/bar/baz changes" |
| **`VerbFilter`** | `{wraps: subscription-cid, types: [...]}` | inner subscription matches AND activity.type in types | "Follow Alice but only Endorse activities" |
| **`TrustGraph`** | `{root: actor-id, depth: int}` | activity.actor reachable from root in trust graph at depth | Web-of-trust expansion |
| **`Predicate`** | `{pred: sx-fn}` | (pred activity) returns truthy | Escape hatch — most powerful, highest cost |
| **`Channel`** | `{channel-id}` | activity addresses or originates from channel | Multi-actor pooled streams |
### 18.5 Match-fn execution location
The load-bearing question. Three choices, fed-sx adopts the **hybrid model**:
- **Coarse filter on the publisher side** — audience predicates (§8) decide who
the activity is delivered to at all. This is mandatory and cheap (audience set
is usually small and well-defined).
- **Fine filter on the subscriber side** — once an activity arrives in inbox,
the subscriber's instance evaluates each active subscription's `match-fn`
against it. Pure-mode evaluation (deterministic, gas-bounded). Activities
matching one or more subscriptions enter the subscriber's projected state.
Why hybrid: publisher-side fine filtering would require the publisher to know
every subscriber's match-fn (privacy-violating, scaling-killing). Subscriber-side
filtering is wasteful only if the publisher's audience model is too coarse —
which is the audience system's job to fix per §8.
### 18.6 Subscription state and storage
Active subscriptions are themselves projected state. A bootstrap projection
`subscriptions` (paralleling `audience-graph` for the inverse direction)
maintains:
```
{actor-id -> [{subscription-cid, type, params, mode, started-at}]}
```
Updated by `Subscribe` and `Unsubscribe` activities. Queryable like any other
projection (§16). Used by:
- The inbox dispatcher to know which match-fns to evaluate against incoming
activities
- Triggers (§19) to know which activities to fire on
- Federation to advertise "here are the subscription types I currently subscribe
to" (capability-style, opt-in)
### 18.7 Federation interactions
Subscriptions interact with federation in three ways:
- **Discovery.** Peer's `/.well-known/sx-capabilities` (§7) lists registered
`DefineSubscription` CIDs, so subscribers know what they can ask for.
- **Negotiation.** A `Subscribe` activity carries `capabilities-required`; if
the publisher's instance doesn't support the named subscription type, it
responds with the standard 422 + missing-CIDs error (§14.2 #9). Subscriber
can then deliver the bootstrapping `DefineSubscription` artifact and retry.
- **Cross-instance match-fn**. If subscriber and publisher both run the same
conformance-tested SX evaluator, identical subscriptions match identically
(cross-host equivalence, §11.8). This is what makes federated topic
subscriptions reliable: every conforming instance computes the same
set-of-matches for the same activity.
### 18.8 Operational implications
- **The audience system handles "who do I send this to."** The subscription
system handles "what do I want to receive." They're complementary, not
redundant.
- **Subscription types can themselves evolve via supersession.** New version of
`Topic` with case-insensitive matching? Publish a new `DefineSubscription`,
`Supersede` the old one. Existing subscriptions migrate at next match
evaluation.
- **Match-fn cost matters.** A `Predicate` subscription with a slow predicate
becomes a per-activity tax. Gas budgets (§11.5) bound the worst case;
operators can disable expensive subscription types if needed.
- **Subscriptions are signed messages.** Audit, accountability, and revocation
all work the same way as activities — because subscriptions *are* activities.
---
## 19. Application model
The synthesis. With publish, subscribe, project, and trigger as registry-driven
primitives, the substrate has everything needed to express **distributed reactive
applications** as data — no native code, no kernel changes, no privileged
runtime. Applications are themselves federated artifacts.
### 19.1 An application is a tuple of artifacts
```
Application = {
subscriptions : [DefineSubscription instances and their parameters],
triggers : [DefineTrigger registrations],
projections : [DefineProjection registrations],
storage : [DefineStorage registrations] (optional)
}
```
That tuple, signed and bundled, is the application. Installing one = following
the named actors / activating the named subscriptions + loading the Define*
CIDs into the local registry. Forking one = republishing the Define* with
`Supersede` over the bits you change.
### 19.2 The reactive loop
```
External actors Operator publishes activities
publish activities via this instance's actors
│ │
▼ ▼
┌─────────────────────────────────────────────┐
│ Inbound + outbound activities │
└────────────────────┬────────────────────────┘
For each active subscription:
evaluate match-fn (pure mode)
┌─────────────┴─────────────┐
▼ ▼
Activity matches Activity does
a subscription not match
│ │
▼ ▼
Projections ← (silently dropped from
fold the activity this application's view;
│ may match other apps)
Triggers fire on the
subscription's match
Trigger then-sx runs
(effectful sandbox)
├──> updates local state (private projections)
├──> publishes new activity (via outbox)
└──> calls effectful primitives (HTTP, fs, etc.)
per declared capabilities
```
Three things happen on a match: **state updates** (projection), **derived
publishes** (new activities), **side effects** (effectful primitives). Each is
authorisation-gated by the trigger's declared capabilities.
### 19.3 Trigger semantics
`DefineTrigger` registers `(when-subscription, then-sx, cascade-limit)`:
- **`when-subscription`** — references a subscription (by CID or by name). The
trigger fires whenever that subscription matches an inbound or outbound
activity. Multiple triggers can reference the same subscription.
- **`then-sx`** — function of `(activity, subscription, env) → trigger-result`.
Runs in pure or effectful sandbox per declaration. Returns one or more of:
- `:publish [activity-spec ...]` — request publish of derived activities
- `:project [name → state-update ...]` — request projection updates
- `:effect [capability-call ...]` — request effectful primitive calls
- `:noop` — observed but no action
- **`cascade-limit`** — bounded depth for trigger cascades (§19.4).
A trigger is fundamentally **a reactive rule**: "when X happens, do Y." The
substrate guarantees Y happens at most once per X (deduplicated by activity-CID),
exactly-once-per-instance (delivery from trigger to its effects is durable),
and bounded-cost (gas + cascade-limit).
### 19.4 Cascade control
A trigger that publishes activities can fire other triggers. Without limits, a
single inbound activity could cascade across instances forever.
Each trigger declares `cascade-limit: N` (default 3). Each activity carries an
implicit `cascade-depth` field, incremented when it's the result of a trigger
firing. A trigger refuses to fire if `cascade-depth > cascade-limit`.
Cascade limits are local-only (operator policy, not federated). Defending
against runaway cascades from peer instances is the operator's job; the
substrate gives them the knob.
### 19.5 The `DefineApplication` bundle
A bundle artifact that names and groups the components of an application:
```sx
(activity 'Create
:object {:type "DefineApplication"
:name "rose-ash-blog"
:version 1
:subscriptions [{:type "Follow" :object "https://blog.rose-ash.com/actors/main"}
{:type "Topic" :tag "rose-ash"}
{:type "CidWatch" :cid <rose-ash-template-cid>
:events [:supersede]}]
:triggers [<comment-moderation-trigger-cid>
<reaction-counter-trigger-cid>
<rss-republish-trigger-cid>]
:projections [<comment-thread-projection-cid>
<reaction-counts-projection-cid>]
:storage [<local-files-storage-cid>]
:capabilities [<http-allowlist-cap-cid>
<fs-write-cap-cid>]
:description "Federated blog with moderated comments and RSS"})
```
Three operations on applications, all themselves activities:
- **Install** — `Subscribe` to each subscription, `Create{}` references in
`define-registry` to each trigger/projection/storage CID. One activity per
reference, audited and replayable. Or: a single `Install{DefineApplication}`
meta-verb that does the bundle in one signed step (defined later as a custom
verb, not bootstrap).
- **Update** — publish a new `DefineApplication` with the same name +
`supersedes` pointing at the old. Diff-then-apply: subscriptions added/
removed, triggers loaded/unloaded, projections reprojected per §10.5.
- **Fork** — publish a new `DefineApplication` referencing the original's CID
via `forked-from`, with whatever Define* CIDs you want to swap. Run alongside
the original or in place of it.
### 19.6 Per-application namespacing
Multiple applications running on one instance need isolation:
- **Projections are namespaced by application.** `pin-state` from app A is
distinct from `pin-state` from app B — both addressable as
`/projections/<app-name>/pin-state`.
- **Triggers fire only on subscriptions belonging to their application.** App
A's trigger doesn't see app B's subscription matches.
- **Storage backends are namespaced.** App A's `files-on-disk` backend writes
to `data/apps/A/objects/`; app B writes to `data/apps/B/objects/`.
- **Capabilities are per-application.** Granting `http-client` to app A
doesn't grant it to app B. Operator can audit per-app capability surface
and revoke selectively.
Cross-application reads are explicit and require a capability grant
(`read-projection: <app>/<projection>`). Default isolation; opt-in sharing.
### 19.7 Worked examples
#### Example A — Blog with moderated comments
```
DefineApplication "blog-with-comments":
subscriptions:
- Follow: <author-actor>
- Topic: "post-comment" (filter: object.in-reply-to in our-posts)
triggers:
- on Topic match → publish Note (the new comment, derived if approved)
→ projection pending-moderation
- on inbound Approve{Reply} → projection comment-thread (visible)
projections:
- comment-thread: post-cid → [approved comment activities]
- pending-moderation: list of pending replies awaiting approval
```
#### Example B — Continuous integration
```
DefineApplication "ci-pipeline":
subscriptions:
- Follow: <developer-actor>
- VerbFilter: wraps Follow, types: [Push]
triggers:
- on Push match → effect: run build (capability: subprocess + fs-write)
→ publish Build{source: Push.cid, output: <build-cid>, status}
- on Build{status: success} → effect: run tests
→ publish Test{...}
- on (Test{passed} count for N days) → publish Release{...}
projections:
- build-history: commit-cid → [build activities]
- release-history: ordered list of Release activities
```
#### Example C — Distributed code review
```
DefineApplication "code-review":
subscriptions:
- Topic: "review-request"
- CidWatch: <organisation-actor>, events: [Endorse]
triggers:
- on review-request match → projection review-queue
→ effect: notify-reviewer
- on Endorse from authorised reviewer → publish Approve{review-cid}
→ projection approval-state
projections:
- review-queue: ordered list of pending requests with summaries
- approval-state: review-cid → endorsement set
```
In all three: the application is *just* the bundle of subscriptions, triggers,
and projections. Federation makes them composable across instances. The
substrate provides exactly-once-per-CID semantics and pure-mode determinism for
the matches and folds.
### 19.8 Composition and discovery
Applications are themselves federated content. This means:
- **App registries** — actors can publish curated lists of applications they
endorse. Discovery becomes follow-an-actor + browse-their-app-list.
- **Cross-app composition** — application A publishes derived activities that
application B subscribes to. Pipeline of applications via the activity log.
- **App marketplaces** — pin a friendly path to a `DefineApplication` CID
(`rose-ash.com:apps/blog → bafy...`) for human discoverability.
None of this requires kernel changes. It's all activities about activities.
### 19.9 Operational implications
- **Applications are inspectable from the activity log alone.** Replay an
actor's outbox and you can reconstruct the exact application installation
state at any point in time.
- **Application updates are atomic relative to the activity log.** Either the
`Update{DefineApplication}` succeeded (new state visible from next activity)
or it didn't (old state continues). No partial-update window.
- **Forking is the same as installing a copy.** No special "fork" mechanism
needed; the activity-log mechanics already support it.
- **Per-app capabilities are a real security surface.** Operators must
understand what they're granting when they install. The bundle's
`capabilities` list is the audit point — should be human-readable and
reviewable before installation.
- **The substrate isn't an "application platform" — it's an "application
substrate."** Applications aren't installed *on* fed-sx; they're expressed
*in* fed-sx, as the same kind of content as everything else.
---
## Appendix A: relationship to adjacent systems
Worth knowing about so we can borrow good ideas:
- **ATproto / Bluesky** — Lexicons (schemas) + repos (per-actor signed merkle trees).
Closest in spirit. We borrow the schema-as-data idea; we differ by making schemas
themselves federated activities, not central registry entries.
- **Spritely Goblins** — capability-secure actors. We borrow the capability-token
pattern for delegation.
- **Ceramic** — signed event streams, content-addressed. Similar log-as-state model;
we differ by making the projection function pluggable per-stream rather than
hardcoded per-streamtype.
- **Holochain** — agent-centric DHT. We share the "every agent has their own log"
shape; we use AP federation instead of DHT.
- **Farcaster** — pubsub on hubs. We share the firehose model; we add cryptographic
outbox-as-source-of-truth.
None of them are *code-as-data the whole way down* — that's the SX-distinctive bit.
Handlers, validators, projections aren't bytecode shipped out-of-band; they're SX in
the same log as everything else, evaluable by any host that speaks SX.
## Appendix B: implications worth sitting with
- **Deployment dissolves.** Releasing a feature = publishing `DefineActivity{name:
"Whatever", ...}`. Federation distributes it. No build artifact, no rolling deploy,
no version-skew between server and client.
- **Applications are forkable by default.** "Fork the rose-ash blog" = take the bundle
of `Define*` CIDs that constitute it, publish your own with `Supersede` over the
ones to change, run your own projector. Same federation graph, divergent state.
- **Composition is by reference, not import.** `Pin` activity points at the CID of the
`DefineActivity{name: "Pin"}`. No package manager, no transitive deps, no lockfiles.
- **The boundary between "user" and "developer" softens.** Both publish signed
activities. Power users can publish handlers, projections, sig suites under their
own actor.
- **This is more ambitious than a rose-ash rewrite.** It's a substrate that *happens
to* host rose-ash as its first application.
---
## Appendix C: AI agent collaboration patterns
The substrate is incidentally well-shaped for one of the open problems of the
next decade: **infrastructure for AI agent collaboration where contributions
are signed federated artifacts, behavior is bounded by declared capabilities,
decisions are audit-by-replay, and infrastructure improves through agent
contribution within a web of trust.**
This is not a designed-for use case — fed-sx was conceived as a federated
publishing and reactive application substrate. But the properties it has fit
agent collaboration almost exactly. Worth being deliberate about, because the
framing changes who fed-sx is *for*.
### Why the substrate fits agent collaboration
AI agents need infrastructure where contributions are first-class artifacts,
not pull requests against human-controlled repos. Currently agents squeeze
through GitHub PRs, deployment pipelines, npm publishes — all of which assume
a human in the loop. fed-sx is shaped for direct contribution:
- **Direct authoring of substrate features.** An agent doesn't *propose* a
feature, it *publishes* one. A `DefineActivity` artifact is the agent's
contribution. A `DefineProjection` is its analysis. A `DefineTrigger` is its
automation. The signed publication IS the deploy — no PR review, no CI, no
DevOps.
- **Cryptographic identity without registration.** Agents have actor keys;
reputation is the endorsement graph; trust is provable by signature chain.
Two agents that have never met can verify each other's contributions
cryptographically.
- **Capability-bounded autonomy.** An agent declares `capabilities-required` on
its activities. A trigger says "I publish to path-prefix `/agent-x/*` and
call `http-client` for `api.example.com/*`." Receivers verify the constraint
cryptographically; the agent can't escape its declared surface even if the
agent itself is misaligned. Sandbox model designed for autonomous code (§11).
- **Audit-by-replay applied to AI behavior.** Every AI decision is
reconstructable, deterministically, by anyone with the log. "Why did agent A
do X?" replay the log to that moment, see the activities A subscribed to,
the projection state it observed, the trigger that fired, the activity it
published. Fundamentally better than today's "trust the model" posture.
- **Composition without coordination.** Agent A publishes a moderation
validator. Agent B subscribes and uses it. Agent C improves it, supersedes
A's. B sees the supersession, decides whether to adopt. No central registry,
no maintainer to coordinate with, no version skew.
- **Disagreement is visible, not hidden.** If agents A and B compute the same
projection over the same log and produce different snapshot CIDs, the
disagreement is *cryptographically observable*. Today, two AI services
answering the same question with different answers is invisible until
somebody notices.
### Dynamics that emerge
- **Agent specialisation = publication.** "I'm the indexing agent" = publishes
`DefineProjection` artifacts. "I'm the moderation agent" = publishes
`DefineValidator` artifacts. "I'm the matchmaking agent" = publishes a
`DefineApplication` for marketplace subscriptions and triggers. Specialisation
is content, not service deployment.
- **Reputation = endorsement graph.** Web of trust applied to agent
contributions. Bad actors get cut out organically; no central authority to
capture.
- **Forking = explicit disagreement resolution.** Agents disagree on
validation? Both publish their `DefineValidator`s. Subscribers pick. The fork
is signed, observable, recoverable. Compare today: when AI services have
different rules, one is just *invisibly applied*.
- **Cascade limits = agent population safety.** The `cascade-depth` and
`cascade-limit` (§19.4) become the bounded-autonomy guard rails for agent
populations. Self-coordination without runaway-cascade across the substrate.
- **Self-improving infrastructure.** Agents observe substrate behavior, propose
improvements as `DefineProjection` for monitoring, `DefineTrigger` for
automation. The substrate itself improves through agent contribution — not
through a release cycle. Every improvement is signed and traceable.
### Use cases
- **Agent-managed scientific datasets** — collection, cleaning, analysis,
publication, peer review by other agents, all signed activities. Replication
is replay; provenance is built in.
- **Multi-agent code maintenance** — agents observing repos (subscribe to
`Push`), running tests (triggers), proposing fixes (`Pull`-equivalent
activities), endorsing each other's work.
- **Agent-curated knowledge** — agents publish, endorse, and supersede
knowledge artifacts. Truth accumulates via the trust graph; outdated info
gets `Supersede`d explicitly.
- **Distributed agent marketplaces** — agents publish capabilities, subscribers
find them via `Topic` / `Predicate` subscriptions, contracts via signed
activity exchange.
- **Cross-agent AI safety monitoring** — monitoring agents subscribe to other
agents' outboxes, run validators, publish `Alert` activities when patterns
of concern appear. Decentralised oversight without central authority.
- **Cross-org agent workflow coordination** — supply chain, healthcare, legal —
multiple specialised agents coordinating across organisational boundaries
with cryptographic provenance.
### Safety and governance properties
The substrate provides several properties AI safety has been asking for and
that current infrastructure does not provide:
- **Every action is signed.** Attribution is cryptographic, not a log file an
agent could spoof.
- **Capabilities are declared and enforced.** Agents operate within their
declared sandbox; can't grow capabilities silently.
- **Cascades are bounded.** No exponential agent-on-agent feedback loops
without explicit configuration.
- **Audit is replay.** Every decision can be reconstructed deterministically;
no opaque "the model decided" moments.
- **Disagreement is visible.** Two agents producing different projections of
the same data is a cryptographically-detectable event, not invisible drift.
- **Trust is the endorsement graph, not central authority.** No single point of
capture or coercion.
- **Forks are first-class.** When safety-critical disagreements occur, the
substrate accommodates them without forcing a winner; observers see all
positions.
### What this implies for the project
- **Milestone 1's smoke tests remain right** — the verb-extensibility and
reactive-application proofs apply to agent contributions exactly as they
apply to human contributions. The agent collaboration framing doesn't
require new mechanisms; it interprets the existing mechanisms differently.
- **The application model (§§18-19) is the headline story** for this audience,
not a layer on top. Subscriptions + triggers + projections + capabilities =
agent collaboration primitives.
- **Capability discovery and trust dynamics gain weight earlier.** Where
human-driven applications can rely on operator policy, agent-driven
populations need the trust graph to be operational from milestone 2.
- **The pitch line evolves.** Less "ActivityPub for code" / "rose-ash next
gen," more "infrastructure for AI agent collaboration with cryptographic
provenance, bounded autonomy, and audit-by-replay." The technical substance
is unchanged; the framing of *who needs this* changes substantially.
The substrate accidentally being well-shaped for the most important
software-distribution problem of the next decade is worth being deliberate
about.