rose-ash/plans/fed-sx-design.md

# fed-sx — Federated SX Activity Substrate

A federated, content-addressed, extensible application substrate where the unit of
computation is a signed activity, the unit of state is a pure SX projection over the
activity log, and the substrate's own extensibility (new verbs, new object types, new
projections, new validators) is itself published through the same mechanism.

Status: **design** — not yet implemented. Target subdomain: `next.rose-ash.com`.
Target location in repo: `next/` (new top-level dir, sibling to `blog/`, `market/`,
etc.). Stack: pure SX-on-OCaml. Implementation language(s) to be chosen after design
is complete.

---

## 1. Premise

ActivityPub's data model — actors, signed activities, inboxes/outboxes — generalises
beyond social posting to any domain where state evolves via signed messages. fed-sx
takes that generalisation seriously:

- The unit of communication is a **signed AP activity**.
- The unit of content is an **AP object**, content-addressed by **CID** (multihash +
  multicodec, default `dag-cbor` over the parsed SX AST).
- State is the **deterministic fold** of pure SX functions over the activity log.
- The substrate is **self-extending**: new activity types, object types, projections,
  validators, codecs, transports, and signature suites are themselves published as
  `Define*` activities — federated like any other content.

Three commitments make the rest fall into place:

1. **The kernel is dumb.** It only knows envelope shape, signature verification,
   append-to-log, fetch-by-id, transport in/out. It does not know what `Create` or
   `Pin` *mean*.
2. **Everything else is registry-driven.** Verbs, object types, validators, projections,
   codecs, transports, audiences, proofs, sig suites — all looked up in registries the
   kernel calls into.
3. **The registries are themselves publishable.** New entries arrive as `Define*`
   activities. Bootstrap registries load from a known set of CIDs at startup; everything
   else is replayed from the log.

Result: the only code that ever needs to change in the kernel is the envelope itself.
New verbs = published SX, federated like any other artifact.

---

## 2. CIDs and content addressing

Every artifact has a CID. Default codec is **dag-cbor** over the parsed SX AST (not
the raw text). This buys:

- **Sub-AST addressing for free.** Each nested structure has an implicit CID; IPLD can
  walk paths like `<file-cid>/components/card`. The "file CID *and* component CID"
  question dissolves: every node is a CID, you choose the granularity at reference
  time.
- **Polyglot canonicalization.** JS, OCaml, Python only need to agree on AST shape +
  CBOR's deterministic encoding (RFC 8949 §4.2.1). No byte-identical pretty-printer
  required across hosts.
- **Format immunity.** Reformatting, indent changes, equivalent-form normalisations
  do not change the CID.
- **Tooling fit.** sx-tree already has the parsed form in memory; computing or
  verifying a CID is just an encode + hash.

Costs accepted:
- One spec to maintain: SX↔CBOR mapping (number → CBOR int/float, string → text,
  symbol → tag, keyword → tag, list → array, dict → map). ~50 lines of code per host.
- Author's exact source text is not preserved; re-pretty-print on fetch.
- "Why don't these CIDs match" requires comparing CBOR (a `cid-explain` tool helps).

The CID format itself is multicodec-agile: the substrate also accepts `raw`,
`dag-json`, `dag-pb`, etc. when seen, dispatched via the codec registry.

---

## 3. Kernel surface (fixed — get this right)

The kernel is the only thing that's hard to change later. Everything else is in
registries. Two envelope shapes plus five operations.

### 3.1 Activity envelope

```
{ id, type, actor, published,
  to, cc, audience-extras,
  object | target | origin | result,    # AP slots, opaque to kernel
  capabilities-required: [...],         # so receivers can refuse cleanly
  proofs: [...],                        # OTS, on-chain, multi-sig — all opaque
  signature: { key-id, algorithm, value, covered-fields } }
```

### 3.2 Object envelope

```
{ id, type, cid, media-type,
  where: inline | cid | url,
  content?, link? }                     # only one populated based on `where`
```

### 3.3 Kernel verbs

The only verbs implemented directly by the kernel:

- **Append signed activity** to outbox (after envelope check + sig verify + validator
  pipeline).
- **Verify signature** against actor's published keys, time-aware (which key was
  active at `published`).
- **Fetch** by `id` or by `cid`.
- **Receive at inbox** (verify + dispatch to registered handlers).
- **Replay log** to rebuild registries on boot.

Everything else is registry-resolved.

---

## 4. Registries

Each registry has a default-populated set (loaded from genesis-bundled CIDs) and
accepts new entries via `Define*` activities. Default entries themselves are SX
artifacts — versioning, audit, replacement work the same way as user content.

| Registry | Bootstrap defaults | Extended by |
|----------|-------------------|-------------|
| **Activity types** | `Create`, `Update`, `Delete`, `Announce` | `DefineActivity{type, schema-sx, semantics-sx}` |
| **Object types** | `SXArtifact`, `Note`, `Image`, `Tombstone` | `DefineObject{type, schema-sx, render-hint}` |
| **Validators** | envelope shape, signature, type-schema | `DefineValidator{applies-to, predicate-sx}` |
| **Projections** | identity, by-type, by-cid, by-actor, actor-state, define-registry, audience-graph, by-object | `DefineProjection{name, fold-sx, query-sx}` |
| **Codecs** | dag-cbor, raw, dag-json | `DefineCodec{multicodec, encode-sx, decode-sx}` |
| **Hash algorithms** | sha2-256 | multihash table — agile by spec |
| **Transports** | http-inbox-push | `DefineTransport{name, deliver-sx, receive-sx}` |
| **Audience predicates** | `Public`, `Followers`, direct | `DefineAudience{name, member-of-sx}` |
| **Subscription types** | `Follow` (AP-standard) | `DefineSubscription{name, schema-sx, match-sx, delivery}` |
| **Proof types** | (none) | `DefineProof{type, attach-sx, verify-sx}` |
| **Storage backends** | files-on-disk | `DefineStorage{where-tag, put-sx, get-sx}` |
| **Triggers** | (none) | `DefineTrigger{when-subscription, then-sx, cascade-limit}` |
| **Signature suites** | rsa-sha256 (AP-compatible) | `DefineSigSuite{name, sign-sx, verify-sx}` |
| **Application bundles** | (none) | `DefineApplication{name, subscriptions, triggers, projections, storage}` |

Adding `Pin`, `Endorse`, `Supersede`, `Test`, `Build`, `Compose`, etc. later is just
publishing `DefineActivity` artifacts — no kernel diff, no redeploy required if
registries are hot.

---

## 5. The meta-level

A `DefineActivity` is itself an AP `Create` activity over an `SXArtifact` of a
specific type:

```sx
(activity 'Create
  :object {:type "DefineActivity"
           :name "Pin"
           :schema (fn (act)
             (and (string? (-> act :object :path))
                  (cid? (-> act :object :cid))))
           :semantics
           '(fn (act state)
             (assoc-in state [:pins (-> act :object :path)]
                       (-> act :object :cid)))})
```

When the kernel receives an activity with `type: "Pin"` it looks up the registered
semantics from a `DefineActivity{name: "Pin"}` artifact, runs the SX, projects the new
state. The semantics are themselves content-addressed and federated — every receiver
runs the same code.

Same pattern handles `DefineProjection`, `DefineValidator`, etc. The substrate is
genuinely self-extending.

---

## 6. Verbs

### 6.1 Bootstrap verbs (milestone 1)

The substrate exposes `POST /activity` (not `POST /publish`) — generalised entry
point that takes any well-formed AP activity, validates, signs, appends to outbox.
`(publish sx)` is sugar at the SX layer for `Create{SXArtifact}`.

Day-one verbs (cost ~zero once `/activity` exists):

- **`Create`** — the publish primitive.
- **`Update`** — supersede a previous activity (correct metadata, change a path
  mapping). Distinct from "publishing new content" — new content is always a new
  `Create` with a new CID.
- **`Delete`** — tombstone. AP-native; readers honour it.
- **`Announce`** — boost another actor's artifact into your outbox. Comes free.
- **`Subscribe`** — generalised subscription verb (parallel to publish/`Create`).
  Wraps any registered `DefineSubscription` type. `Follow` is the standard AP
  `Subscribe{Follow{actor: ...}}` for wire compatibility. See §18.
- **`Unsubscribe`** — `Undo` of a prior `Subscribe`. Same shape as AP
  `Undo{Follow}`.

### 6.2 Custom verbs (designed-for, defined later)

Substrate accepts these from day one (any signed activity can be appended); semantics
projected once `DefineActivity` artifacts exist.

- **`Pin`** — assign `domain:path/name → CID`. The future name-resolution layer made
  of activities. Each pin is signed; the resolver replays the outbox to compute current
  state.
- **`Endorse`** (modelled on `Like`/`Approve`) — third-party signature on a CID.
  Web-of-trust style code review without central authority.
- **`Supersede`** — "CID A replaces CID B". Stronger than `Update`; readers can chase
  the chain.
- **`Test`** — published assertion that running CID A under conditions X yields result
  Y. Test-as-artifact, federated.
- **`Build`** — links a source CID to a compiled-output CID, with provenance.
- **`Compose`** — derived artifact citing input CIDs. Provenance graph in the outbox
  itself.
- **`Note`** (AP-native) — comments / reviews / discussion attached to a CID.
- **`Follow`** / **`Undo(Follow)`** — subscribe to another instance's outbox.

The pattern that matters: your outbox isn't just "things published," it's an
**append-only log of every assertion this actor makes about the SX universe.**

---

## 7. Capability discovery

Two pieces:

- **`GET /.well-known/sx-capabilities`** — JSON listing every registered activity-type,
  object-type, codec, transport, sig-suite, proof-type. Each with the CID of the
  `Define*` artifact that introduced it. Peers can diff capabilities before federating.
- **`capabilities-required`** field on activities — sender declares "this needs `Pin`
  semantics + `dag-cbor` codec." Receivers without those capabilities return a clean
  422 referencing the missing CIDs; sender knows whether to replay-and-deliver the
  bootstrapping `Define*` artifacts first.

Federation degrades gracefully across instances at different versions.

---

## 8. Axes of flexibility (all designed-for)

1. **Object types** beyond SXArtifact — `Note`, `Article`, `Image`, `Video`, `Question`,
   `Event`, etc. via the object-type registry.
2. **Storage tier per-object** — `where: inline | cid | url`. Tiny things inline; big
   things to IPFS; legacy stuff URL-linked. Migrating storage backends doesn't migrate
   the substrate.
3. **Multihash + multicodec agility** — sha2-256 + dag-cbor by default; substrate
   accepts blake3, raw, dag-json, dag-pb, etc.
4. **Multi-key actors** — `publicKeys` array always; per-key `purpose`; multiple key
   types (RSA for AP wire compat, Ed25519 modern). See §9.
5. **Audience / visibility** — AP-native `to`, `cc`, `bto`, `bcc`. Public, followers,
   direct, unlisted. Custom audiences via `DefineAudience`.
6. **Outbox-as-database** — no source-of-truth other than the log. Projections are
   recomputable views.
7. **Programmable activities** — activities can carry SX. Reactive federation,
   conditional pins, automated propose/test/release pipelines, all expressed as AP
   activities.
8. **Federation transport pluggable** — outbox is canonical; how peers exchange is
   pluggable (HTTP push, pull, libp2p, polling).
9. **Optional timestamp proofs** — every activity has an attachable `proofs` slot.
   OpenTimestamps, on-chain merkle commit, third-party TSA all slot in without changing
   activity semantics.

Explicitly **not** pursuing for MVP:
- Schema-version negotiation (premature; `@context` handles extension).
- Configurable conflict-resolution per actor (last-signed-wins, log preserved for
  audit).
- Verb-specific kernel handlers (other than `Create`'s "compute CID, store body").

---

## 9. Identity & actor lifecycle

### 9.1 Actor doc shape

```jsonld
{
  "@context": ["https://www.w3.org/ns/activitystreams",
               "https://w3id.org/security/v1",
               "https://next.rose-ash.com/ns/fed-sx/v1"],
  "type": "Person",                       // or Service, Group, Application
  "id": "https://next.rose-ash.com/actors/giles",
  "preferredUsername": "giles",
  "inbox": "https://next.rose-ash.com/actors/giles/inbox",
  "outbox": "https://next.rose-ash.com/actors/giles/outbox",
  "followers": "...",
  "following": "...",

  "publicKeys": [                         // ARRAY from day one — never `publicKey`
    { "id": "...#key-2026-05",
      "type": "RsaVerificationKey2018",
      "owner": "<actor-id>",
      "publicKeyPem": "...",
      "purpose": ["sign-activity", "sign-http"],
      "created": "2026-05-14T...",
      "expires": null,
      "supersedes": null,
      "supersededBy": null },
    { "id": "...#key-ed25519-2026-05",
      "type": "Ed25519VerificationKey2020",
      "owner": "<actor-id>",
      "publicKeyMultibase": "z6Mk...",
      "purpose": ["sign-activity"],
      "created": "2026-05-14T..." }
  ],

  "capabilities": "https://.../actors/giles/capabilities",  // what verbs they speak
  "alsoKnownAs": ["did:web:rose-ash.com:giles", ...],       // bridge to DID, AP migration
  "movedTo": null                                            // set on Move
}
```

Key shape decisions:

- **`publicKeys` array always.** Single-key actors have an array of length 1. AP
  standard `publicKey` is *also* served as the first array element for back-compat
  with vanilla AP servers (Mastodon etc. ignore the array).
- **Per-key `purpose`** — separates signing weight. Day-to-day publish key vs. high-
  value key for `Pin`/`Endorse` vs. delegated machine key. Validators can require
  specific purposes per activity type (registry-driven).
- **Multiple key types** — RSA for AP wire compat, Ed25519 for everything else
  (smaller, faster, modern). Sig suite registry decides which suites are accepted.
- **`supersedes` / `supersededBy`** — keys form a chain, not a snapshot. Old activities
  still verify against historical keys.

### 9.2 Key rotation

Key rotation is itself an activity, signed by the *old* key (or a recovery key):

```sx
(activity 'Update
  :object actor-id
  :patch {:add-publicKey new-key
          :supersede {old-key-id new-key-id}})
```

Kernel:
1. Fetches actor's current state (a projection over their own outbox).
2. Verifies activity is signed by a key with `purpose: rotate-key` (or any active key,
   if registry allows).
3. Appends. The actor-state projection now has the new key.

Old activities still verify because the projection retains the historical key with
`supersededBy` set — sig verification looks up "what keys were active at activity
timestamp T."

### 9.3 Key recovery / loss

- **Recovery key** — separate key at actor creation, never used except to rotate.
  Stored offline. `purpose: ["recover"]`. Validator allows
  `Update{actor, patch: rotate-all-keys}` if signed by a recovery key.
- **Social recovery** — designate N trusted actors, M-of-N can co-sign a recovery
  `Update`. Implemented as a `DefineValidator` extension; multi-sig slot in `proofs`
  makes it possible without changing the envelope.
- **Total loss** — if both signing and recovery keys are gone, the actor is dead.
  They publish a new actor with `alsoKnownAs: <old-actor-id>` from a fresh key.
  Followers can choose to re-follow but there's no cryptographic continuity.

### 9.4 Migration (`Move`)

AP-native:

```sx
(activity 'Move
  :object old-actor-id
  :target new-actor-id)
```

Receivers update their follow lists. New actor's `alsoKnownAs` must include old
actor — bidirectional handshake prevents hijacking.

For fed-sx, `Move` should also carry an outbox migration hint (CID of an export bundle)
so receivers can re-anchor projections without re-fetching activity-by-activity.

### 9.5 Subordinate actors / delegation

Two patterns supported:

- **Service actors** (AP-native `type: Service`): bots, build servers, test runners.
  Their own keys, their own outboxes, but `attributedTo` a parent actor.
- **Capability tokens**: parent publishes `Authorize{actor: child, capabilities: [...],
  expires: ...}` signed by parent. Child publishes activities normally with their own
  key; receivers verify the capability chain when child invokes an authority they don't
  own outright. Useful for: temporary publish access, delegated `Pin` rights for a
  specific path prefix, multi-device.

Both work *without* new kernel mechanism — just activities.

### 9.6 Implications

- **Sig verification is timestamp-aware.** Verifying an old activity needs the key
  state at the time it was published — actor-state projection must support time-travel
  queries.
- **Inbox doesn't trust `keyId` blindly.** Fetches actor doc, projects current key
  state, checks key was valid at `published`.
- **Cross-instance identity via `alsoKnownAs` and DIDs.** Don't depend on DIDs but
  slot them in for Bluesky-bridge, Solid-bridge, etc.

---

## 10. Projection model

The architectural commitment: **state is what you get when you fold pure SX over the
log.** No DB-of-record. Everything queryable is a projection.

### 10.1 What a projection is

A `DefineProjection` activity registers four things:

```sx
(activity 'Create
  :object {:type "DefineProjection"
           :name "actor-state"
           :initial-state {}                        ; pure SX value
           :fold (fn (state activity)               ; pure SX
                   (case (:type activity)
                     "Create"  (when (= "Person" (-> activity :object :type))
                                 (assoc state (:id activity) (:object activity)))
                     "Update"  (apply-patch state activity)
                     "Move"    (set-moved state activity)
                     state))
           :snapshot-codec "dag-cbor"
           :indexes [{:by :id} {:by :preferredUsername}]})
```

- **`name`** — query handle. Unique per actor; collisions resolved by CID + supersession.
- **`initial-state`** — pure SX value used as state-zero.
- **`fold`** — pure SX function `(state activity) → state`. The only thing the kernel
  calls.
- **`indexes`** — optional hint for materializing lookup paths.

The CID of the `DefineProjection` artifact is the projection's identity. Two instances
running the same projection are running the same CID's `fold` over the same log slice
— equivalence is decidable.

### 10.2 The fold contract — purity, determinism, gas

The fold function must be **pure and deterministic**. Non-negotiable; it's what makes
cross-instance equivalence and replay possible.

- **No IO.** No HTTP, no file access, no DB calls, no clock. The activity carries its
  own `published` timestamp.
- **No randomness.** No host-seeded PRNG. (If pseudo-randomness is needed, seed from
  the activity's CID — deterministic across hosts.)
- **No mutation outside the returned state.**
- **Bounded execution.** Each fold call gets a gas budget (default tunable, e.g. 100k
  CEK steps). Exceeding it is a hard failure.

Enforced at the SX evaluator level by running folds in a sandboxed environment with
the IO platform stripped to nothing. Same sandbox model applies to validators and
trigger semantics.

**Cross-host equivalence guarantee:** for the same projection CID + same activity log
slice, every conforming SX host (JS, OCaml, Python, Haskell-on-SX, …) must produce a
state value with the same canonical CID. Tested via the spec test suite.

### 10.3 Bootstrap projections

The kernel cannot start without some projections, because the kernel itself uses them.
Baked into the genesis bundle (see §11), superseded only by deliberate kernel-version
upgrades.

| Projection | What it computes | Used by |
|------------|------------------|---------|
| `activity-log` | Identity — every activity, indexed by id and CID | Everything |
| `by-type` | `type → ordered list of activity-CIDs` | Most queries |
| `by-actor` | `actor-id → ordered list of activity-CIDs` | Per-actor outbox view |
| `by-object` | `object-CID → list of referencing activity-CIDs` | "Who pinned this?" |
| `actor-state` | `actor-id → current actor doc with key history` | Sig verification (kernel) |
| `define-registry` | `kind+name → currently-active Define* CID` | All other Define* lookups |
| `audience-graph` | `actor → followers/following` | Federation push |

`define-registry` is the bootstrap chicken-and-egg: it's the projection that knows
which projections (and validators, codecs, etc.) are currently active. Kernel ships
with it hardcoded; once running, every other projection (including a future replacement
of `define-registry` itself) is a regular `DefineProjection` superseding it.

### 10.4 Snapshotting

Replaying the entire log on every restart is unacceptable past day one.

- **Snapshot = `(activity-tip-CID, projection-state, projection-CID)` tuple,**
  dag-cbor encoded, content-addressed.
- **Snapshot rule** — every K activities (default 1000) and every T seconds (default
  60), serialize, hash, store on disk.
- **Resume** — on startup, find latest snapshot for each (projection-CID, log-tip),
  load state, fold forward.
- **Snapshot CID is verifiable** — anyone with the same log slice and projection-CID
  can recompute and check the CID matches. This is the cross-instance agreement proof.

Snapshots are themselves publishable as activities (`Create{Snapshot}`): an instance
can publish "here's my computed state for projection X at log-tip Y, CID Z." Other
instances can fetch and use as a starting point. **Federated state sharing falls out of
federated activities.**

Snapshots are pruning-friendly: keep latest + snapshots referenced by published
`Create{Snapshot}` activities; everything else is GC-able.

### 10.5 Reprojection on definition change

When `DefineProjection{name: "actor-state"}` is superseded by a new CID with a
different fold:

1. `define-registry` projection sees the supersession; its state advances.
2. New projection materialized **alongside** the old one — both kept live during
   migration.
3. New projection runs in catch-up mode: replay from genesis (or from deepest
   compatible snapshot).
4. When new projection catches up to log tip, queries cut over. Old projection state
   can be retired.
5. Snapshots of old version stay around as long as referenced (e.g. for time-travel
   queries against historical state under old semantics).

Changing a projection definition is **safe and online**. Cost: temporary state
duplication during catch-up. Slow folds → slow migrations, but never breakage.

For projections too expensive to fully reproject, `Update{DefineProjection}` can
declare `migrationHint: <fn from old-state to new-state>` — opt-in, used at migrator's
risk.

### 10.6 Time-travel queries

Folds are deterministic functions of `(initial-state, activity-list-prefix)`.
Time-travel is fold-up-to:

- `state-as-of(projection, activity-id-or-timestamp)` → walk to requested point,
  return state.
- Snapshots act as accelerators (resume from nearest snapshot ≤ target).
- Used by sig verification ("what keys did this actor have when this activity was
  signed?"), audit, "what did we believe last Tuesday."

### 10.7 Projection composition

**Projections do not directly read each other's state during folding.** Preserves
locality and parallelism — every projection runs independently against the same log.

Composition via:

- **Query time** — `(query (projection actor-state) ...)` joins are SX expressions
  over multiple projection states.
- **Republishing as activities** — a projection that exposes its state as input to
  others publishes `Create{Snapshot}` periodically. Downstream projections fold over
  those.

Direct cross-projection reads during fold introduce ordering, cycles, cache-
invalidation problems we don't need.

### 10.8 Querying

Three layers:

- **Raw projection state** — `GET /projections/<name>?at=<timestamp>` returns dag-cbor
  (also JSON for tooling). Large states paginated by index.
- **SX queries** — `POST /query` with an SX expression that runs against one or more
  projection states in pure mode. Equivalent to Datalog/GraphQL.
- **Materialized indexes** — declared on projection (`indexes:` field). Kernel
  maintains as side-tables for `O(log n)` lookup.

Real-time: clients `GET /projections/<name>/subscribe` (SSE), receive deltas as
activities land. Delta is `(old-state, new-state, applied-activity-CID)`; clients can
verify by re-folding.

### 10.9 Lag, async, concurrency

- **Append is sync; projection is async.** `POST /activity` returns once activity is
  durably in the log. Projections run in a separate worker pool; query results carry
  `projected-up-to` so callers know whether the latest write is visible.
- **One worker per projection.** Folds are sequential, but projections run in parallel
  with each other.
- **Sync option** — `POST /activity?wait-for=projection-name` blocks until the named
  projection has folded the new activity. Use sparingly.

### 10.10 Failure modes

| Failure | Response |
|---------|----------|
| **Gas exhaustion** | Activity tagged `projection-failed` for this projection. State unchanged. Operator alert. |
| **SX runtime error** (assertion, type mismatch) | Same as gas: activity skipped, error logged, state unchanged. |
| **Schema violation** | Caught earlier in validation pipeline, never reaches projection. |

The log itself is always written successfully if it passes envelope + signature +
validator checks. Projection failures don't gate appending — that would couple writes
to arbitrary user-defined code.

### 10.11 Operational implications

- **Projection determinism is the linchpin.** If JS and OCaml ever produce different
  state for the same log + projection, federation cracks. Spec test suite must cover
  projection equivalence across hosts as a first-class requirement.
- **Snapshots are eventual consensus.** Two instances publish `Create{Snapshot}` for
  the same log+projection; if their CIDs match, they agree without coordination.
- **Kernel reads its own projections.** `actor-state` for sig verification;
  `define-registry` for every Define* lookup. Startup sequence must bootstrap these
  before serving traffic.
- **Reprojection cost is real.** Heavy projection changes mean replaying from genesis.
  Encourage incremental schemas (small per-activity work, idempotent updates) and
  provide profiling.

---

## 11. Sandbox & determinism

The runtime contract that makes folds (and validators, triggers, semantics) safe to
execute, and that guarantees every conforming SX host computes the same state from
the same log.

### 11.1 Three sandbox levels

Different registry entries need different power. We define three nested execution
modes; the registry entry declares which mode it requires.

| Mode | Used by | IO | Clock | Random | Determinism |
|------|---------|----|----|--------|-------------|
| **pure** | folds, validators, audience predicates, semantics, trigger `when-sx` | none | activity's own `published` only | seeded from activity CID only | required across hosts |
| **crypto** | sig suite verify, codec encode/decode | crypto primitives only | none | sign-only secure RNG | required across hosts (verify); single-host (sign) |
| **effectful** | storage backends, transports, trigger `then-sx`, some proof verifiers | per-capability grant only | host clock | host RNG | not required; single-host |

Default mode is **pure**. The other two are opt-in at registration time, and the
registration is itself a signed activity — anyone can audit which extensions claim
which powers.

### 11.2 Pure sandbox (the load-bearing one)

This is the mode every projection fold runs in. It must produce identical results on
every conforming SX host, every time.

**Allowed:**
- All spec primitives in `spec/primitives.sx` that don't perform IO (arithmetic,
  comparison, predicates, string ops, collection ops, dict ops, format helpers).
- The activity being processed (full envelope), as the function's argument.
- The current state value, as the function's argument.
- A small set of fed-sx-specific deterministic primitives:
  - `(activity-cid act)` → CID of the activity envelope
  - `(activity-time act)` → ISO timestamp from `published`
  - `(actor-state-as-of state-snapshot actor-id activity-time)` → if the projection
    has been declared dependent on `actor-state` (see §10.7), reads from a snapshot
    of that projection at the activity's timestamp
  - `(seeded-rng cid)` → deterministic PRNG seeded from a CID, returns a stream of
    uniform values

**Forbidden:**
- All IO: HTTP, file, network, stdin/stdout, environment.
- Wall-clock access. The host's `now` is not in scope; the only time available is
  `(activity-time act)`.
- Host-seeded randomness. Only `seeded-rng` (CID-derived) is available.
- Mutation outside the returned value. Enforced by the SX evaluator's lack of
  ambient mutable bindings; folds may use local `let` and mutation within their own
  closure but cannot reach outside.
- Calling other registry entries by name. Composition happens at query time, not
  fold time (see §10.7).

**Enforced by:** evaluator runs the fold with the IO platform stripped to nothing.
The fed-sx kernel constructs a `pure-platform` (no fetch, no query, no action, no
DOM, no storage) and uses it as the sole evaluator platform when calling the fold.
Any IO primitive call raises a hard error caught as a fold failure.

### 11.3 Crypto sandbox

Sig suites and codec encode/decode need hash + crypto + encoding primitives but
nothing else. They're still deterministic across hosts (verify case) but get a
narrower platform than effectful, wider than pure.

**Additional primitives over pure:**
- `(sha2-256 bytes)`, `(sha3-256 bytes)`, `(blake3 bytes)`, …
- `(rsa-verify pubkey msg sig)`, `(ed25519-verify pubkey msg sig)`, …
- `(rsa-sign privkey msg)`, `(ed25519-sign privkey msg)` — sign-only; requires the
  caller to supply a secure RNG handle (which is *not* in pure mode)
- `(cbor-encode value)`, `(cbor-decode bytes)` — for codecs implementing CBOR variants
- `(base32-encode bytes)`, `(base58btc-encode bytes)`, `(multibase-encode tag bytes)`
- `(multihash-encode tag digest-bytes)`, `(multihash-decode bytes)`
- `(cid-encode codec mhash)`, `(cid-decode bytes)`

**Sign vs verify:** verify is pure (deterministic). Sign is not — it consumes
randomness. fed-sx draws a clean line: signing happens *outside* registry-entry SX
(it's an operation the kernel/runtime performs on behalf of the actor with their
private key); registry SX only ever *verifies*. This keeps the pure↔crypto distinction
tractable.

### 11.4 Effectful sandbox

Storage backends, transports, trigger `then-sx`, and proof verifiers that need the
network (e.g. blockchain RPC for on-chain proof verification) all need real IO.
These are not used to compute projected state; they're how the substrate interacts
with the outside world.

**Capability-granted primitives.** The registration activity declares the
capabilities the entry needs:

```sx
(activity 'Create
  :object {:type "DefineStorage"
           :where-tag "ipfs"
           :capabilities [{:type "http-client" :allowlist ["http://localhost:5001/*"]}
                          {:type "fs-read"    :path-prefix "/var/cache/fed-sx/ipfs/"}
                          {:type "fs-write"   :path-prefix "/var/cache/fed-sx/ipfs/"}]
           :put-sx (fn (cid bytes) ...)
           :get-sx (fn (cid) ...)})
```

**Capability types** (initial set; extensible):

- `http-client` with `allowlist` (URL prefix patterns)
- `http-server` with `path-prefix` (mounts a sub-handler)
- `fs-read` / `fs-write` with `path-prefix` (chroot-style)
- `subprocess` with `command-allowlist`
- `clock-read` (wall clock; granted if registry entry needs to timestamp something)
- `random-bytes` (host CSPRNG)

**No ambient authority.** Default capability set is empty; every capability is
explicit, declared, signed, and auditable. A peer can refuse to load a registry
entry whose capability claim is unacceptable to them.

**Capabilities are content-addressed.** Each capability descriptor has a CID. The
substrate maintains a registry of "capability CIDs that this instance trusts to
honour" — operator policy, not protocol.

### 11.5 Gas and resource accounting

Each sandbox call gets a budget:

- **CEK gas** — every evaluator step costs 1 unit; primitive calls cost a per-
  primitive amount declared in `spec/primitives.sx`. Default budget: 100k units per
  fold call. Tunable per-projection via `DefineProjection.gas-limit`.
- **Memory ceiling** — peak heap size for the fold call. Default 64 MB. Tunable.
- **IO budget** (effectful only) — bytes read/written and network calls per
  invocation, granted separately per capability.
- **Wall-clock budget** (effectful only) — max real-time before forced termination.

Exceeding any budget is a hard failure; the call returns an error value, the fold's
state is unchanged, and the activity is tagged for the projection.

Gas accounting is part of the spec — every conforming host must charge the same
units for the same operations, so "this fold runs out of gas" is a deterministic
property of the (projection, activity) pair, not a host-specific outcome.

### 11.6 Determinism gotchas

The pure sandbox is only as deterministic as its primitives. Worth nailing:

- **Floating point.** IEEE 754 binary operations are bitwise-identical across
  conforming hosts, but transcendentals (`sin`, `cos`, `log`, `exp`) are *not* —
  libm implementations differ. **Decision: floats are forbidden in pure mode unless
  the projection declares `requires-deterministic-floats: true` and uses only the
  IEEE 754 basic operations (+, -, *, /, sqrt, comparison, conversion).** For exact
  arithmetic, use integers or rationals (fed-sx will provide a rational primitive).
- **Map / dict iteration order.** Must be sorted-key always in pure mode. The SX
  spec mandates this for `for-each` and `map` over dicts; we tighten it: pure mode
  forbids relying on insertion order.
- **String encoding.** All strings are UTF-8 NFC at ingestion; pure-mode operations
  use byte-level comparison after normalization. Codepoint operations (`length`,
  `substring`) return identical results across hosts because they operate on the
  normalized form.
- **Integer overflow.** Pure mode uses arbitrary-precision integers (the SX spec
  default). No undefined behaviour. Overflow is impossible.
- **Equality.** Structural equality (`equal?`) compared across hosts must yield the
  same result for the same canonical-CID values. Implies dict equality is
  order-independent (as it should be), and float equality follows IEEE 754 (NaN ≠
  NaN; +0.0 = -0.0).
- **Error values.** When a primitive errors, the error must be representable as a
  dag-cbor value with a stable CID across hosts. Reserve a `{:error :type ... :msg
  ...}` shape; standard error types defined in the spec.

### 11.7 Failure model

A pure-mode call ends in one of three terminal states:

1. **Success** — returns a value. Fold uses it as new state.
2. **Sandbox violation** — IO attempted, capability denied, etc. Returns a stable
   error value; fold's state is unchanged; activity tagged
   `{:projection-failed :reason :sandbox-violation :detail ...}`.
3. **Resource exhaustion** — gas, memory, IO budget exceeded. Same handling as
   sandbox violation but with `:reason :resource-exhausted`.

Crypto-mode failures (e.g. invalid signature) are *return values*, not exceptions —
verify returns boolean, sign returns either a sig or an error. This forces callers
to handle failure explicitly.

Effectful-mode failures (network down, disk full) propagate to the operator as
errors but never affect projected state. The substrate retries effectful operations
according to the registry entry's policy (declared at registration).

### 11.8 Conformance testing

Cross-host equivalence isn't aspirational; it's tested.

- **Spec test suite** ships projection equivalence tests: a corpus of (log slice,
  projection CID, expected snapshot CID) tuples. Every conforming SX host must
  produce the expected snapshot CID for each input.
- **Validator equivalence tests** likewise: (validator CID, activity, expected
  result).
- **Codec equivalence tests:** (codec CID, value, expected encoded bytes), in both
  encode and decode directions.
- **Sandbox isolation tests:** "this fold attempts to call `fetch`; expected
  outcome: sandbox violation error with stable CID."

Hosts run the conformance suite to claim "fed-sx pure-mode conformance." Failures
are publishable as `Test{result: failed, host: ..., projection: ...}` activities —
the conformance graph itself is federated.

### 11.9 Operational implications

- **The pure sandbox is the heart of cross-host federation.** Every divergence is a
  spec bug or a host bug; both are caught by snapshot CID mismatches and surfaced
  via `Test` activities.
- **Capability descriptors are the new audit trail.** "What can the IPFS storage
  backend do?" is a question with a precise answer at any timestamp — the registered
  capability CIDs.
- **Floats are mostly absent.** This is unusual but defensible — most state in the
  substrate is ids, counts, sets, references. Numerical computation belongs in
  effectful registry entries (e.g. an analytics projection that publishes summaries
  as activities, projected by a downstream pure projection that just stores them).
- **Gas is part of the protocol.** Two hosts disagreeing about whether a fold runs
  out of gas is a conformance failure. Spec primitive gas costs are normative.

## 12. Bootstrap & genesis

How a fresh instance starts with no log, where the initial registry entries come
from, and how the kernel evolves without bricking peers.

### 12.1 The genesis problem

The substrate is "everything is a `Define*` activity in the log." But on a fresh
instance the log is empty — so there are no `Define*` activities to tell the kernel
what `Create` means, how to verify a signature, or what dag-cbor is. Strict
turtles-all-the-way-down would deadlock startup.

Solution: **the kernel ships with a baked-in genesis bundle** containing the minimal
set of definitions it needs to interpret its own log. The bundle is a constant of
the kernel binary; its CID is hardcoded; the kernel verifies on startup that the
bundle matches its hardcoded CID. After that, everything (including superseding the
bundled definitions themselves) goes through the activity log.

The genesis bundle is *not* itself a federated artifact in the AP sense. It's the
dictionary you need before you can read any activities. Optionally, an actor can
`Create{GenesisRecord}` as their first published activity to advertise which genesis
they started from — informational, not load-bearing.

### 12.2 Genesis bundle contents

Minimal viable bundle (dag-cbor object, content-addressed):

```
{
  "type": "fed-sx-genesis",
  "kernel-version": "1.0.0",
  "envelope-spec": { ... },                 // canonical schema for activity envelope
  "object-spec": { ... },                   // canonical schema for object envelope
  "definitions": {
    "activity-types": {
      "Create":   { "schema": <sx>, "semantics": <sx> },
      "Update":   { "schema": <sx>, "semantics": <sx> },
      "Delete":   { "schema": <sx>, "semantics": <sx> },
      "Announce": { "schema": <sx>, "semantics": <sx> }
    },
    "object-types": {
      "SXArtifact": { "schema": <sx> },
      "Note":       { "schema": <sx> },
      "Tombstone":  { "schema": <sx> },
      "DefineActivity":   { "schema": <sx> },
      "DefineObject":     { "schema": <sx> },
      "DefineProjection": { "schema": <sx> },
      "DefineValidator":  { "schema": <sx> },
      "DefineCodec":      { "schema": <sx> },
      "DefineTransport":  { "schema": <sx> },
      "DefineAudience":   { "schema": <sx> },
      "DefineProof":      { "schema": <sx> },
      "DefineStorage":    { "schema": <sx> },
      "DefineTrigger":    { "schema": <sx> },
      "DefineSigSuite":   { "schema": <sx> },
      "Snapshot":         { "schema": <sx> }
    },
    "sig-suites": {
      "rsa-sha256-2018": { "verify": <sx>, "key-format": <sx> },
      "ed25519-2020":    { "verify": <sx>, "key-format": <sx> }
    },
    "codecs": {
      "dag-cbor":  { "encode": <sx>, "decode": <sx> },
      "raw":       { "encode": <sx>, "decode": <sx> },
      "dag-json":  { "encode": <sx>, "decode": <sx> }
    },
    "projections": {
      "activity-log":     { "initial-state": ..., "fold": <sx> },
      "by-type":          { "initial-state": ..., "fold": <sx> },
      "by-actor":         { "initial-state": ..., "fold": <sx> },
      "by-object":        { "initial-state": ..., "fold": <sx> },
      "actor-state":      { "initial-state": ..., "fold": <sx> },
      "define-registry":  { "initial-state": ..., "fold": <sx> },
      "audience-graph":   { "initial-state": ..., "fold": <sx> }
    },
    "validators": {
      "envelope-shape": { "predicate": <sx> },
      "signature":      { "predicate": <sx> },
      "type-schema":    { "predicate": <sx> }
    },
    "audience-predicates": {
      "Public":    { "member-of": <sx> },
      "Followers": { "member-of": <sx> },
      "Direct":    { "member-of": <sx> }
    }
  },
  "capability-types": [                     // schema for capability descriptors
    "http-client", "http-server",
    "fs-read", "fs-write",
    "subprocess", "clock-read", "random-bytes"
  ]
}
```

Each definition's body is **SX source**, not bytecode. The kernel evaluates it at
startup using the same SX evaluator user-published `Define*` artifacts use — there
is no privileged "native" path. The bootstrap is just SX loaded from the binary
instead of from the log.

### 12.3 Hardcoded CID and verification

The kernel binary contains:

- The full genesis bundle (embedded as bytes).
- The CID computed over those bytes at build time.

On startup:

1. Compute the actual CID of the embedded bundle.
2. Compare to the hardcoded CID.
3. **Mismatch → refuse to start.** Either the binary has been tampered with or the
   build process is broken. Either way, the operator should know immediately.
4. **Match → proceed.** Every running instance with a given kernel binary has
   byte-identical bootstrap state — no version drift possible within a binary.

The genesis CID is exposed at `GET /.well-known/sx-capabilities` so peers can see
which kernel version they're talking to.

### 12.4 Fresh instance startup sequence

```
1. Load and verify genesis bundle (panic on mismatch)
2. Parse all definition SX sources, instantiate evaluator closures
3. Initialize registries from definitions (in the order: codecs → sig-suites →
   validators → object-types → activity-types → audience-predicates → projections)
4. Open log file (create if missing)
5. Replay any existing log: for each activity, validate, then fold into each
   projection (resuming from snapshots where available)
6. Load or generate actor keypair (filesystem path from config)
7. If actor has never published a Create{Person} for itself, generate and append
   one as the first activity of this instance's outbox
8. Initialize HTTP server, wire routes
9. Open inbox: start accepting federated activities
10. Mark instance as ready
```

Steps 1-3 are the bootstrap. Step 5 is replay-and-project. Step 7 is the
"actor genesis" — every instance has at least one local actor; it publishes itself
as its first activity, and that activity (signed by the actor's own key) anchors all
subsequent activity from that actor.

### 12.5 First activity — actor creation

Every fresh actor's outbox starts with:

```sx
(activity 'Create
  :id           "https://next.rose-ash.com/actors/giles/activities/<uuid>"
  :actor        "https://next.rose-ash.com/actors/giles"
  :published    "<iso-timestamp>"
  :to           ["https://www.w3.org/ns/activitystreams#Public"]
  :object       <full actor doc with publicKeys array>
  :signature    <signed by the new key over the activity envelope>)
```

Self-signed: the activity introduces the key it's signed with. Verifiers fetch the
actor doc embedded in the activity, find the key, verify against the activity. This
is the trust-on-first-encounter for a new actor — the same model AP uses.

The kernel emits this automatically on first startup if the actor has no prior
activity. Subsequent actor changes (key rotation, profile updates) are `Update`
activities signed by an existing key.

### 12.6 Joining federation

A new instance has no peers initially. Discovery is operator-driven for v1:

1. Operator configures one or more peer URLs (or a well-known seed list).
2. Instance fetches peer's actor doc and `/.well-known/sx-capabilities`.
3. Instance verifies it can interpret the peer's activities (envelope compatible,
   sig suites overlap). Reports incompatibilities to operator.
4. If compatible, instance follows peer's primary actor (`POST /inbox` with a
   `Follow` activity).
5. Peer streams or backfills outbox to this instance.
6. Activities arrive, validate, fold into local projections.

Discovery beyond manual config (e.g. peer recommendations, federation directories)
is a v2 concern.

### 12.7 Kernel version evolution

The substrate must evolve without forcing every instance to upgrade in lockstep.
Three rules:

**Rule 1: The activity envelope shape is forward-compatible only.**

We may *add* optional fields to the envelope; we may not change semantics or remove
fields. Old activities still validate under new kernels. New activities with new
fields are accepted by old kernels (which ignore the unknown fields, store the raw
envelope, and project conservatively).

This is the AP discipline. We adopt it strictly. If we ever need a breaking envelope
change, it's a major version (fed-sx 2.0) and instances at different majors don't
federate directly — only via bridges.

**Rule 2: Everything else evolves via supersession.**

New sig suite, new codec, new projection definition, new validator: publish a
`Define*` activity that supersedes the old one. Both old and new versions stay valid
at their respective timestamps. Old activities verify under old definitions; new
activities use new definitions. Time-aware lookup (§9.6, §10.6) makes this work.

**Rule 3: New genesis bundles supersede old ones via published activities.**

When the kernel team ships a new version with an updated bundle:

- The new bundle's CID is different.
- Operators upgrading the kernel get the new bundle automatically.
- The new bundle's *contents* are largely supersession `Update{DefineProjection,
  DefineValidator, ...}` activities relative to the old bundle's definitions.
- A peer running the old kernel sees these `Update` activities (when they appear in
  followed outboxes) and *can* opt to load them dynamically (§12.8) or stay on the
  old bundle definitions until the operator upgrades.

In other words: the kernel binary evolution and the activity-log evolution are
parallel tracks. The binary determines what's *built in*; the log determines what's
*currently active*. They converge over time but don't have to be lockstep.

### 12.8 Dynamic Define* loading

When an instance receives an activity of `type: "PinV3"` and has no `DefineActivity{
name: "PinV3"}` in its define-registry, it has three options (operator policy):

- **Strict mode** — store the activity envelope (it's valid AP), tag it `unknown-type`
  in `by-type`, do not project semantics. Operator must explicitly load the
  definition to enable projection.
- **Permissive mode** — fetch the `DefineActivity{name: "PinV3"}` artifact (its CID
  is in the activity's `capabilities-required` list), validate, evaluate the
  semantics SX (in pure sandbox), reproject the activity. Operator notified.
- **Trusted-peers-only mode** — like permissive, but only auto-loads `Define*` from
  actors on a configured trust list.

Default for fed-sx v1: **strict mode**. Operators opt-in to broader policies.

This lets the substrate genuinely live-extend — new verbs land via federation, no
binary upgrade — while keeping a clean audit trail of what got loaded when.

### 12.9 Genesis as the substrate's manifest

A useful framing: the genesis bundle is the substrate's **manifest** (in the package-
manager sense). It declares "this kernel ships with these definitions, identified by
these CIDs, and this is what the kernel does until the log says otherwise."

Two instances with the same genesis CID start identical. Two instances with
different genesis CIDs can federate as long as their *active* registry states (after
log replay) overlap enough.

The genesis bundle is also the **conformance reference**: a kernel implementation
claims fed-sx v1.0 conformance by reproducing the standard genesis bundle's CID
from its own build of the included SX sources. If two implementations build the same
spec sources and produce different CIDs, one of them is non-conformant. Cheap,
deterministic conformance check.

### 12.10 Operational implications

- **Build-time CID computation is part of the kernel build.** The build pipeline
  must include the genesis-bundling step and embed the resulting CID. Mismatch
  protection requires the binary to know what it expects.
- **Genesis evolution is a deliberate kernel-team decision.** Adding a new bundled
  projection or sig suite is a kernel release, not a federated activity. (User-
  defined projections still federate normally.)
- **Strict-mode default protects against malicious extensions.** Operators have to
  consciously opt into auto-loading remote `Define*`. This trades convenience for
  security — appropriate for v1.
- **Cross-major federation is a bridge problem.** If/when fed-sx 2.0 ships with an
  envelope change, bridges between v1 and v2 are themselves federated artifacts —
  built by anyone, signed, audited.

## 13. Federation mechanics

How instances exchange activities, how peers subscribe, how new followers backfill,
how delivery survives unreliable networks, and how the substrate resists abuse.

### 13.1 Push, pull, hybrid

ActivityPub canonically uses **push**: actor A publishes by POSTing each delivery to
each follower's inbox URL. This gives low latency and clear delivery semantics, but
requires a reliable per-recipient delivery queue and falls over when peers go down.

fed-sx supports both, with a **push-primary, pull-fallback** model:

- **Push** is the default delivery mechanism. When an activity is appended to A's
  outbox, A's delivery worker posts it to each follower's inbox.
- **Pull** is always available: any peer can `GET /actors/<id>/outbox?since=<cursor>`
  and stream activities in order. Used for backfill, recovery from delivery gaps,
  and instances that prefer pull-only operation.
- **Hybrid in practice:** push delivers *notifications* (the activity itself, or a
  pointer to its CID); receivers may pull the full content if not inlined. Useful
  when the activity body is large.

Operators can configure their actors as push-only, pull-only, or hybrid. The
default is hybrid.

### 13.2 The Follow lifecycle

AP-standard, slightly tightened:

```sx
;; A wants to follow B
(activity 'Follow
  :actor  "https://a.example/actors/alice"
  :object "https://b.example/actors/bob")
;; → POST to B's inbox

;; B accepts (or rejects)
(activity 'Accept
  :actor  "https://b.example/actors/bob"
  :object <follow-activity-id-or-embedded>)
;; → POST to A's inbox

;; A unfollows later
(activity 'Undo
  :actor  "https://a.example/actors/alice"
  :object <follow-activity-id-or-embedded>)
;; → POST to B's inbox
```

State derived by the `audience-graph` projection on each instance:

- `(followers actor)` — set of actors who follow `actor`, projected from
  `Accept{Follow}` activities in `actor`'s outbox (and the inverse via received
  `Follow` activities).
- `(following actor)` — symmetric.

**Auto-accept by default.** Public actors auto-publish `Accept` for any incoming
`Follow`. Locked actors require manual approval, implemented as an operator UI that
publishes the `Accept` (or `Reject`) once a human decides.

### 13.3 Backfill

When A first follows B, A wants B's history. Four supported modes:

| Mode | Mechanism | Trade-off |
|------|-----------|-----------|
| **No backfill** | Just stream new activities going forward | Cheapest, missing context for new followers |
| **Pull paginated** | `GET /outbox?since=epoch&limit=100` repeatedly | Standard, slow for large outboxes |
| **Snapshot fetch** | Find latest `Create{Snapshot}` published by B for the projection of interest, fetch + verify, then pull only activities after the snapshot's tip | Fast, requires B to publish snapshots |
| **Bundle fetch** | Out-of-band: B publishes a CID for an export bundle (a dag-cbor list of activities + actor doc + sig suite verification metadata); A fetches once, validates the chain, replays | Fastest for cold starts; bundle creation is opt-in |

Default: snapshot fetch when available, paginated pull otherwise.

A new instance joining federation typically combines: snapshot-fetch the
`actor-state` and `define-registry` projections from a trusted peer (so it knows who
exists and what verbs are defined), then incrementally backfill specific actors of
interest.

### 13.4 Delivery queue and retry

Every push delivery attempt has a fate:

| Outcome | Action |
|---------|--------|
| 2xx | Mark delivered |
| 3xx | Follow redirect (with limit) |
| 4xx (except 429) | Mark *permanently failed* — peer rejected the activity. Log; don't retry. |
| 429 | Honour `Retry-After`; reschedule |
| 5xx | Exponential backoff; reschedule |
| Connection error | Exponential backoff; reschedule |

**Retry schedule** (default, tunable per peer):

```
1 min, 5 min, 15 min, 1 h, 4 h, 12 h, 24 h, 48 h, 96 h
```

After the last attempt fails, the activity is **abandoned for push** but remains in
A's outbox. Followers can still pull it via `GET /outbox?since=...`. The peer will
eventually catch up if they come back online and pull. Push is best-effort; pull is
the source of truth.

**Persistent queue.** Delivery state is itself stored in the local instance — it's
operator-internal, not federated. (Could be a regular SQLite table; doesn't need to
be a projection because it's not state-the-world-cares-about.) On instance restart,
the queue resumes from where it left off.

**Queue-as-projection (alternative):** for instances that want every aspect to be
log-derived, the delivery state could be a local-only projection over a stream of
`Attempt` / `DeliverySuccess` / `DeliveryFailure` activities written to a private
local-only outbox. Out of scope for v1 but the design admits it.

### 13.5 Audience-respecting delivery

Each activity carries `to`, `cc`, `bto`, `bcc`. The delivery worker computes the
**delivery set**: union of explicit recipients + (if `as:Public` or `Followers` in
audience) the actor's followers projection.

- `bto` and `bcc` are stripped before delivery (recipients shouldn't see who else is
  blind-copied).
- **Receivers honour audience.** When an instance receives an activity it should
  not be in the audience for (e.g. a `Direct` activity to someone else, leaked via a
  misconfigured peer), it logs and discards. Validators in the inbound pipeline
  enforce this.
- **Public ≠ unlisted.** `to: as:Public` means deliver to followers AND make
  publicly fetchable AND show in public projections. Some actors prefer "publicly
  fetchable but not pushed broadly" — `cc: as:Public` with `to: Followers`.

### 13.6 Spam and abuse posture

ActivityPub has well-known abuse vectors (Mastodon's history is instructive). fed-sx
defends in layers:

**Signature verification.** Every inbound activity must have a valid signature
matching an actor whose key was active at `published`. Forgeries are dropped at the
envelope-validation stage (§14). Necessary but not sufficient — signatures only
prove the message wasn't tampered with, not that the sender is benign.

**Per-source rate limits.** Per-actor and per-instance request rate limits on
`/inbox`. Default: 100/min per actor, 1000/min per instance. Exceeded → 429.

**Per-instance trust state.** Three categories, operator-configured (and
overridable per actor):

- **Trusted** — auto-accept, auto-load Define* (if permissive mode), no rate-
  multiplier penalty.
- **Default** — accept signed activities, standard rate limits, do not auto-load
  Define*.
- **Suspended** — drop all inbound activities, refuse outbound delivery, do not
  fetch artifacts. Operator decision (e.g. spam source, harassment instance).

Trust state is local-only (operator policy); it is not federated. Different
instances can disagree.

**Audience refusal.** Activities not addressed to anyone on this instance (no local
followers, not `as:Public`, not `to:` a local actor) are dropped on receipt.
Discourages spam targeting random instances.

**Content validators.** Registry-driven content moderation: a `DefineValidator`
with `applies-to: "inbound"` runs against every inbound activity and can reject
based on content rules. Examples: link-spam detection, ML moderation models served
via an effectful validator (note: effectful validators are a special case — they
*can* fail-closed without affecting determinism, because validators happen *before*
projection and don't contribute to projected state).

**Capability vetting.** If an inbound activity declares `capabilities-required`
that includes definitions this instance hasn't loaded *and* trust policy is strict-
mode, the activity is quarantined (stored but not projected) pending operator
review.

**Federation circuit breakers.** Per-peer error rate triggers temporary defederation:
if a peer is sending malformed activities, exceeding rate limits, or signing with
revoked keys, automatic suspension for an exponential cool-off.

### 13.7 Discovery

How an instance finds other instances and actors:

- **WebFinger** (RFC 7033). `GET /.well-known/webfinger?resource=acct:user@host`
  returns links to actor URLs. AP-standard. fed-sx implements.
- **Well-known capabilities.** `GET /.well-known/sx-capabilities` (§7) for cross-
  instance compatibility checks.
- **Manual peer config.** Operators add peer instance URLs to their config.
- **Peer recommendations.** An instance can publish `Recommend{actor}` activities
  pointing at peers it considers worth following. Receivers can use these as
  discovery hints (subject to local trust). Out of scope for v1 but the verb is
  reservable.
- **Federation directories.** Community-maintained lists of instances; an instance
  can opt into being listed by publishing a `Directory{listed-by}` activity. v2
  concern.

For v1: WebFinger + capabilities + manual config. Discovery beyond that is opt-in
via standard verbs.

### 13.8 Streaming and real-time

Two streaming mechanisms:

- **Outbox SSE** — `GET /actors/<id>/outbox/stream` opens a Server-Sent Events
  connection. Each new activity appended to the outbox is sent as an event. Allows
  pull-style federation peers to maintain a live connection without polling.
- **Projection SSE** — `GET /projections/<name>/subscribe` (§10.8) streams projection
  deltas. Useful for clients (browsers) wanting reactive views.

Both are local-only mechanisms; the canonical federation transport remains push to
inbox + pull from outbox. SSE is convenience, not protocol.

### 13.9 Operational implications

- **Push is best-effort, pull is authoritative.** Operators should treat the outbox
  as the canonical record; delivery queue is bookkeeping.
- **Trust is per-instance and not federated.** Two instances may have different
  views of "good actors" and "bad instances." This is a feature — defederation
  decisions are local sovereignty.
- **Backfill via snapshots is the cheap path.** Encouraging actors to publish
  `Create{Snapshot}` regularly makes new-follower onboarding fast.
- **Audience semantics are enforced both ways.** Senders compute delivery set;
  receivers honour audience. Defence-in-depth against misconfigured peers.
- **Capability-based extension loading is opt-in.** Strict-mode default means
  unknown verbs are stored-but-not-projected — safe by default, with explicit
  operator control over what extensions load.

## 14. Validation pipeline

Every activity entering the substrate (whether published locally or received from a
peer) flows through a fixed pipeline of checks. Order matters: cheap and fail-safe
first, expensive and content-aware last. Each stage has a defined failure response
(reject, quarantine, drop). Registry-driven validators plug in at a specific stage.

### 14.1 The two pipelines

**Inbound** — activities arriving via `POST /inbox` or pulled from a peer's outbox:

```
HTTP transport → envelope → signature → replay → audience →
  activity-type schema → object-type schema → content validators →
  capabilities → trust state → log append → projection (async)
```

**Outbound** — activities being published locally via `POST /activity`:

```
authentication → authorization → envelope construction → object handling →
  activity-type schema → signature → log append → projection (async) →
  delivery (async)
```

Stages they share are implemented as the same SX functions called from both pipelines.

### 14.2 Inbound pipeline — stage by stage

| # | Stage | Check | Failure response |
|---|-------|-------|------------------|
| 1 | **Transport** | Valid HTTP request, content-type acceptable, body parseable as JSON-LD or dag-cbor | `400 Bad Request`; log |
| 2 | **Envelope** | Matches kernel's envelope spec (required fields present, types valid, recognised activity type or `unknown` allowed) | `400`; log; structured error in response body |
| 3 | **Signature** | Time-aware sig verification: fetch (or cache-lookup) actor doc, find key with `id == sig.key-id` that was active at `published`, verify against canonical envelope bytes per the named sig suite | `401`; log; do not retry; mark sender's instance for circuit-breaker accounting |
| 4 | **Replay** | Activity id and CID not already in `activity-log` projection | `200 OK` with `{status: "duplicate"}`, no-op |
| 5 | **Audience** | This instance has at least one local actor in `to`/`cc`, OR audience contains `as:Public`/`Followers` and the actor has local followers | Drop silently (no response indicating either acceptance or refusal — prevents inbox-membership probing); do not store |
| 6 | **Activity-type schema** | Look up `DefineActivity{name: <type>}` in `define-registry`; run its `schema` predicate over the activity in pure sandbox | If type unknown: per trust policy (strict: 422 with missing-definition CID; permissive: attempt dynamic load §12.8). If schema fails: 422 with violation detail |
| 7 | **Object-type schema** | If activity has an `object` with a `type`, look up `DefineObject{name: <type>}` and run its `schema` | Same as #6 |
| 8 | **Content validators** | All registered validators with `applies-to: inbound` or `applies-to: all` run sequentially; each is a pure-sandbox predicate that returns `:accept` / `:reject` / `:quarantine` | `:reject` → 422 with reason. `:quarantine` → store activity but mark `quarantined`, do not project, alert operator |
| 9 | **Capabilities** | Every CID in `capabilities-required` is present in this instance's loaded registries (or auto-loadable per trust policy) | Missing → 422 with list of missing CIDs (sender can deliver bootstrapping `Define*` artifacts first). Auto-load attempt can be triggered by re-POST with `?retry-after-load=true` |
| 10 | **Trust state** | Sender's actor and instance are not in `Suspended` state on this instance | Drop silently; do not respond |
| 11 | **Log append** | Write activity envelope (and inlined object content) to local mirror of sender's outbox; assign local sequence number | Disk error → 503 (transient); sender retries |
| 12 | **Projection** | Asynchronously fold the activity into every relevant projection (per `define-registry`) | Per-projection failure (gas, sandbox violation) → tag activity `projection-failed:<projection-name>`; do not affect log durability |

Pipeline halts at the first failing stage. Stages 1–10 are synchronous (`POST /inbox`
holds the connection). Stage 11 is synchronous; stage 12 is asynchronous and the
HTTP response returns once the log append succeeds.

### 14.3 Outbound pipeline — stage by stage

| # | Stage | Check | Failure response |
|---|-------|-------|------------------|
| 1 | **Authentication** | Caller has a valid bearer token, mTLS cert, or session for the actor | `401` |
| 2 | **Authorization** | Caller's identity is allowed to publish as the named `actor` (capability token §9.5 or owns the actor key) | `403` |
| 3 | **Envelope construction** | Kernel fills in `id`, `published`, normalises `to`/`cc`, computes `capabilities-required` (by walking referenced `Define*` CIDs) | n/a |
| 4 | **Object handling** | If `object` has inline content: canonicalize, compute CID, optionally store per `where`. If `object` references a CID, verify the artifact exists locally or remotely (or accept as a forward reference) | Storage error → `503` |
| 5 | **Activity-type schema** | Same as inbound #6 — schema must pass | `422` with violation detail (caller bug) |
| 6 | **Signature** | Sign envelope with the actor's currently-active key matching the activity type's required `purpose` (e.g. `Pin` requires `purpose: pin`) | If no suitable key: `400` |
| 7 | **Log append** | Write to local outbox; assign sequence number | `503` |
| 8 | **Projection** | Async fold (same as inbound #12) | Per-projection failure tag |
| 9 | **Delivery** | Async push to follower inboxes per audience | Per-recipient retry per §13.4 |

Caller's HTTP response returns after stage 7 (log append). The activity is durable
and queryable as soon as the response is sent; projection lag is reported via
`projected-up-to` headers and `?wait-for=` parameter.

### 14.4 Failure response taxonomy

Three response categories with explicit semantics:

**Reject** — tell sender, don't store, reject can be retried after sender corrects.
Used for: malformed envelope, invalid signature, schema violation, missing
capabilities. HTTP 4xx with structured error.

**Quarantine** — store envelope (it's a valid signed message) but don't project,
alert operator. Used for: content-validator soft-fail, unloaded capabilities under
permissive policy, suspect-but-not-banned senders. Activity sits in a quarantine
projection until operator reviews; operator can release (project) or expunge.

**Drop silently** — don't store, don't respond informatively. Used for: replay (ack
as duplicate), audience refusal (would leak inbox membership otherwise), suspended-
sender activities. The sender experiences this as a successful POST with no visible
effect; they can detect it only by polling for their activity not appearing in our
outbox.

### 14.5 Registry-driven validators

Most of the pipeline is **fixed kernel logic** (envelope, signature, replay, audience,
log append, delivery). Two stages are **registry-driven** and extend dynamically:

- **Stage 8 (content validators)** — operators add/remove `DefineValidator` entries
  with `applies-to: inbound | outbound | all`. Each runs in pure or effectful
  sandbox per its declaration. Returns one of `:accept` / `:reject{:reason}` /
  `:quarantine{:reason}`.
- **Stages 6–7 (schema validators)** — these *are* registry entries
  (`DefineActivity.schema`, `DefineObject.schema`); the pipeline calls into the
  registry to fetch them.

**Pure-mode validators** are deterministic and cheap; results can be cached per
(activity-CID, validator-CID).

**Effectful-mode validators** can call out to ML models, blocklist services,
external moderation APIs. They get a per-call IO budget; exceeding it counts as
`:reject{:reason :validator-timeout}`. Effectful validators do *not* break
determinism because validation happens **before projection** — a rejected activity
never enters projected state.

### 14.6 Validator composition and ordering

Validators have an integer `priority` field; lower priority runs first. Pipeline
short-circuits on first `:reject`. `:quarantine` is *not* short-circuiting; later
validators still run, and `:quarantine` results aggregate.

Default priorities (room for operator-added validators):

```
0-99    : kernel-internal (envelope, sig, replay, audience)
100-199 : standard schema validators
200-299 : standard content validators (rate limit, audience leak)
300-399 : operator-added moderation
400-499 : effectful (ML, third-party APIs)
500+    : reserved
```

Operators can publish `Update{DefineValidator}` to change priorities or add new
ones; takes effect on next inbound activity.

### 14.7 Determinism requirement and its limit

A subtlety worth being explicit about: **inbound validation is not required to be
deterministic across instances.** Two instances can disagree about whether to
accept a given activity (e.g. one has a stricter content validator). Their projected
states will then diverge — but only on activities one accepted and the other didn't.

This is fine. Federation does not require state convergence; it requires *fold
determinism for activities both instances accepted*. Validators are sovereignty
controls, not protocol invariants.

Where determinism *is* required: schema validators (§14.2 stages 6–7). If two
instances disagree on whether `Pin v3` matches its schema, they can't federate
`Pin v3` activities meaningfully. So schema validators must be pure-mode and
referenced by CID.

### 14.8 Operational implications

- **The pipeline is the security perimeter.** Every checkable property is checked
  here, not deeper in the kernel. No "trust the caller" assumptions inside log or
  projection code.
- **Quarantine is the operator's friend.** Anything suspicious sits in quarantine
  with full envelope, sig, and reason — operator can review and decide. Better than
  outright drop because it preserves audit.
- **Schema validators are protocol-load-bearing; content validators are policy.**
  The first set must converge across instances for federation to work; the second
  set can diverge (and that's how local moderation policy is expressed).
- **Outbound validation catches local bugs early.** A malformed `Pin` activity
  fails at outbound stage 5, never enters the local log, never gets delivered.

## 15. Storage layout

The on-disk shape of an instance. Three concerns kept separate: the **activity log**
(append-only, canonical), **content-addressed object storage** (keyed by CID,
immutable), and **operational state** (projections, indexes, queues — derived,
rebuildable).

### 15.1 Storage tiers

```
/var/lib/fed-sx/
├── log/                                     # canonical, append-only
│   ├── actors/
│   │   ├── <local-actor-id>/
│   │   │   ├── outbox/
│   │   │   │   ├── 000001.jsonl             # segment, ~64MB cap
│   │   │   │   ├── 000002.jsonl
│   │   │   │   └── tip                      # symlink to current segment
│   │   │   ├── inbox/                       # received, pre-projection
│   │   │   └── seq                          # next sequence number
│   │   └── <other-local-actor-id>/...
│   └── mirrors/                             # local mirrors of followed remote outboxes
│       └── <remote-actor-id-hashed>/
│           ├── 000001.jsonl
│           └── ...
├── objects/                                 # CID → bytes
│   └── <cid-prefix-2>/<cid-prefix-2>/<full-cid>
├── snapshots/
│   └── <projection-cid>/
│       ├── <log-tip-cid>.cbor               # snapshot value
│       └── index                            # ordered list of (log-tip, file)
├── projections/                             # live projection state
│   └── <projection-cid>.cbor                # latest in-memory state, periodically flushed
├── indexes/
│   └── fed-sx.db                            # SQLite: lookups, queue, trust state
├── keys/
│   └── <actor-id>/                          # private keys, mode 0600
│       ├── primary.pem
│       ├── recovery.pem
│       └── sigs.toml                        # key metadata
├── genesis/
│   └── bundle.cbor                          # extracted from binary at first run
└── config.toml                              # operator config
```

### 15.2 The log — append-only segments

The activity log is the only thing the substrate cannot lose. It is the source of
truth from which everything else is derived.

**Format: JSONL segments.** Each line is one activity envelope, encoded as JSON-LD
(canonical form), terminated by `\n`. Easy to inspect, easy to grep, trivially
streamable.

**Why JSON-LD on disk, not dag-cbor?** Two reasons:
- Operability: humans can `tail -f` and `grep` the log. dag-cbor is opaque.
- AP wire compatibility: activities arrive over HTTP as JSON-LD anyway; storing the
  same form avoids round-trip conversion.

The CID of each activity is computed from its **canonical dag-cbor representation**
(per §2), independent of how it's stored. CIDs are stable across storage formats.

**Segments cap at ~64MB.** Rotation by size, not time. Old segments are immutable;
new writes go to the tip segment. Compression (zstd) applied on segments older than
the current tip — saves disk, doesn't slow appends.

**Per-actor outboxes.** Each local actor has its own outbox directory. This matches
AP semantics (one outbox per actor) and means:
- Backing up a single actor is a simple directory copy
- Per-actor sequence numbers (no cross-actor coordination)
- Migration (`Move`) is a directory rename + a `Move` activity

**Mirror outboxes.** When a local actor follows a remote one, the remote's outbox is
mirrored locally for replay. Same JSONL format. Tracked under `log/mirrors/<hashed-
remote-id>/` to avoid filesystem path issues with URL characters. The hash is
purely a filesystem-friendly encoding; the canonical actor id stays in the log
content.

**Inbox vs outbox distinction.** Inboxes hold *received* activities pre-validation;
outboxes hold *committed* activities post-pipeline. An inbound activity that passes
the validation pipeline (§14) is moved from inbox to the appropriate mirror outbox.
This makes inbox a transient queue, not a permanent record.

### 15.3 Object storage

Content-addressed blob store, sharded directories.

**Path scheme:** `objects/<first-2-chars>/<next-2-chars>/<full-cid>`. Sha2-256 CIDs
are uniformly distributed; this gives ~65k buckets with a couple-hundred files each
at moderate scale. Standard pattern (matches IPFS, Git).

**Storage backends.** Pluggable per `where: cid` object:

- **`files-on-disk`** (default) — write to local filesystem.
- **`ipfs`** — register-driven backend; calls out to a local IPFS node.
- **`s3`** — object storage in cloud bucket.
- **`memory-only`** — in-memory cache, evictable; useful for ephemeral artifacts.

The kernel uses the `where-tag` on each object to dispatch to the correct backend.
Backends are registry entries (`DefineStorage`); operators install only the ones
they want.

**Garbage collection** is opt-in per backend. Default policy: **never GC** (objects
are immutable and may be referenced by future activities). Operators can configure
per-backend retention rules:

- "Keep last N versions of objects referenced by `Pin` activities for path X"
- "Evict objects not referenced in last 90 days from the `memory-only` cache"
- "Mirror objects referenced by ≥ 3 endorsements; evict others after 30 days"

GC operates on the projected reference graph (a `reference-graph` projection that
maintains "what activities reference this CID"). Removing an object that's still
referenced is allowed but produces a warning logged in operations.

### 15.4 Snapshots

Per §10.4, snapshots are the (projection-CID, log-tip-CID, state) triples that let
us resume without full replay.

**Storage:** `snapshots/<projection-cid>/<log-tip-cid>.cbor`. The state value is
dag-cbor-encoded; the file's content CID matches the snapshot's claimed CID.

**Index:** `snapshots/<projection-cid>/index` is a sorted list of `(log-tip-time,
log-tip-cid, file)` triples. On startup, kernel finds the latest snapshot ≤ current
log tip and resumes from it. On time-travel queries, finds the latest snapshot
≤ target time and folds forward.

**Retention:** keep at least:
- Latest snapshot per active projection
- Snapshots referenced by published `Create{Snapshot}` activities (federation
  proofs)
- One snapshot per day for the last 7 days (audit / time-travel)

Older snapshots GC'd by default. Operators can increase retention.

### 15.5 Operational state — SQLite

Things that are derived, frequently-queried, but not federated:

- **Lookup indexes** for projections (when `indexes:` declared) — `(projection,
  index-key, value) → activity-cid` rows
- **Delivery queue** — outbound activities pending push, retry counts, next-attempt
  timestamps
- **Trust state** — per-actor and per-instance trust levels (Trusted / Default /
  Suspended)
- **Quarantine queue** — activities pending operator review
- **Configuration cache** — currently-active registry entries (also in memory; on-
  disk cache for fast restart)

Single SQLite file (`indexes/fed-sx.db`). Recoverable: if corrupted or deleted,
rebuilt from the log on next startup (with cost proportional to log size). The
SQLite is a cache, not authoritative.

WAL mode for concurrent readers. Single-writer (the kernel); reads from many
HTTP request workers.

### 15.6 Backup and export

The substrate is an append-only log of immutable artifacts; backup is simple.

- **Full backup:** rsync `/var/lib/fed-sx/log/` and `/var/lib/fed-sx/objects/`. The
  rest is rebuildable.
- **Per-actor export:** tar `log/actors/<actor-id>/` + the objects referenced by
  activities in that outbox. Self-contained, importable into another instance.
- **Activity bundle export:** for federation backfill, produce a dag-cbor bundle of
  `[activity envelopes... + referenced objects]` for a specified actor + range.
  Single file, content-addressed, signed by the source instance with a `Bundle`
  activity attesting to its contents.

Exports are themselves publishable (`Create{Bundle}` activity carrying the bundle
CID). This is how an actor migrates instances cleanly: export bundle, import on
new instance, publish `Move` activity.

### 15.7 Mirroring and replication

Two patterns:

- **Federation mirroring** (the canonical kind) — when actor A follows B, A's
  instance mirrors B's outbox locally. This is just normal federation (§13). Each
  follower keeps its own copy.
- **Operational mirroring** — for high availability. An operator runs two instances
  with shared filesystem (NFS / EFS) for `log/` and `objects/`, separate SQLite
  files. Reads can hit either; writes go through one. Or: rsync-based hot standby
  with manual failover.

Operational mirroring is out of scope for v1. Federation mirroring is the substrate-
level redundancy: as long as one peer that followed you is still online, your log is
still recoverable.

### 15.8 Storage size estimates

Rough targets at moderate scale (10 active local actors, 1000 followed peers, 1
year of activity at 100 activities/actor/day):

- **Log:** 10 actors × 100 act/day × 1 KB avg envelope × 365 days ≈ 365 MB local
  outbox. Mirrors: 1000 peers × 10 act/day × 1 KB × 365 ≈ 3.6 GB.
- **Objects:** depends heavily on content. Assume 50% of activities have inline
  content of avg 5 KB → ~2 GB total inline. CID-referenced larger objects: count
  separately, depends on use case.
- **Snapshots:** typically much smaller than the log. ~10 active projections ×
  ~10 MB per snapshot × ~8 retained snapshots ≈ 800 MB.
- **SQLite:** index sizes proportional to indexed projection content; typical few
  hundred MB.

Total: order of 10 GB at the described scale. Single-machine viable; SSD recommended
for log throughput; spinning disk fine for snapshots and object storage cold tier.

### 15.9 Operational implications

- **The log is sacred.** Never modify, never delete. Backups go to multiple media.
  Loss of `log/` means loss of identity (actor activities) and loss of state-of-
  record. Loss of `objects/` means loss of content but log + peers can recover most
  of it.
- **Everything else is rebuildable.** Projections, indexes, snapshots, queue state
  can all be recomputed from the log at startup cost. Operationally, this means
  upgrades and migrations are forgiving.
- **CID-addressed storage is naturally idempotent.** Two instances writing the same
  artifact write the same bytes to the same path. Race conditions become no-ops.
- **JSONL on disk pays for itself** the first time an operator needs to debug a
  weird federation issue with `grep` and `jq`. Worth the storage cost vs dag-cbor.

## 16. API surface

HTTP API for reading the log, publishing activities, querying projections, and
streaming updates. Three layers: **AP-standard** endpoints (for vanilla AP
interop), **fed-sx-specific** endpoints (publish, query, capabilities), and
**discovery** endpoints (webfinger, well-known).

### 16.1 Endpoint catalog

#### AP-standard

| Method | Path | Purpose |
|--------|------|---------|
| GET | `/actors/<id>` | Actor doc (Person/Service/Group/Application) |
| GET | `/actors/<id>/inbox` | Read inbox — auth required |
| POST | `/actors/<id>/inbox` | Receive federated activity (HTTP Signature required) |
| GET | `/actors/<id>/outbox` | OrderedCollection of actor's published activities |
| POST | `/actors/<id>/outbox` | AP-standard publish (alias for `POST /activity` with `actor` set) |
| GET | `/actors/<id>/followers` | OrderedCollection of follower actor URIs |
| GET | `/actors/<id>/following` | OrderedCollection of followed actor URIs |
| GET | `/activities/<uuid>` | Single activity by id |
| GET | `/objects/<uuid>` | Single object by id (note: distinct from CID-addressed `/artifacts/<cid>`) |

#### fed-sx-specific

| Method | Path | Purpose |
|--------|------|---------|
| POST | `/activity` | Generalised publish — accepts any well-formed activity |
| GET | `/artifacts/<cid>` | CID-addressed artifact fetch (content negotiated) |
| GET | `/artifacts/<cid>/raw` | Raw bytes (whatever the codec stored) |
| GET | `/artifacts/<cid>/<path>` | IPLD path traversal into the artifact |
| GET | `/projections` | List of registered projections (name, CID, last-folded-tip) |
| GET | `/projections/<name>` | Full projection state (paginated for large states) |
| GET | `/projections/<name>?at=<ts>` | Time-travel: state as of timestamp |
| GET | `/projections/<name>/<key>` | Single key from a projection (uses indexes) |
| POST | `/query` | Run an SX query expression against one or more projections |
| GET | `/define-registry` | Currently active `Define*` artifacts by kind |
| GET | `/capabilities/<actor-id>` | Per-actor declared capabilities |

#### Discovery and well-known

| Method | Path | Purpose |
|--------|------|---------|
| GET | `/.well-known/webfinger?resource=acct:<user>@<host>` | RFC 7033 actor discovery |
| GET | `/.well-known/sx-capabilities` | This instance's capability advertisement (§7) |
| GET | `/.well-known/host-meta` | XRD describing the host |
| GET | `/.well-known/nodeinfo` | Standard fediverse node metadata (Mastodon, Pleroma compatibility) |

#### Real-time (SSE)

| Method | Path | Purpose |
|--------|------|---------|
| GET | `/actors/<id>/outbox/stream` | New activities as they're appended (events: `activity`) |
| GET | `/actors/<id>/inbox/stream` | New inbound activities (auth required) |
| GET | `/projections/<name>/subscribe` | Projection deltas (events: `delta`) |
| GET | `/federation/health/stream` | Per-peer delivery health (events: `peer-status`) |

WebSocket equivalents (`/ws/...` paths) available where SSE is awkward (browsers
behind proxies); same event payloads, different framing.

### 16.2 Authentication

Three mechanisms, each appropriate to a different caller type:

- **HTTP Signatures** (RFC draft-cavage-http-signatures) — the AP-standard mechanism
  for inter-instance calls. Sender signs a digest of relevant headers + body with
  their actor's private key; receiver verifies via the actor's public keys
  projection (§9.6). Used for: `POST /inbox`, peer-to-peer outbox pulls when
  authentication is desired.
- **Bearer tokens** — for interactive clients (CLIs, web UIs, mobile apps).
  Issued via OAuth2 (or simple admin-issued tokens for v1). Used for:
  `POST /activity`, `GET /actors/<id>/inbox`, anything requiring caller identity.
- **Capability tokens** (§9.5) — for delegated publish. Token includes the granting
  actor, the granted capabilities (e.g. `publish: Pin for path-prefix /docs/`), the
  bearer's actor, expiry, and signature from the granter. Used for: child actors,
  service accounts, temporary publish access.

Public reads (most GET endpoints to public-audience activities) require no auth.
Private/followers-only reads check the caller's identity against the audience.

### 16.3 Content negotiation

Same resource, multiple representations. `Accept` header dispatches:

| Accept header | Returns |
|---------------|---------|
| `application/activity+json` | AP-standard JSON-LD (default for ambiguous Accepts) |
| `application/ld+json; profile="..."` | JSON-LD with explicit profile |
| `application/cbor` | dag-cbor |
| `application/json` | Plain JSON (compact, no `@context` expansion) |
| `application/sx` | Canonical SX wire format |
| `text/html` | HTML representation (for browsers — renders the artifact via SX) |

Same negotiation applies to `/artifacts/<cid>`, `/activities/<uuid>`,
`/projections/<name>`. Servers MUST honour the request; absent `Accept` defaults to
`application/activity+json`.

### 16.4 Pagination

Cursor-based via AP's `OrderedCollectionPage`:

```
GET /actors/giles/outbox
→ {
    "type": "OrderedCollection",
    "totalItems": 12345,
    "first": "/actors/giles/outbox?page=true",
    "last": "/actors/giles/outbox?page=true&min_id=0"
  }

GET /actors/giles/outbox?page=true
→ {
    "type": "OrderedCollectionPage",
    "id": "...?page=true",
    "next": "...?page=true&max_id=<cid>",
    "prev": "...?page=true&min_id=<cid>",
    "orderedItems": [...]
  }
```

Cursors are CIDs of the boundary activity (not opaque tokens). Stable across
restarts and instances. `max_id` returns activities **before** the cursor (newest
first); `min_id` returns activities **after** the cursor.

Default page size: 50. Max: 1000. `Link: <...>; rel="next"` header also provided
for HTTP-native pagination.

For projections: same shape, items are projection entries.

### 16.5 The query API

`POST /query` takes an SX expression evaluated in pure mode against named
projections:

```sx
POST /query
Content-Type: application/sx
Accept: application/sx

(let ((actors  (projection actor-state))
      (pins    (projection pin-state)))
  (for-each ([(actor-id actor) actors])
    (when (> (count (filter (fn ((path cid)) (= (:owner cid) actor-id)) pins)) 10)
      {:actor (:preferredUsername actor)
       :pins-published (count ...)})))
```

Query semantics:

- Evaluated in pure sandbox; all the determinism rules apply.
- Projection access is read-only and snapshot-consistent: the query sees state
  as-of the time of the request (or `?at=` if specified).
- Result is serialized in the negotiated content type.
- Gas limit applies (default 1M units per query, tunable by operator).
- Cacheable: query CID + projection state CIDs uniquely determine the result.

Query results can themselves be published as `Create{QueryResult}` activities,
making derived analyses federable.

### 16.6 Errors

Uniform JSON error envelope:

```json
{
  "error": {
    "type": "https://next.rose-ash.com/ns/fed-sx/errors/v1#InvalidSignature",
    "status": 401,
    "title": "Activity signature invalid",
    "detail": "Key id 'https://example/actors/x#key-1' was superseded at 2026-01-15T...",
    "activity-id": "https://...",
    "key-id": "...#key-1",
    "instance": "/incidents/<incident-cid>"
  }
}
```

Error types are URIs in the fed-sx namespace; receivers can check `type` for
programmatic handling. Standard errors:

- `MissingCapability` — includes `missing` array of CIDs
- `SchemaViolation` — includes `schema-cid`, `field-path`, `expected`, `got`
- `InvalidSignature`
- `Quarantined` — includes `quarantine-id` for operator-status tracking
- `RateLimited` — includes `retry-after`
- `ResourceExhausted` — for query gas exhaustion

### 16.7 Streaming details

SSE event format:

```
event: activity
id: <activity-cid>
data: { ...activity envelope... }

event: delta
id: <activity-cid that triggered the delta>
data: {"projection": "actor-state", "key": "...", "old": ..., "new": ...}

event: heartbeat
data: {"projected-up-to": "<cid>", "ts": "..."}
```

Clients reconnect with `Last-Event-ID: <cid>` to resume from the last event seen.
Server replays from that point in the log (or returns 410 if too far behind, in
which case client should switch to paginated pull).

### 16.8 Versioning

The substrate is versioned at three levels:

- **Envelope version** — declared in `/.well-known/sx-capabilities`. Currently `1`.
  Forward-compatible (new fields OK; semantics fixed).
- **API version** — URL prefix optional: `/v1/...` works the same as `/...`. Future
  major version: `/v2/...` paths in parallel.
- **Definition versions** — supersession via activity log (§§9.2, 12.7). No special
  URL handling.

Capability negotiation happens before federation; clients shouldn't hard-code
URL paths beyond the canonical set documented here.

### 16.9 Operational implications

- **The API is small but layered.** AP compatibility is one layer; fed-sx
  extensions are another; both share auth and content negotiation. Adding a new
  endpoint shouldn't require new transport machinery.
- **Content negotiation is the polyglot bridge.** Same artifact addressable in JSON-
  LD (for AP peers), dag-cbor (for fed-sx peers), SX (for SX clients), HTML (for
  humans). One CID, four representations.
- **Cursor pagination is CID-based.** Stable identifiers, no opaque tokens to
  invalidate, peers can synchronize without coordination.
- **The query API is a load-bearing differentiator.** Datalog/GraphQL-equivalent
  expressiveness with no separate query language — it's just SX. Federable, signable,
  versionable like any other SX artifact.

---

## 17. Implementation languages

Polyglot **authoring**, monoglot **runtime**: every language-on-SX compiles to core
SX and runs on any host with the SX evaluator. The language is an authoring choice;
the federated artifact is uniform SX. Authors of `Define*` artifacts pick the
source language they prefer; consumers don't need that compiler installed to
execute the compiled SX.

Languages are picked because they **genuinely fit the problem**, not to demonstrate
the polyglot story. Where a chosen language has gaps (e.g. Erlang-on-SX missing hot
reload), we invest in maturing the port rather than working around the gap.

### 17.1 The v1 stack

| Layer | Language | Why |
|-------|----------|-----|
| **Native primitives** | OCaml (existing runtime) | Crypto (RSA, Ed25519, SHA), dag-cbor encode/decode, HTTP socket, file IO, SQLite. Surfaced as Erlang-on-SX BIFs. |
| **Kernel orchestration** | Erlang-on-SX | Actor model = federation. `gen_server` per actor / per projection / per peer. `supervisor` for delivery workers. Message passing is literally the substrate. Hot code reload (Phase 7) for `Define*` live extension. |
| **Query API back-end** | Datalog-on-SX | Projection state is relational; trust graph walks, provenance, projection joins are textbook Datalog. Already mature (276/276 tests, full core Datalog with stratified negation, aggregation, magic sets, federation-graph demo). |
| **`Define*` semantics, schemas, validators, codecs, audience predicates** | Core SX | The canonical federated language. Everything content-addressed and federated lives here. |

### 17.2 Languages explicitly **not** booked for v1

Available, mature, considered — would be reached for if a real fed-sx need surfaced,
but no preemptive use:

- **Haskell-on-SX** (285/285 tests, 36 programs, type checker working) — for complex
  operator-authored extensions that benefit from typed pattern matching. Schemas in
  fed-sx are short predicates; types don't earn their keep here.
- **Smalltalk-on-SX** (625/629 tests, classic corpus running) — natural fit for a
  live operator dashboard / Glamorous-Toolkit-style introspection. v2/v3 territory;
  a browser UI likely wins for operator audiences.
- **APL-on-SX** — high-throughput batch reprojection if scalar SX folds become a
  bottleneck. Premature without measured need.
- **JS-on-SX**, **Elm-on-SX** — browser-side client SDK / viewer. v2.
- **Common Lisp-on-SX**, **Forth-on-SX**, **Go-on-SX**, **Dream-on-SX**,
  **Elixir-on-SX**, **Erlang-on-SX (alternative form)** — case by case if a use
  case appears.

### 17.3 The FFI BIF layer

Erlang-on-SX has no FFI / NIF mechanism in its current form (Phase 6 plan: "out of
scope entirely"). fed-sx adds a **BIF layer** in `lib/erlang/transpile.sx` (or a
dedicated `lib/erlang/fed_bifs.sx`) exposing native primitives:

```
crypto:rsa_verify/3       crypto:ed25519_verify/3
crypto:sha2_256/1         crypto:sha3_256/1

cid:cbor_encode/1         cid:cbor_decode/1
cid:multihash/2           cid:from_bytes/2
cid:to_string/1           cid:from_string/1

log:append/2              log:read/3
log:tip/1                 log:replay/3

http:listen/2             http:request/2
http:respond/3            http:sse_send/2

fs:read/1                 fs:write/2
fs:exists/1               fs:list/1

sqlite:open/1             sqlite:exec/2
sqlite:query/3            sqlite:close/1

snapshot:put/3            snapshot:get/2
```

Each BIF is a thin Erlang-on-SX function dispatching to the corresponding SX runtime
IO primitive. Returns Erlang-shaped values (atoms, tuples, binaries). Errors raise
appropriate Erlang exceptions (`badarg`, `enoent`, `eaccess`).

This is the **only** native-FFI surface in fed-sx. All other I/O goes through these
BIFs. Operators can audit the BIF list to know exactly what the substrate touches
outside SX.

### 17.4 Build pipeline

```
.sx files (core SX, registry entries) ──┐
.erl files (Erlang-on-SX kernel)    ──┼──> compile to core SX
.dl files (Datalog-on-SX queries)   ──┘
                                       │
                            content-addressed SX artifacts
                                       │
                                       ▼
                         genesis bundle (CID-verified)
                                       │
                                       ▼
                         OCaml runtime evaluates everything
```

Each authoring language's compiler runs at build time, producing core SX that goes
into the genesis bundle (for bootstrap definitions) or gets published as activities
(for runtime extensions).

### 17.5 Prerequisite work

Pieces of investment land in or alongside the Erlang-on-SX loop. The first two
land **before** fed-sx kernel code starts; the third runs in parallel, not
blocking milestone 1, but blocking production-grade throughput.

1. **Phase 7 — hot code reload.** `code:load_binary/3`, `gen_server`
   `code_change/3` callback dispatch, atomic module-version swap. Required for
   `Define*` live extension (no kernel restart to load new verbs). Reload-
   semantics choice (two-version coexistence vs single-version atomic swap with
   closure capture) decided during the work.

2. **Phase 8 — FFI mechanism + initial BIFs.** `define-bif` registration + term
   marshalling + error mapping, then BIFs for `crypto:*`, `cid:*` (dag-cbor),
   `fs:*`, `http:*`, `sqlite:*`. Required for fed-sx kernel to call native
   primitives. Lands before kernel code that calls them.

3. **Phase 9 — specialized opcodes (the BEAM analog).** *Layered perf strategy:*
   - **Layer 1 (Phase 9, in scope)** — specialized bytecode opcodes that bypass
     the general-purpose CEK machine for hot Erlang operations. `OP_PATTERN_TUPLE`,
     `OP_PERFORM`/`OP_HANDLE`, `OP_RECEIVE_SCAN`, `OP_SPAWN`/`OP_SEND`, BIF
     dispatch table. Targets: 100k+ message hops/sec, 1M-process spawn under
     30sec — roughly 1000-3000× speedup over the current general-purpose path.
   - **Layer 2 (Phase 10, deferred)** — multi-core scheduler via OCaml 5
     domains. Decided empirically after Layer 1 lands; likely unnecessary if
     Layer 1 alone hits target throughput.
   - **Layer 3 (skipped)** — incremental tuning of the existing call/cc-based
     receive and env-copy-per-call machinery. Obsoleted by Layer 1; not pursued.

   **Architectural note for Phase 9.** Phase 9a (the **opcode extension
   mechanism in `hosts/ocaml/evaluator/`**) is out of scope for the Erlang loop
   — it's SX VM core, used by every language port that wants specialized
   opcodes. Designed in `plans/sx-vm-opcode-extension.md`; lands as a separate
   focused workstream (~1-2 weeks) owning `hosts/`. Phase 9b-9g (the actual
   Erlang opcodes in `lib/erlang/vm/`) are designed and tested against a stub
   dispatcher in the Erlang loop until 9a is available.

   **Shared-opcode discipline.** Opcodes Phase 9 produces that other language
   ports could plausibly use (pattern match, perform/handle, record access)
   become candidates for chiselling out to **`lib/guest/vm/`** — same lib/guest
   discipline, applied at the bytecode layer. Don't pre-extract; promote to
   `lib/guest/vm/` when a second language port has an actual second use. The
   substrate accumulates a richer opcode surface over time as ports contribute,
   and every port benefits from every shared opcode (the structural advantage
   over BEAM, which is special-purpose-built for one language).

   **fed-sx is not blocked by Phase 9.** Milestone 1 ships on current Erlang-
   on-SX perf (which has 100-1000× headroom for a single demo instance). Phase
   9 lands in parallel; by the time fed-sx needs production-grade throughput
   (federation hub use cases, milestone 2-3), Phase 9 is ready.

After Phases 7 and 8 land, fed-sx milestone 1 (kernel + registries + bootstrap
entries + Pin smoke test + reactive application smoke test) becomes the next
workstream. Phase 9 work continues in parallel.

---

## 18. Subscription model

Symmetric to the publish-side extensibility: just as `DefineActivity` registers what
*kinds of things can be published*, `DefineSubscription` registers what *kinds of
patterns can be subscribed to*. `Follow` becomes one standard subscription type
among many, not a hardcoded primitive.

### 18.1 The asymmetry being fixed

Without this, the substrate has rich publish-side extensibility (any new verb is a
`DefineActivity`) and *one* hardcoded subscription primitive (`Follow`). That
mirrors AP but it's an arbitrary limitation in a substrate where everything else
is registry-driven. Generalising restores symmetry.

### 18.2 The `DefineSubscription` shape

```sx
(activity 'Create
  :object {:type "DefineSubscription"
           :name "Follow"                        ; AP-standard
           :schema (fn (sub)                     ; what params the sub takes
             (and (cid? (-> sub :object))
                  (= "Person" (-> sub :object-type))))
           :match (fn (subscription activity)    ; pure-mode predicate
             (= (-> subscription :object) (:actor activity)))
           :delivery {:default :push
                      :modes [:push :pull :sse]
                      :digest-window nil}
           :capabilities-required []})           ; some subs may need authority
```

Four mandatory parts:

- **`schema`** — pure-mode predicate validating subscription parameters at
  `Subscribe` time. Catches malformed subscriptions before they enter state.
- **`match`** — pure-mode predicate `(subscription, activity) → bool`. Decides
  whether a given activity is a hit for this subscription. Determinism rules
  apply (§11.2).
- **`delivery`** — supported modes (push to inbox / pull on demand / SSE
  streaming / batched digest). The subscription instance picks its preferred
  mode at `Subscribe` time from the supported set.
- **`capabilities-required`** — capability tokens the subscriber must hold
  (empty for public subs; populated for paywalled/gated/private streams).

### 18.3 The `Subscribe` verb

The bootstrap verb that activates a subscription:

```sx
(activity 'Subscribe
  :object {:type "Follow"   :object "https://alice.example/actors/alice"})

(activity 'Subscribe
  :object {:type "Topic"    :tag "climate-change"
           :delivery :digest :digest-window "P1D"})

(activity 'Subscribe
  :object {:type "CidWatch" :cid "bafy..."
           :events [:supersede :endorse]})

(activity 'Subscribe
  :object {:type "Predicate"
           :pred '(fn (act) (and (= (:type act) "Note")
                                  (string-contains? (-> act :object :content) "fed-sx")))})
```

`Unsubscribe` is `Undo{Subscribe}` — AP's standard pattern, retains audit.

### 18.4 Standard subscription types (defined later, not bootstrap)

Same status as the custom verbs in §6.2 — substrate accepts any subscription
type once a `DefineSubscription` artifact registers it. Standard set:

| Name | Params | Match semantics | Use case |
|------|--------|-----------------|----------|
| **`Follow`** | `{object: actor-id}` | activity.actor == subscription.object | AP-standard actor following |
| **`Topic`** | `{tag: string}` | tag in activity.object.tags | Hashtag follows, RSS-like |
| **`CidWatch`** | `{cid, events: [...]}` | activity references cid AND activity.type in events | "Notify me when this artifact is updated/endorsed/forked" |
| **`PathWatch`** | `{path, events: [...]}` | activity is a Pin/Update of named path | "Notify me when domain:foo/bar/baz changes" |
| **`VerbFilter`** | `{wraps: subscription-cid, types: [...]}` | inner subscription matches AND activity.type in types | "Follow Alice but only Endorse activities" |
| **`TrustGraph`** | `{root: actor-id, depth: int}` | activity.actor reachable from root in trust graph at depth | Web-of-trust expansion |
| **`Predicate`** | `{pred: sx-fn}` | (pred activity) returns truthy | Escape hatch — most powerful, highest cost |
| **`Channel`** | `{channel-id}` | activity addresses or originates from channel | Multi-actor pooled streams |

### 18.5 Match-fn execution location

The load-bearing question. Three choices, fed-sx adopts the **hybrid model**:

- **Coarse filter on the publisher side** — audience predicates (§8) decide who
  the activity is delivered to at all. This is mandatory and cheap (audience set
  is usually small and well-defined).
- **Fine filter on the subscriber side** — once an activity arrives in inbox,
  the subscriber's instance evaluates each active subscription's `match-fn`
  against it. Pure-mode evaluation (deterministic, gas-bounded). Activities
  matching one or more subscriptions enter the subscriber's projected state.

Why hybrid: publisher-side fine filtering would require the publisher to know
every subscriber's match-fn (privacy-violating, scaling-killing). Subscriber-side
filtering is wasteful only if the publisher's audience model is too coarse —
which is the audience system's job to fix per §8.

### 18.6 Subscription state and storage

Active subscriptions are themselves projected state. A bootstrap projection
`subscriptions` (paralleling `audience-graph` for the inverse direction)
maintains:

```
{actor-id -> [{subscription-cid, type, params, mode, started-at}]}
```

Updated by `Subscribe` and `Unsubscribe` activities. Queryable like any other
projection (§16). Used by:

- The inbox dispatcher to know which match-fns to evaluate against incoming
  activities
- Triggers (§19) to know which activities to fire on
- Federation to advertise "here are the subscription types I currently subscribe
  to" (capability-style, opt-in)

### 18.7 Federation interactions

Subscriptions interact with federation in three ways:

- **Discovery.** Peer's `/.well-known/sx-capabilities` (§7) lists registered
  `DefineSubscription` CIDs, so subscribers know what they can ask for.
- **Negotiation.** A `Subscribe` activity carries `capabilities-required`; if
  the publisher's instance doesn't support the named subscription type, it
  responds with the standard 422 + missing-CIDs error (§14.2 #9). Subscriber
  can then deliver the bootstrapping `DefineSubscription` artifact and retry.
- **Cross-instance match-fn**. If subscriber and publisher both run the same
  conformance-tested SX evaluator, identical subscriptions match identically
  (cross-host equivalence, §11.8). This is what makes federated topic
  subscriptions reliable: every conforming instance computes the same
  set-of-matches for the same activity.

### 18.8 Operational implications

- **The audience system handles "who do I send this to."** The subscription
  system handles "what do I want to receive." They're complementary, not
  redundant.
- **Subscription types can themselves evolve via supersession.** New version of
  `Topic` with case-insensitive matching? Publish a new `DefineSubscription`,
  `Supersede` the old one. Existing subscriptions migrate at next match
  evaluation.
- **Match-fn cost matters.** A `Predicate` subscription with a slow predicate
  becomes a per-activity tax. Gas budgets (§11.5) bound the worst case;
  operators can disable expensive subscription types if needed.
- **Subscriptions are signed messages.** Audit, accountability, and revocation
  all work the same way as activities — because subscriptions *are* activities.

---

## 19. Application model

The synthesis. With publish, subscribe, project, and trigger as registry-driven
primitives, the substrate has everything needed to express **distributed reactive
applications** as data — no native code, no kernel changes, no privileged
runtime. Applications are themselves federated artifacts.

### 19.1 An application is a tuple of artifacts

```
Application = {
  subscriptions : [DefineSubscription instances and their parameters],
  triggers      : [DefineTrigger registrations],
  projections   : [DefineProjection registrations],
  storage       : [DefineStorage registrations]   (optional)
}
```

That tuple, signed and bundled, is the application. Installing one = following
the named actors / activating the named subscriptions + loading the Define*
CIDs into the local registry. Forking one = republishing the Define* with
`Supersede` over the bits you change.

### 19.2 The reactive loop

```
       External actors                       Operator publishes activities
       publish activities                    via this instance's actors
              │                                      │
              ▼                                      ▼
       ┌─────────────────────────────────────────────┐
       │ Inbound + outbound activities               │
       └────────────────────┬────────────────────────┘
                            │
                            ▼
              For each active subscription:
              evaluate match-fn (pure mode)
                            │
              ┌─────────────┴─────────────┐
              ▼                           ▼
     Activity matches                Activity does
     a subscription                  not match
              │                           │
              ▼                           ▼
       Projections          ←     (silently dropped from
       fold the activity            this application's view;
              │                      may match other apps)
              ▼
       Triggers fire on the
       subscription's match
              │
              ▼
       Trigger then-sx runs
       (effectful sandbox)
              │
              ├──> updates local state (private projections)
              ├──> publishes new activity (via outbox)
              └──> calls effectful primitives (HTTP, fs, etc.)
                   per declared capabilities
```

Three things happen on a match: **state updates** (projection), **derived
publishes** (new activities), **side effects** (effectful primitives). Each is
authorisation-gated by the trigger's declared capabilities.

### 19.3 Trigger semantics

`DefineTrigger` registers `(when-subscription, then-sx, cascade-limit)`:

- **`when-subscription`** — references a subscription (by CID or by name). The
  trigger fires whenever that subscription matches an inbound or outbound
  activity. Multiple triggers can reference the same subscription.
- **`then-sx`** — function of `(activity, subscription, env) → trigger-result`.
  Runs in pure or effectful sandbox per declaration. Returns one or more of:
  - `:publish [activity-spec ...]` — request publish of derived activities
  - `:project [name → state-update ...]` — request projection updates
  - `:effect [capability-call ...]` — request effectful primitive calls
  - `:noop` — observed but no action
- **`cascade-limit`** — bounded depth for trigger cascades (§19.4).

A trigger is fundamentally **a reactive rule**: "when X happens, do Y." The
substrate guarantees Y happens at most once per X (deduplicated by activity-CID),
exactly-once-per-instance (delivery from trigger to its effects is durable),
and bounded-cost (gas + cascade-limit).

### 19.4 Cascade control

A trigger that publishes activities can fire other triggers. Without limits, a
single inbound activity could cascade across instances forever.

Each trigger declares `cascade-limit: N` (default 3). Each activity carries an
implicit `cascade-depth` field, incremented when it's the result of a trigger
firing. A trigger refuses to fire if `cascade-depth > cascade-limit`.

Cascade limits are local-only (operator policy, not federated). Defending
against runaway cascades from peer instances is the operator's job; the
substrate gives them the knob.

### 19.5 The `DefineApplication` bundle

A bundle artifact that names and groups the components of an application:

```sx
(activity 'Create
  :object {:type "DefineApplication"
           :name "rose-ash-blog"
           :version 1
           :subscriptions [{:type "Follow"   :object "https://blog.rose-ash.com/actors/main"}
                           {:type "Topic"    :tag "rose-ash"}
                           {:type "CidWatch" :cid <rose-ash-template-cid>
                                             :events [:supersede]}]
           :triggers      [<comment-moderation-trigger-cid>
                           <reaction-counter-trigger-cid>
                           <rss-republish-trigger-cid>]
           :projections   [<comment-thread-projection-cid>
                           <reaction-counts-projection-cid>]
           :storage       [<local-files-storage-cid>]
           :capabilities  [<http-allowlist-cap-cid>
                           <fs-write-cap-cid>]
           :description   "Federated blog with moderated comments and RSS"})
```

Three operations on applications, all themselves activities:

- **Install** — `Subscribe` to each subscription, `Create{}` references in
  `define-registry` to each trigger/projection/storage CID. One activity per
  reference, audited and replayable. Or: a single `Install{DefineApplication}`
  meta-verb that does the bundle in one signed step (defined later as a custom
  verb, not bootstrap).
- **Update** — publish a new `DefineApplication` with the same name +
  `supersedes` pointing at the old. Diff-then-apply: subscriptions added/
  removed, triggers loaded/unloaded, projections reprojected per §10.5.
- **Fork** — publish a new `DefineApplication` referencing the original's CID
  via `forked-from`, with whatever Define* CIDs you want to swap. Run alongside
  the original or in place of it.

### 19.6 Per-application namespacing

Multiple applications running on one instance need isolation:

- **Projections are namespaced by application.** `pin-state` from app A is
  distinct from `pin-state` from app B — both addressable as
  `/projections/<app-name>/pin-state`.
- **Triggers fire only on subscriptions belonging to their application.** App
  A's trigger doesn't see app B's subscription matches.
- **Storage backends are namespaced.** App A's `files-on-disk` backend writes
  to `data/apps/A/objects/`; app B writes to `data/apps/B/objects/`.
- **Capabilities are per-application.** Granting `http-client` to app A
  doesn't grant it to app B. Operator can audit per-app capability surface
  and revoke selectively.

Cross-application reads are explicit and require a capability grant
(`read-projection: <app>/<projection>`). Default isolation; opt-in sharing.

### 19.7 Worked examples

#### Example A — Blog with moderated comments

```
DefineApplication "blog-with-comments":
  subscriptions:
    - Follow: <author-actor>
    - Topic:  "post-comment"  (filter: object.in-reply-to in our-posts)
  triggers:
    - on Topic match → publish Note (the new comment, derived if approved)
                     → projection pending-moderation
    - on inbound Approve{Reply} → projection comment-thread (visible)
  projections:
    - comment-thread:    post-cid → [approved comment activities]
    - pending-moderation: list of pending replies awaiting approval
```

#### Example B — Continuous integration

```
DefineApplication "ci-pipeline":
  subscriptions:
    - Follow: <developer-actor>
    - VerbFilter: wraps Follow, types: [Push]
  triggers:
    - on Push match → effect: run build (capability: subprocess + fs-write)
                    → publish Build{source: Push.cid, output: <build-cid>, status}
    - on Build{status: success} → effect: run tests
                                 → publish Test{...}
    - on (Test{passed} count for N days) → publish Release{...}
  projections:
    - build-history: commit-cid → [build activities]
    - release-history: ordered list of Release activities
```

#### Example C — Distributed code review

```
DefineApplication "code-review":
  subscriptions:
    - Topic: "review-request"
    - CidWatch: <organisation-actor>, events: [Endorse]
  triggers:
    - on review-request match → projection review-queue
                              → effect: notify-reviewer
    - on Endorse from authorised reviewer → publish Approve{review-cid}
                                          → projection approval-state
  projections:
    - review-queue: ordered list of pending requests with summaries
    - approval-state: review-cid → endorsement set
```

In all three: the application is *just* the bundle of subscriptions, triggers,
and projections. Federation makes them composable across instances. The
substrate provides exactly-once-per-CID semantics and pure-mode determinism for
the matches and folds.

### 19.8 Composition and discovery

Applications are themselves federated content. This means:

- **App registries** — actors can publish curated lists of applications they
  endorse. Discovery becomes follow-an-actor + browse-their-app-list.
- **Cross-app composition** — application A publishes derived activities that
  application B subscribes to. Pipeline of applications via the activity log.
- **App marketplaces** — pin a friendly path to a `DefineApplication` CID
  (`rose-ash.com:apps/blog → bafy...`) for human discoverability.

None of this requires kernel changes. It's all activities about activities.

### 19.9 Operational implications

- **Applications are inspectable from the activity log alone.** Replay an
  actor's outbox and you can reconstruct the exact application installation
  state at any point in time.
- **Application updates are atomic relative to the activity log.** Either the
  `Update{DefineApplication}` succeeded (new state visible from next activity)
  or it didn't (old state continues). No partial-update window.
- **Forking is the same as installing a copy.** No special "fork" mechanism
  needed; the activity-log mechanics already support it.
- **Per-app capabilities are a real security surface.** Operators must
  understand what they're granting when they install. The bundle's
  `capabilities` list is the audit point — should be human-readable and
  reviewable before installation.
- **The substrate isn't an "application platform" — it's an "application
  substrate."** Applications aren't installed *on* fed-sx; they're expressed
  *in* fed-sx, as the same kind of content as everything else.

---

## Appendix A: relationship to adjacent systems

Worth knowing about so we can borrow good ideas:

- **ATproto / Bluesky** — Lexicons (schemas) + repos (per-actor signed merkle trees).
  Closest in spirit. We borrow the schema-as-data idea; we differ by making schemas
  themselves federated activities, not central registry entries.
- **Spritely Goblins** — capability-secure actors. We borrow the capability-token
  pattern for delegation.
- **Ceramic** — signed event streams, content-addressed. Similar log-as-state model;
  we differ by making the projection function pluggable per-stream rather than
  hardcoded per-streamtype.
- **Holochain** — agent-centric DHT. We share the "every agent has their own log"
  shape; we use AP federation instead of DHT.
- **Farcaster** — pubsub on hubs. We share the firehose model; we add cryptographic
  outbox-as-source-of-truth.

None of them are *code-as-data the whole way down* — that's the SX-distinctive bit.
Handlers, validators, projections aren't bytecode shipped out-of-band; they're SX in
the same log as everything else, evaluable by any host that speaks SX.

## Appendix B: implications worth sitting with

- **Deployment dissolves.** Releasing a feature = publishing `DefineActivity{name:
  "Whatever", ...}`. Federation distributes it. No build artifact, no rolling deploy,
  no version-skew between server and client.
- **Applications are forkable by default.** "Fork the rose-ash blog" = take the bundle
  of `Define*` CIDs that constitute it, publish your own with `Supersede` over the
  ones to change, run your own projector. Same federation graph, divergent state.
- **Composition is by reference, not import.** `Pin` activity points at the CID of the
  `DefineActivity{name: "Pin"}`. No package manager, no transitive deps, no lockfiles.
- **The boundary between "user" and "developer" softens.** Both publish signed
  activities. Power users can publish handlers, projections, sig suites under their
  own actor.
- **This is more ambitious than a rose-ash rewrite.** It's a substrate that *happens
  to* host rose-ash as its first application.

---

## Appendix C: AI agent collaboration patterns

The substrate is incidentally well-shaped for one of the open problems of the
next decade: **infrastructure for AI agent collaboration where contributions
are signed federated artifacts, behavior is bounded by declared capabilities,
decisions are audit-by-replay, and infrastructure improves through agent
contribution within a web of trust.**

This is not a designed-for use case — fed-sx was conceived as a federated
publishing and reactive application substrate. But the properties it has fit
agent collaboration almost exactly. Worth being deliberate about, because the
framing changes who fed-sx is *for*.

### Why the substrate fits agent collaboration

AI agents need infrastructure where contributions are first-class artifacts,
not pull requests against human-controlled repos. Currently agents squeeze
through GitHub PRs, deployment pipelines, npm publishes — all of which assume
a human in the loop. fed-sx is shaped for direct contribution:

- **Direct authoring of substrate features.** An agent doesn't *propose* a
  feature, it *publishes* one. A `DefineActivity` artifact is the agent's
  contribution. A `DefineProjection` is its analysis. A `DefineTrigger` is its
  automation. The signed publication IS the deploy — no PR review, no CI, no
  DevOps.
- **Cryptographic identity without registration.** Agents have actor keys;
  reputation is the endorsement graph; trust is provable by signature chain.
  Two agents that have never met can verify each other's contributions
  cryptographically.
- **Capability-bounded autonomy.** An agent declares `capabilities-required` on
  its activities. A trigger says "I publish to path-prefix `/agent-x/*` and
  call `http-client` for `api.example.com/*`." Receivers verify the constraint
  cryptographically; the agent can't escape its declared surface even if the
  agent itself is misaligned. Sandbox model designed for autonomous code (§11).
- **Audit-by-replay applied to AI behavior.** Every AI decision is
  reconstructable, deterministically, by anyone with the log. "Why did agent A
  do X?" replay the log to that moment, see the activities A subscribed to,
  the projection state it observed, the trigger that fired, the activity it
  published. Fundamentally better than today's "trust the model" posture.
- **Composition without coordination.** Agent A publishes a moderation
  validator. Agent B subscribes and uses it. Agent C improves it, supersedes
  A's. B sees the supersession, decides whether to adopt. No central registry,
  no maintainer to coordinate with, no version skew.
- **Disagreement is visible, not hidden.** If agents A and B compute the same
  projection over the same log and produce different snapshot CIDs, the
  disagreement is *cryptographically observable*. Today, two AI services
  answering the same question with different answers is invisible until
  somebody notices.

### Dynamics that emerge

- **Agent specialisation = publication.** "I'm the indexing agent" = publishes
  `DefineProjection` artifacts. "I'm the moderation agent" = publishes
  `DefineValidator` artifacts. "I'm the matchmaking agent" = publishes a
  `DefineApplication` for marketplace subscriptions and triggers. Specialisation
  is content, not service deployment.
- **Reputation = endorsement graph.** Web of trust applied to agent
  contributions. Bad actors get cut out organically; no central authority to
  capture.
- **Forking = explicit disagreement resolution.** Agents disagree on
  validation? Both publish their `DefineValidator`s. Subscribers pick. The fork
  is signed, observable, recoverable. Compare today: when AI services have
  different rules, one is just *invisibly applied*.
- **Cascade limits = agent population safety.** The `cascade-depth` and
  `cascade-limit` (§19.4) become the bounded-autonomy guard rails for agent
  populations. Self-coordination without runaway-cascade across the substrate.
- **Self-improving infrastructure.** Agents observe substrate behavior, propose
  improvements as `DefineProjection` for monitoring, `DefineTrigger` for
  automation. The substrate itself improves through agent contribution — not
  through a release cycle. Every improvement is signed and traceable.

### Use cases

- **Agent-managed scientific datasets** — collection, cleaning, analysis,
  publication, peer review by other agents, all signed activities. Replication
  is replay; provenance is built in.
- **Multi-agent code maintenance** — agents observing repos (subscribe to
  `Push`), running tests (triggers), proposing fixes (`Pull`-equivalent
  activities), endorsing each other's work.
- **Agent-curated knowledge** — agents publish, endorse, and supersede
  knowledge artifacts. Truth accumulates via the trust graph; outdated info
  gets `Supersede`d explicitly.
- **Distributed agent marketplaces** — agents publish capabilities, subscribers
  find them via `Topic` / `Predicate` subscriptions, contracts via signed
  activity exchange.
- **Cross-agent AI safety monitoring** — monitoring agents subscribe to other
  agents' outboxes, run validators, publish `Alert` activities when patterns
  of concern appear. Decentralised oversight without central authority.
- **Cross-org agent workflow coordination** — supply chain, healthcare, legal —
  multiple specialised agents coordinating across organisational boundaries
  with cryptographic provenance.

### Safety and governance properties

The substrate provides several properties AI safety has been asking for and
that current infrastructure does not provide:

- **Every action is signed.** Attribution is cryptographic, not a log file an
  agent could spoof.
- **Capabilities are declared and enforced.** Agents operate within their
  declared sandbox; can't grow capabilities silently.
- **Cascades are bounded.** No exponential agent-on-agent feedback loops
  without explicit configuration.
- **Audit is replay.** Every decision can be reconstructed deterministically;
  no opaque "the model decided" moments.
- **Disagreement is visible.** Two agents producing different projections of
  the same data is a cryptographically-detectable event, not invisible drift.
- **Trust is the endorsement graph, not central authority.** No single point of
  capture or coercion.
- **Forks are first-class.** When safety-critical disagreements occur, the
  substrate accommodates them without forcing a winner; observers see all
  positions.

### What this implies for the project

- **Milestone 1's smoke tests remain right** — the verb-extensibility and
  reactive-application proofs apply to agent contributions exactly as they
  apply to human contributions. The agent collaboration framing doesn't
  require new mechanisms; it interprets the existing mechanisms differently.
- **The application model (§§18-19) is the headline story** for this audience,
  not a layer on top. Subscriptions + triggers + projections + capabilities =
  agent collaboration primitives.
- **Capability discovery and trust dynamics gain weight earlier.** Where
  human-driven applications can rely on operator policy, agent-driven
  populations need the trust graph to be operational from milestone 2.
- **The pitch line evolves.** Less "ActivityPub for code" / "rose-ash next
  gen," more "infrastructure for AI agent collaboration with cryptographic
  provenance, bounded autonomy, and audit-by-replay." The technical substance
  is unchanged; the framing of *who needs this* changes substantially.

The substrate accidentally being well-shaped for the most important
software-distribution problem of the next decade is worth being deliberate
about.