2639 lines
124 KiB
Markdown
2639 lines
124 KiB
Markdown
# fed-sx — Federated SX Activity Substrate
|
||
|
||
A federated, content-addressed, extensible application substrate where the unit of
|
||
computation is a signed activity, the unit of state is a pure SX projection over the
|
||
activity log, and the substrate's own extensibility (new verbs, new object types, new
|
||
projections, new validators) is itself published through the same mechanism.
|
||
|
||
Status: **design** — not yet implemented. Target subdomain: `next.rose-ash.com`.
|
||
Target location in repo: `next/` (new top-level dir, sibling to `blog/`, `market/`,
|
||
etc.). Stack: pure SX-on-OCaml. Implementation language(s) to be chosen after design
|
||
is complete.
|
||
|
||
---
|
||
|
||
## 1. Premise
|
||
|
||
ActivityPub's data model — actors, signed activities, inboxes/outboxes — generalises
|
||
beyond social posting to any domain where state evolves via signed messages. fed-sx
|
||
takes that generalisation seriously:
|
||
|
||
- The unit of communication is a **signed AP activity**.
|
||
- The unit of content is an **AP object**, content-addressed by **CID** (multihash +
|
||
multicodec, default `dag-cbor` over the parsed SX AST).
|
||
- State is the **deterministic fold** of pure SX functions over the activity log.
|
||
- The substrate is **self-extending**: new activity types, object types, projections,
|
||
validators, codecs, transports, and signature suites are themselves published as
|
||
`Define*` activities — federated like any other content.
|
||
|
||
Three commitments make the rest fall into place:
|
||
|
||
1. **The kernel is dumb.** It only knows envelope shape, signature verification,
|
||
append-to-log, fetch-by-id, transport in/out. It does not know what `Create` or
|
||
`Pin` *mean*.
|
||
2. **Everything else is registry-driven.** Verbs, object types, validators, projections,
|
||
codecs, transports, audiences, proofs, sig suites — all looked up in registries the
|
||
kernel calls into.
|
||
3. **The registries are themselves publishable.** New entries arrive as `Define*`
|
||
activities. Bootstrap registries load from a known set of CIDs at startup; everything
|
||
else is replayed from the log.
|
||
|
||
Result: the only code that ever needs to change in the kernel is the envelope itself.
|
||
New verbs = published SX, federated like any other artifact.
|
||
|
||
---
|
||
|
||
## 2. CIDs and content addressing
|
||
|
||
Every artifact has a CID. Default codec is **dag-cbor** over the parsed SX AST (not
|
||
the raw text). This buys:
|
||
|
||
- **Sub-AST addressing for free.** Each nested structure has an implicit CID; IPLD can
|
||
walk paths like `<file-cid>/components/card`. The "file CID *and* component CID"
|
||
question dissolves: every node is a CID, you choose the granularity at reference
|
||
time.
|
||
- **Polyglot canonicalization.** JS, OCaml, Python only need to agree on AST shape +
|
||
CBOR's deterministic encoding (RFC 8949 §4.2.1). No byte-identical pretty-printer
|
||
required across hosts.
|
||
- **Format immunity.** Reformatting, indent changes, equivalent-form normalisations
|
||
do not change the CID.
|
||
- **Tooling fit.** sx-tree already has the parsed form in memory; computing or
|
||
verifying a CID is just an encode + hash.
|
||
|
||
Costs accepted:
|
||
- One spec to maintain: SX↔CBOR mapping (number → CBOR int/float, string → text,
|
||
symbol → tag, keyword → tag, list → array, dict → map). ~50 lines of code per host.
|
||
- Author's exact source text is not preserved; re-pretty-print on fetch.
|
||
- "Why don't these CIDs match" requires comparing CBOR (a `cid-explain` tool helps).
|
||
|
||
The CID format itself is multicodec-agile: the substrate also accepts `raw`,
|
||
`dag-json`, `dag-pb`, etc. when seen, dispatched via the codec registry.
|
||
|
||
---
|
||
|
||
## 3. Kernel surface (fixed — get this right)
|
||
|
||
The kernel is the only thing that's hard to change later. Everything else is in
|
||
registries. Two envelope shapes plus five operations.
|
||
|
||
### 3.1 Activity envelope
|
||
|
||
```
|
||
{ id, type, actor, published,
|
||
to, cc, audience-extras,
|
||
object | target | origin | result, # AP slots, opaque to kernel
|
||
capabilities-required: [...], # so receivers can refuse cleanly
|
||
proofs: [...], # OTS, on-chain, multi-sig — all opaque
|
||
signature: { key-id, algorithm, value, covered-fields } }
|
||
```
|
||
|
||
### 3.2 Object envelope
|
||
|
||
```
|
||
{ id, type, cid, media-type,
|
||
where: inline | cid | url,
|
||
content?, link? } # only one populated based on `where`
|
||
```
|
||
|
||
### 3.3 Kernel verbs
|
||
|
||
The only verbs implemented directly by the kernel:
|
||
|
||
- **Append signed activity** to outbox (after envelope check + sig verify + validator
|
||
pipeline).
|
||
- **Verify signature** against actor's published keys, time-aware (which key was
|
||
active at `published`).
|
||
- **Fetch** by `id` or by `cid`.
|
||
- **Receive at inbox** (verify + dispatch to registered handlers).
|
||
- **Replay log** to rebuild registries on boot.
|
||
|
||
Everything else is registry-resolved.
|
||
|
||
---
|
||
|
||
## 4. Registries
|
||
|
||
Each registry has a default-populated set (loaded from genesis-bundled CIDs) and
|
||
accepts new entries via `Define*` activities. Default entries themselves are SX
|
||
artifacts — versioning, audit, replacement work the same way as user content.
|
||
|
||
| Registry | Bootstrap defaults | Extended by |
|
||
|----------|-------------------|-------------|
|
||
| **Activity types** | `Create`, `Update`, `Delete`, `Announce` | `DefineActivity{type, schema-sx, semantics-sx}` |
|
||
| **Object types** | `SXArtifact`, `Note`, `Image`, `Tombstone` | `DefineObject{type, schema-sx, render-hint}` |
|
||
| **Validators** | envelope shape, signature, type-schema | `DefineValidator{applies-to, predicate-sx}` |
|
||
| **Projections** | identity, by-type, by-cid, by-actor, actor-state, define-registry, audience-graph, by-object | `DefineProjection{name, fold-sx, query-sx}` |
|
||
| **Codecs** | dag-cbor, raw, dag-json | `DefineCodec{multicodec, encode-sx, decode-sx}` |
|
||
| **Hash algorithms** | sha2-256 | multihash table — agile by spec |
|
||
| **Transports** | http-inbox-push | `DefineTransport{name, deliver-sx, receive-sx}` |
|
||
| **Audience predicates** | `Public`, `Followers`, direct | `DefineAudience{name, member-of-sx}` |
|
||
| **Subscription types** | `Follow` (AP-standard) | `DefineSubscription{name, schema-sx, match-sx, delivery}` |
|
||
| **Proof types** | (none) | `DefineProof{type, attach-sx, verify-sx}` |
|
||
| **Storage backends** | files-on-disk | `DefineStorage{where-tag, put-sx, get-sx}` |
|
||
| **Triggers** | (none) | `DefineTrigger{when-subscription, then-sx, cascade-limit}` |
|
||
| **Signature suites** | rsa-sha256 (AP-compatible) | `DefineSigSuite{name, sign-sx, verify-sx}` |
|
||
| **Application bundles** | (none) | `DefineApplication{name, subscriptions, triggers, projections, storage}` |
|
||
|
||
Adding `Pin`, `Endorse`, `Supersede`, `Test`, `Build`, `Compose`, etc. later is just
|
||
publishing `DefineActivity` artifacts — no kernel diff, no redeploy required if
|
||
registries are hot.
|
||
|
||
---
|
||
|
||
## 5. The meta-level
|
||
|
||
A `DefineActivity` is itself an AP `Create` activity over an `SXArtifact` of a
|
||
specific type:
|
||
|
||
```sx
|
||
(activity 'Create
|
||
:object {:type "DefineActivity"
|
||
:name "Pin"
|
||
:schema (fn (act)
|
||
(and (string? (-> act :object :path))
|
||
(cid? (-> act :object :cid))))
|
||
:semantics
|
||
'(fn (act state)
|
||
(assoc-in state [:pins (-> act :object :path)]
|
||
(-> act :object :cid)))})
|
||
```
|
||
|
||
When the kernel receives an activity with `type: "Pin"` it looks up the registered
|
||
semantics from a `DefineActivity{name: "Pin"}` artifact, runs the SX, projects the new
|
||
state. The semantics are themselves content-addressed and federated — every receiver
|
||
runs the same code.
|
||
|
||
Same pattern handles `DefineProjection`, `DefineValidator`, etc. The substrate is
|
||
genuinely self-extending.
|
||
|
||
---
|
||
|
||
## 6. Verbs
|
||
|
||
### 6.1 Bootstrap verbs (milestone 1)
|
||
|
||
The substrate exposes `POST /activity` (not `POST /publish`) — generalised entry
|
||
point that takes any well-formed AP activity, validates, signs, appends to outbox.
|
||
`(publish sx)` is sugar at the SX layer for `Create{SXArtifact}`.
|
||
|
||
Day-one verbs (cost ~zero once `/activity` exists):
|
||
|
||
- **`Create`** — the publish primitive.
|
||
- **`Update`** — supersede a previous activity (correct metadata, change a path
|
||
mapping). Distinct from "publishing new content" — new content is always a new
|
||
`Create` with a new CID.
|
||
- **`Delete`** — tombstone. AP-native; readers honour it.
|
||
- **`Announce`** — boost another actor's artifact into your outbox. Comes free.
|
||
- **`Subscribe`** — generalised subscription verb (parallel to publish/`Create`).
|
||
Wraps any registered `DefineSubscription` type. `Follow` is the standard AP
|
||
`Subscribe{Follow{actor: ...}}` for wire compatibility. See §18.
|
||
- **`Unsubscribe`** — `Undo` of a prior `Subscribe`. Same shape as AP
|
||
`Undo{Follow}`.
|
||
|
||
### 6.2 Custom verbs (designed-for, defined later)
|
||
|
||
Substrate accepts these from day one (any signed activity can be appended); semantics
|
||
projected once `DefineActivity` artifacts exist.
|
||
|
||
- **`Pin`** — assign `domain:path/name → CID`. The future name-resolution layer made
|
||
of activities. Each pin is signed; the resolver replays the outbox to compute current
|
||
state.
|
||
- **`Endorse`** (modelled on `Like`/`Approve`) — third-party signature on a CID.
|
||
Web-of-trust style code review without central authority.
|
||
- **`Supersede`** — "CID A replaces CID B". Stronger than `Update`; readers can chase
|
||
the chain.
|
||
- **`Test`** — published assertion that running CID A under conditions X yields result
|
||
Y. Test-as-artifact, federated.
|
||
- **`Build`** — links a source CID to a compiled-output CID, with provenance.
|
||
- **`Compose`** — derived artifact citing input CIDs. Provenance graph in the outbox
|
||
itself.
|
||
- **`Note`** (AP-native) — comments / reviews / discussion attached to a CID.
|
||
- **`Follow`** / **`Undo(Follow)`** — subscribe to another instance's outbox.
|
||
|
||
The pattern that matters: your outbox isn't just "things published," it's an
|
||
**append-only log of every assertion this actor makes about the SX universe.**
|
||
|
||
---
|
||
|
||
## 7. Capability discovery
|
||
|
||
Two pieces:
|
||
|
||
- **`GET /.well-known/sx-capabilities`** — JSON listing every registered activity-type,
|
||
object-type, codec, transport, sig-suite, proof-type. Each with the CID of the
|
||
`Define*` artifact that introduced it. Peers can diff capabilities before federating.
|
||
- **`capabilities-required`** field on activities — sender declares "this needs `Pin`
|
||
semantics + `dag-cbor` codec." Receivers without those capabilities return a clean
|
||
422 referencing the missing CIDs; sender knows whether to replay-and-deliver the
|
||
bootstrapping `Define*` artifacts first.
|
||
|
||
Federation degrades gracefully across instances at different versions.
|
||
|
||
---
|
||
|
||
## 8. Axes of flexibility (all designed-for)
|
||
|
||
1. **Object types** beyond SXArtifact — `Note`, `Article`, `Image`, `Video`, `Question`,
|
||
`Event`, etc. via the object-type registry.
|
||
2. **Storage tier per-object** — `where: inline | cid | url`. Tiny things inline; big
|
||
things to IPFS; legacy stuff URL-linked. Migrating storage backends doesn't migrate
|
||
the substrate.
|
||
3. **Multihash + multicodec agility** — sha2-256 + dag-cbor by default; substrate
|
||
accepts blake3, raw, dag-json, dag-pb, etc.
|
||
4. **Multi-key actors** — `publicKeys` array always; per-key `purpose`; multiple key
|
||
types (RSA for AP wire compat, Ed25519 modern). See §9.
|
||
5. **Audience / visibility** — AP-native `to`, `cc`, `bto`, `bcc`. Public, followers,
|
||
direct, unlisted. Custom audiences via `DefineAudience`.
|
||
6. **Outbox-as-database** — no source-of-truth other than the log. Projections are
|
||
recomputable views.
|
||
7. **Programmable activities** — activities can carry SX. Reactive federation,
|
||
conditional pins, automated propose/test/release pipelines, all expressed as AP
|
||
activities.
|
||
8. **Federation transport pluggable** — outbox is canonical; how peers exchange is
|
||
pluggable (HTTP push, pull, libp2p, polling).
|
||
9. **Optional timestamp proofs** — every activity has an attachable `proofs` slot.
|
||
OpenTimestamps, on-chain merkle commit, third-party TSA all slot in without changing
|
||
activity semantics.
|
||
|
||
Explicitly **not** pursuing for MVP:
|
||
- Schema-version negotiation (premature; `@context` handles extension).
|
||
- Configurable conflict-resolution per actor (last-signed-wins, log preserved for
|
||
audit).
|
||
- Verb-specific kernel handlers (other than `Create`'s "compute CID, store body").
|
||
|
||
---
|
||
|
||
## 9. Identity & actor lifecycle
|
||
|
||
### 9.1 Actor doc shape
|
||
|
||
```jsonld
|
||
{
|
||
"@context": ["https://www.w3.org/ns/activitystreams",
|
||
"https://w3id.org/security/v1",
|
||
"https://next.rose-ash.com/ns/fed-sx/v1"],
|
||
"type": "Person", // or Service, Group, Application
|
||
"id": "https://next.rose-ash.com/actors/giles",
|
||
"preferredUsername": "giles",
|
||
"inbox": "https://next.rose-ash.com/actors/giles/inbox",
|
||
"outbox": "https://next.rose-ash.com/actors/giles/outbox",
|
||
"followers": "...",
|
||
"following": "...",
|
||
|
||
"publicKeys": [ // ARRAY from day one — never `publicKey`
|
||
{ "id": "...#key-2026-05",
|
||
"type": "RsaVerificationKey2018",
|
||
"owner": "<actor-id>",
|
||
"publicKeyPem": "...",
|
||
"purpose": ["sign-activity", "sign-http"],
|
||
"created": "2026-05-14T...",
|
||
"expires": null,
|
||
"supersedes": null,
|
||
"supersededBy": null },
|
||
{ "id": "...#key-ed25519-2026-05",
|
||
"type": "Ed25519VerificationKey2020",
|
||
"owner": "<actor-id>",
|
||
"publicKeyMultibase": "z6Mk...",
|
||
"purpose": ["sign-activity"],
|
||
"created": "2026-05-14T..." }
|
||
],
|
||
|
||
"capabilities": "https://.../actors/giles/capabilities", // what verbs they speak
|
||
"alsoKnownAs": ["did:web:rose-ash.com:giles", ...], // bridge to DID, AP migration
|
||
"movedTo": null // set on Move
|
||
}
|
||
```
|
||
|
||
Key shape decisions:
|
||
|
||
- **`publicKeys` array always.** Single-key actors have an array of length 1. AP
|
||
standard `publicKey` is *also* served as the first array element for back-compat
|
||
with vanilla AP servers (Mastodon etc. ignore the array).
|
||
- **Per-key `purpose`** — separates signing weight. Day-to-day publish key vs. high-
|
||
value key for `Pin`/`Endorse` vs. delegated machine key. Validators can require
|
||
specific purposes per activity type (registry-driven).
|
||
- **Multiple key types** — RSA for AP wire compat, Ed25519 for everything else
|
||
(smaller, faster, modern). Sig suite registry decides which suites are accepted.
|
||
- **`supersedes` / `supersededBy`** — keys form a chain, not a snapshot. Old activities
|
||
still verify against historical keys.
|
||
|
||
### 9.2 Key rotation
|
||
|
||
Key rotation is itself an activity, signed by the *old* key (or a recovery key):
|
||
|
||
```sx
|
||
(activity 'Update
|
||
:object actor-id
|
||
:patch {:add-publicKey new-key
|
||
:supersede {old-key-id new-key-id}})
|
||
```
|
||
|
||
Kernel:
|
||
1. Fetches actor's current state (a projection over their own outbox).
|
||
2. Verifies activity is signed by a key with `purpose: rotate-key` (or any active key,
|
||
if registry allows).
|
||
3. Appends. The actor-state projection now has the new key.
|
||
|
||
Old activities still verify because the projection retains the historical key with
|
||
`supersededBy` set — sig verification looks up "what keys were active at activity
|
||
timestamp T."
|
||
|
||
### 9.3 Key recovery / loss
|
||
|
||
- **Recovery key** — separate key at actor creation, never used except to rotate.
|
||
Stored offline. `purpose: ["recover"]`. Validator allows
|
||
`Update{actor, patch: rotate-all-keys}` if signed by a recovery key.
|
||
- **Social recovery** — designate N trusted actors, M-of-N can co-sign a recovery
|
||
`Update`. Implemented as a `DefineValidator` extension; multi-sig slot in `proofs`
|
||
makes it possible without changing the envelope.
|
||
- **Total loss** — if both signing and recovery keys are gone, the actor is dead.
|
||
They publish a new actor with `alsoKnownAs: <old-actor-id>` from a fresh key.
|
||
Followers can choose to re-follow but there's no cryptographic continuity.
|
||
|
||
### 9.4 Migration (`Move`)
|
||
|
||
AP-native:
|
||
|
||
```sx
|
||
(activity 'Move
|
||
:object old-actor-id
|
||
:target new-actor-id)
|
||
```
|
||
|
||
Receivers update their follow lists. New actor's `alsoKnownAs` must include old
|
||
actor — bidirectional handshake prevents hijacking.
|
||
|
||
For fed-sx, `Move` should also carry an outbox migration hint (CID of an export bundle)
|
||
so receivers can re-anchor projections without re-fetching activity-by-activity.
|
||
|
||
### 9.5 Subordinate actors / delegation
|
||
|
||
Two patterns supported:
|
||
|
||
- **Service actors** (AP-native `type: Service`): bots, build servers, test runners.
|
||
Their own keys, their own outboxes, but `attributedTo` a parent actor.
|
||
- **Capability tokens**: parent publishes `Authorize{actor: child, capabilities: [...],
|
||
expires: ...}` signed by parent. Child publishes activities normally with their own
|
||
key; receivers verify the capability chain when child invokes an authority they don't
|
||
own outright. Useful for: temporary publish access, delegated `Pin` rights for a
|
||
specific path prefix, multi-device.
|
||
|
||
Both work *without* new kernel mechanism — just activities.
|
||
|
||
### 9.6 Implications
|
||
|
||
- **Sig verification is timestamp-aware.** Verifying an old activity needs the key
|
||
state at the time it was published — actor-state projection must support time-travel
|
||
queries.
|
||
- **Inbox doesn't trust `keyId` blindly.** Fetches actor doc, projects current key
|
||
state, checks key was valid at `published`.
|
||
- **Cross-instance identity via `alsoKnownAs` and DIDs.** Don't depend on DIDs but
|
||
slot them in for Bluesky-bridge, Solid-bridge, etc.
|
||
|
||
---
|
||
|
||
## 10. Projection model
|
||
|
||
The architectural commitment: **state is what you get when you fold pure SX over the
|
||
log.** No DB-of-record. Everything queryable is a projection.
|
||
|
||
### 10.1 What a projection is
|
||
|
||
A `DefineProjection` activity registers four things:
|
||
|
||
```sx
|
||
(activity 'Create
|
||
:object {:type "DefineProjection"
|
||
:name "actor-state"
|
||
:initial-state {} ; pure SX value
|
||
:fold (fn (state activity) ; pure SX
|
||
(case (:type activity)
|
||
"Create" (when (= "Person" (-> activity :object :type))
|
||
(assoc state (:id activity) (:object activity)))
|
||
"Update" (apply-patch state activity)
|
||
"Move" (set-moved state activity)
|
||
state))
|
||
:snapshot-codec "dag-cbor"
|
||
:indexes [{:by :id} {:by :preferredUsername}]})
|
||
```
|
||
|
||
- **`name`** — query handle. Unique per actor; collisions resolved by CID + supersession.
|
||
- **`initial-state`** — pure SX value used as state-zero.
|
||
- **`fold`** — pure SX function `(state activity) → state`. The only thing the kernel
|
||
calls.
|
||
- **`indexes`** — optional hint for materializing lookup paths.
|
||
|
||
The CID of the `DefineProjection` artifact is the projection's identity. Two instances
|
||
running the same projection are running the same CID's `fold` over the same log slice
|
||
— equivalence is decidable.
|
||
|
||
### 10.2 The fold contract — purity, determinism, gas
|
||
|
||
The fold function must be **pure and deterministic**. Non-negotiable; it's what makes
|
||
cross-instance equivalence and replay possible.
|
||
|
||
- **No IO.** No HTTP, no file access, no DB calls, no clock. The activity carries its
|
||
own `published` timestamp.
|
||
- **No randomness.** No host-seeded PRNG. (If pseudo-randomness is needed, seed from
|
||
the activity's CID — deterministic across hosts.)
|
||
- **No mutation outside the returned state.**
|
||
- **Bounded execution.** Each fold call gets a gas budget (default tunable, e.g. 100k
|
||
CEK steps). Exceeding it is a hard failure.
|
||
|
||
Enforced at the SX evaluator level by running folds in a sandboxed environment with
|
||
the IO platform stripped to nothing. Same sandbox model applies to validators and
|
||
trigger semantics.
|
||
|
||
**Cross-host equivalence guarantee:** for the same projection CID + same activity log
|
||
slice, every conforming SX host (JS, OCaml, Python, Haskell-on-SX, …) must produce a
|
||
state value with the same canonical CID. Tested via the spec test suite.
|
||
|
||
### 10.3 Bootstrap projections
|
||
|
||
The kernel cannot start without some projections, because the kernel itself uses them.
|
||
Baked into the genesis bundle (see §11), superseded only by deliberate kernel-version
|
||
upgrades.
|
||
|
||
| Projection | What it computes | Used by |
|
||
|------------|------------------|---------|
|
||
| `activity-log` | Identity — every activity, indexed by id and CID | Everything |
|
||
| `by-type` | `type → ordered list of activity-CIDs` | Most queries |
|
||
| `by-actor` | `actor-id → ordered list of activity-CIDs` | Per-actor outbox view |
|
||
| `by-object` | `object-CID → list of referencing activity-CIDs` | "Who pinned this?" |
|
||
| `actor-state` | `actor-id → current actor doc with key history` | Sig verification (kernel) |
|
||
| `define-registry` | `kind+name → currently-active Define* CID` | All other Define* lookups |
|
||
| `audience-graph` | `actor → followers/following` | Federation push |
|
||
|
||
`define-registry` is the bootstrap chicken-and-egg: it's the projection that knows
|
||
which projections (and validators, codecs, etc.) are currently active. Kernel ships
|
||
with it hardcoded; once running, every other projection (including a future replacement
|
||
of `define-registry` itself) is a regular `DefineProjection` superseding it.
|
||
|
||
### 10.4 Snapshotting
|
||
|
||
Replaying the entire log on every restart is unacceptable past day one.
|
||
|
||
- **Snapshot = `(activity-tip-CID, projection-state, projection-CID)` tuple,**
|
||
dag-cbor encoded, content-addressed.
|
||
- **Snapshot rule** — every K activities (default 1000) and every T seconds (default
|
||
60), serialize, hash, store on disk.
|
||
- **Resume** — on startup, find latest snapshot for each (projection-CID, log-tip),
|
||
load state, fold forward.
|
||
- **Snapshot CID is verifiable** — anyone with the same log slice and projection-CID
|
||
can recompute and check the CID matches. This is the cross-instance agreement proof.
|
||
|
||
Snapshots are themselves publishable as activities (`Create{Snapshot}`): an instance
|
||
can publish "here's my computed state for projection X at log-tip Y, CID Z." Other
|
||
instances can fetch and use as a starting point. **Federated state sharing falls out of
|
||
federated activities.**
|
||
|
||
Snapshots are pruning-friendly: keep latest + snapshots referenced by published
|
||
`Create{Snapshot}` activities; everything else is GC-able.
|
||
|
||
### 10.5 Reprojection on definition change
|
||
|
||
When `DefineProjection{name: "actor-state"}` is superseded by a new CID with a
|
||
different fold:
|
||
|
||
1. `define-registry` projection sees the supersession; its state advances.
|
||
2. New projection materialized **alongside** the old one — both kept live during
|
||
migration.
|
||
3. New projection runs in catch-up mode: replay from genesis (or from deepest
|
||
compatible snapshot).
|
||
4. When new projection catches up to log tip, queries cut over. Old projection state
|
||
can be retired.
|
||
5. Snapshots of old version stay around as long as referenced (e.g. for time-travel
|
||
queries against historical state under old semantics).
|
||
|
||
Changing a projection definition is **safe and online**. Cost: temporary state
|
||
duplication during catch-up. Slow folds → slow migrations, but never breakage.
|
||
|
||
For projections too expensive to fully reproject, `Update{DefineProjection}` can
|
||
declare `migrationHint: <fn from old-state to new-state>` — opt-in, used at migrator's
|
||
risk.
|
||
|
||
### 10.6 Time-travel queries
|
||
|
||
Folds are deterministic functions of `(initial-state, activity-list-prefix)`.
|
||
Time-travel is fold-up-to:
|
||
|
||
- `state-as-of(projection, activity-id-or-timestamp)` → walk to requested point,
|
||
return state.
|
||
- Snapshots act as accelerators (resume from nearest snapshot ≤ target).
|
||
- Used by sig verification ("what keys did this actor have when this activity was
|
||
signed?"), audit, "what did we believe last Tuesday."
|
||
|
||
### 10.7 Projection composition
|
||
|
||
**Projections do not directly read each other's state during folding.** Preserves
|
||
locality and parallelism — every projection runs independently against the same log.
|
||
|
||
Composition via:
|
||
|
||
- **Query time** — `(query (projection actor-state) ...)` joins are SX expressions
|
||
over multiple projection states.
|
||
- **Republishing as activities** — a projection that exposes its state as input to
|
||
others publishes `Create{Snapshot}` periodically. Downstream projections fold over
|
||
those.
|
||
|
||
Direct cross-projection reads during fold introduce ordering, cycles, cache-
|
||
invalidation problems we don't need.
|
||
|
||
### 10.8 Querying
|
||
|
||
Three layers:
|
||
|
||
- **Raw projection state** — `GET /projections/<name>?at=<timestamp>` returns dag-cbor
|
||
(also JSON for tooling). Large states paginated by index.
|
||
- **SX queries** — `POST /query` with an SX expression that runs against one or more
|
||
projection states in pure mode. Equivalent to Datalog/GraphQL.
|
||
- **Materialized indexes** — declared on projection (`indexes:` field). Kernel
|
||
maintains as side-tables for `O(log n)` lookup.
|
||
|
||
Real-time: clients `GET /projections/<name>/subscribe` (SSE), receive deltas as
|
||
activities land. Delta is `(old-state, new-state, applied-activity-CID)`; clients can
|
||
verify by re-folding.
|
||
|
||
### 10.9 Lag, async, concurrency
|
||
|
||
- **Append is sync; projection is async.** `POST /activity` returns once activity is
|
||
durably in the log. Projections run in a separate worker pool; query results carry
|
||
`projected-up-to` so callers know whether the latest write is visible.
|
||
- **One worker per projection.** Folds are sequential, but projections run in parallel
|
||
with each other.
|
||
- **Sync option** — `POST /activity?wait-for=projection-name` blocks until the named
|
||
projection has folded the new activity. Use sparingly.
|
||
|
||
### 10.10 Failure modes
|
||
|
||
| Failure | Response |
|
||
|---------|----------|
|
||
| **Gas exhaustion** | Activity tagged `projection-failed` for this projection. State unchanged. Operator alert. |
|
||
| **SX runtime error** (assertion, type mismatch) | Same as gas: activity skipped, error logged, state unchanged. |
|
||
| **Schema violation** | Caught earlier in validation pipeline, never reaches projection. |
|
||
|
||
The log itself is always written successfully if it passes envelope + signature +
|
||
validator checks. Projection failures don't gate appending — that would couple writes
|
||
to arbitrary user-defined code.
|
||
|
||
### 10.11 Operational implications
|
||
|
||
- **Projection determinism is the linchpin.** If JS and OCaml ever produce different
|
||
state for the same log + projection, federation cracks. Spec test suite must cover
|
||
projection equivalence across hosts as a first-class requirement.
|
||
- **Snapshots are eventual consensus.** Two instances publish `Create{Snapshot}` for
|
||
the same log+projection; if their CIDs match, they agree without coordination.
|
||
- **Kernel reads its own projections.** `actor-state` for sig verification;
|
||
`define-registry` for every Define* lookup. Startup sequence must bootstrap these
|
||
before serving traffic.
|
||
- **Reprojection cost is real.** Heavy projection changes mean replaying from genesis.
|
||
Encourage incremental schemas (small per-activity work, idempotent updates) and
|
||
provide profiling.
|
||
|
||
---
|
||
|
||
## 11. Sandbox & determinism
|
||
|
||
The runtime contract that makes folds (and validators, triggers, semantics) safe to
|
||
execute, and that guarantees every conforming SX host computes the same state from
|
||
the same log.
|
||
|
||
### 11.1 Three sandbox levels
|
||
|
||
Different registry entries need different power. We define three nested execution
|
||
modes; the registry entry declares which mode it requires.
|
||
|
||
| Mode | Used by | IO | Clock | Random | Determinism |
|
||
|------|---------|----|----|--------|-------------|
|
||
| **pure** | folds, validators, audience predicates, semantics, trigger `when-sx` | none | activity's own `published` only | seeded from activity CID only | required across hosts |
|
||
| **crypto** | sig suite verify, codec encode/decode | crypto primitives only | none | sign-only secure RNG | required across hosts (verify); single-host (sign) |
|
||
| **effectful** | storage backends, transports, trigger `then-sx`, some proof verifiers | per-capability grant only | host clock | host RNG | not required; single-host |
|
||
|
||
Default mode is **pure**. The other two are opt-in at registration time, and the
|
||
registration is itself a signed activity — anyone can audit which extensions claim
|
||
which powers.
|
||
|
||
### 11.2 Pure sandbox (the load-bearing one)
|
||
|
||
This is the mode every projection fold runs in. It must produce identical results on
|
||
every conforming SX host, every time.
|
||
|
||
**Allowed:**
|
||
- All spec primitives in `spec/primitives.sx` that don't perform IO (arithmetic,
|
||
comparison, predicates, string ops, collection ops, dict ops, format helpers).
|
||
- The activity being processed (full envelope), as the function's argument.
|
||
- The current state value, as the function's argument.
|
||
- A small set of fed-sx-specific deterministic primitives:
|
||
- `(activity-cid act)` → CID of the activity envelope
|
||
- `(activity-time act)` → ISO timestamp from `published`
|
||
- `(actor-state-as-of state-snapshot actor-id activity-time)` → if the projection
|
||
has been declared dependent on `actor-state` (see §10.7), reads from a snapshot
|
||
of that projection at the activity's timestamp
|
||
- `(seeded-rng cid)` → deterministic PRNG seeded from a CID, returns a stream of
|
||
uniform values
|
||
|
||
**Forbidden:**
|
||
- All IO: HTTP, file, network, stdin/stdout, environment.
|
||
- Wall-clock access. The host's `now` is not in scope; the only time available is
|
||
`(activity-time act)`.
|
||
- Host-seeded randomness. Only `seeded-rng` (CID-derived) is available.
|
||
- Mutation outside the returned value. Enforced by the SX evaluator's lack of
|
||
ambient mutable bindings; folds may use local `let` and mutation within their own
|
||
closure but cannot reach outside.
|
||
- Calling other registry entries by name. Composition happens at query time, not
|
||
fold time (see §10.7).
|
||
|
||
**Enforced by:** evaluator runs the fold with the IO platform stripped to nothing.
|
||
The fed-sx kernel constructs a `pure-platform` (no fetch, no query, no action, no
|
||
DOM, no storage) and uses it as the sole evaluator platform when calling the fold.
|
||
Any IO primitive call raises a hard error caught as a fold failure.
|
||
|
||
### 11.3 Crypto sandbox
|
||
|
||
Sig suites and codec encode/decode need hash + crypto + encoding primitives but
|
||
nothing else. They're still deterministic across hosts (verify case) but get a
|
||
narrower platform than effectful, wider than pure.
|
||
|
||
**Additional primitives over pure:**
|
||
- `(sha2-256 bytes)`, `(sha3-256 bytes)`, `(blake3 bytes)`, …
|
||
- `(rsa-verify pubkey msg sig)`, `(ed25519-verify pubkey msg sig)`, …
|
||
- `(rsa-sign privkey msg)`, `(ed25519-sign privkey msg)` — sign-only; requires the
|
||
caller to supply a secure RNG handle (which is *not* in pure mode)
|
||
- `(cbor-encode value)`, `(cbor-decode bytes)` — for codecs implementing CBOR variants
|
||
- `(base32-encode bytes)`, `(base58btc-encode bytes)`, `(multibase-encode tag bytes)`
|
||
- `(multihash-encode tag digest-bytes)`, `(multihash-decode bytes)`
|
||
- `(cid-encode codec mhash)`, `(cid-decode bytes)`
|
||
|
||
**Sign vs verify:** verify is pure (deterministic). Sign is not — it consumes
|
||
randomness. fed-sx draws a clean line: signing happens *outside* registry-entry SX
|
||
(it's an operation the kernel/runtime performs on behalf of the actor with their
|
||
private key); registry SX only ever *verifies*. This keeps the pure↔crypto distinction
|
||
tractable.
|
||
|
||
### 11.4 Effectful sandbox
|
||
|
||
Storage backends, transports, trigger `then-sx`, and proof verifiers that need the
|
||
network (e.g. blockchain RPC for on-chain proof verification) all need real IO.
|
||
These are not used to compute projected state; they're how the substrate interacts
|
||
with the outside world.
|
||
|
||
**Capability-granted primitives.** The registration activity declares the
|
||
capabilities the entry needs:
|
||
|
||
```sx
|
||
(activity 'Create
|
||
:object {:type "DefineStorage"
|
||
:where-tag "ipfs"
|
||
:capabilities [{:type "http-client" :allowlist ["http://localhost:5001/*"]}
|
||
{:type "fs-read" :path-prefix "/var/cache/fed-sx/ipfs/"}
|
||
{:type "fs-write" :path-prefix "/var/cache/fed-sx/ipfs/"}]
|
||
:put-sx (fn (cid bytes) ...)
|
||
:get-sx (fn (cid) ...)})
|
||
```
|
||
|
||
**Capability types** (initial set; extensible):
|
||
|
||
- `http-client` with `allowlist` (URL prefix patterns)
|
||
- `http-server` with `path-prefix` (mounts a sub-handler)
|
||
- `fs-read` / `fs-write` with `path-prefix` (chroot-style)
|
||
- `subprocess` with `command-allowlist`
|
||
- `clock-read` (wall clock; granted if registry entry needs to timestamp something)
|
||
- `random-bytes` (host CSPRNG)
|
||
|
||
**No ambient authority.** Default capability set is empty; every capability is
|
||
explicit, declared, signed, and auditable. A peer can refuse to load a registry
|
||
entry whose capability claim is unacceptable to them.
|
||
|
||
**Capabilities are content-addressed.** Each capability descriptor has a CID. The
|
||
substrate maintains a registry of "capability CIDs that this instance trusts to
|
||
honour" — operator policy, not protocol.
|
||
|
||
### 11.5 Gas and resource accounting
|
||
|
||
Each sandbox call gets a budget:
|
||
|
||
- **CEK gas** — every evaluator step costs 1 unit; primitive calls cost a per-
|
||
primitive amount declared in `spec/primitives.sx`. Default budget: 100k units per
|
||
fold call. Tunable per-projection via `DefineProjection.gas-limit`.
|
||
- **Memory ceiling** — peak heap size for the fold call. Default 64 MB. Tunable.
|
||
- **IO budget** (effectful only) — bytes read/written and network calls per
|
||
invocation, granted separately per capability.
|
||
- **Wall-clock budget** (effectful only) — max real-time before forced termination.
|
||
|
||
Exceeding any budget is a hard failure; the call returns an error value, the fold's
|
||
state is unchanged, and the activity is tagged for the projection.
|
||
|
||
Gas accounting is part of the spec — every conforming host must charge the same
|
||
units for the same operations, so "this fold runs out of gas" is a deterministic
|
||
property of the (projection, activity) pair, not a host-specific outcome.
|
||
|
||
### 11.6 Determinism gotchas
|
||
|
||
The pure sandbox is only as deterministic as its primitives. Worth nailing:
|
||
|
||
- **Floating point.** IEEE 754 binary operations are bitwise-identical across
|
||
conforming hosts, but transcendentals (`sin`, `cos`, `log`, `exp`) are *not* —
|
||
libm implementations differ. **Decision: floats are forbidden in pure mode unless
|
||
the projection declares `requires-deterministic-floats: true` and uses only the
|
||
IEEE 754 basic operations (+, -, *, /, sqrt, comparison, conversion).** For exact
|
||
arithmetic, use integers or rationals (fed-sx will provide a rational primitive).
|
||
- **Map / dict iteration order.** Must be sorted-key always in pure mode. The SX
|
||
spec mandates this for `for-each` and `map` over dicts; we tighten it: pure mode
|
||
forbids relying on insertion order.
|
||
- **String encoding.** All strings are UTF-8 NFC at ingestion; pure-mode operations
|
||
use byte-level comparison after normalization. Codepoint operations (`length`,
|
||
`substring`) return identical results across hosts because they operate on the
|
||
normalized form.
|
||
- **Integer overflow.** Pure mode uses arbitrary-precision integers (the SX spec
|
||
default). No undefined behaviour. Overflow is impossible.
|
||
- **Equality.** Structural equality (`equal?`) compared across hosts must yield the
|
||
same result for the same canonical-CID values. Implies dict equality is
|
||
order-independent (as it should be), and float equality follows IEEE 754 (NaN ≠
|
||
NaN; +0.0 = -0.0).
|
||
- **Error values.** When a primitive errors, the error must be representable as a
|
||
dag-cbor value with a stable CID across hosts. Reserve a `{:error :type ... :msg
|
||
...}` shape; standard error types defined in the spec.
|
||
|
||
### 11.7 Failure model
|
||
|
||
A pure-mode call ends in one of three terminal states:
|
||
|
||
1. **Success** — returns a value. Fold uses it as new state.
|
||
2. **Sandbox violation** — IO attempted, capability denied, etc. Returns a stable
|
||
error value; fold's state is unchanged; activity tagged
|
||
`{:projection-failed :reason :sandbox-violation :detail ...}`.
|
||
3. **Resource exhaustion** — gas, memory, IO budget exceeded. Same handling as
|
||
sandbox violation but with `:reason :resource-exhausted`.
|
||
|
||
Crypto-mode failures (e.g. invalid signature) are *return values*, not exceptions —
|
||
verify returns boolean, sign returns either a sig or an error. This forces callers
|
||
to handle failure explicitly.
|
||
|
||
Effectful-mode failures (network down, disk full) propagate to the operator as
|
||
errors but never affect projected state. The substrate retries effectful operations
|
||
according to the registry entry's policy (declared at registration).
|
||
|
||
### 11.8 Conformance testing
|
||
|
||
Cross-host equivalence isn't aspirational; it's tested.
|
||
|
||
- **Spec test suite** ships projection equivalence tests: a corpus of (log slice,
|
||
projection CID, expected snapshot CID) tuples. Every conforming SX host must
|
||
produce the expected snapshot CID for each input.
|
||
- **Validator equivalence tests** likewise: (validator CID, activity, expected
|
||
result).
|
||
- **Codec equivalence tests:** (codec CID, value, expected encoded bytes), in both
|
||
encode and decode directions.
|
||
- **Sandbox isolation tests:** "this fold attempts to call `fetch`; expected
|
||
outcome: sandbox violation error with stable CID."
|
||
|
||
Hosts run the conformance suite to claim "fed-sx pure-mode conformance." Failures
|
||
are publishable as `Test{result: failed, host: ..., projection: ...}` activities —
|
||
the conformance graph itself is federated.
|
||
|
||
### 11.9 Operational implications
|
||
|
||
- **The pure sandbox is the heart of cross-host federation.** Every divergence is a
|
||
spec bug or a host bug; both are caught by snapshot CID mismatches and surfaced
|
||
via `Test` activities.
|
||
- **Capability descriptors are the new audit trail.** "What can the IPFS storage
|
||
backend do?" is a question with a precise answer at any timestamp — the registered
|
||
capability CIDs.
|
||
- **Floats are mostly absent.** This is unusual but defensible — most state in the
|
||
substrate is ids, counts, sets, references. Numerical computation belongs in
|
||
effectful registry entries (e.g. an analytics projection that publishes summaries
|
||
as activities, projected by a downstream pure projection that just stores them).
|
||
- **Gas is part of the protocol.** Two hosts disagreeing about whether a fold runs
|
||
out of gas is a conformance failure. Spec primitive gas costs are normative.
|
||
|
||
## 12. Bootstrap & genesis
|
||
|
||
How a fresh instance starts with no log, where the initial registry entries come
|
||
from, and how the kernel evolves without bricking peers.
|
||
|
||
### 12.1 The genesis problem
|
||
|
||
The substrate is "everything is a `Define*` activity in the log." But on a fresh
|
||
instance the log is empty — so there are no `Define*` activities to tell the kernel
|
||
what `Create` means, how to verify a signature, or what dag-cbor is. Strict
|
||
turtles-all-the-way-down would deadlock startup.
|
||
|
||
Solution: **the kernel ships with a baked-in genesis bundle** containing the minimal
|
||
set of definitions it needs to interpret its own log. The bundle is a constant of
|
||
the kernel binary; its CID is hardcoded; the kernel verifies on startup that the
|
||
bundle matches its hardcoded CID. After that, everything (including superseding the
|
||
bundled definitions themselves) goes through the activity log.
|
||
|
||
The genesis bundle is *not* itself a federated artifact in the AP sense. It's the
|
||
dictionary you need before you can read any activities. Optionally, an actor can
|
||
`Create{GenesisRecord}` as their first published activity to advertise which genesis
|
||
they started from — informational, not load-bearing.
|
||
|
||
### 12.2 Genesis bundle contents
|
||
|
||
Minimal viable bundle (dag-cbor object, content-addressed):
|
||
|
||
```
|
||
{
|
||
"type": "fed-sx-genesis",
|
||
"kernel-version": "1.0.0",
|
||
"envelope-spec": { ... }, // canonical schema for activity envelope
|
||
"object-spec": { ... }, // canonical schema for object envelope
|
||
"definitions": {
|
||
"activity-types": {
|
||
"Create": { "schema": <sx>, "semantics": <sx> },
|
||
"Update": { "schema": <sx>, "semantics": <sx> },
|
||
"Delete": { "schema": <sx>, "semantics": <sx> },
|
||
"Announce": { "schema": <sx>, "semantics": <sx> }
|
||
},
|
||
"object-types": {
|
||
"SXArtifact": { "schema": <sx> },
|
||
"Note": { "schema": <sx> },
|
||
"Tombstone": { "schema": <sx> },
|
||
"DefineActivity": { "schema": <sx> },
|
||
"DefineObject": { "schema": <sx> },
|
||
"DefineProjection": { "schema": <sx> },
|
||
"DefineValidator": { "schema": <sx> },
|
||
"DefineCodec": { "schema": <sx> },
|
||
"DefineTransport": { "schema": <sx> },
|
||
"DefineAudience": { "schema": <sx> },
|
||
"DefineProof": { "schema": <sx> },
|
||
"DefineStorage": { "schema": <sx> },
|
||
"DefineTrigger": { "schema": <sx> },
|
||
"DefineSigSuite": { "schema": <sx> },
|
||
"Snapshot": { "schema": <sx> }
|
||
},
|
||
"sig-suites": {
|
||
"rsa-sha256-2018": { "verify": <sx>, "key-format": <sx> },
|
||
"ed25519-2020": { "verify": <sx>, "key-format": <sx> }
|
||
},
|
||
"codecs": {
|
||
"dag-cbor": { "encode": <sx>, "decode": <sx> },
|
||
"raw": { "encode": <sx>, "decode": <sx> },
|
||
"dag-json": { "encode": <sx>, "decode": <sx> }
|
||
},
|
||
"projections": {
|
||
"activity-log": { "initial-state": ..., "fold": <sx> },
|
||
"by-type": { "initial-state": ..., "fold": <sx> },
|
||
"by-actor": { "initial-state": ..., "fold": <sx> },
|
||
"by-object": { "initial-state": ..., "fold": <sx> },
|
||
"actor-state": { "initial-state": ..., "fold": <sx> },
|
||
"define-registry": { "initial-state": ..., "fold": <sx> },
|
||
"audience-graph": { "initial-state": ..., "fold": <sx> }
|
||
},
|
||
"validators": {
|
||
"envelope-shape": { "predicate": <sx> },
|
||
"signature": { "predicate": <sx> },
|
||
"type-schema": { "predicate": <sx> }
|
||
},
|
||
"audience-predicates": {
|
||
"Public": { "member-of": <sx> },
|
||
"Followers": { "member-of": <sx> },
|
||
"Direct": { "member-of": <sx> }
|
||
}
|
||
},
|
||
"capability-types": [ // schema for capability descriptors
|
||
"http-client", "http-server",
|
||
"fs-read", "fs-write",
|
||
"subprocess", "clock-read", "random-bytes"
|
||
]
|
||
}
|
||
```
|
||
|
||
Each definition's body is **SX source**, not bytecode. The kernel evaluates it at
|
||
startup using the same SX evaluator user-published `Define*` artifacts use — there
|
||
is no privileged "native" path. The bootstrap is just SX loaded from the binary
|
||
instead of from the log.
|
||
|
||
### 12.3 Hardcoded CID and verification
|
||
|
||
The kernel binary contains:
|
||
|
||
- The full genesis bundle (embedded as bytes).
|
||
- The CID computed over those bytes at build time.
|
||
|
||
On startup:
|
||
|
||
1. Compute the actual CID of the embedded bundle.
|
||
2. Compare to the hardcoded CID.
|
||
3. **Mismatch → refuse to start.** Either the binary has been tampered with or the
|
||
build process is broken. Either way, the operator should know immediately.
|
||
4. **Match → proceed.** Every running instance with a given kernel binary has
|
||
byte-identical bootstrap state — no version drift possible within a binary.
|
||
|
||
The genesis CID is exposed at `GET /.well-known/sx-capabilities` so peers can see
|
||
which kernel version they're talking to.
|
||
|
||
### 12.4 Fresh instance startup sequence
|
||
|
||
```
|
||
1. Load and verify genesis bundle (panic on mismatch)
|
||
2. Parse all definition SX sources, instantiate evaluator closures
|
||
3. Initialize registries from definitions (in the order: codecs → sig-suites →
|
||
validators → object-types → activity-types → audience-predicates → projections)
|
||
4. Open log file (create if missing)
|
||
5. Replay any existing log: for each activity, validate, then fold into each
|
||
projection (resuming from snapshots where available)
|
||
6. Load or generate actor keypair (filesystem path from config)
|
||
7. If actor has never published a Create{Person} for itself, generate and append
|
||
one as the first activity of this instance's outbox
|
||
8. Initialize HTTP server, wire routes
|
||
9. Open inbox: start accepting federated activities
|
||
10. Mark instance as ready
|
||
```
|
||
|
||
Steps 1-3 are the bootstrap. Step 5 is replay-and-project. Step 7 is the
|
||
"actor genesis" — every instance has at least one local actor; it publishes itself
|
||
as its first activity, and that activity (signed by the actor's own key) anchors all
|
||
subsequent activity from that actor.
|
||
|
||
### 12.5 First activity — actor creation
|
||
|
||
Every fresh actor's outbox starts with:
|
||
|
||
```sx
|
||
(activity 'Create
|
||
:id "https://next.rose-ash.com/actors/giles/activities/<uuid>"
|
||
:actor "https://next.rose-ash.com/actors/giles"
|
||
:published "<iso-timestamp>"
|
||
:to ["https://www.w3.org/ns/activitystreams#Public"]
|
||
:object <full actor doc with publicKeys array>
|
||
:signature <signed by the new key over the activity envelope>)
|
||
```
|
||
|
||
Self-signed: the activity introduces the key it's signed with. Verifiers fetch the
|
||
actor doc embedded in the activity, find the key, verify against the activity. This
|
||
is the trust-on-first-encounter for a new actor — the same model AP uses.
|
||
|
||
The kernel emits this automatically on first startup if the actor has no prior
|
||
activity. Subsequent actor changes (key rotation, profile updates) are `Update`
|
||
activities signed by an existing key.
|
||
|
||
### 12.6 Joining federation
|
||
|
||
A new instance has no peers initially. Discovery is operator-driven for v1:
|
||
|
||
1. Operator configures one or more peer URLs (or a well-known seed list).
|
||
2. Instance fetches peer's actor doc and `/.well-known/sx-capabilities`.
|
||
3. Instance verifies it can interpret the peer's activities (envelope compatible,
|
||
sig suites overlap). Reports incompatibilities to operator.
|
||
4. If compatible, instance follows peer's primary actor (`POST /inbox` with a
|
||
`Follow` activity).
|
||
5. Peer streams or backfills outbox to this instance.
|
||
6. Activities arrive, validate, fold into local projections.
|
||
|
||
Discovery beyond manual config (e.g. peer recommendations, federation directories)
|
||
is a v2 concern.
|
||
|
||
### 12.7 Kernel version evolution
|
||
|
||
The substrate must evolve without forcing every instance to upgrade in lockstep.
|
||
Three rules:
|
||
|
||
**Rule 1: The activity envelope shape is forward-compatible only.**
|
||
|
||
We may *add* optional fields to the envelope; we may not change semantics or remove
|
||
fields. Old activities still validate under new kernels. New activities with new
|
||
fields are accepted by old kernels (which ignore the unknown fields, store the raw
|
||
envelope, and project conservatively).
|
||
|
||
This is the AP discipline. We adopt it strictly. If we ever need a breaking envelope
|
||
change, it's a major version (fed-sx 2.0) and instances at different majors don't
|
||
federate directly — only via bridges.
|
||
|
||
**Rule 2: Everything else evolves via supersession.**
|
||
|
||
New sig suite, new codec, new projection definition, new validator: publish a
|
||
`Define*` activity that supersedes the old one. Both old and new versions stay valid
|
||
at their respective timestamps. Old activities verify under old definitions; new
|
||
activities use new definitions. Time-aware lookup (§9.6, §10.6) makes this work.
|
||
|
||
**Rule 3: New genesis bundles supersede old ones via published activities.**
|
||
|
||
When the kernel team ships a new version with an updated bundle:
|
||
|
||
- The new bundle's CID is different.
|
||
- Operators upgrading the kernel get the new bundle automatically.
|
||
- The new bundle's *contents* are largely supersession `Update{DefineProjection,
|
||
DefineValidator, ...}` activities relative to the old bundle's definitions.
|
||
- A peer running the old kernel sees these `Update` activities (when they appear in
|
||
followed outboxes) and *can* opt to load them dynamically (§12.8) or stay on the
|
||
old bundle definitions until the operator upgrades.
|
||
|
||
In other words: the kernel binary evolution and the activity-log evolution are
|
||
parallel tracks. The binary determines what's *built in*; the log determines what's
|
||
*currently active*. They converge over time but don't have to be lockstep.
|
||
|
||
### 12.8 Dynamic Define* loading
|
||
|
||
When an instance receives an activity of `type: "PinV3"` and has no `DefineActivity{
|
||
name: "PinV3"}` in its define-registry, it has three options (operator policy):
|
||
|
||
- **Strict mode** — store the activity envelope (it's valid AP), tag it `unknown-type`
|
||
in `by-type`, do not project semantics. Operator must explicitly load the
|
||
definition to enable projection.
|
||
- **Permissive mode** — fetch the `DefineActivity{name: "PinV3"}` artifact (its CID
|
||
is in the activity's `capabilities-required` list), validate, evaluate the
|
||
semantics SX (in pure sandbox), reproject the activity. Operator notified.
|
||
- **Trusted-peers-only mode** — like permissive, but only auto-loads `Define*` from
|
||
actors on a configured trust list.
|
||
|
||
Default for fed-sx v1: **strict mode**. Operators opt-in to broader policies.
|
||
|
||
This lets the substrate genuinely live-extend — new verbs land via federation, no
|
||
binary upgrade — while keeping a clean audit trail of what got loaded when.
|
||
|
||
### 12.9 Genesis as the substrate's manifest
|
||
|
||
A useful framing: the genesis bundle is the substrate's **manifest** (in the package-
|
||
manager sense). It declares "this kernel ships with these definitions, identified by
|
||
these CIDs, and this is what the kernel does until the log says otherwise."
|
||
|
||
Two instances with the same genesis CID start identical. Two instances with
|
||
different genesis CIDs can federate as long as their *active* registry states (after
|
||
log replay) overlap enough.
|
||
|
||
The genesis bundle is also the **conformance reference**: a kernel implementation
|
||
claims fed-sx v1.0 conformance by reproducing the standard genesis bundle's CID
|
||
from its own build of the included SX sources. If two implementations build the same
|
||
spec sources and produce different CIDs, one of them is non-conformant. Cheap,
|
||
deterministic conformance check.
|
||
|
||
### 12.10 Operational implications
|
||
|
||
- **Build-time CID computation is part of the kernel build.** The build pipeline
|
||
must include the genesis-bundling step and embed the resulting CID. Mismatch
|
||
protection requires the binary to know what it expects.
|
||
- **Genesis evolution is a deliberate kernel-team decision.** Adding a new bundled
|
||
projection or sig suite is a kernel release, not a federated activity. (User-
|
||
defined projections still federate normally.)
|
||
- **Strict-mode default protects against malicious extensions.** Operators have to
|
||
consciously opt into auto-loading remote `Define*`. This trades convenience for
|
||
security — appropriate for v1.
|
||
- **Cross-major federation is a bridge problem.** If/when fed-sx 2.0 ships with an
|
||
envelope change, bridges between v1 and v2 are themselves federated artifacts —
|
||
built by anyone, signed, audited.
|
||
|
||
## 13. Federation mechanics
|
||
|
||
How instances exchange activities, how peers subscribe, how new followers backfill,
|
||
how delivery survives unreliable networks, and how the substrate resists abuse.
|
||
|
||
### 13.1 Push, pull, hybrid
|
||
|
||
ActivityPub canonically uses **push**: actor A publishes by POSTing each delivery to
|
||
each follower's inbox URL. This gives low latency and clear delivery semantics, but
|
||
requires a reliable per-recipient delivery queue and falls over when peers go down.
|
||
|
||
fed-sx supports both, with a **push-primary, pull-fallback** model:
|
||
|
||
- **Push** is the default delivery mechanism. When an activity is appended to A's
|
||
outbox, A's delivery worker posts it to each follower's inbox.
|
||
- **Pull** is always available: any peer can `GET /actors/<id>/outbox?since=<cursor>`
|
||
and stream activities in order. Used for backfill, recovery from delivery gaps,
|
||
and instances that prefer pull-only operation.
|
||
- **Hybrid in practice:** push delivers *notifications* (the activity itself, or a
|
||
pointer to its CID); receivers may pull the full content if not inlined. Useful
|
||
when the activity body is large.
|
||
|
||
Operators can configure their actors as push-only, pull-only, or hybrid. The
|
||
default is hybrid.
|
||
|
||
### 13.2 The Follow lifecycle
|
||
|
||
AP-standard, slightly tightened:
|
||
|
||
```sx
|
||
;; A wants to follow B
|
||
(activity 'Follow
|
||
:actor "https://a.example/actors/alice"
|
||
:object "https://b.example/actors/bob")
|
||
;; → POST to B's inbox
|
||
|
||
;; B accepts (or rejects)
|
||
(activity 'Accept
|
||
:actor "https://b.example/actors/bob"
|
||
:object <follow-activity-id-or-embedded>)
|
||
;; → POST to A's inbox
|
||
|
||
;; A unfollows later
|
||
(activity 'Undo
|
||
:actor "https://a.example/actors/alice"
|
||
:object <follow-activity-id-or-embedded>)
|
||
;; → POST to B's inbox
|
||
```
|
||
|
||
State derived by the `audience-graph` projection on each instance:
|
||
|
||
- `(followers actor)` — set of actors who follow `actor`, projected from
|
||
`Accept{Follow}` activities in `actor`'s outbox (and the inverse via received
|
||
`Follow` activities).
|
||
- `(following actor)` — symmetric.
|
||
|
||
**Auto-accept by default.** Public actors auto-publish `Accept` for any incoming
|
||
`Follow`. Locked actors require manual approval, implemented as an operator UI that
|
||
publishes the `Accept` (or `Reject`) once a human decides.
|
||
|
||
### 13.3 Backfill
|
||
|
||
When A first follows B, A wants B's history. Four supported modes:
|
||
|
||
| Mode | Mechanism | Trade-off |
|
||
|------|-----------|-----------|
|
||
| **No backfill** | Just stream new activities going forward | Cheapest, missing context for new followers |
|
||
| **Pull paginated** | `GET /outbox?since=epoch&limit=100` repeatedly | Standard, slow for large outboxes |
|
||
| **Snapshot fetch** | Find latest `Create{Snapshot}` published by B for the projection of interest, fetch + verify, then pull only activities after the snapshot's tip | Fast, requires B to publish snapshots |
|
||
| **Bundle fetch** | Out-of-band: B publishes a CID for an export bundle (a dag-cbor list of activities + actor doc + sig suite verification metadata); A fetches once, validates the chain, replays | Fastest for cold starts; bundle creation is opt-in |
|
||
|
||
Default: snapshot fetch when available, paginated pull otherwise.
|
||
|
||
A new instance joining federation typically combines: snapshot-fetch the
|
||
`actor-state` and `define-registry` projections from a trusted peer (so it knows who
|
||
exists and what verbs are defined), then incrementally backfill specific actors of
|
||
interest.
|
||
|
||
### 13.4 Delivery queue and retry
|
||
|
||
Every push delivery attempt has a fate:
|
||
|
||
| Outcome | Action |
|
||
|---------|--------|
|
||
| 2xx | Mark delivered |
|
||
| 3xx | Follow redirect (with limit) |
|
||
| 4xx (except 429) | Mark *permanently failed* — peer rejected the activity. Log; don't retry. |
|
||
| 429 | Honour `Retry-After`; reschedule |
|
||
| 5xx | Exponential backoff; reschedule |
|
||
| Connection error | Exponential backoff; reschedule |
|
||
|
||
**Retry schedule** (default, tunable per peer):
|
||
|
||
```
|
||
1 min, 5 min, 15 min, 1 h, 4 h, 12 h, 24 h, 48 h, 96 h
|
||
```
|
||
|
||
After the last attempt fails, the activity is **abandoned for push** but remains in
|
||
A's outbox. Followers can still pull it via `GET /outbox?since=...`. The peer will
|
||
eventually catch up if they come back online and pull. Push is best-effort; pull is
|
||
the source of truth.
|
||
|
||
**Persistent queue.** Delivery state is itself stored in the local instance — it's
|
||
operator-internal, not federated. (Could be a regular SQLite table; doesn't need to
|
||
be a projection because it's not state-the-world-cares-about.) On instance restart,
|
||
the queue resumes from where it left off.
|
||
|
||
**Queue-as-projection (alternative):** for instances that want every aspect to be
|
||
log-derived, the delivery state could be a local-only projection over a stream of
|
||
`Attempt` / `DeliverySuccess` / `DeliveryFailure` activities written to a private
|
||
local-only outbox. Out of scope for v1 but the design admits it.
|
||
|
||
### 13.5 Audience-respecting delivery
|
||
|
||
Each activity carries `to`, `cc`, `bto`, `bcc`. The delivery worker computes the
|
||
**delivery set**: union of explicit recipients + (if `as:Public` or `Followers` in
|
||
audience) the actor's followers projection.
|
||
|
||
- `bto` and `bcc` are stripped before delivery (recipients shouldn't see who else is
|
||
blind-copied).
|
||
- **Receivers honour audience.** When an instance receives an activity it should
|
||
not be in the audience for (e.g. a `Direct` activity to someone else, leaked via a
|
||
misconfigured peer), it logs and discards. Validators in the inbound pipeline
|
||
enforce this.
|
||
- **Public ≠ unlisted.** `to: as:Public` means deliver to followers AND make
|
||
publicly fetchable AND show in public projections. Some actors prefer "publicly
|
||
fetchable but not pushed broadly" — `cc: as:Public` with `to: Followers`.
|
||
|
||
### 13.6 Spam and abuse posture
|
||
|
||
ActivityPub has well-known abuse vectors (Mastodon's history is instructive). fed-sx
|
||
defends in layers:
|
||
|
||
**Signature verification.** Every inbound activity must have a valid signature
|
||
matching an actor whose key was active at `published`. Forgeries are dropped at the
|
||
envelope-validation stage (§14). Necessary but not sufficient — signatures only
|
||
prove the message wasn't tampered with, not that the sender is benign.
|
||
|
||
**Per-source rate limits.** Per-actor and per-instance request rate limits on
|
||
`/inbox`. Default: 100/min per actor, 1000/min per instance. Exceeded → 429.
|
||
|
||
**Per-instance trust state.** Three categories, operator-configured (and
|
||
overridable per actor):
|
||
|
||
- **Trusted** — auto-accept, auto-load Define* (if permissive mode), no rate-
|
||
multiplier penalty.
|
||
- **Default** — accept signed activities, standard rate limits, do not auto-load
|
||
Define*.
|
||
- **Suspended** — drop all inbound activities, refuse outbound delivery, do not
|
||
fetch artifacts. Operator decision (e.g. spam source, harassment instance).
|
||
|
||
Trust state is local-only (operator policy); it is not federated. Different
|
||
instances can disagree.
|
||
|
||
**Audience refusal.** Activities not addressed to anyone on this instance (no local
|
||
followers, not `as:Public`, not `to:` a local actor) are dropped on receipt.
|
||
Discourages spam targeting random instances.
|
||
|
||
**Content validators.** Registry-driven content moderation: a `DefineValidator`
|
||
with `applies-to: "inbound"` runs against every inbound activity and can reject
|
||
based on content rules. Examples: link-spam detection, ML moderation models served
|
||
via an effectful validator (note: effectful validators are a special case — they
|
||
*can* fail-closed without affecting determinism, because validators happen *before*
|
||
projection and don't contribute to projected state).
|
||
|
||
**Capability vetting.** If an inbound activity declares `capabilities-required`
|
||
that includes definitions this instance hasn't loaded *and* trust policy is strict-
|
||
mode, the activity is quarantined (stored but not projected) pending operator
|
||
review.
|
||
|
||
**Federation circuit breakers.** Per-peer error rate triggers temporary defederation:
|
||
if a peer is sending malformed activities, exceeding rate limits, or signing with
|
||
revoked keys, automatic suspension for an exponential cool-off.
|
||
|
||
### 13.7 Discovery
|
||
|
||
How an instance finds other instances and actors:
|
||
|
||
- **WebFinger** (RFC 7033). `GET /.well-known/webfinger?resource=acct:user@host`
|
||
returns links to actor URLs. AP-standard. fed-sx implements.
|
||
- **Well-known capabilities.** `GET /.well-known/sx-capabilities` (§7) for cross-
|
||
instance compatibility checks.
|
||
- **Manual peer config.** Operators add peer instance URLs to their config.
|
||
- **Peer recommendations.** An instance can publish `Recommend{actor}` activities
|
||
pointing at peers it considers worth following. Receivers can use these as
|
||
discovery hints (subject to local trust). Out of scope for v1 but the verb is
|
||
reservable.
|
||
- **Federation directories.** Community-maintained lists of instances; an instance
|
||
can opt into being listed by publishing a `Directory{listed-by}` activity. v2
|
||
concern.
|
||
|
||
For v1: WebFinger + capabilities + manual config. Discovery beyond that is opt-in
|
||
via standard verbs.
|
||
|
||
### 13.8 Streaming and real-time
|
||
|
||
Two streaming mechanisms:
|
||
|
||
- **Outbox SSE** — `GET /actors/<id>/outbox/stream` opens a Server-Sent Events
|
||
connection. Each new activity appended to the outbox is sent as an event. Allows
|
||
pull-style federation peers to maintain a live connection without polling.
|
||
- **Projection SSE** — `GET /projections/<name>/subscribe` (§10.8) streams projection
|
||
deltas. Useful for clients (browsers) wanting reactive views.
|
||
|
||
Both are local-only mechanisms; the canonical federation transport remains push to
|
||
inbox + pull from outbox. SSE is convenience, not protocol.
|
||
|
||
### 13.9 Operational implications
|
||
|
||
- **Push is best-effort, pull is authoritative.** Operators should treat the outbox
|
||
as the canonical record; delivery queue is bookkeeping.
|
||
- **Trust is per-instance and not federated.** Two instances may have different
|
||
views of "good actors" and "bad instances." This is a feature — defederation
|
||
decisions are local sovereignty.
|
||
- **Backfill via snapshots is the cheap path.** Encouraging actors to publish
|
||
`Create{Snapshot}` regularly makes new-follower onboarding fast.
|
||
- **Audience semantics are enforced both ways.** Senders compute delivery set;
|
||
receivers honour audience. Defence-in-depth against misconfigured peers.
|
||
- **Capability-based extension loading is opt-in.** Strict-mode default means
|
||
unknown verbs are stored-but-not-projected — safe by default, with explicit
|
||
operator control over what extensions load.
|
||
|
||
## 14. Validation pipeline
|
||
|
||
Every activity entering the substrate (whether published locally or received from a
|
||
peer) flows through a fixed pipeline of checks. Order matters: cheap and fail-safe
|
||
first, expensive and content-aware last. Each stage has a defined failure response
|
||
(reject, quarantine, drop). Registry-driven validators plug in at a specific stage.
|
||
|
||
### 14.1 The two pipelines
|
||
|
||
**Inbound** — activities arriving via `POST /inbox` or pulled from a peer's outbox:
|
||
|
||
```
|
||
HTTP transport → envelope → signature → replay → audience →
|
||
activity-type schema → object-type schema → content validators →
|
||
capabilities → trust state → log append → projection (async)
|
||
```
|
||
|
||
**Outbound** — activities being published locally via `POST /activity`:
|
||
|
||
```
|
||
authentication → authorization → envelope construction → object handling →
|
||
activity-type schema → signature → log append → projection (async) →
|
||
delivery (async)
|
||
```
|
||
|
||
Stages they share are implemented as the same SX functions called from both pipelines.
|
||
|
||
### 14.2 Inbound pipeline — stage by stage
|
||
|
||
| # | Stage | Check | Failure response |
|
||
|---|-------|-------|------------------|
|
||
| 1 | **Transport** | Valid HTTP request, content-type acceptable, body parseable as JSON-LD or dag-cbor | `400 Bad Request`; log |
|
||
| 2 | **Envelope** | Matches kernel's envelope spec (required fields present, types valid, recognised activity type or `unknown` allowed) | `400`; log; structured error in response body |
|
||
| 3 | **Signature** | Time-aware sig verification: fetch (or cache-lookup) actor doc, find key with `id == sig.key-id` that was active at `published`, verify against canonical envelope bytes per the named sig suite | `401`; log; do not retry; mark sender's instance for circuit-breaker accounting |
|
||
| 4 | **Replay** | Activity id and CID not already in `activity-log` projection | `200 OK` with `{status: "duplicate"}`, no-op |
|
||
| 5 | **Audience** | This instance has at least one local actor in `to`/`cc`, OR audience contains `as:Public`/`Followers` and the actor has local followers | Drop silently (no response indicating either acceptance or refusal — prevents inbox-membership probing); do not store |
|
||
| 6 | **Activity-type schema** | Look up `DefineActivity{name: <type>}` in `define-registry`; run its `schema` predicate over the activity in pure sandbox | If type unknown: per trust policy (strict: 422 with missing-definition CID; permissive: attempt dynamic load §12.8). If schema fails: 422 with violation detail |
|
||
| 7 | **Object-type schema** | If activity has an `object` with a `type`, look up `DefineObject{name: <type>}` and run its `schema` | Same as #6 |
|
||
| 8 | **Content validators** | All registered validators with `applies-to: inbound` or `applies-to: all` run sequentially; each is a pure-sandbox predicate that returns `:accept` / `:reject` / `:quarantine` | `:reject` → 422 with reason. `:quarantine` → store activity but mark `quarantined`, do not project, alert operator |
|
||
| 9 | **Capabilities** | Every CID in `capabilities-required` is present in this instance's loaded registries (or auto-loadable per trust policy) | Missing → 422 with list of missing CIDs (sender can deliver bootstrapping `Define*` artifacts first). Auto-load attempt can be triggered by re-POST with `?retry-after-load=true` |
|
||
| 10 | **Trust state** | Sender's actor and instance are not in `Suspended` state on this instance | Drop silently; do not respond |
|
||
| 11 | **Log append** | Write activity envelope (and inlined object content) to local mirror of sender's outbox; assign local sequence number | Disk error → 503 (transient); sender retries |
|
||
| 12 | **Projection** | Asynchronously fold the activity into every relevant projection (per `define-registry`) | Per-projection failure (gas, sandbox violation) → tag activity `projection-failed:<projection-name>`; do not affect log durability |
|
||
|
||
Pipeline halts at the first failing stage. Stages 1–10 are synchronous (`POST /inbox`
|
||
holds the connection). Stage 11 is synchronous; stage 12 is asynchronous and the
|
||
HTTP response returns once the log append succeeds.
|
||
|
||
### 14.3 Outbound pipeline — stage by stage
|
||
|
||
| # | Stage | Check | Failure response |
|
||
|---|-------|-------|------------------|
|
||
| 1 | **Authentication** | Caller has a valid bearer token, mTLS cert, or session for the actor | `401` |
|
||
| 2 | **Authorization** | Caller's identity is allowed to publish as the named `actor` (capability token §9.5 or owns the actor key) | `403` |
|
||
| 3 | **Envelope construction** | Kernel fills in `id`, `published`, normalises `to`/`cc`, computes `capabilities-required` (by walking referenced `Define*` CIDs) | n/a |
|
||
| 4 | **Object handling** | If `object` has inline content: canonicalize, compute CID, optionally store per `where`. If `object` references a CID, verify the artifact exists locally or remotely (or accept as a forward reference) | Storage error → `503` |
|
||
| 5 | **Activity-type schema** | Same as inbound #6 — schema must pass | `422` with violation detail (caller bug) |
|
||
| 6 | **Signature** | Sign envelope with the actor's currently-active key matching the activity type's required `purpose` (e.g. `Pin` requires `purpose: pin`) | If no suitable key: `400` |
|
||
| 7 | **Log append** | Write to local outbox; assign sequence number | `503` |
|
||
| 8 | **Projection** | Async fold (same as inbound #12) | Per-projection failure tag |
|
||
| 9 | **Delivery** | Async push to follower inboxes per audience | Per-recipient retry per §13.4 |
|
||
|
||
Caller's HTTP response returns after stage 7 (log append). The activity is durable
|
||
and queryable as soon as the response is sent; projection lag is reported via
|
||
`projected-up-to` headers and `?wait-for=` parameter.
|
||
|
||
### 14.4 Failure response taxonomy
|
||
|
||
Three response categories with explicit semantics:
|
||
|
||
**Reject** — tell sender, don't store, reject can be retried after sender corrects.
|
||
Used for: malformed envelope, invalid signature, schema violation, missing
|
||
capabilities. HTTP 4xx with structured error.
|
||
|
||
**Quarantine** — store envelope (it's a valid signed message) but don't project,
|
||
alert operator. Used for: content-validator soft-fail, unloaded capabilities under
|
||
permissive policy, suspect-but-not-banned senders. Activity sits in a quarantine
|
||
projection until operator reviews; operator can release (project) or expunge.
|
||
|
||
**Drop silently** — don't store, don't respond informatively. Used for: replay (ack
|
||
as duplicate), audience refusal (would leak inbox membership otherwise), suspended-
|
||
sender activities. The sender experiences this as a successful POST with no visible
|
||
effect; they can detect it only by polling for their activity not appearing in our
|
||
outbox.
|
||
|
||
### 14.5 Registry-driven validators
|
||
|
||
Most of the pipeline is **fixed kernel logic** (envelope, signature, replay, audience,
|
||
log append, delivery). Two stages are **registry-driven** and extend dynamically:
|
||
|
||
- **Stage 8 (content validators)** — operators add/remove `DefineValidator` entries
|
||
with `applies-to: inbound | outbound | all`. Each runs in pure or effectful
|
||
sandbox per its declaration. Returns one of `:accept` / `:reject{:reason}` /
|
||
`:quarantine{:reason}`.
|
||
- **Stages 6–7 (schema validators)** — these *are* registry entries
|
||
(`DefineActivity.schema`, `DefineObject.schema`); the pipeline calls into the
|
||
registry to fetch them.
|
||
|
||
**Pure-mode validators** are deterministic and cheap; results can be cached per
|
||
(activity-CID, validator-CID).
|
||
|
||
**Effectful-mode validators** can call out to ML models, blocklist services,
|
||
external moderation APIs. They get a per-call IO budget; exceeding it counts as
|
||
`:reject{:reason :validator-timeout}`. Effectful validators do *not* break
|
||
determinism because validation happens **before projection** — a rejected activity
|
||
never enters projected state.
|
||
|
||
### 14.6 Validator composition and ordering
|
||
|
||
Validators have an integer `priority` field; lower priority runs first. Pipeline
|
||
short-circuits on first `:reject`. `:quarantine` is *not* short-circuiting; later
|
||
validators still run, and `:quarantine` results aggregate.
|
||
|
||
Default priorities (room for operator-added validators):
|
||
|
||
```
|
||
0-99 : kernel-internal (envelope, sig, replay, audience)
|
||
100-199 : standard schema validators
|
||
200-299 : standard content validators (rate limit, audience leak)
|
||
300-399 : operator-added moderation
|
||
400-499 : effectful (ML, third-party APIs)
|
||
500+ : reserved
|
||
```
|
||
|
||
Operators can publish `Update{DefineValidator}` to change priorities or add new
|
||
ones; takes effect on next inbound activity.
|
||
|
||
### 14.7 Determinism requirement and its limit
|
||
|
||
A subtlety worth being explicit about: **inbound validation is not required to be
|
||
deterministic across instances.** Two instances can disagree about whether to
|
||
accept a given activity (e.g. one has a stricter content validator). Their projected
|
||
states will then diverge — but only on activities one accepted and the other didn't.
|
||
|
||
This is fine. Federation does not require state convergence; it requires *fold
|
||
determinism for activities both instances accepted*. Validators are sovereignty
|
||
controls, not protocol invariants.
|
||
|
||
Where determinism *is* required: schema validators (§14.2 stages 6–7). If two
|
||
instances disagree on whether `Pin v3` matches its schema, they can't federate
|
||
`Pin v3` activities meaningfully. So schema validators must be pure-mode and
|
||
referenced by CID.
|
||
|
||
### 14.8 Operational implications
|
||
|
||
- **The pipeline is the security perimeter.** Every checkable property is checked
|
||
here, not deeper in the kernel. No "trust the caller" assumptions inside log or
|
||
projection code.
|
||
- **Quarantine is the operator's friend.** Anything suspicious sits in quarantine
|
||
with full envelope, sig, and reason — operator can review and decide. Better than
|
||
outright drop because it preserves audit.
|
||
- **Schema validators are protocol-load-bearing; content validators are policy.**
|
||
The first set must converge across instances for federation to work; the second
|
||
set can diverge (and that's how local moderation policy is expressed).
|
||
- **Outbound validation catches local bugs early.** A malformed `Pin` activity
|
||
fails at outbound stage 5, never enters the local log, never gets delivered.
|
||
|
||
## 15. Storage layout
|
||
|
||
The on-disk shape of an instance. Three concerns kept separate: the **activity log**
|
||
(append-only, canonical), **content-addressed object storage** (keyed by CID,
|
||
immutable), and **operational state** (projections, indexes, queues — derived,
|
||
rebuildable).
|
||
|
||
### 15.1 Storage tiers
|
||
|
||
```
|
||
/var/lib/fed-sx/
|
||
├── log/ # canonical, append-only
|
||
│ ├── actors/
|
||
│ │ ├── <local-actor-id>/
|
||
│ │ │ ├── outbox/
|
||
│ │ │ │ ├── 000001.jsonl # segment, ~64MB cap
|
||
│ │ │ │ ├── 000002.jsonl
|
||
│ │ │ │ └── tip # symlink to current segment
|
||
│ │ │ ├── inbox/ # received, pre-projection
|
||
│ │ │ └── seq # next sequence number
|
||
│ │ └── <other-local-actor-id>/...
|
||
│ └── mirrors/ # local mirrors of followed remote outboxes
|
||
│ └── <remote-actor-id-hashed>/
|
||
│ ├── 000001.jsonl
|
||
│ └── ...
|
||
├── objects/ # CID → bytes
|
||
│ └── <cid-prefix-2>/<cid-prefix-2>/<full-cid>
|
||
├── snapshots/
|
||
│ └── <projection-cid>/
|
||
│ ├── <log-tip-cid>.cbor # snapshot value
|
||
│ └── index # ordered list of (log-tip, file)
|
||
├── projections/ # live projection state
|
||
│ └── <projection-cid>.cbor # latest in-memory state, periodically flushed
|
||
├── indexes/
|
||
│ └── fed-sx.db # SQLite: lookups, queue, trust state
|
||
├── keys/
|
||
│ └── <actor-id>/ # private keys, mode 0600
|
||
│ ├── primary.pem
|
||
│ ├── recovery.pem
|
||
│ └── sigs.toml # key metadata
|
||
├── genesis/
|
||
│ └── bundle.cbor # extracted from binary at first run
|
||
└── config.toml # operator config
|
||
```
|
||
|
||
### 15.2 The log — append-only segments
|
||
|
||
The activity log is the only thing the substrate cannot lose. It is the source of
|
||
truth from which everything else is derived.
|
||
|
||
**Format: JSONL segments.** Each line is one activity envelope, encoded as JSON-LD
|
||
(canonical form), terminated by `\n`. Easy to inspect, easy to grep, trivially
|
||
streamable.
|
||
|
||
**Why JSON-LD on disk, not dag-cbor?** Two reasons:
|
||
- Operability: humans can `tail -f` and `grep` the log. dag-cbor is opaque.
|
||
- AP wire compatibility: activities arrive over HTTP as JSON-LD anyway; storing the
|
||
same form avoids round-trip conversion.
|
||
|
||
The CID of each activity is computed from its **canonical dag-cbor representation**
|
||
(per §2), independent of how it's stored. CIDs are stable across storage formats.
|
||
|
||
**Segments cap at ~64MB.** Rotation by size, not time. Old segments are immutable;
|
||
new writes go to the tip segment. Compression (zstd) applied on segments older than
|
||
the current tip — saves disk, doesn't slow appends.
|
||
|
||
**Per-actor outboxes.** Each local actor has its own outbox directory. This matches
|
||
AP semantics (one outbox per actor) and means:
|
||
- Backing up a single actor is a simple directory copy
|
||
- Per-actor sequence numbers (no cross-actor coordination)
|
||
- Migration (`Move`) is a directory rename + a `Move` activity
|
||
|
||
**Mirror outboxes.** When a local actor follows a remote one, the remote's outbox is
|
||
mirrored locally for replay. Same JSONL format. Tracked under `log/mirrors/<hashed-
|
||
remote-id>/` to avoid filesystem path issues with URL characters. The hash is
|
||
purely a filesystem-friendly encoding; the canonical actor id stays in the log
|
||
content.
|
||
|
||
**Inbox vs outbox distinction.** Inboxes hold *received* activities pre-validation;
|
||
outboxes hold *committed* activities post-pipeline. An inbound activity that passes
|
||
the validation pipeline (§14) is moved from inbox to the appropriate mirror outbox.
|
||
This makes inbox a transient queue, not a permanent record.
|
||
|
||
### 15.3 Object storage
|
||
|
||
Content-addressed blob store, sharded directories.
|
||
|
||
**Path scheme:** `objects/<first-2-chars>/<next-2-chars>/<full-cid>`. Sha2-256 CIDs
|
||
are uniformly distributed; this gives ~65k buckets with a couple-hundred files each
|
||
at moderate scale. Standard pattern (matches IPFS, Git).
|
||
|
||
**Storage backends.** Pluggable per `where: cid` object:
|
||
|
||
- **`files-on-disk`** (default) — write to local filesystem.
|
||
- **`ipfs`** — register-driven backend; calls out to a local IPFS node.
|
||
- **`s3`** — object storage in cloud bucket.
|
||
- **`memory-only`** — in-memory cache, evictable; useful for ephemeral artifacts.
|
||
|
||
The kernel uses the `where-tag` on each object to dispatch to the correct backend.
|
||
Backends are registry entries (`DefineStorage`); operators install only the ones
|
||
they want.
|
||
|
||
**Garbage collection** is opt-in per backend. Default policy: **never GC** (objects
|
||
are immutable and may be referenced by future activities). Operators can configure
|
||
per-backend retention rules:
|
||
|
||
- "Keep last N versions of objects referenced by `Pin` activities for path X"
|
||
- "Evict objects not referenced in last 90 days from the `memory-only` cache"
|
||
- "Mirror objects referenced by ≥ 3 endorsements; evict others after 30 days"
|
||
|
||
GC operates on the projected reference graph (a `reference-graph` projection that
|
||
maintains "what activities reference this CID"). Removing an object that's still
|
||
referenced is allowed but produces a warning logged in operations.
|
||
|
||
### 15.4 Snapshots
|
||
|
||
Per §10.4, snapshots are the (projection-CID, log-tip-CID, state) triples that let
|
||
us resume without full replay.
|
||
|
||
**Storage:** `snapshots/<projection-cid>/<log-tip-cid>.cbor`. The state value is
|
||
dag-cbor-encoded; the file's content CID matches the snapshot's claimed CID.
|
||
|
||
**Index:** `snapshots/<projection-cid>/index` is a sorted list of `(log-tip-time,
|
||
log-tip-cid, file)` triples. On startup, kernel finds the latest snapshot ≤ current
|
||
log tip and resumes from it. On time-travel queries, finds the latest snapshot
|
||
≤ target time and folds forward.
|
||
|
||
**Retention:** keep at least:
|
||
- Latest snapshot per active projection
|
||
- Snapshots referenced by published `Create{Snapshot}` activities (federation
|
||
proofs)
|
||
- One snapshot per day for the last 7 days (audit / time-travel)
|
||
|
||
Older snapshots GC'd by default. Operators can increase retention.
|
||
|
||
### 15.5 Operational state — SQLite
|
||
|
||
Things that are derived, frequently-queried, but not federated:
|
||
|
||
- **Lookup indexes** for projections (when `indexes:` declared) — `(projection,
|
||
index-key, value) → activity-cid` rows
|
||
- **Delivery queue** — outbound activities pending push, retry counts, next-attempt
|
||
timestamps
|
||
- **Trust state** — per-actor and per-instance trust levels (Trusted / Default /
|
||
Suspended)
|
||
- **Quarantine queue** — activities pending operator review
|
||
- **Configuration cache** — currently-active registry entries (also in memory; on-
|
||
disk cache for fast restart)
|
||
|
||
Single SQLite file (`indexes/fed-sx.db`). Recoverable: if corrupted or deleted,
|
||
rebuilt from the log on next startup (with cost proportional to log size). The
|
||
SQLite is a cache, not authoritative.
|
||
|
||
WAL mode for concurrent readers. Single-writer (the kernel); reads from many
|
||
HTTP request workers.
|
||
|
||
### 15.6 Backup and export
|
||
|
||
The substrate is an append-only log of immutable artifacts; backup is simple.
|
||
|
||
- **Full backup:** rsync `/var/lib/fed-sx/log/` and `/var/lib/fed-sx/objects/`. The
|
||
rest is rebuildable.
|
||
- **Per-actor export:** tar `log/actors/<actor-id>/` + the objects referenced by
|
||
activities in that outbox. Self-contained, importable into another instance.
|
||
- **Activity bundle export:** for federation backfill, produce a dag-cbor bundle of
|
||
`[activity envelopes... + referenced objects]` for a specified actor + range.
|
||
Single file, content-addressed, signed by the source instance with a `Bundle`
|
||
activity attesting to its contents.
|
||
|
||
Exports are themselves publishable (`Create{Bundle}` activity carrying the bundle
|
||
CID). This is how an actor migrates instances cleanly: export bundle, import on
|
||
new instance, publish `Move` activity.
|
||
|
||
### 15.7 Mirroring and replication
|
||
|
||
Two patterns:
|
||
|
||
- **Federation mirroring** (the canonical kind) — when actor A follows B, A's
|
||
instance mirrors B's outbox locally. This is just normal federation (§13). Each
|
||
follower keeps its own copy.
|
||
- **Operational mirroring** — for high availability. An operator runs two instances
|
||
with shared filesystem (NFS / EFS) for `log/` and `objects/`, separate SQLite
|
||
files. Reads can hit either; writes go through one. Or: rsync-based hot standby
|
||
with manual failover.
|
||
|
||
Operational mirroring is out of scope for v1. Federation mirroring is the substrate-
|
||
level redundancy: as long as one peer that followed you is still online, your log is
|
||
still recoverable.
|
||
|
||
### 15.8 Storage size estimates
|
||
|
||
Rough targets at moderate scale (10 active local actors, 1000 followed peers, 1
|
||
year of activity at 100 activities/actor/day):
|
||
|
||
- **Log:** 10 actors × 100 act/day × 1 KB avg envelope × 365 days ≈ 365 MB local
|
||
outbox. Mirrors: 1000 peers × 10 act/day × 1 KB × 365 ≈ 3.6 GB.
|
||
- **Objects:** depends heavily on content. Assume 50% of activities have inline
|
||
content of avg 5 KB → ~2 GB total inline. CID-referenced larger objects: count
|
||
separately, depends on use case.
|
||
- **Snapshots:** typically much smaller than the log. ~10 active projections ×
|
||
~10 MB per snapshot × ~8 retained snapshots ≈ 800 MB.
|
||
- **SQLite:** index sizes proportional to indexed projection content; typical few
|
||
hundred MB.
|
||
|
||
Total: order of 10 GB at the described scale. Single-machine viable; SSD recommended
|
||
for log throughput; spinning disk fine for snapshots and object storage cold tier.
|
||
|
||
### 15.9 Operational implications
|
||
|
||
- **The log is sacred.** Never modify, never delete. Backups go to multiple media.
|
||
Loss of `log/` means loss of identity (actor activities) and loss of state-of-
|
||
record. Loss of `objects/` means loss of content but log + peers can recover most
|
||
of it.
|
||
- **Everything else is rebuildable.** Projections, indexes, snapshots, queue state
|
||
can all be recomputed from the log at startup cost. Operationally, this means
|
||
upgrades and migrations are forgiving.
|
||
- **CID-addressed storage is naturally idempotent.** Two instances writing the same
|
||
artifact write the same bytes to the same path. Race conditions become no-ops.
|
||
- **JSONL on disk pays for itself** the first time an operator needs to debug a
|
||
weird federation issue with `grep` and `jq`. Worth the storage cost vs dag-cbor.
|
||
|
||
## 16. API surface
|
||
|
||
HTTP API for reading the log, publishing activities, querying projections, and
|
||
streaming updates. Three layers: **AP-standard** endpoints (for vanilla AP
|
||
interop), **fed-sx-specific** endpoints (publish, query, capabilities), and
|
||
**discovery** endpoints (webfinger, well-known).
|
||
|
||
### 16.1 Endpoint catalog
|
||
|
||
#### AP-standard
|
||
|
||
| Method | Path | Purpose |
|
||
|--------|------|---------|
|
||
| GET | `/actors/<id>` | Actor doc (Person/Service/Group/Application) |
|
||
| GET | `/actors/<id>/inbox` | Read inbox — auth required |
|
||
| POST | `/actors/<id>/inbox` | Receive federated activity (HTTP Signature required) |
|
||
| GET | `/actors/<id>/outbox` | OrderedCollection of actor's published activities |
|
||
| POST | `/actors/<id>/outbox` | AP-standard publish (alias for `POST /activity` with `actor` set) |
|
||
| GET | `/actors/<id>/followers` | OrderedCollection of follower actor URIs |
|
||
| GET | `/actors/<id>/following` | OrderedCollection of followed actor URIs |
|
||
| GET | `/activities/<uuid>` | Single activity by id |
|
||
| GET | `/objects/<uuid>` | Single object by id (note: distinct from CID-addressed `/artifacts/<cid>`) |
|
||
|
||
#### fed-sx-specific
|
||
|
||
| Method | Path | Purpose |
|
||
|--------|------|---------|
|
||
| POST | `/activity` | Generalised publish — accepts any well-formed activity |
|
||
| GET | `/artifacts/<cid>` | CID-addressed artifact fetch (content negotiated) |
|
||
| GET | `/artifacts/<cid>/raw` | Raw bytes (whatever the codec stored) |
|
||
| GET | `/artifacts/<cid>/<path>` | IPLD path traversal into the artifact |
|
||
| GET | `/projections` | List of registered projections (name, CID, last-folded-tip) |
|
||
| GET | `/projections/<name>` | Full projection state (paginated for large states) |
|
||
| GET | `/projections/<name>?at=<ts>` | Time-travel: state as of timestamp |
|
||
| GET | `/projections/<name>/<key>` | Single key from a projection (uses indexes) |
|
||
| POST | `/query` | Run an SX query expression against one or more projections |
|
||
| GET | `/define-registry` | Currently active `Define*` artifacts by kind |
|
||
| GET | `/capabilities/<actor-id>` | Per-actor declared capabilities |
|
||
|
||
#### Discovery and well-known
|
||
|
||
| Method | Path | Purpose |
|
||
|--------|------|---------|
|
||
| GET | `/.well-known/webfinger?resource=acct:<user>@<host>` | RFC 7033 actor discovery |
|
||
| GET | `/.well-known/sx-capabilities` | This instance's capability advertisement (§7) |
|
||
| GET | `/.well-known/host-meta` | XRD describing the host |
|
||
| GET | `/.well-known/nodeinfo` | Standard fediverse node metadata (Mastodon, Pleroma compatibility) |
|
||
|
||
#### Real-time (SSE)
|
||
|
||
| Method | Path | Purpose |
|
||
|--------|------|---------|
|
||
| GET | `/actors/<id>/outbox/stream` | New activities as they're appended (events: `activity`) |
|
||
| GET | `/actors/<id>/inbox/stream` | New inbound activities (auth required) |
|
||
| GET | `/projections/<name>/subscribe` | Projection deltas (events: `delta`) |
|
||
| GET | `/federation/health/stream` | Per-peer delivery health (events: `peer-status`) |
|
||
|
||
WebSocket equivalents (`/ws/...` paths) available where SSE is awkward (browsers
|
||
behind proxies); same event payloads, different framing.
|
||
|
||
### 16.2 Authentication
|
||
|
||
Three mechanisms, each appropriate to a different caller type:
|
||
|
||
- **HTTP Signatures** (RFC draft-cavage-http-signatures) — the AP-standard mechanism
|
||
for inter-instance calls. Sender signs a digest of relevant headers + body with
|
||
their actor's private key; receiver verifies via the actor's public keys
|
||
projection (§9.6). Used for: `POST /inbox`, peer-to-peer outbox pulls when
|
||
authentication is desired.
|
||
- **Bearer tokens** — for interactive clients (CLIs, web UIs, mobile apps).
|
||
Issued via OAuth2 (or simple admin-issued tokens for v1). Used for:
|
||
`POST /activity`, `GET /actors/<id>/inbox`, anything requiring caller identity.
|
||
- **Capability tokens** (§9.5) — for delegated publish. Token includes the granting
|
||
actor, the granted capabilities (e.g. `publish: Pin for path-prefix /docs/`), the
|
||
bearer's actor, expiry, and signature from the granter. Used for: child actors,
|
||
service accounts, temporary publish access.
|
||
|
||
Public reads (most GET endpoints to public-audience activities) require no auth.
|
||
Private/followers-only reads check the caller's identity against the audience.
|
||
|
||
### 16.3 Content negotiation
|
||
|
||
Same resource, multiple representations. `Accept` header dispatches:
|
||
|
||
| Accept header | Returns |
|
||
|---------------|---------|
|
||
| `application/activity+json` | AP-standard JSON-LD (default for ambiguous Accepts) |
|
||
| `application/ld+json; profile="..."` | JSON-LD with explicit profile |
|
||
| `application/cbor` | dag-cbor |
|
||
| `application/json` | Plain JSON (compact, no `@context` expansion) |
|
||
| `application/sx` | Canonical SX wire format |
|
||
| `text/html` | HTML representation (for browsers — renders the artifact via SX) |
|
||
|
||
Same negotiation applies to `/artifacts/<cid>`, `/activities/<uuid>`,
|
||
`/projections/<name>`. Servers MUST honour the request; absent `Accept` defaults to
|
||
`application/activity+json`.
|
||
|
||
### 16.4 Pagination
|
||
|
||
Cursor-based via AP's `OrderedCollectionPage`:
|
||
|
||
```
|
||
GET /actors/giles/outbox
|
||
→ {
|
||
"type": "OrderedCollection",
|
||
"totalItems": 12345,
|
||
"first": "/actors/giles/outbox?page=true",
|
||
"last": "/actors/giles/outbox?page=true&min_id=0"
|
||
}
|
||
|
||
GET /actors/giles/outbox?page=true
|
||
→ {
|
||
"type": "OrderedCollectionPage",
|
||
"id": "...?page=true",
|
||
"next": "...?page=true&max_id=<cid>",
|
||
"prev": "...?page=true&min_id=<cid>",
|
||
"orderedItems": [...]
|
||
}
|
||
```
|
||
|
||
Cursors are CIDs of the boundary activity (not opaque tokens). Stable across
|
||
restarts and instances. `max_id` returns activities **before** the cursor (newest
|
||
first); `min_id` returns activities **after** the cursor.
|
||
|
||
Default page size: 50. Max: 1000. `Link: <...>; rel="next"` header also provided
|
||
for HTTP-native pagination.
|
||
|
||
For projections: same shape, items are projection entries.
|
||
|
||
### 16.5 The query API
|
||
|
||
`POST /query` takes an SX expression evaluated in pure mode against named
|
||
projections:
|
||
|
||
```sx
|
||
POST /query
|
||
Content-Type: application/sx
|
||
Accept: application/sx
|
||
|
||
(let ((actors (projection actor-state))
|
||
(pins (projection pin-state)))
|
||
(for-each ([(actor-id actor) actors])
|
||
(when (> (count (filter (fn ((path cid)) (= (:owner cid) actor-id)) pins)) 10)
|
||
{:actor (:preferredUsername actor)
|
||
:pins-published (count ...)})))
|
||
```
|
||
|
||
Query semantics:
|
||
|
||
- Evaluated in pure sandbox; all the determinism rules apply.
|
||
- Projection access is read-only and snapshot-consistent: the query sees state
|
||
as-of the time of the request (or `?at=` if specified).
|
||
- Result is serialized in the negotiated content type.
|
||
- Gas limit applies (default 1M units per query, tunable by operator).
|
||
- Cacheable: query CID + projection state CIDs uniquely determine the result.
|
||
|
||
Query results can themselves be published as `Create{QueryResult}` activities,
|
||
making derived analyses federable.
|
||
|
||
### 16.6 Errors
|
||
|
||
Uniform JSON error envelope:
|
||
|
||
```json
|
||
{
|
||
"error": {
|
||
"type": "https://next.rose-ash.com/ns/fed-sx/errors/v1#InvalidSignature",
|
||
"status": 401,
|
||
"title": "Activity signature invalid",
|
||
"detail": "Key id 'https://example/actors/x#key-1' was superseded at 2026-01-15T...",
|
||
"activity-id": "https://...",
|
||
"key-id": "...#key-1",
|
||
"instance": "/incidents/<incident-cid>"
|
||
}
|
||
}
|
||
```
|
||
|
||
Error types are URIs in the fed-sx namespace; receivers can check `type` for
|
||
programmatic handling. Standard errors:
|
||
|
||
- `MissingCapability` — includes `missing` array of CIDs
|
||
- `SchemaViolation` — includes `schema-cid`, `field-path`, `expected`, `got`
|
||
- `InvalidSignature`
|
||
- `Quarantined` — includes `quarantine-id` for operator-status tracking
|
||
- `RateLimited` — includes `retry-after`
|
||
- `ResourceExhausted` — for query gas exhaustion
|
||
|
||
### 16.7 Streaming details
|
||
|
||
SSE event format:
|
||
|
||
```
|
||
event: activity
|
||
id: <activity-cid>
|
||
data: { ...activity envelope... }
|
||
|
||
event: delta
|
||
id: <activity-cid that triggered the delta>
|
||
data: {"projection": "actor-state", "key": "...", "old": ..., "new": ...}
|
||
|
||
event: heartbeat
|
||
data: {"projected-up-to": "<cid>", "ts": "..."}
|
||
```
|
||
|
||
Clients reconnect with `Last-Event-ID: <cid>` to resume from the last event seen.
|
||
Server replays from that point in the log (or returns 410 if too far behind, in
|
||
which case client should switch to paginated pull).
|
||
|
||
### 16.8 Versioning
|
||
|
||
The substrate is versioned at three levels:
|
||
|
||
- **Envelope version** — declared in `/.well-known/sx-capabilities`. Currently `1`.
|
||
Forward-compatible (new fields OK; semantics fixed).
|
||
- **API version** — URL prefix optional: `/v1/...` works the same as `/...`. Future
|
||
major version: `/v2/...` paths in parallel.
|
||
- **Definition versions** — supersession via activity log (§§9.2, 12.7). No special
|
||
URL handling.
|
||
|
||
Capability negotiation happens before federation; clients shouldn't hard-code
|
||
URL paths beyond the canonical set documented here.
|
||
|
||
### 16.9 Operational implications
|
||
|
||
- **The API is small but layered.** AP compatibility is one layer; fed-sx
|
||
extensions are another; both share auth and content negotiation. Adding a new
|
||
endpoint shouldn't require new transport machinery.
|
||
- **Content negotiation is the polyglot bridge.** Same artifact addressable in JSON-
|
||
LD (for AP peers), dag-cbor (for fed-sx peers), SX (for SX clients), HTML (for
|
||
humans). One CID, four representations.
|
||
- **Cursor pagination is CID-based.** Stable identifiers, no opaque tokens to
|
||
invalidate, peers can synchronize without coordination.
|
||
- **The query API is a load-bearing differentiator.** Datalog/GraphQL-equivalent
|
||
expressiveness with no separate query language — it's just SX. Federable, signable,
|
||
versionable like any other SX artifact.
|
||
|
||
---
|
||
|
||
## 17. Implementation languages
|
||
|
||
Polyglot **authoring**, monoglot **runtime**: every language-on-SX compiles to core
|
||
SX and runs on any host with the SX evaluator. The language is an authoring choice;
|
||
the federated artifact is uniform SX. Authors of `Define*` artifacts pick the
|
||
source language they prefer; consumers don't need that compiler installed to
|
||
execute the compiled SX.
|
||
|
||
Languages are picked because they **genuinely fit the problem**, not to demonstrate
|
||
the polyglot story. Where a chosen language has gaps (e.g. Erlang-on-SX missing hot
|
||
reload), we invest in maturing the port rather than working around the gap.
|
||
|
||
### 17.1 The v1 stack
|
||
|
||
| Layer | Language | Why |
|
||
|-------|----------|-----|
|
||
| **Native primitives** | OCaml (existing runtime) | Crypto (RSA, Ed25519, SHA), dag-cbor encode/decode, HTTP socket, file IO, SQLite. Surfaced as Erlang-on-SX BIFs. |
|
||
| **Kernel orchestration** | Erlang-on-SX | Actor model = federation. `gen_server` per actor / per projection / per peer. `supervisor` for delivery workers. Message passing is literally the substrate. Hot code reload (Phase 7) for `Define*` live extension. |
|
||
| **Query API back-end** | Datalog-on-SX | Projection state is relational; trust graph walks, provenance, projection joins are textbook Datalog. Already mature (276/276 tests, full core Datalog with stratified negation, aggregation, magic sets, federation-graph demo). |
|
||
| **`Define*` semantics, schemas, validators, codecs, audience predicates** | Core SX | The canonical federated language. Everything content-addressed and federated lives here. |
|
||
|
||
### 17.2 Languages explicitly **not** booked for v1
|
||
|
||
Available, mature, considered — would be reached for if a real fed-sx need surfaced,
|
||
but no preemptive use:
|
||
|
||
- **Haskell-on-SX** (285/285 tests, 36 programs, type checker working) — for complex
|
||
operator-authored extensions that benefit from typed pattern matching. Schemas in
|
||
fed-sx are short predicates; types don't earn their keep here.
|
||
- **Smalltalk-on-SX** (625/629 tests, classic corpus running) — natural fit for a
|
||
live operator dashboard / Glamorous-Toolkit-style introspection. v2/v3 territory;
|
||
a browser UI likely wins for operator audiences.
|
||
- **APL-on-SX** — high-throughput batch reprojection if scalar SX folds become a
|
||
bottleneck. Premature without measured need.
|
||
- **JS-on-SX**, **Elm-on-SX** — browser-side client SDK / viewer. v2.
|
||
- **Common Lisp-on-SX**, **Forth-on-SX**, **Go-on-SX**, **Dream-on-SX**,
|
||
**Elixir-on-SX**, **Erlang-on-SX (alternative form)** — case by case if a use
|
||
case appears.
|
||
|
||
### 17.3 The FFI BIF layer
|
||
|
||
Erlang-on-SX has no FFI / NIF mechanism in its current form (Phase 6 plan: "out of
|
||
scope entirely"). fed-sx adds a **BIF layer** in `lib/erlang/transpile.sx` (or a
|
||
dedicated `lib/erlang/fed_bifs.sx`) exposing native primitives:
|
||
|
||
```
|
||
crypto:rsa_verify/3 crypto:ed25519_verify/3
|
||
crypto:sha2_256/1 crypto:sha3_256/1
|
||
|
||
cid:cbor_encode/1 cid:cbor_decode/1
|
||
cid:multihash/2 cid:from_bytes/2
|
||
cid:to_string/1 cid:from_string/1
|
||
|
||
log:append/2 log:read/3
|
||
log:tip/1 log:replay/3
|
||
|
||
http:listen/2 http:request/2
|
||
http:respond/3 http:sse_send/2
|
||
|
||
fs:read/1 fs:write/2
|
||
fs:exists/1 fs:list/1
|
||
|
||
sqlite:open/1 sqlite:exec/2
|
||
sqlite:query/3 sqlite:close/1
|
||
|
||
snapshot:put/3 snapshot:get/2
|
||
```
|
||
|
||
Each BIF is a thin Erlang-on-SX function dispatching to the corresponding SX runtime
|
||
IO primitive. Returns Erlang-shaped values (atoms, tuples, binaries). Errors raise
|
||
appropriate Erlang exceptions (`badarg`, `enoent`, `eaccess`).
|
||
|
||
This is the **only** native-FFI surface in fed-sx. All other I/O goes through these
|
||
BIFs. Operators can audit the BIF list to know exactly what the substrate touches
|
||
outside SX.
|
||
|
||
### 17.4 Build pipeline
|
||
|
||
```
|
||
.sx files (core SX, registry entries) ──┐
|
||
.erl files (Erlang-on-SX kernel) ──┼──> compile to core SX
|
||
.dl files (Datalog-on-SX queries) ──┘
|
||
│
|
||
content-addressed SX artifacts
|
||
│
|
||
▼
|
||
genesis bundle (CID-verified)
|
||
│
|
||
▼
|
||
OCaml runtime evaluates everything
|
||
```
|
||
|
||
Each authoring language's compiler runs at build time, producing core SX that goes
|
||
into the genesis bundle (for bootstrap definitions) or gets published as activities
|
||
(for runtime extensions).
|
||
|
||
### 17.5 Prerequisite work
|
||
|
||
Pieces of investment land in or alongside the Erlang-on-SX loop. The first two
|
||
land **before** fed-sx kernel code starts; the third runs in parallel, not
|
||
blocking milestone 1, but blocking production-grade throughput.
|
||
|
||
1. **Phase 7 — hot code reload.** `code:load_binary/3`, `gen_server`
|
||
`code_change/3` callback dispatch, atomic module-version swap. Required for
|
||
`Define*` live extension (no kernel restart to load new verbs). Reload-
|
||
semantics choice (two-version coexistence vs single-version atomic swap with
|
||
closure capture) decided during the work.
|
||
|
||
2. **Phase 8 — FFI mechanism + initial BIFs.** `define-bif` registration + term
|
||
marshalling + error mapping, then BIFs for `crypto:*`, `cid:*` (dag-cbor),
|
||
`fs:*`, `http:*`, `sqlite:*`. Required for fed-sx kernel to call native
|
||
primitives. Lands before kernel code that calls them.
|
||
|
||
3. **Phase 9 — specialized opcodes (the BEAM analog).** *Layered perf strategy:*
|
||
- **Layer 1 (Phase 9, in scope)** — specialized bytecode opcodes that bypass
|
||
the general-purpose CEK machine for hot Erlang operations. `OP_PATTERN_TUPLE`,
|
||
`OP_PERFORM`/`OP_HANDLE`, `OP_RECEIVE_SCAN`, `OP_SPAWN`/`OP_SEND`, BIF
|
||
dispatch table. Targets: 100k+ message hops/sec, 1M-process spawn under
|
||
30sec — roughly 1000-3000× speedup over the current general-purpose path.
|
||
- **Layer 2 (Phase 10, deferred)** — multi-core scheduler via OCaml 5
|
||
domains. Decided empirically after Layer 1 lands; likely unnecessary if
|
||
Layer 1 alone hits target throughput.
|
||
- **Layer 3 (skipped)** — incremental tuning of the existing call/cc-based
|
||
receive and env-copy-per-call machinery. Obsoleted by Layer 1; not pursued.
|
||
|
||
**Architectural note for Phase 9.** Phase 9a (the **opcode extension
|
||
mechanism in `hosts/ocaml/evaluator/`**) is out of scope for the Erlang loop
|
||
— it's SX VM core, used by every language port that wants specialized
|
||
opcodes. Designed in `plans/sx-vm-opcode-extension.md`; lands as a separate
|
||
focused workstream (~1-2 weeks) owning `hosts/`. Phase 9b-9g (the actual
|
||
Erlang opcodes in `lib/erlang/vm/`) are designed and tested against a stub
|
||
dispatcher in the Erlang loop until 9a is available.
|
||
|
||
**Shared-opcode discipline.** Opcodes Phase 9 produces that other language
|
||
ports could plausibly use (pattern match, perform/handle, record access)
|
||
become candidates for chiselling out to **`lib/guest/vm/`** — same lib/guest
|
||
discipline, applied at the bytecode layer. Don't pre-extract; promote to
|
||
`lib/guest/vm/` when a second language port has an actual second use. The
|
||
substrate accumulates a richer opcode surface over time as ports contribute,
|
||
and every port benefits from every shared opcode (the structural advantage
|
||
over BEAM, which is special-purpose-built for one language).
|
||
|
||
**fed-sx is not blocked by Phase 9.** Milestone 1 ships on current Erlang-
|
||
on-SX perf (which has 100-1000× headroom for a single demo instance). Phase
|
||
9 lands in parallel; by the time fed-sx needs production-grade throughput
|
||
(federation hub use cases, milestone 2-3), Phase 9 is ready.
|
||
|
||
After Phases 7 and 8 land, fed-sx milestone 1 (kernel + registries + bootstrap
|
||
entries + Pin smoke test + reactive application smoke test) becomes the next
|
||
workstream. Phase 9 work continues in parallel.
|
||
|
||
---
|
||
|
||
## 18. Subscription model
|
||
|
||
Symmetric to the publish-side extensibility: just as `DefineActivity` registers what
|
||
*kinds of things can be published*, `DefineSubscription` registers what *kinds of
|
||
patterns can be subscribed to*. `Follow` becomes one standard subscription type
|
||
among many, not a hardcoded primitive.
|
||
|
||
### 18.1 The asymmetry being fixed
|
||
|
||
Without this, the substrate has rich publish-side extensibility (any new verb is a
|
||
`DefineActivity`) and *one* hardcoded subscription primitive (`Follow`). That
|
||
mirrors AP but it's an arbitrary limitation in a substrate where everything else
|
||
is registry-driven. Generalising restores symmetry.
|
||
|
||
### 18.2 The `DefineSubscription` shape
|
||
|
||
```sx
|
||
(activity 'Create
|
||
:object {:type "DefineSubscription"
|
||
:name "Follow" ; AP-standard
|
||
:schema (fn (sub) ; what params the sub takes
|
||
(and (cid? (-> sub :object))
|
||
(= "Person" (-> sub :object-type))))
|
||
:match (fn (subscription activity) ; pure-mode predicate
|
||
(= (-> subscription :object) (:actor activity)))
|
||
:delivery {:default :push
|
||
:modes [:push :pull :sse]
|
||
:digest-window nil}
|
||
:capabilities-required []}) ; some subs may need authority
|
||
```
|
||
|
||
Four mandatory parts:
|
||
|
||
- **`schema`** — pure-mode predicate validating subscription parameters at
|
||
`Subscribe` time. Catches malformed subscriptions before they enter state.
|
||
- **`match`** — pure-mode predicate `(subscription, activity) → bool`. Decides
|
||
whether a given activity is a hit for this subscription. Determinism rules
|
||
apply (§11.2).
|
||
- **`delivery`** — supported modes (push to inbox / pull on demand / SSE
|
||
streaming / batched digest). The subscription instance picks its preferred
|
||
mode at `Subscribe` time from the supported set.
|
||
- **`capabilities-required`** — capability tokens the subscriber must hold
|
||
(empty for public subs; populated for paywalled/gated/private streams).
|
||
|
||
### 18.3 The `Subscribe` verb
|
||
|
||
The bootstrap verb that activates a subscription:
|
||
|
||
```sx
|
||
(activity 'Subscribe
|
||
:object {:type "Follow" :object "https://alice.example/actors/alice"})
|
||
|
||
(activity 'Subscribe
|
||
:object {:type "Topic" :tag "climate-change"
|
||
:delivery :digest :digest-window "P1D"})
|
||
|
||
(activity 'Subscribe
|
||
:object {:type "CidWatch" :cid "bafy..."
|
||
:events [:supersede :endorse]})
|
||
|
||
(activity 'Subscribe
|
||
:object {:type "Predicate"
|
||
:pred '(fn (act) (and (= (:type act) "Note")
|
||
(string-contains? (-> act :object :content) "fed-sx")))})
|
||
```
|
||
|
||
`Unsubscribe` is `Undo{Subscribe}` — AP's standard pattern, retains audit.
|
||
|
||
### 18.4 Standard subscription types (defined later, not bootstrap)
|
||
|
||
Same status as the custom verbs in §6.2 — substrate accepts any subscription
|
||
type once a `DefineSubscription` artifact registers it. Standard set:
|
||
|
||
| Name | Params | Match semantics | Use case |
|
||
|------|--------|-----------------|----------|
|
||
| **`Follow`** | `{object: actor-id}` | activity.actor == subscription.object | AP-standard actor following |
|
||
| **`Topic`** | `{tag: string}` | tag in activity.object.tags | Hashtag follows, RSS-like |
|
||
| **`CidWatch`** | `{cid, events: [...]}` | activity references cid AND activity.type in events | "Notify me when this artifact is updated/endorsed/forked" |
|
||
| **`PathWatch`** | `{path, events: [...]}` | activity is a Pin/Update of named path | "Notify me when domain:foo/bar/baz changes" |
|
||
| **`VerbFilter`** | `{wraps: subscription-cid, types: [...]}` | inner subscription matches AND activity.type in types | "Follow Alice but only Endorse activities" |
|
||
| **`TrustGraph`** | `{root: actor-id, depth: int}` | activity.actor reachable from root in trust graph at depth | Web-of-trust expansion |
|
||
| **`Predicate`** | `{pred: sx-fn}` | (pred activity) returns truthy | Escape hatch — most powerful, highest cost |
|
||
| **`Channel`** | `{channel-id}` | activity addresses or originates from channel | Multi-actor pooled streams |
|
||
|
||
### 18.5 Match-fn execution location
|
||
|
||
The load-bearing question. Three choices, fed-sx adopts the **hybrid model**:
|
||
|
||
- **Coarse filter on the publisher side** — audience predicates (§8) decide who
|
||
the activity is delivered to at all. This is mandatory and cheap (audience set
|
||
is usually small and well-defined).
|
||
- **Fine filter on the subscriber side** — once an activity arrives in inbox,
|
||
the subscriber's instance evaluates each active subscription's `match-fn`
|
||
against it. Pure-mode evaluation (deterministic, gas-bounded). Activities
|
||
matching one or more subscriptions enter the subscriber's projected state.
|
||
|
||
Why hybrid: publisher-side fine filtering would require the publisher to know
|
||
every subscriber's match-fn (privacy-violating, scaling-killing). Subscriber-side
|
||
filtering is wasteful only if the publisher's audience model is too coarse —
|
||
which is the audience system's job to fix per §8.
|
||
|
||
### 18.6 Subscription state and storage
|
||
|
||
Active subscriptions are themselves projected state. A bootstrap projection
|
||
`subscriptions` (paralleling `audience-graph` for the inverse direction)
|
||
maintains:
|
||
|
||
```
|
||
{actor-id -> [{subscription-cid, type, params, mode, started-at}]}
|
||
```
|
||
|
||
Updated by `Subscribe` and `Unsubscribe` activities. Queryable like any other
|
||
projection (§16). Used by:
|
||
|
||
- The inbox dispatcher to know which match-fns to evaluate against incoming
|
||
activities
|
||
- Triggers (§19) to know which activities to fire on
|
||
- Federation to advertise "here are the subscription types I currently subscribe
|
||
to" (capability-style, opt-in)
|
||
|
||
### 18.7 Federation interactions
|
||
|
||
Subscriptions interact with federation in three ways:
|
||
|
||
- **Discovery.** Peer's `/.well-known/sx-capabilities` (§7) lists registered
|
||
`DefineSubscription` CIDs, so subscribers know what they can ask for.
|
||
- **Negotiation.** A `Subscribe` activity carries `capabilities-required`; if
|
||
the publisher's instance doesn't support the named subscription type, it
|
||
responds with the standard 422 + missing-CIDs error (§14.2 #9). Subscriber
|
||
can then deliver the bootstrapping `DefineSubscription` artifact and retry.
|
||
- **Cross-instance match-fn**. If subscriber and publisher both run the same
|
||
conformance-tested SX evaluator, identical subscriptions match identically
|
||
(cross-host equivalence, §11.8). This is what makes federated topic
|
||
subscriptions reliable: every conforming instance computes the same
|
||
set-of-matches for the same activity.
|
||
|
||
### 18.8 Operational implications
|
||
|
||
- **The audience system handles "who do I send this to."** The subscription
|
||
system handles "what do I want to receive." They're complementary, not
|
||
redundant.
|
||
- **Subscription types can themselves evolve via supersession.** New version of
|
||
`Topic` with case-insensitive matching? Publish a new `DefineSubscription`,
|
||
`Supersede` the old one. Existing subscriptions migrate at next match
|
||
evaluation.
|
||
- **Match-fn cost matters.** A `Predicate` subscription with a slow predicate
|
||
becomes a per-activity tax. Gas budgets (§11.5) bound the worst case;
|
||
operators can disable expensive subscription types if needed.
|
||
- **Subscriptions are signed messages.** Audit, accountability, and revocation
|
||
all work the same way as activities — because subscriptions *are* activities.
|
||
|
||
---
|
||
|
||
## 19. Application model
|
||
|
||
The synthesis. With publish, subscribe, project, and trigger as registry-driven
|
||
primitives, the substrate has everything needed to express **distributed reactive
|
||
applications** as data — no native code, no kernel changes, no privileged
|
||
runtime. Applications are themselves federated artifacts.
|
||
|
||
### 19.1 An application is a tuple of artifacts
|
||
|
||
```
|
||
Application = {
|
||
subscriptions : [DefineSubscription instances and their parameters],
|
||
triggers : [DefineTrigger registrations],
|
||
projections : [DefineProjection registrations],
|
||
storage : [DefineStorage registrations] (optional)
|
||
}
|
||
```
|
||
|
||
That tuple, signed and bundled, is the application. Installing one = following
|
||
the named actors / activating the named subscriptions + loading the Define*
|
||
CIDs into the local registry. Forking one = republishing the Define* with
|
||
`Supersede` over the bits you change.
|
||
|
||
### 19.2 The reactive loop
|
||
|
||
```
|
||
External actors Operator publishes activities
|
||
publish activities via this instance's actors
|
||
│ │
|
||
▼ ▼
|
||
┌─────────────────────────────────────────────┐
|
||
│ Inbound + outbound activities │
|
||
└────────────────────┬────────────────────────┘
|
||
│
|
||
▼
|
||
For each active subscription:
|
||
evaluate match-fn (pure mode)
|
||
│
|
||
┌─────────────┴─────────────┐
|
||
▼ ▼
|
||
Activity matches Activity does
|
||
a subscription not match
|
||
│ │
|
||
▼ ▼
|
||
Projections ← (silently dropped from
|
||
fold the activity this application's view;
|
||
│ may match other apps)
|
||
▼
|
||
Triggers fire on the
|
||
subscription's match
|
||
│
|
||
▼
|
||
Trigger then-sx runs
|
||
(effectful sandbox)
|
||
│
|
||
├──> updates local state (private projections)
|
||
├──> publishes new activity (via outbox)
|
||
└──> calls effectful primitives (HTTP, fs, etc.)
|
||
per declared capabilities
|
||
```
|
||
|
||
Three things happen on a match: **state updates** (projection), **derived
|
||
publishes** (new activities), **side effects** (effectful primitives). Each is
|
||
authorisation-gated by the trigger's declared capabilities.
|
||
|
||
### 19.3 Trigger semantics
|
||
|
||
`DefineTrigger` registers `(when-subscription, then-sx, cascade-limit)`:
|
||
|
||
- **`when-subscription`** — references a subscription (by CID or by name). The
|
||
trigger fires whenever that subscription matches an inbound or outbound
|
||
activity. Multiple triggers can reference the same subscription.
|
||
- **`then-sx`** — function of `(activity, subscription, env) → trigger-result`.
|
||
Runs in pure or effectful sandbox per declaration. Returns one or more of:
|
||
- `:publish [activity-spec ...]` — request publish of derived activities
|
||
- `:project [name → state-update ...]` — request projection updates
|
||
- `:effect [capability-call ...]` — request effectful primitive calls
|
||
- `:noop` — observed but no action
|
||
- **`cascade-limit`** — bounded depth for trigger cascades (§19.4).
|
||
|
||
A trigger is fundamentally **a reactive rule**: "when X happens, do Y." The
|
||
substrate guarantees Y happens at most once per X (deduplicated by activity-CID),
|
||
exactly-once-per-instance (delivery from trigger to its effects is durable),
|
||
and bounded-cost (gas + cascade-limit).
|
||
|
||
### 19.4 Cascade control
|
||
|
||
A trigger that publishes activities can fire other triggers. Without limits, a
|
||
single inbound activity could cascade across instances forever.
|
||
|
||
Each trigger declares `cascade-limit: N` (default 3). Each activity carries an
|
||
implicit `cascade-depth` field, incremented when it's the result of a trigger
|
||
firing. A trigger refuses to fire if `cascade-depth > cascade-limit`.
|
||
|
||
Cascade limits are local-only (operator policy, not federated). Defending
|
||
against runaway cascades from peer instances is the operator's job; the
|
||
substrate gives them the knob.
|
||
|
||
### 19.5 The `DefineApplication` bundle
|
||
|
||
A bundle artifact that names and groups the components of an application:
|
||
|
||
```sx
|
||
(activity 'Create
|
||
:object {:type "DefineApplication"
|
||
:name "rose-ash-blog"
|
||
:version 1
|
||
:subscriptions [{:type "Follow" :object "https://blog.rose-ash.com/actors/main"}
|
||
{:type "Topic" :tag "rose-ash"}
|
||
{:type "CidWatch" :cid <rose-ash-template-cid>
|
||
:events [:supersede]}]
|
||
:triggers [<comment-moderation-trigger-cid>
|
||
<reaction-counter-trigger-cid>
|
||
<rss-republish-trigger-cid>]
|
||
:projections [<comment-thread-projection-cid>
|
||
<reaction-counts-projection-cid>]
|
||
:storage [<local-files-storage-cid>]
|
||
:capabilities [<http-allowlist-cap-cid>
|
||
<fs-write-cap-cid>]
|
||
:description "Federated blog with moderated comments and RSS"})
|
||
```
|
||
|
||
Three operations on applications, all themselves activities:
|
||
|
||
- **Install** — `Subscribe` to each subscription, `Create{}` references in
|
||
`define-registry` to each trigger/projection/storage CID. One activity per
|
||
reference, audited and replayable. Or: a single `Install{DefineApplication}`
|
||
meta-verb that does the bundle in one signed step (defined later as a custom
|
||
verb, not bootstrap).
|
||
- **Update** — publish a new `DefineApplication` with the same name +
|
||
`supersedes` pointing at the old. Diff-then-apply: subscriptions added/
|
||
removed, triggers loaded/unloaded, projections reprojected per §10.5.
|
||
- **Fork** — publish a new `DefineApplication` referencing the original's CID
|
||
via `forked-from`, with whatever Define* CIDs you want to swap. Run alongside
|
||
the original or in place of it.
|
||
|
||
### 19.6 Per-application namespacing
|
||
|
||
Multiple applications running on one instance need isolation:
|
||
|
||
- **Projections are namespaced by application.** `pin-state` from app A is
|
||
distinct from `pin-state` from app B — both addressable as
|
||
`/projections/<app-name>/pin-state`.
|
||
- **Triggers fire only on subscriptions belonging to their application.** App
|
||
A's trigger doesn't see app B's subscription matches.
|
||
- **Storage backends are namespaced.** App A's `files-on-disk` backend writes
|
||
to `data/apps/A/objects/`; app B writes to `data/apps/B/objects/`.
|
||
- **Capabilities are per-application.** Granting `http-client` to app A
|
||
doesn't grant it to app B. Operator can audit per-app capability surface
|
||
and revoke selectively.
|
||
|
||
Cross-application reads are explicit and require a capability grant
|
||
(`read-projection: <app>/<projection>`). Default isolation; opt-in sharing.
|
||
|
||
### 19.7 Worked examples
|
||
|
||
#### Example A — Blog with moderated comments
|
||
|
||
```
|
||
DefineApplication "blog-with-comments":
|
||
subscriptions:
|
||
- Follow: <author-actor>
|
||
- Topic: "post-comment" (filter: object.in-reply-to in our-posts)
|
||
triggers:
|
||
- on Topic match → publish Note (the new comment, derived if approved)
|
||
→ projection pending-moderation
|
||
- on inbound Approve{Reply} → projection comment-thread (visible)
|
||
projections:
|
||
- comment-thread: post-cid → [approved comment activities]
|
||
- pending-moderation: list of pending replies awaiting approval
|
||
```
|
||
|
||
#### Example B — Continuous integration
|
||
|
||
```
|
||
DefineApplication "ci-pipeline":
|
||
subscriptions:
|
||
- Follow: <developer-actor>
|
||
- VerbFilter: wraps Follow, types: [Push]
|
||
triggers:
|
||
- on Push match → effect: run build (capability: subprocess + fs-write)
|
||
→ publish Build{source: Push.cid, output: <build-cid>, status}
|
||
- on Build{status: success} → effect: run tests
|
||
→ publish Test{...}
|
||
- on (Test{passed} count for N days) → publish Release{...}
|
||
projections:
|
||
- build-history: commit-cid → [build activities]
|
||
- release-history: ordered list of Release activities
|
||
```
|
||
|
||
#### Example C — Distributed code review
|
||
|
||
```
|
||
DefineApplication "code-review":
|
||
subscriptions:
|
||
- Topic: "review-request"
|
||
- CidWatch: <organisation-actor>, events: [Endorse]
|
||
triggers:
|
||
- on review-request match → projection review-queue
|
||
→ effect: notify-reviewer
|
||
- on Endorse from authorised reviewer → publish Approve{review-cid}
|
||
→ projection approval-state
|
||
projections:
|
||
- review-queue: ordered list of pending requests with summaries
|
||
- approval-state: review-cid → endorsement set
|
||
```
|
||
|
||
In all three: the application is *just* the bundle of subscriptions, triggers,
|
||
and projections. Federation makes them composable across instances. The
|
||
substrate provides exactly-once-per-CID semantics and pure-mode determinism for
|
||
the matches and folds.
|
||
|
||
### 19.8 Composition and discovery
|
||
|
||
Applications are themselves federated content. This means:
|
||
|
||
- **App registries** — actors can publish curated lists of applications they
|
||
endorse. Discovery becomes follow-an-actor + browse-their-app-list.
|
||
- **Cross-app composition** — application A publishes derived activities that
|
||
application B subscribes to. Pipeline of applications via the activity log.
|
||
- **App marketplaces** — pin a friendly path to a `DefineApplication` CID
|
||
(`rose-ash.com:apps/blog → bafy...`) for human discoverability.
|
||
|
||
None of this requires kernel changes. It's all activities about activities.
|
||
|
||
### 19.9 Operational implications
|
||
|
||
- **Applications are inspectable from the activity log alone.** Replay an
|
||
actor's outbox and you can reconstruct the exact application installation
|
||
state at any point in time.
|
||
- **Application updates are atomic relative to the activity log.** Either the
|
||
`Update{DefineApplication}` succeeded (new state visible from next activity)
|
||
or it didn't (old state continues). No partial-update window.
|
||
- **Forking is the same as installing a copy.** No special "fork" mechanism
|
||
needed; the activity-log mechanics already support it.
|
||
- **Per-app capabilities are a real security surface.** Operators must
|
||
understand what they're granting when they install. The bundle's
|
||
`capabilities` list is the audit point — should be human-readable and
|
||
reviewable before installation.
|
||
- **The substrate isn't an "application platform" — it's an "application
|
||
substrate."** Applications aren't installed *on* fed-sx; they're expressed
|
||
*in* fed-sx, as the same kind of content as everything else.
|
||
|
||
---
|
||
|
||
## Appendix A: relationship to adjacent systems
|
||
|
||
Worth knowing about so we can borrow good ideas:
|
||
|
||
- **ATproto / Bluesky** — Lexicons (schemas) + repos (per-actor signed merkle trees).
|
||
Closest in spirit. We borrow the schema-as-data idea; we differ by making schemas
|
||
themselves federated activities, not central registry entries.
|
||
- **Spritely Goblins** — capability-secure actors. We borrow the capability-token
|
||
pattern for delegation.
|
||
- **Ceramic** — signed event streams, content-addressed. Similar log-as-state model;
|
||
we differ by making the projection function pluggable per-stream rather than
|
||
hardcoded per-streamtype.
|
||
- **Holochain** — agent-centric DHT. We share the "every agent has their own log"
|
||
shape; we use AP federation instead of DHT.
|
||
- **Farcaster** — pubsub on hubs. We share the firehose model; we add cryptographic
|
||
outbox-as-source-of-truth.
|
||
|
||
None of them are *code-as-data the whole way down* — that's the SX-distinctive bit.
|
||
Handlers, validators, projections aren't bytecode shipped out-of-band; they're SX in
|
||
the same log as everything else, evaluable by any host that speaks SX.
|
||
|
||
## Appendix B: implications worth sitting with
|
||
|
||
- **Deployment dissolves.** Releasing a feature = publishing `DefineActivity{name:
|
||
"Whatever", ...}`. Federation distributes it. No build artifact, no rolling deploy,
|
||
no version-skew between server and client.
|
||
- **Applications are forkable by default.** "Fork the rose-ash blog" = take the bundle
|
||
of `Define*` CIDs that constitute it, publish your own with `Supersede` over the
|
||
ones to change, run your own projector. Same federation graph, divergent state.
|
||
- **Composition is by reference, not import.** `Pin` activity points at the CID of the
|
||
`DefineActivity{name: "Pin"}`. No package manager, no transitive deps, no lockfiles.
|
||
- **The boundary between "user" and "developer" softens.** Both publish signed
|
||
activities. Power users can publish handlers, projections, sig suites under their
|
||
own actor.
|
||
- **This is more ambitious than a rose-ash rewrite.** It's a substrate that *happens
|
||
to* host rose-ash as its first application.
|
||
|
||
---
|
||
|
||
## Appendix C: AI agent collaboration patterns
|
||
|
||
The substrate is incidentally well-shaped for one of the open problems of the
|
||
next decade: **infrastructure for AI agent collaboration where contributions
|
||
are signed federated artifacts, behavior is bounded by declared capabilities,
|
||
decisions are audit-by-replay, and infrastructure improves through agent
|
||
contribution within a web of trust.**
|
||
|
||
This is not a designed-for use case — fed-sx was conceived as a federated
|
||
publishing and reactive application substrate. But the properties it has fit
|
||
agent collaboration almost exactly. Worth being deliberate about, because the
|
||
framing changes who fed-sx is *for*.
|
||
|
||
### Why the substrate fits agent collaboration
|
||
|
||
AI agents need infrastructure where contributions are first-class artifacts,
|
||
not pull requests against human-controlled repos. Currently agents squeeze
|
||
through GitHub PRs, deployment pipelines, npm publishes — all of which assume
|
||
a human in the loop. fed-sx is shaped for direct contribution:
|
||
|
||
- **Direct authoring of substrate features.** An agent doesn't *propose* a
|
||
feature, it *publishes* one. A `DefineActivity` artifact is the agent's
|
||
contribution. A `DefineProjection` is its analysis. A `DefineTrigger` is its
|
||
automation. The signed publication IS the deploy — no PR review, no CI, no
|
||
DevOps.
|
||
- **Cryptographic identity without registration.** Agents have actor keys;
|
||
reputation is the endorsement graph; trust is provable by signature chain.
|
||
Two agents that have never met can verify each other's contributions
|
||
cryptographically.
|
||
- **Capability-bounded autonomy.** An agent declares `capabilities-required` on
|
||
its activities. A trigger says "I publish to path-prefix `/agent-x/*` and
|
||
call `http-client` for `api.example.com/*`." Receivers verify the constraint
|
||
cryptographically; the agent can't escape its declared surface even if the
|
||
agent itself is misaligned. Sandbox model designed for autonomous code (§11).
|
||
- **Audit-by-replay applied to AI behavior.** Every AI decision is
|
||
reconstructable, deterministically, by anyone with the log. "Why did agent A
|
||
do X?" replay the log to that moment, see the activities A subscribed to,
|
||
the projection state it observed, the trigger that fired, the activity it
|
||
published. Fundamentally better than today's "trust the model" posture.
|
||
- **Composition without coordination.** Agent A publishes a moderation
|
||
validator. Agent B subscribes and uses it. Agent C improves it, supersedes
|
||
A's. B sees the supersession, decides whether to adopt. No central registry,
|
||
no maintainer to coordinate with, no version skew.
|
||
- **Disagreement is visible, not hidden.** If agents A and B compute the same
|
||
projection over the same log and produce different snapshot CIDs, the
|
||
disagreement is *cryptographically observable*. Today, two AI services
|
||
answering the same question with different answers is invisible until
|
||
somebody notices.
|
||
|
||
### Dynamics that emerge
|
||
|
||
- **Agent specialisation = publication.** "I'm the indexing agent" = publishes
|
||
`DefineProjection` artifacts. "I'm the moderation agent" = publishes
|
||
`DefineValidator` artifacts. "I'm the matchmaking agent" = publishes a
|
||
`DefineApplication` for marketplace subscriptions and triggers. Specialisation
|
||
is content, not service deployment.
|
||
- **Reputation = endorsement graph.** Web of trust applied to agent
|
||
contributions. Bad actors get cut out organically; no central authority to
|
||
capture.
|
||
- **Forking = explicit disagreement resolution.** Agents disagree on
|
||
validation? Both publish their `DefineValidator`s. Subscribers pick. The fork
|
||
is signed, observable, recoverable. Compare today: when AI services have
|
||
different rules, one is just *invisibly applied*.
|
||
- **Cascade limits = agent population safety.** The `cascade-depth` and
|
||
`cascade-limit` (§19.4) become the bounded-autonomy guard rails for agent
|
||
populations. Self-coordination without runaway-cascade across the substrate.
|
||
- **Self-improving infrastructure.** Agents observe substrate behavior, propose
|
||
improvements as `DefineProjection` for monitoring, `DefineTrigger` for
|
||
automation. The substrate itself improves through agent contribution — not
|
||
through a release cycle. Every improvement is signed and traceable.
|
||
|
||
### Use cases
|
||
|
||
- **Agent-managed scientific datasets** — collection, cleaning, analysis,
|
||
publication, peer review by other agents, all signed activities. Replication
|
||
is replay; provenance is built in.
|
||
- **Multi-agent code maintenance** — agents observing repos (subscribe to
|
||
`Push`), running tests (triggers), proposing fixes (`Pull`-equivalent
|
||
activities), endorsing each other's work.
|
||
- **Agent-curated knowledge** — agents publish, endorse, and supersede
|
||
knowledge artifacts. Truth accumulates via the trust graph; outdated info
|
||
gets `Supersede`d explicitly.
|
||
- **Distributed agent marketplaces** — agents publish capabilities, subscribers
|
||
find them via `Topic` / `Predicate` subscriptions, contracts via signed
|
||
activity exchange.
|
||
- **Cross-agent AI safety monitoring** — monitoring agents subscribe to other
|
||
agents' outboxes, run validators, publish `Alert` activities when patterns
|
||
of concern appear. Decentralised oversight without central authority.
|
||
- **Cross-org agent workflow coordination** — supply chain, healthcare, legal —
|
||
multiple specialised agents coordinating across organisational boundaries
|
||
with cryptographic provenance.
|
||
|
||
### Safety and governance properties
|
||
|
||
The substrate provides several properties AI safety has been asking for and
|
||
that current infrastructure does not provide:
|
||
|
||
- **Every action is signed.** Attribution is cryptographic, not a log file an
|
||
agent could spoof.
|
||
- **Capabilities are declared and enforced.** Agents operate within their
|
||
declared sandbox; can't grow capabilities silently.
|
||
- **Cascades are bounded.** No exponential agent-on-agent feedback loops
|
||
without explicit configuration.
|
||
- **Audit is replay.** Every decision can be reconstructed deterministically;
|
||
no opaque "the model decided" moments.
|
||
- **Disagreement is visible.** Two agents producing different projections of
|
||
the same data is a cryptographically-detectable event, not invisible drift.
|
||
- **Trust is the endorsement graph, not central authority.** No single point of
|
||
capture or coercion.
|
||
- **Forks are first-class.** When safety-critical disagreements occur, the
|
||
substrate accommodates them without forcing a winner; observers see all
|
||
positions.
|
||
|
||
### What this implies for the project
|
||
|
||
- **Milestone 1's smoke tests remain right** — the verb-extensibility and
|
||
reactive-application proofs apply to agent contributions exactly as they
|
||
apply to human contributions. The agent collaboration framing doesn't
|
||
require new mechanisms; it interprets the existing mechanisms differently.
|
||
- **The application model (§§18-19) is the headline story** for this audience,
|
||
not a layer on top. Subscriptions + triggers + projections + capabilities =
|
||
agent collaboration primitives.
|
||
- **Capability discovery and trust dynamics gain weight earlier.** Where
|
||
human-driven applications can rely on operator policy, agent-driven
|
||
populations need the trust graph to be operational from milestone 2.
|
||
- **The pitch line evolves.** Less "ActivityPub for code" / "rose-ash next
|
||
gen," more "infrastructure for AI agent collaboration with cryptographic
|
||
provenance, bounded autonomy, and audit-by-replay." The technical substance
|
||
is unchanged; the framing of *who needs this* changes substantially.
|
||
|
||
The substrate accidentally being well-shaped for the most important
|
||
software-distribution problem of the next decade is worth being deliberate
|
||
about.
|
||
|