Files
rose-ash/plans/fed-sx-design.md

124 KiB
Raw Blame History

fed-sx — Federated SX Activity Substrate

A federated, content-addressed, extensible application substrate where the unit of computation is a signed activity, the unit of state is a pure SX projection over the activity log, and the substrate's own extensibility (new verbs, new object types, new projections, new validators) is itself published through the same mechanism.

Status: design — not yet implemented. Target subdomain: next.rose-ash.com. Target location in repo: next/ (new top-level dir, sibling to blog/, market/, etc.). Stack: pure SX-on-OCaml. Implementation language(s) to be chosen after design is complete.


1. Premise

ActivityPub's data model — actors, signed activities, inboxes/outboxes — generalises beyond social posting to any domain where state evolves via signed messages. fed-sx takes that generalisation seriously:

  • The unit of communication is a signed AP activity.
  • The unit of content is an AP object, content-addressed by CID (multihash + multicodec, default dag-cbor over the parsed SX AST).
  • State is the deterministic fold of pure SX functions over the activity log.
  • The substrate is self-extending: new activity types, object types, projections, validators, codecs, transports, and signature suites are themselves published as Define* activities — federated like any other content.

Three commitments make the rest fall into place:

  1. The kernel is dumb. It only knows envelope shape, signature verification, append-to-log, fetch-by-id, transport in/out. It does not know what Create or Pin mean.
  2. Everything else is registry-driven. Verbs, object types, validators, projections, codecs, transports, audiences, proofs, sig suites — all looked up in registries the kernel calls into.
  3. The registries are themselves publishable. New entries arrive as Define* activities. Bootstrap registries load from a known set of CIDs at startup; everything else is replayed from the log.

Result: the only code that ever needs to change in the kernel is the envelope itself. New verbs = published SX, federated like any other artifact.


2. CIDs and content addressing

Every artifact has a CID. Default codec is dag-cbor over the parsed SX AST (not the raw text). This buys:

  • Sub-AST addressing for free. Each nested structure has an implicit CID; IPLD can walk paths like <file-cid>/components/card. The "file CID and component CID" question dissolves: every node is a CID, you choose the granularity at reference time.
  • Polyglot canonicalization. JS, OCaml, Python only need to agree on AST shape + CBOR's deterministic encoding (RFC 8949 §4.2.1). No byte-identical pretty-printer required across hosts.
  • Format immunity. Reformatting, indent changes, equivalent-form normalisations do not change the CID.
  • Tooling fit. sx-tree already has the parsed form in memory; computing or verifying a CID is just an encode + hash.

Costs accepted:

  • One spec to maintain: SX↔CBOR mapping (number → CBOR int/float, string → text, symbol → tag, keyword → tag, list → array, dict → map). ~50 lines of code per host.
  • Author's exact source text is not preserved; re-pretty-print on fetch.
  • "Why don't these CIDs match" requires comparing CBOR (a cid-explain tool helps).

The CID format itself is multicodec-agile: the substrate also accepts raw, dag-json, dag-pb, etc. when seen, dispatched via the codec registry.


3. Kernel surface (fixed — get this right)

The kernel is the only thing that's hard to change later. Everything else is in registries. Two envelope shapes plus five operations.

3.1 Activity envelope

{ id, type, actor, published,
  to, cc, audience-extras,
  object | target | origin | result,    # AP slots, opaque to kernel
  capabilities-required: [...],         # so receivers can refuse cleanly
  proofs: [...],                        # OTS, on-chain, multi-sig — all opaque
  signature: { key-id, algorithm, value, covered-fields } }

3.2 Object envelope

{ id, type, cid, media-type,
  where: inline | cid | url,
  content?, link? }                     # only one populated based on `where`

3.3 Kernel verbs

The only verbs implemented directly by the kernel:

  • Append signed activity to outbox (after envelope check + sig verify + validator pipeline).
  • Verify signature against actor's published keys, time-aware (which key was active at published).
  • Fetch by id or by cid.
  • Receive at inbox (verify + dispatch to registered handlers).
  • Replay log to rebuild registries on boot.

Everything else is registry-resolved.


4. Registries

Each registry has a default-populated set (loaded from genesis-bundled CIDs) and accepts new entries via Define* activities. Default entries themselves are SX artifacts — versioning, audit, replacement work the same way as user content.

Registry Bootstrap defaults Extended by
Activity types Create, Update, Delete, Announce DefineActivity{type, schema-sx, semantics-sx}
Object types SXArtifact, Note, Image, Tombstone DefineObject{type, schema-sx, render-hint}
Validators envelope shape, signature, type-schema DefineValidator{applies-to, predicate-sx}
Projections identity, by-type, by-cid, by-actor, actor-state, define-registry, audience-graph, by-object DefineProjection{name, fold-sx, query-sx}
Codecs dag-cbor, raw, dag-json DefineCodec{multicodec, encode-sx, decode-sx}
Hash algorithms sha2-256 multihash table — agile by spec
Transports http-inbox-push DefineTransport{name, deliver-sx, receive-sx}
Audience predicates Public, Followers, direct DefineAudience{name, member-of-sx}
Subscription types Follow (AP-standard) DefineSubscription{name, schema-sx, match-sx, delivery}
Proof types (none) DefineProof{type, attach-sx, verify-sx}
Storage backends files-on-disk DefineStorage{where-tag, put-sx, get-sx}
Triggers (none) DefineTrigger{when-subscription, then-sx, cascade-limit}
Signature suites rsa-sha256 (AP-compatible) DefineSigSuite{name, sign-sx, verify-sx}
Application bundles (none) DefineApplication{name, subscriptions, triggers, projections, storage}

Adding Pin, Endorse, Supersede, Test, Build, Compose, etc. later is just publishing DefineActivity artifacts — no kernel diff, no redeploy required if registries are hot.


5. The meta-level

A DefineActivity is itself an AP Create activity over an SXArtifact of a specific type:

(activity 'Create
  :object {:type "DefineActivity"
           :name "Pin"
           :schema (fn (act)
             (and (string? (-> act :object :path))
                  (cid? (-> act :object :cid))))
           :semantics
           '(fn (act state)
             (assoc-in state [:pins (-> act :object :path)]
                       (-> act :object :cid)))})

When the kernel receives an activity with type: "Pin" it looks up the registered semantics from a DefineActivity{name: "Pin"} artifact, runs the SX, projects the new state. The semantics are themselves content-addressed and federated — every receiver runs the same code.

Same pattern handles DefineProjection, DefineValidator, etc. The substrate is genuinely self-extending.


6. Verbs

6.1 Bootstrap verbs (milestone 1)

The substrate exposes POST /activity (not POST /publish) — generalised entry point that takes any well-formed AP activity, validates, signs, appends to outbox. (publish sx) is sugar at the SX layer for Create{SXArtifact}.

Day-one verbs (cost ~zero once /activity exists):

  • Create — the publish primitive.
  • Update — supersede a previous activity (correct metadata, change a path mapping). Distinct from "publishing new content" — new content is always a new Create with a new CID.
  • Delete — tombstone. AP-native; readers honour it.
  • Announce — boost another actor's artifact into your outbox. Comes free.
  • Subscribe — generalised subscription verb (parallel to publish/Create). Wraps any registered DefineSubscription type. Follow is the standard AP Subscribe{Follow{actor: ...}} for wire compatibility. See §18.
  • UnsubscribeUndo of a prior Subscribe. Same shape as AP Undo{Follow}.

6.2 Custom verbs (designed-for, defined later)

Substrate accepts these from day one (any signed activity can be appended); semantics projected once DefineActivity artifacts exist.

  • Pin — assign domain:path/name → CID. The future name-resolution layer made of activities. Each pin is signed; the resolver replays the outbox to compute current state.
  • Endorse (modelled on Like/Approve) — third-party signature on a CID. Web-of-trust style code review without central authority.
  • Supersede — "CID A replaces CID B". Stronger than Update; readers can chase the chain.
  • Test — published assertion that running CID A under conditions X yields result Y. Test-as-artifact, federated.
  • Build — links a source CID to a compiled-output CID, with provenance.
  • Compose — derived artifact citing input CIDs. Provenance graph in the outbox itself.
  • Note (AP-native) — comments / reviews / discussion attached to a CID.
  • Follow / Undo(Follow) — subscribe to another instance's outbox.

The pattern that matters: your outbox isn't just "things published," it's an append-only log of every assertion this actor makes about the SX universe.


7. Capability discovery

Two pieces:

  • GET /.well-known/sx-capabilities — JSON listing every registered activity-type, object-type, codec, transport, sig-suite, proof-type. Each with the CID of the Define* artifact that introduced it. Peers can diff capabilities before federating.
  • capabilities-required field on activities — sender declares "this needs Pin semantics + dag-cbor codec." Receivers without those capabilities return a clean 422 referencing the missing CIDs; sender knows whether to replay-and-deliver the bootstrapping Define* artifacts first.

Federation degrades gracefully across instances at different versions.


8. Axes of flexibility (all designed-for)

  1. Object types beyond SXArtifact — Note, Article, Image, Video, Question, Event, etc. via the object-type registry.
  2. Storage tier per-objectwhere: inline | cid | url. Tiny things inline; big things to IPFS; legacy stuff URL-linked. Migrating storage backends doesn't migrate the substrate.
  3. Multihash + multicodec agility — sha2-256 + dag-cbor by default; substrate accepts blake3, raw, dag-json, dag-pb, etc.
  4. Multi-key actorspublicKeys array always; per-key purpose; multiple key types (RSA for AP wire compat, Ed25519 modern). See §9.
  5. Audience / visibility — AP-native to, cc, bto, bcc. Public, followers, direct, unlisted. Custom audiences via DefineAudience.
  6. Outbox-as-database — no source-of-truth other than the log. Projections are recomputable views.
  7. Programmable activities — activities can carry SX. Reactive federation, conditional pins, automated propose/test/release pipelines, all expressed as AP activities.
  8. Federation transport pluggable — outbox is canonical; how peers exchange is pluggable (HTTP push, pull, libp2p, polling).
  9. Optional timestamp proofs — every activity has an attachable proofs slot. OpenTimestamps, on-chain merkle commit, third-party TSA all slot in without changing activity semantics.

Explicitly not pursuing for MVP:

  • Schema-version negotiation (premature; @context handles extension).
  • Configurable conflict-resolution per actor (last-signed-wins, log preserved for audit).
  • Verb-specific kernel handlers (other than Create's "compute CID, store body").

9. Identity & actor lifecycle

9.1 Actor doc shape

{
  "@context": ["https://www.w3.org/ns/activitystreams",
               "https://w3id.org/security/v1",
               "https://next.rose-ash.com/ns/fed-sx/v1"],
  "type": "Person",                       // or Service, Group, Application
  "id": "https://next.rose-ash.com/actors/giles",
  "preferredUsername": "giles",
  "inbox": "https://next.rose-ash.com/actors/giles/inbox",
  "outbox": "https://next.rose-ash.com/actors/giles/outbox",
  "followers": "...",
  "following": "...",

  "publicKeys": [                         // ARRAY from day one — never `publicKey`
    { "id": "...#key-2026-05",
      "type": "RsaVerificationKey2018",
      "owner": "<actor-id>",
      "publicKeyPem": "...",
      "purpose": ["sign-activity", "sign-http"],
      "created": "2026-05-14T...",
      "expires": null,
      "supersedes": null,
      "supersededBy": null },
    { "id": "...#key-ed25519-2026-05",
      "type": "Ed25519VerificationKey2020",
      "owner": "<actor-id>",
      "publicKeyMultibase": "z6Mk...",
      "purpose": ["sign-activity"],
      "created": "2026-05-14T..." }
  ],

  "capabilities": "https://.../actors/giles/capabilities",  // what verbs they speak
  "alsoKnownAs": ["did:web:rose-ash.com:giles", ...],       // bridge to DID, AP migration
  "movedTo": null                                            // set on Move
}

Key shape decisions:

  • publicKeys array always. Single-key actors have an array of length 1. AP standard publicKey is also served as the first array element for back-compat with vanilla AP servers (Mastodon etc. ignore the array).
  • Per-key purpose — separates signing weight. Day-to-day publish key vs. high- value key for Pin/Endorse vs. delegated machine key. Validators can require specific purposes per activity type (registry-driven).
  • Multiple key types — RSA for AP wire compat, Ed25519 for everything else (smaller, faster, modern). Sig suite registry decides which suites are accepted.
  • supersedes / supersededBy — keys form a chain, not a snapshot. Old activities still verify against historical keys.

9.2 Key rotation

Key rotation is itself an activity, signed by the old key (or a recovery key):

(activity 'Update
  :object actor-id
  :patch {:add-publicKey new-key
          :supersede {old-key-id new-key-id}})

Kernel:

  1. Fetches actor's current state (a projection over their own outbox).
  2. Verifies activity is signed by a key with purpose: rotate-key (or any active key, if registry allows).
  3. Appends. The actor-state projection now has the new key.

Old activities still verify because the projection retains the historical key with supersededBy set — sig verification looks up "what keys were active at activity timestamp T."

9.3 Key recovery / loss

  • Recovery key — separate key at actor creation, never used except to rotate. Stored offline. purpose: ["recover"]. Validator allows Update{actor, patch: rotate-all-keys} if signed by a recovery key.
  • Social recovery — designate N trusted actors, M-of-N can co-sign a recovery Update. Implemented as a DefineValidator extension; multi-sig slot in proofs makes it possible without changing the envelope.
  • Total loss — if both signing and recovery keys are gone, the actor is dead. They publish a new actor with alsoKnownAs: <old-actor-id> from a fresh key. Followers can choose to re-follow but there's no cryptographic continuity.

9.4 Migration (Move)

AP-native:

(activity 'Move
  :object old-actor-id
  :target new-actor-id)

Receivers update their follow lists. New actor's alsoKnownAs must include old actor — bidirectional handshake prevents hijacking.

For fed-sx, Move should also carry an outbox migration hint (CID of an export bundle) so receivers can re-anchor projections without re-fetching activity-by-activity.

9.5 Subordinate actors / delegation

Two patterns supported:

  • Service actors (AP-native type: Service): bots, build servers, test runners. Their own keys, their own outboxes, but attributedTo a parent actor.
  • Capability tokens: parent publishes Authorize{actor: child, capabilities: [...], expires: ...} signed by parent. Child publishes activities normally with their own key; receivers verify the capability chain when child invokes an authority they don't own outright. Useful for: temporary publish access, delegated Pin rights for a specific path prefix, multi-device.

Both work without new kernel mechanism — just activities.

9.6 Implications

  • Sig verification is timestamp-aware. Verifying an old activity needs the key state at the time it was published — actor-state projection must support time-travel queries.
  • Inbox doesn't trust keyId blindly. Fetches actor doc, projects current key state, checks key was valid at published.
  • Cross-instance identity via alsoKnownAs and DIDs. Don't depend on DIDs but slot them in for Bluesky-bridge, Solid-bridge, etc.

10. Projection model

The architectural commitment: state is what you get when you fold pure SX over the log. No DB-of-record. Everything queryable is a projection.

10.1 What a projection is

A DefineProjection activity registers four things:

(activity 'Create
  :object {:type "DefineProjection"
           :name "actor-state"
           :initial-state {}                        ; pure SX value
           :fold (fn (state activity)               ; pure SX
                   (case (:type activity)
                     "Create"  (when (= "Person" (-> activity :object :type))
                                 (assoc state (:id activity) (:object activity)))
                     "Update"  (apply-patch state activity)
                     "Move"    (set-moved state activity)
                     state))
           :snapshot-codec "dag-cbor"
           :indexes [{:by :id} {:by :preferredUsername}]})
  • name — query handle. Unique per actor; collisions resolved by CID + supersession.
  • initial-state — pure SX value used as state-zero.
  • fold — pure SX function (state activity) → state. The only thing the kernel calls.
  • indexes — optional hint for materializing lookup paths.

The CID of the DefineProjection artifact is the projection's identity. Two instances running the same projection are running the same CID's fold over the same log slice — equivalence is decidable.

10.2 The fold contract — purity, determinism, gas

The fold function must be pure and deterministic. Non-negotiable; it's what makes cross-instance equivalence and replay possible.

  • No IO. No HTTP, no file access, no DB calls, no clock. The activity carries its own published timestamp.
  • No randomness. No host-seeded PRNG. (If pseudo-randomness is needed, seed from the activity's CID — deterministic across hosts.)
  • No mutation outside the returned state.
  • Bounded execution. Each fold call gets a gas budget (default tunable, e.g. 100k CEK steps). Exceeding it is a hard failure.

Enforced at the SX evaluator level by running folds in a sandboxed environment with the IO platform stripped to nothing. Same sandbox model applies to validators and trigger semantics.

Cross-host equivalence guarantee: for the same projection CID + same activity log slice, every conforming SX host (JS, OCaml, Python, Haskell-on-SX, …) must produce a state value with the same canonical CID. Tested via the spec test suite.

10.3 Bootstrap projections

The kernel cannot start without some projections, because the kernel itself uses them. Baked into the genesis bundle (see §11), superseded only by deliberate kernel-version upgrades.

Projection What it computes Used by
activity-log Identity — every activity, indexed by id and CID Everything
by-type type → ordered list of activity-CIDs Most queries
by-actor actor-id → ordered list of activity-CIDs Per-actor outbox view
by-object object-CID → list of referencing activity-CIDs "Who pinned this?"
actor-state actor-id → current actor doc with key history Sig verification (kernel)
define-registry kind+name → currently-active Define* CID All other Define* lookups
audience-graph actor → followers/following Federation push

define-registry is the bootstrap chicken-and-egg: it's the projection that knows which projections (and validators, codecs, etc.) are currently active. Kernel ships with it hardcoded; once running, every other projection (including a future replacement of define-registry itself) is a regular DefineProjection superseding it.

10.4 Snapshotting

Replaying the entire log on every restart is unacceptable past day one.

  • Snapshot = (activity-tip-CID, projection-state, projection-CID) tuple, dag-cbor encoded, content-addressed.
  • Snapshot rule — every K activities (default 1000) and every T seconds (default 60), serialize, hash, store on disk.
  • Resume — on startup, find latest snapshot for each (projection-CID, log-tip), load state, fold forward.
  • Snapshot CID is verifiable — anyone with the same log slice and projection-CID can recompute and check the CID matches. This is the cross-instance agreement proof.

Snapshots are themselves publishable as activities (Create{Snapshot}): an instance can publish "here's my computed state for projection X at log-tip Y, CID Z." Other instances can fetch and use as a starting point. Federated state sharing falls out of federated activities.

Snapshots are pruning-friendly: keep latest + snapshots referenced by published Create{Snapshot} activities; everything else is GC-able.

10.5 Reprojection on definition change

When DefineProjection{name: "actor-state"} is superseded by a new CID with a different fold:

  1. define-registry projection sees the supersession; its state advances.
  2. New projection materialized alongside the old one — both kept live during migration.
  3. New projection runs in catch-up mode: replay from genesis (or from deepest compatible snapshot).
  4. When new projection catches up to log tip, queries cut over. Old projection state can be retired.
  5. Snapshots of old version stay around as long as referenced (e.g. for time-travel queries against historical state under old semantics).

Changing a projection definition is safe and online. Cost: temporary state duplication during catch-up. Slow folds → slow migrations, but never breakage.

For projections too expensive to fully reproject, Update{DefineProjection} can declare migrationHint: <fn from old-state to new-state> — opt-in, used at migrator's risk.

10.6 Time-travel queries

Folds are deterministic functions of (initial-state, activity-list-prefix). Time-travel is fold-up-to:

  • state-as-of(projection, activity-id-or-timestamp) → walk to requested point, return state.
  • Snapshots act as accelerators (resume from nearest snapshot ≤ target).
  • Used by sig verification ("what keys did this actor have when this activity was signed?"), audit, "what did we believe last Tuesday."

10.7 Projection composition

Projections do not directly read each other's state during folding. Preserves locality and parallelism — every projection runs independently against the same log.

Composition via:

  • Query time(query (projection actor-state) ...) joins are SX expressions over multiple projection states.
  • Republishing as activities — a projection that exposes its state as input to others publishes Create{Snapshot} periodically. Downstream projections fold over those.

Direct cross-projection reads during fold introduce ordering, cycles, cache- invalidation problems we don't need.

10.8 Querying

Three layers:

  • Raw projection stateGET /projections/<name>?at=<timestamp> returns dag-cbor (also JSON for tooling). Large states paginated by index.
  • SX queriesPOST /query with an SX expression that runs against one or more projection states in pure mode. Equivalent to Datalog/GraphQL.
  • Materialized indexes — declared on projection (indexes: field). Kernel maintains as side-tables for O(log n) lookup.

Real-time: clients GET /projections/<name>/subscribe (SSE), receive deltas as activities land. Delta is (old-state, new-state, applied-activity-CID); clients can verify by re-folding.

10.9 Lag, async, concurrency

  • Append is sync; projection is async. POST /activity returns once activity is durably in the log. Projections run in a separate worker pool; query results carry projected-up-to so callers know whether the latest write is visible.
  • One worker per projection. Folds are sequential, but projections run in parallel with each other.
  • Sync optionPOST /activity?wait-for=projection-name blocks until the named projection has folded the new activity. Use sparingly.

10.10 Failure modes

Failure Response
Gas exhaustion Activity tagged projection-failed for this projection. State unchanged. Operator alert.
SX runtime error (assertion, type mismatch) Same as gas: activity skipped, error logged, state unchanged.
Schema violation Caught earlier in validation pipeline, never reaches projection.

The log itself is always written successfully if it passes envelope + signature + validator checks. Projection failures don't gate appending — that would couple writes to arbitrary user-defined code.

10.11 Operational implications

  • Projection determinism is the linchpin. If JS and OCaml ever produce different state for the same log + projection, federation cracks. Spec test suite must cover projection equivalence across hosts as a first-class requirement.
  • Snapshots are eventual consensus. Two instances publish Create{Snapshot} for the same log+projection; if their CIDs match, they agree without coordination.
  • Kernel reads its own projections. actor-state for sig verification; define-registry for every Define* lookup. Startup sequence must bootstrap these before serving traffic.
  • Reprojection cost is real. Heavy projection changes mean replaying from genesis. Encourage incremental schemas (small per-activity work, idempotent updates) and provide profiling.

11. Sandbox & determinism

The runtime contract that makes folds (and validators, triggers, semantics) safe to execute, and that guarantees every conforming SX host computes the same state from the same log.

11.1 Three sandbox levels

Different registry entries need different power. We define three nested execution modes; the registry entry declares which mode it requires.

Mode Used by IO Clock Random Determinism
pure folds, validators, audience predicates, semantics, trigger when-sx none activity's own published only seeded from activity CID only required across hosts
crypto sig suite verify, codec encode/decode crypto primitives only none sign-only secure RNG required across hosts (verify); single-host (sign)
effectful storage backends, transports, trigger then-sx, some proof verifiers per-capability grant only host clock host RNG not required; single-host

Default mode is pure. The other two are opt-in at registration time, and the registration is itself a signed activity — anyone can audit which extensions claim which powers.

11.2 Pure sandbox (the load-bearing one)

This is the mode every projection fold runs in. It must produce identical results on every conforming SX host, every time.

Allowed:

  • All spec primitives in spec/primitives.sx that don't perform IO (arithmetic, comparison, predicates, string ops, collection ops, dict ops, format helpers).
  • The activity being processed (full envelope), as the function's argument.
  • The current state value, as the function's argument.
  • A small set of fed-sx-specific deterministic primitives:
    • (activity-cid act) → CID of the activity envelope
    • (activity-time act) → ISO timestamp from published
    • (actor-state-as-of state-snapshot actor-id activity-time) → if the projection has been declared dependent on actor-state (see §10.7), reads from a snapshot of that projection at the activity's timestamp
    • (seeded-rng cid) → deterministic PRNG seeded from a CID, returns a stream of uniform values

Forbidden:

  • All IO: HTTP, file, network, stdin/stdout, environment.
  • Wall-clock access. The host's now is not in scope; the only time available is (activity-time act).
  • Host-seeded randomness. Only seeded-rng (CID-derived) is available.
  • Mutation outside the returned value. Enforced by the SX evaluator's lack of ambient mutable bindings; folds may use local let and mutation within their own closure but cannot reach outside.
  • Calling other registry entries by name. Composition happens at query time, not fold time (see §10.7).

Enforced by: evaluator runs the fold with the IO platform stripped to nothing. The fed-sx kernel constructs a pure-platform (no fetch, no query, no action, no DOM, no storage) and uses it as the sole evaluator platform when calling the fold. Any IO primitive call raises a hard error caught as a fold failure.

11.3 Crypto sandbox

Sig suites and codec encode/decode need hash + crypto + encoding primitives but nothing else. They're still deterministic across hosts (verify case) but get a narrower platform than effectful, wider than pure.

Additional primitives over pure:

  • (sha2-256 bytes), (sha3-256 bytes), (blake3 bytes), …
  • (rsa-verify pubkey msg sig), (ed25519-verify pubkey msg sig), …
  • (rsa-sign privkey msg), (ed25519-sign privkey msg) — sign-only; requires the caller to supply a secure RNG handle (which is not in pure mode)
  • (cbor-encode value), (cbor-decode bytes) — for codecs implementing CBOR variants
  • (base32-encode bytes), (base58btc-encode bytes), (multibase-encode tag bytes)
  • (multihash-encode tag digest-bytes), (multihash-decode bytes)
  • (cid-encode codec mhash), (cid-decode bytes)

Sign vs verify: verify is pure (deterministic). Sign is not — it consumes randomness. fed-sx draws a clean line: signing happens outside registry-entry SX (it's an operation the kernel/runtime performs on behalf of the actor with their private key); registry SX only ever verifies. This keeps the pure↔crypto distinction tractable.

11.4 Effectful sandbox

Storage backends, transports, trigger then-sx, and proof verifiers that need the network (e.g. blockchain RPC for on-chain proof verification) all need real IO. These are not used to compute projected state; they're how the substrate interacts with the outside world.

Capability-granted primitives. The registration activity declares the capabilities the entry needs:

(activity 'Create
  :object {:type "DefineStorage"
           :where-tag "ipfs"
           :capabilities [{:type "http-client" :allowlist ["http://localhost:5001/*"]}
                          {:type "fs-read"    :path-prefix "/var/cache/fed-sx/ipfs/"}
                          {:type "fs-write"   :path-prefix "/var/cache/fed-sx/ipfs/"}]
           :put-sx (fn (cid bytes) ...)
           :get-sx (fn (cid) ...)})

Capability types (initial set; extensible):

  • http-client with allowlist (URL prefix patterns)
  • http-server with path-prefix (mounts a sub-handler)
  • fs-read / fs-write with path-prefix (chroot-style)
  • subprocess with command-allowlist
  • clock-read (wall clock; granted if registry entry needs to timestamp something)
  • random-bytes (host CSPRNG)

No ambient authority. Default capability set is empty; every capability is explicit, declared, signed, and auditable. A peer can refuse to load a registry entry whose capability claim is unacceptable to them.

Capabilities are content-addressed. Each capability descriptor has a CID. The substrate maintains a registry of "capability CIDs that this instance trusts to honour" — operator policy, not protocol.

11.5 Gas and resource accounting

Each sandbox call gets a budget:

  • CEK gas — every evaluator step costs 1 unit; primitive calls cost a per- primitive amount declared in spec/primitives.sx. Default budget: 100k units per fold call. Tunable per-projection via DefineProjection.gas-limit.
  • Memory ceiling — peak heap size for the fold call. Default 64 MB. Tunable.
  • IO budget (effectful only) — bytes read/written and network calls per invocation, granted separately per capability.
  • Wall-clock budget (effectful only) — max real-time before forced termination.

Exceeding any budget is a hard failure; the call returns an error value, the fold's state is unchanged, and the activity is tagged for the projection.

Gas accounting is part of the spec — every conforming host must charge the same units for the same operations, so "this fold runs out of gas" is a deterministic property of the (projection, activity) pair, not a host-specific outcome.

11.6 Determinism gotchas

The pure sandbox is only as deterministic as its primitives. Worth nailing:

  • Floating point. IEEE 754 binary operations are bitwise-identical across conforming hosts, but transcendentals (sin, cos, log, exp) are not — libm implementations differ. *Decision: floats are forbidden in pure mode unless the projection declares requires-deterministic-floats: true and uses only the IEEE 754 basic operations (+, -, , /, sqrt, comparison, conversion). For exact arithmetic, use integers or rationals (fed-sx will provide a rational primitive).
  • Map / dict iteration order. Must be sorted-key always in pure mode. The SX spec mandates this for for-each and map over dicts; we tighten it: pure mode forbids relying on insertion order.
  • String encoding. All strings are UTF-8 NFC at ingestion; pure-mode operations use byte-level comparison after normalization. Codepoint operations (length, substring) return identical results across hosts because they operate on the normalized form.
  • Integer overflow. Pure mode uses arbitrary-precision integers (the SX spec default). No undefined behaviour. Overflow is impossible.
  • Equality. Structural equality (equal?) compared across hosts must yield the same result for the same canonical-CID values. Implies dict equality is order-independent (as it should be), and float equality follows IEEE 754 (NaN ≠ NaN; +0.0 = -0.0).
  • Error values. When a primitive errors, the error must be representable as a dag-cbor value with a stable CID across hosts. Reserve a {:error :type ... :msg ...} shape; standard error types defined in the spec.

11.7 Failure model

A pure-mode call ends in one of three terminal states:

  1. Success — returns a value. Fold uses it as new state.
  2. Sandbox violation — IO attempted, capability denied, etc. Returns a stable error value; fold's state is unchanged; activity tagged {:projection-failed :reason :sandbox-violation :detail ...}.
  3. Resource exhaustion — gas, memory, IO budget exceeded. Same handling as sandbox violation but with :reason :resource-exhausted.

Crypto-mode failures (e.g. invalid signature) are return values, not exceptions — verify returns boolean, sign returns either a sig or an error. This forces callers to handle failure explicitly.

Effectful-mode failures (network down, disk full) propagate to the operator as errors but never affect projected state. The substrate retries effectful operations according to the registry entry's policy (declared at registration).

11.8 Conformance testing

Cross-host equivalence isn't aspirational; it's tested.

  • Spec test suite ships projection equivalence tests: a corpus of (log slice, projection CID, expected snapshot CID) tuples. Every conforming SX host must produce the expected snapshot CID for each input.
  • Validator equivalence tests likewise: (validator CID, activity, expected result).
  • Codec equivalence tests: (codec CID, value, expected encoded bytes), in both encode and decode directions.
  • Sandbox isolation tests: "this fold attempts to call fetch; expected outcome: sandbox violation error with stable CID."

Hosts run the conformance suite to claim "fed-sx pure-mode conformance." Failures are publishable as Test{result: failed, host: ..., projection: ...} activities — the conformance graph itself is federated.

11.9 Operational implications

  • The pure sandbox is the heart of cross-host federation. Every divergence is a spec bug or a host bug; both are caught by snapshot CID mismatches and surfaced via Test activities.
  • Capability descriptors are the new audit trail. "What can the IPFS storage backend do?" is a question with a precise answer at any timestamp — the registered capability CIDs.
  • Floats are mostly absent. This is unusual but defensible — most state in the substrate is ids, counts, sets, references. Numerical computation belongs in effectful registry entries (e.g. an analytics projection that publishes summaries as activities, projected by a downstream pure projection that just stores them).
  • Gas is part of the protocol. Two hosts disagreeing about whether a fold runs out of gas is a conformance failure. Spec primitive gas costs are normative.

12. Bootstrap & genesis

How a fresh instance starts with no log, where the initial registry entries come from, and how the kernel evolves without bricking peers.

12.1 The genesis problem

The substrate is "everything is a Define* activity in the log." But on a fresh instance the log is empty — so there are no Define* activities to tell the kernel what Create means, how to verify a signature, or what dag-cbor is. Strict turtles-all-the-way-down would deadlock startup.

Solution: the kernel ships with a baked-in genesis bundle containing the minimal set of definitions it needs to interpret its own log. The bundle is a constant of the kernel binary; its CID is hardcoded; the kernel verifies on startup that the bundle matches its hardcoded CID. After that, everything (including superseding the bundled definitions themselves) goes through the activity log.

The genesis bundle is not itself a federated artifact in the AP sense. It's the dictionary you need before you can read any activities. Optionally, an actor can Create{GenesisRecord} as their first published activity to advertise which genesis they started from — informational, not load-bearing.

12.2 Genesis bundle contents

Minimal viable bundle (dag-cbor object, content-addressed):

{
  "type": "fed-sx-genesis",
  "kernel-version": "1.0.0",
  "envelope-spec": { ... },                 // canonical schema for activity envelope
  "object-spec": { ... },                   // canonical schema for object envelope
  "definitions": {
    "activity-types": {
      "Create":   { "schema": <sx>, "semantics": <sx> },
      "Update":   { "schema": <sx>, "semantics": <sx> },
      "Delete":   { "schema": <sx>, "semantics": <sx> },
      "Announce": { "schema": <sx>, "semantics": <sx> }
    },
    "object-types": {
      "SXArtifact": { "schema": <sx> },
      "Note":       { "schema": <sx> },
      "Tombstone":  { "schema": <sx> },
      "DefineActivity":   { "schema": <sx> },
      "DefineObject":     { "schema": <sx> },
      "DefineProjection": { "schema": <sx> },
      "DefineValidator":  { "schema": <sx> },
      "DefineCodec":      { "schema": <sx> },
      "DefineTransport":  { "schema": <sx> },
      "DefineAudience":   { "schema": <sx> },
      "DefineProof":      { "schema": <sx> },
      "DefineStorage":    { "schema": <sx> },
      "DefineTrigger":    { "schema": <sx> },
      "DefineSigSuite":   { "schema": <sx> },
      "Snapshot":         { "schema": <sx> }
    },
    "sig-suites": {
      "rsa-sha256-2018": { "verify": <sx>, "key-format": <sx> },
      "ed25519-2020":    { "verify": <sx>, "key-format": <sx> }
    },
    "codecs": {
      "dag-cbor":  { "encode": <sx>, "decode": <sx> },
      "raw":       { "encode": <sx>, "decode": <sx> },
      "dag-json":  { "encode": <sx>, "decode": <sx> }
    },
    "projections": {
      "activity-log":     { "initial-state": ..., "fold": <sx> },
      "by-type":          { "initial-state": ..., "fold": <sx> },
      "by-actor":         { "initial-state": ..., "fold": <sx> },
      "by-object":        { "initial-state": ..., "fold": <sx> },
      "actor-state":      { "initial-state": ..., "fold": <sx> },
      "define-registry":  { "initial-state": ..., "fold": <sx> },
      "audience-graph":   { "initial-state": ..., "fold": <sx> }
    },
    "validators": {
      "envelope-shape": { "predicate": <sx> },
      "signature":      { "predicate": <sx> },
      "type-schema":    { "predicate": <sx> }
    },
    "audience-predicates": {
      "Public":    { "member-of": <sx> },
      "Followers": { "member-of": <sx> },
      "Direct":    { "member-of": <sx> }
    }
  },
  "capability-types": [                     // schema for capability descriptors
    "http-client", "http-server",
    "fs-read", "fs-write",
    "subprocess", "clock-read", "random-bytes"
  ]
}

Each definition's body is SX source, not bytecode. The kernel evaluates it at startup using the same SX evaluator user-published Define* artifacts use — there is no privileged "native" path. The bootstrap is just SX loaded from the binary instead of from the log.

12.3 Hardcoded CID and verification

The kernel binary contains:

  • The full genesis bundle (embedded as bytes).
  • The CID computed over those bytes at build time.

On startup:

  1. Compute the actual CID of the embedded bundle.
  2. Compare to the hardcoded CID.
  3. Mismatch → refuse to start. Either the binary has been tampered with or the build process is broken. Either way, the operator should know immediately.
  4. Match → proceed. Every running instance with a given kernel binary has byte-identical bootstrap state — no version drift possible within a binary.

The genesis CID is exposed at GET /.well-known/sx-capabilities so peers can see which kernel version they're talking to.

12.4 Fresh instance startup sequence

1. Load and verify genesis bundle (panic on mismatch)
2. Parse all definition SX sources, instantiate evaluator closures
3. Initialize registries from definitions (in the order: codecs → sig-suites →
   validators → object-types → activity-types → audience-predicates → projections)
4. Open log file (create if missing)
5. Replay any existing log: for each activity, validate, then fold into each
   projection (resuming from snapshots where available)
6. Load or generate actor keypair (filesystem path from config)
7. If actor has never published a Create{Person} for itself, generate and append
   one as the first activity of this instance's outbox
8. Initialize HTTP server, wire routes
9. Open inbox: start accepting federated activities
10. Mark instance as ready

Steps 1-3 are the bootstrap. Step 5 is replay-and-project. Step 7 is the "actor genesis" — every instance has at least one local actor; it publishes itself as its first activity, and that activity (signed by the actor's own key) anchors all subsequent activity from that actor.

12.5 First activity — actor creation

Every fresh actor's outbox starts with:

(activity 'Create
  :id           "https://next.rose-ash.com/actors/giles/activities/<uuid>"
  :actor        "https://next.rose-ash.com/actors/giles"
  :published    "<iso-timestamp>"
  :to           ["https://www.w3.org/ns/activitystreams#Public"]
  :object       <full actor doc with publicKeys array>
  :signature    <signed by the new key over the activity envelope>)

Self-signed: the activity introduces the key it's signed with. Verifiers fetch the actor doc embedded in the activity, find the key, verify against the activity. This is the trust-on-first-encounter for a new actor — the same model AP uses.

The kernel emits this automatically on first startup if the actor has no prior activity. Subsequent actor changes (key rotation, profile updates) are Update activities signed by an existing key.

12.6 Joining federation

A new instance has no peers initially. Discovery is operator-driven for v1:

  1. Operator configures one or more peer URLs (or a well-known seed list).
  2. Instance fetches peer's actor doc and /.well-known/sx-capabilities.
  3. Instance verifies it can interpret the peer's activities (envelope compatible, sig suites overlap). Reports incompatibilities to operator.
  4. If compatible, instance follows peer's primary actor (POST /inbox with a Follow activity).
  5. Peer streams or backfills outbox to this instance.
  6. Activities arrive, validate, fold into local projections.

Discovery beyond manual config (e.g. peer recommendations, federation directories) is a v2 concern.

12.7 Kernel version evolution

The substrate must evolve without forcing every instance to upgrade in lockstep. Three rules:

Rule 1: The activity envelope shape is forward-compatible only.

We may add optional fields to the envelope; we may not change semantics or remove fields. Old activities still validate under new kernels. New activities with new fields are accepted by old kernels (which ignore the unknown fields, store the raw envelope, and project conservatively).

This is the AP discipline. We adopt it strictly. If we ever need a breaking envelope change, it's a major version (fed-sx 2.0) and instances at different majors don't federate directly — only via bridges.

Rule 2: Everything else evolves via supersession.

New sig suite, new codec, new projection definition, new validator: publish a Define* activity that supersedes the old one. Both old and new versions stay valid at their respective timestamps. Old activities verify under old definitions; new activities use new definitions. Time-aware lookup (§9.6, §10.6) makes this work.

Rule 3: New genesis bundles supersede old ones via published activities.

When the kernel team ships a new version with an updated bundle:

  • The new bundle's CID is different.
  • Operators upgrading the kernel get the new bundle automatically.
  • The new bundle's contents are largely supersession Update{DefineProjection, DefineValidator, ...} activities relative to the old bundle's definitions.
  • A peer running the old kernel sees these Update activities (when they appear in followed outboxes) and can opt to load them dynamically (§12.8) or stay on the old bundle definitions until the operator upgrades.

In other words: the kernel binary evolution and the activity-log evolution are parallel tracks. The binary determines what's built in; the log determines what's currently active. They converge over time but don't have to be lockstep.

12.8 Dynamic Define* loading

When an instance receives an activity of type: "PinV3" and has no DefineActivity{ name: "PinV3"} in its define-registry, it has three options (operator policy):

  • Strict mode — store the activity envelope (it's valid AP), tag it unknown-type in by-type, do not project semantics. Operator must explicitly load the definition to enable projection.
  • Permissive mode — fetch the DefineActivity{name: "PinV3"} artifact (its CID is in the activity's capabilities-required list), validate, evaluate the semantics SX (in pure sandbox), reproject the activity. Operator notified.
  • Trusted-peers-only mode — like permissive, but only auto-loads Define* from actors on a configured trust list.

Default for fed-sx v1: strict mode. Operators opt-in to broader policies.

This lets the substrate genuinely live-extend — new verbs land via federation, no binary upgrade — while keeping a clean audit trail of what got loaded when.

12.9 Genesis as the substrate's manifest

A useful framing: the genesis bundle is the substrate's manifest (in the package- manager sense). It declares "this kernel ships with these definitions, identified by these CIDs, and this is what the kernel does until the log says otherwise."

Two instances with the same genesis CID start identical. Two instances with different genesis CIDs can federate as long as their active registry states (after log replay) overlap enough.

The genesis bundle is also the conformance reference: a kernel implementation claims fed-sx v1.0 conformance by reproducing the standard genesis bundle's CID from its own build of the included SX sources. If two implementations build the same spec sources and produce different CIDs, one of them is non-conformant. Cheap, deterministic conformance check.

12.10 Operational implications

  • Build-time CID computation is part of the kernel build. The build pipeline must include the genesis-bundling step and embed the resulting CID. Mismatch protection requires the binary to know what it expects.
  • Genesis evolution is a deliberate kernel-team decision. Adding a new bundled projection or sig suite is a kernel release, not a federated activity. (User- defined projections still federate normally.)
  • Strict-mode default protects against malicious extensions. Operators have to consciously opt into auto-loading remote Define*. This trades convenience for security — appropriate for v1.
  • Cross-major federation is a bridge problem. If/when fed-sx 2.0 ships with an envelope change, bridges between v1 and v2 are themselves federated artifacts — built by anyone, signed, audited.

13. Federation mechanics

How instances exchange activities, how peers subscribe, how new followers backfill, how delivery survives unreliable networks, and how the substrate resists abuse.

13.1 Push, pull, hybrid

ActivityPub canonically uses push: actor A publishes by POSTing each delivery to each follower's inbox URL. This gives low latency and clear delivery semantics, but requires a reliable per-recipient delivery queue and falls over when peers go down.

fed-sx supports both, with a push-primary, pull-fallback model:

  • Push is the default delivery mechanism. When an activity is appended to A's outbox, A's delivery worker posts it to each follower's inbox.
  • Pull is always available: any peer can GET /actors/<id>/outbox?since=<cursor> and stream activities in order. Used for backfill, recovery from delivery gaps, and instances that prefer pull-only operation.
  • Hybrid in practice: push delivers notifications (the activity itself, or a pointer to its CID); receivers may pull the full content if not inlined. Useful when the activity body is large.

Operators can configure their actors as push-only, pull-only, or hybrid. The default is hybrid.

13.2 The Follow lifecycle

AP-standard, slightly tightened:

;; A wants to follow B
(activity 'Follow
  :actor  "https://a.example/actors/alice"
  :object "https://b.example/actors/bob")
;; → POST to B's inbox

;; B accepts (or rejects)
(activity 'Accept
  :actor  "https://b.example/actors/bob"
  :object <follow-activity-id-or-embedded>)
;; → POST to A's inbox

;; A unfollows later
(activity 'Undo
  :actor  "https://a.example/actors/alice"
  :object <follow-activity-id-or-embedded>)
;; → POST to B's inbox

State derived by the audience-graph projection on each instance:

  • (followers actor) — set of actors who follow actor, projected from Accept{Follow} activities in actor's outbox (and the inverse via received Follow activities).
  • (following actor) — symmetric.

Auto-accept by default. Public actors auto-publish Accept for any incoming Follow. Locked actors require manual approval, implemented as an operator UI that publishes the Accept (or Reject) once a human decides.

13.3 Backfill

When A first follows B, A wants B's history. Four supported modes:

Mode Mechanism Trade-off
No backfill Just stream new activities going forward Cheapest, missing context for new followers
Pull paginated GET /outbox?since=epoch&limit=100 repeatedly Standard, slow for large outboxes
Snapshot fetch Find latest Create{Snapshot} published by B for the projection of interest, fetch + verify, then pull only activities after the snapshot's tip Fast, requires B to publish snapshots
Bundle fetch Out-of-band: B publishes a CID for an export bundle (a dag-cbor list of activities + actor doc + sig suite verification metadata); A fetches once, validates the chain, replays Fastest for cold starts; bundle creation is opt-in

Default: snapshot fetch when available, paginated pull otherwise.

A new instance joining federation typically combines: snapshot-fetch the actor-state and define-registry projections from a trusted peer (so it knows who exists and what verbs are defined), then incrementally backfill specific actors of interest.

13.4 Delivery queue and retry

Every push delivery attempt has a fate:

Outcome Action
2xx Mark delivered
3xx Follow redirect (with limit)
4xx (except 429) Mark permanently failed — peer rejected the activity. Log; don't retry.
429 Honour Retry-After; reschedule
5xx Exponential backoff; reschedule
Connection error Exponential backoff; reschedule

Retry schedule (default, tunable per peer):

1 min, 5 min, 15 min, 1 h, 4 h, 12 h, 24 h, 48 h, 96 h

After the last attempt fails, the activity is abandoned for push but remains in A's outbox. Followers can still pull it via GET /outbox?since=.... The peer will eventually catch up if they come back online and pull. Push is best-effort; pull is the source of truth.

Persistent queue. Delivery state is itself stored in the local instance — it's operator-internal, not federated. (Could be a regular SQLite table; doesn't need to be a projection because it's not state-the-world-cares-about.) On instance restart, the queue resumes from where it left off.

Queue-as-projection (alternative): for instances that want every aspect to be log-derived, the delivery state could be a local-only projection over a stream of Attempt / DeliverySuccess / DeliveryFailure activities written to a private local-only outbox. Out of scope for v1 but the design admits it.

13.5 Audience-respecting delivery

Each activity carries to, cc, bto, bcc. The delivery worker computes the delivery set: union of explicit recipients + (if as:Public or Followers in audience) the actor's followers projection.

  • bto and bcc are stripped before delivery (recipients shouldn't see who else is blind-copied).
  • Receivers honour audience. When an instance receives an activity it should not be in the audience for (e.g. a Direct activity to someone else, leaked via a misconfigured peer), it logs and discards. Validators in the inbound pipeline enforce this.
  • Public ≠ unlisted. to: as:Public means deliver to followers AND make publicly fetchable AND show in public projections. Some actors prefer "publicly fetchable but not pushed broadly" — cc: as:Public with to: Followers.

13.6 Spam and abuse posture

ActivityPub has well-known abuse vectors (Mastodon's history is instructive). fed-sx defends in layers:

Signature verification. Every inbound activity must have a valid signature matching an actor whose key was active at published. Forgeries are dropped at the envelope-validation stage (§14). Necessary but not sufficient — signatures only prove the message wasn't tampered with, not that the sender is benign.

Per-source rate limits. Per-actor and per-instance request rate limits on /inbox. Default: 100/min per actor, 1000/min per instance. Exceeded → 429.

Per-instance trust state. Three categories, operator-configured (and overridable per actor):

  • Trusted — auto-accept, auto-load Define* (if permissive mode), no rate- multiplier penalty.
  • Default — accept signed activities, standard rate limits, do not auto-load Define*.
  • Suspended — drop all inbound activities, refuse outbound delivery, do not fetch artifacts. Operator decision (e.g. spam source, harassment instance).

Trust state is local-only (operator policy); it is not federated. Different instances can disagree.

Audience refusal. Activities not addressed to anyone on this instance (no local followers, not as:Public, not to: a local actor) are dropped on receipt. Discourages spam targeting random instances.

Content validators. Registry-driven content moderation: a DefineValidator with applies-to: "inbound" runs against every inbound activity and can reject based on content rules. Examples: link-spam detection, ML moderation models served via an effectful validator (note: effectful validators are a special case — they can fail-closed without affecting determinism, because validators happen before projection and don't contribute to projected state).

Capability vetting. If an inbound activity declares capabilities-required that includes definitions this instance hasn't loaded and trust policy is strict- mode, the activity is quarantined (stored but not projected) pending operator review.

Federation circuit breakers. Per-peer error rate triggers temporary defederation: if a peer is sending malformed activities, exceeding rate limits, or signing with revoked keys, automatic suspension for an exponential cool-off.

13.7 Discovery

How an instance finds other instances and actors:

  • WebFinger (RFC 7033). GET /.well-known/webfinger?resource=acct:user@host returns links to actor URLs. AP-standard. fed-sx implements.
  • Well-known capabilities. GET /.well-known/sx-capabilities (§7) for cross- instance compatibility checks.
  • Manual peer config. Operators add peer instance URLs to their config.
  • Peer recommendations. An instance can publish Recommend{actor} activities pointing at peers it considers worth following. Receivers can use these as discovery hints (subject to local trust). Out of scope for v1 but the verb is reservable.
  • Federation directories. Community-maintained lists of instances; an instance can opt into being listed by publishing a Directory{listed-by} activity. v2 concern.

For v1: WebFinger + capabilities + manual config. Discovery beyond that is opt-in via standard verbs.

13.8 Streaming and real-time

Two streaming mechanisms:

  • Outbox SSEGET /actors/<id>/outbox/stream opens a Server-Sent Events connection. Each new activity appended to the outbox is sent as an event. Allows pull-style federation peers to maintain a live connection without polling.
  • Projection SSEGET /projections/<name>/subscribe (§10.8) streams projection deltas. Useful for clients (browsers) wanting reactive views.

Both are local-only mechanisms; the canonical federation transport remains push to inbox + pull from outbox. SSE is convenience, not protocol.

13.9 Operational implications

  • Push is best-effort, pull is authoritative. Operators should treat the outbox as the canonical record; delivery queue is bookkeeping.
  • Trust is per-instance and not federated. Two instances may have different views of "good actors" and "bad instances." This is a feature — defederation decisions are local sovereignty.
  • Backfill via snapshots is the cheap path. Encouraging actors to publish Create{Snapshot} regularly makes new-follower onboarding fast.
  • Audience semantics are enforced both ways. Senders compute delivery set; receivers honour audience. Defence-in-depth against misconfigured peers.
  • Capability-based extension loading is opt-in. Strict-mode default means unknown verbs are stored-but-not-projected — safe by default, with explicit operator control over what extensions load.

14. Validation pipeline

Every activity entering the substrate (whether published locally or received from a peer) flows through a fixed pipeline of checks. Order matters: cheap and fail-safe first, expensive and content-aware last. Each stage has a defined failure response (reject, quarantine, drop). Registry-driven validators plug in at a specific stage.

14.1 The two pipelines

Inbound — activities arriving via POST /inbox or pulled from a peer's outbox:

HTTP transport → envelope → signature → replay → audience →
  activity-type schema → object-type schema → content validators →
  capabilities → trust state → log append → projection (async)

Outbound — activities being published locally via POST /activity:

authentication → authorization → envelope construction → object handling →
  activity-type schema → signature → log append → projection (async) →
  delivery (async)

Stages they share are implemented as the same SX functions called from both pipelines.

14.2 Inbound pipeline — stage by stage

# Stage Check Failure response
1 Transport Valid HTTP request, content-type acceptable, body parseable as JSON-LD or dag-cbor 400 Bad Request; log
2 Envelope Matches kernel's envelope spec (required fields present, types valid, recognised activity type or unknown allowed) 400; log; structured error in response body
3 Signature Time-aware sig verification: fetch (or cache-lookup) actor doc, find key with id == sig.key-id that was active at published, verify against canonical envelope bytes per the named sig suite 401; log; do not retry; mark sender's instance for circuit-breaker accounting
4 Replay Activity id and CID not already in activity-log projection 200 OK with {status: "duplicate"}, no-op
5 Audience This instance has at least one local actor in to/cc, OR audience contains as:Public/Followers and the actor has local followers Drop silently (no response indicating either acceptance or refusal — prevents inbox-membership probing); do not store
6 Activity-type schema Look up DefineActivity{name: <type>} in define-registry; run its schema predicate over the activity in pure sandbox If type unknown: per trust policy (strict: 422 with missing-definition CID; permissive: attempt dynamic load §12.8). If schema fails: 422 with violation detail
7 Object-type schema If activity has an object with a type, look up DefineObject{name: <type>} and run its schema Same as #6
8 Content validators All registered validators with applies-to: inbound or applies-to: all run sequentially; each is a pure-sandbox predicate that returns :accept / :reject / :quarantine :reject → 422 with reason. :quarantine → store activity but mark quarantined, do not project, alert operator
9 Capabilities Every CID in capabilities-required is present in this instance's loaded registries (or auto-loadable per trust policy) Missing → 422 with list of missing CIDs (sender can deliver bootstrapping Define* artifacts first). Auto-load attempt can be triggered by re-POST with ?retry-after-load=true
10 Trust state Sender's actor and instance are not in Suspended state on this instance Drop silently; do not respond
11 Log append Write activity envelope (and inlined object content) to local mirror of sender's outbox; assign local sequence number Disk error → 503 (transient); sender retries
12 Projection Asynchronously fold the activity into every relevant projection (per define-registry) Per-projection failure (gas, sandbox violation) → tag activity projection-failed:<projection-name>; do not affect log durability

Pipeline halts at the first failing stage. Stages 110 are synchronous (POST /inbox holds the connection). Stage 11 is synchronous; stage 12 is asynchronous and the HTTP response returns once the log append succeeds.

14.3 Outbound pipeline — stage by stage

# Stage Check Failure response
1 Authentication Caller has a valid bearer token, mTLS cert, or session for the actor 401
2 Authorization Caller's identity is allowed to publish as the named actor (capability token §9.5 or owns the actor key) 403
3 Envelope construction Kernel fills in id, published, normalises to/cc, computes capabilities-required (by walking referenced Define* CIDs) n/a
4 Object handling If object has inline content: canonicalize, compute CID, optionally store per where. If object references a CID, verify the artifact exists locally or remotely (or accept as a forward reference) Storage error → 503
5 Activity-type schema Same as inbound #6 — schema must pass 422 with violation detail (caller bug)
6 Signature Sign envelope with the actor's currently-active key matching the activity type's required purpose (e.g. Pin requires purpose: pin) If no suitable key: 400
7 Log append Write to local outbox; assign sequence number 503
8 Projection Async fold (same as inbound #12) Per-projection failure tag
9 Delivery Async push to follower inboxes per audience Per-recipient retry per §13.4

Caller's HTTP response returns after stage 7 (log append). The activity is durable and queryable as soon as the response is sent; projection lag is reported via projected-up-to headers and ?wait-for= parameter.

14.4 Failure response taxonomy

Three response categories with explicit semantics:

Reject — tell sender, don't store, reject can be retried after sender corrects. Used for: malformed envelope, invalid signature, schema violation, missing capabilities. HTTP 4xx with structured error.

Quarantine — store envelope (it's a valid signed message) but don't project, alert operator. Used for: content-validator soft-fail, unloaded capabilities under permissive policy, suspect-but-not-banned senders. Activity sits in a quarantine projection until operator reviews; operator can release (project) or expunge.

Drop silently — don't store, don't respond informatively. Used for: replay (ack as duplicate), audience refusal (would leak inbox membership otherwise), suspended- sender activities. The sender experiences this as a successful POST with no visible effect; they can detect it only by polling for their activity not appearing in our outbox.

14.5 Registry-driven validators

Most of the pipeline is fixed kernel logic (envelope, signature, replay, audience, log append, delivery). Two stages are registry-driven and extend dynamically:

  • Stage 8 (content validators) — operators add/remove DefineValidator entries with applies-to: inbound | outbound | all. Each runs in pure or effectful sandbox per its declaration. Returns one of :accept / :reject{:reason} / :quarantine{:reason}.
  • Stages 67 (schema validators) — these are registry entries (DefineActivity.schema, DefineObject.schema); the pipeline calls into the registry to fetch them.

Pure-mode validators are deterministic and cheap; results can be cached per (activity-CID, validator-CID).

Effectful-mode validators can call out to ML models, blocklist services, external moderation APIs. They get a per-call IO budget; exceeding it counts as :reject{:reason :validator-timeout}. Effectful validators do not break determinism because validation happens before projection — a rejected activity never enters projected state.

14.6 Validator composition and ordering

Validators have an integer priority field; lower priority runs first. Pipeline short-circuits on first :reject. :quarantine is not short-circuiting; later validators still run, and :quarantine results aggregate.

Default priorities (room for operator-added validators):

0-99    : kernel-internal (envelope, sig, replay, audience)
100-199 : standard schema validators
200-299 : standard content validators (rate limit, audience leak)
300-399 : operator-added moderation
400-499 : effectful (ML, third-party APIs)
500+    : reserved

Operators can publish Update{DefineValidator} to change priorities or add new ones; takes effect on next inbound activity.

14.7 Determinism requirement and its limit

A subtlety worth being explicit about: inbound validation is not required to be deterministic across instances. Two instances can disagree about whether to accept a given activity (e.g. one has a stricter content validator). Their projected states will then diverge — but only on activities one accepted and the other didn't.

This is fine. Federation does not require state convergence; it requires fold determinism for activities both instances accepted. Validators are sovereignty controls, not protocol invariants.

Where determinism is required: schema validators (§14.2 stages 67). If two instances disagree on whether Pin v3 matches its schema, they can't federate Pin v3 activities meaningfully. So schema validators must be pure-mode and referenced by CID.

14.8 Operational implications

  • The pipeline is the security perimeter. Every checkable property is checked here, not deeper in the kernel. No "trust the caller" assumptions inside log or projection code.
  • Quarantine is the operator's friend. Anything suspicious sits in quarantine with full envelope, sig, and reason — operator can review and decide. Better than outright drop because it preserves audit.
  • Schema validators are protocol-load-bearing; content validators are policy. The first set must converge across instances for federation to work; the second set can diverge (and that's how local moderation policy is expressed).
  • Outbound validation catches local bugs early. A malformed Pin activity fails at outbound stage 5, never enters the local log, never gets delivered.

15. Storage layout

The on-disk shape of an instance. Three concerns kept separate: the activity log (append-only, canonical), content-addressed object storage (keyed by CID, immutable), and operational state (projections, indexes, queues — derived, rebuildable).

15.1 Storage tiers

/var/lib/fed-sx/
├── log/                                     # canonical, append-only
│   ├── actors/
│   │   ├── <local-actor-id>/
│   │   │   ├── outbox/
│   │   │   │   ├── 000001.jsonl             # segment, ~64MB cap
│   │   │   │   ├── 000002.jsonl
│   │   │   │   └── tip                      # symlink to current segment
│   │   │   ├── inbox/                       # received, pre-projection
│   │   │   └── seq                          # next sequence number
│   │   └── <other-local-actor-id>/...
│   └── mirrors/                             # local mirrors of followed remote outboxes
│       └── <remote-actor-id-hashed>/
│           ├── 000001.jsonl
│           └── ...
├── objects/                                 # CID → bytes
│   └── <cid-prefix-2>/<cid-prefix-2>/<full-cid>
├── snapshots/
│   └── <projection-cid>/
│       ├── <log-tip-cid>.cbor               # snapshot value
│       └── index                            # ordered list of (log-tip, file)
├── projections/                             # live projection state
│   └── <projection-cid>.cbor                # latest in-memory state, periodically flushed
├── indexes/
│   └── fed-sx.db                            # SQLite: lookups, queue, trust state
├── keys/
│   └── <actor-id>/                          # private keys, mode 0600
│       ├── primary.pem
│       ├── recovery.pem
│       └── sigs.toml                        # key metadata
├── genesis/
│   └── bundle.cbor                          # extracted from binary at first run
└── config.toml                              # operator config

15.2 The log — append-only segments

The activity log is the only thing the substrate cannot lose. It is the source of truth from which everything else is derived.

Format: JSONL segments. Each line is one activity envelope, encoded as JSON-LD (canonical form), terminated by \n. Easy to inspect, easy to grep, trivially streamable.

Why JSON-LD on disk, not dag-cbor? Two reasons:

  • Operability: humans can tail -f and grep the log. dag-cbor is opaque.
  • AP wire compatibility: activities arrive over HTTP as JSON-LD anyway; storing the same form avoids round-trip conversion.

The CID of each activity is computed from its canonical dag-cbor representation (per §2), independent of how it's stored. CIDs are stable across storage formats.

Segments cap at ~64MB. Rotation by size, not time. Old segments are immutable; new writes go to the tip segment. Compression (zstd) applied on segments older than the current tip — saves disk, doesn't slow appends.

Per-actor outboxes. Each local actor has its own outbox directory. This matches AP semantics (one outbox per actor) and means:

  • Backing up a single actor is a simple directory copy
  • Per-actor sequence numbers (no cross-actor coordination)
  • Migration (Move) is a directory rename + a Move activity

Mirror outboxes. When a local actor follows a remote one, the remote's outbox is mirrored locally for replay. Same JSONL format. Tracked under log/mirrors/<hashed- remote-id>/ to avoid filesystem path issues with URL characters. The hash is purely a filesystem-friendly encoding; the canonical actor id stays in the log content.

Inbox vs outbox distinction. Inboxes hold received activities pre-validation; outboxes hold committed activities post-pipeline. An inbound activity that passes the validation pipeline (§14) is moved from inbox to the appropriate mirror outbox. This makes inbox a transient queue, not a permanent record.

15.3 Object storage

Content-addressed blob store, sharded directories.

Path scheme: objects/<first-2-chars>/<next-2-chars>/<full-cid>. Sha2-256 CIDs are uniformly distributed; this gives ~65k buckets with a couple-hundred files each at moderate scale. Standard pattern (matches IPFS, Git).

Storage backends. Pluggable per where: cid object:

  • files-on-disk (default) — write to local filesystem.
  • ipfs — register-driven backend; calls out to a local IPFS node.
  • s3 — object storage in cloud bucket.
  • memory-only — in-memory cache, evictable; useful for ephemeral artifacts.

The kernel uses the where-tag on each object to dispatch to the correct backend. Backends are registry entries (DefineStorage); operators install only the ones they want.

Garbage collection is opt-in per backend. Default policy: never GC (objects are immutable and may be referenced by future activities). Operators can configure per-backend retention rules:

  • "Keep last N versions of objects referenced by Pin activities for path X"
  • "Evict objects not referenced in last 90 days from the memory-only cache"
  • "Mirror objects referenced by ≥ 3 endorsements; evict others after 30 days"

GC operates on the projected reference graph (a reference-graph projection that maintains "what activities reference this CID"). Removing an object that's still referenced is allowed but produces a warning logged in operations.

15.4 Snapshots

Per §10.4, snapshots are the (projection-CID, log-tip-CID, state) triples that let us resume without full replay.

Storage: snapshots/<projection-cid>/<log-tip-cid>.cbor. The state value is dag-cbor-encoded; the file's content CID matches the snapshot's claimed CID.

Index: snapshots/<projection-cid>/index is a sorted list of (log-tip-time, log-tip-cid, file) triples. On startup, kernel finds the latest snapshot ≤ current log tip and resumes from it. On time-travel queries, finds the latest snapshot ≤ target time and folds forward.

Retention: keep at least:

  • Latest snapshot per active projection
  • Snapshots referenced by published Create{Snapshot} activities (federation proofs)
  • One snapshot per day for the last 7 days (audit / time-travel)

Older snapshots GC'd by default. Operators can increase retention.

15.5 Operational state — SQLite

Things that are derived, frequently-queried, but not federated:

  • Lookup indexes for projections (when indexes: declared) — (projection, index-key, value) → activity-cid rows
  • Delivery queue — outbound activities pending push, retry counts, next-attempt timestamps
  • Trust state — per-actor and per-instance trust levels (Trusted / Default / Suspended)
  • Quarantine queue — activities pending operator review
  • Configuration cache — currently-active registry entries (also in memory; on- disk cache for fast restart)

Single SQLite file (indexes/fed-sx.db). Recoverable: if corrupted or deleted, rebuilt from the log on next startup (with cost proportional to log size). The SQLite is a cache, not authoritative.

WAL mode for concurrent readers. Single-writer (the kernel); reads from many HTTP request workers.

15.6 Backup and export

The substrate is an append-only log of immutable artifacts; backup is simple.

  • Full backup: rsync /var/lib/fed-sx/log/ and /var/lib/fed-sx/objects/. The rest is rebuildable.
  • Per-actor export: tar log/actors/<actor-id>/ + the objects referenced by activities in that outbox. Self-contained, importable into another instance.
  • Activity bundle export: for federation backfill, produce a dag-cbor bundle of [activity envelopes... + referenced objects] for a specified actor + range. Single file, content-addressed, signed by the source instance with a Bundle activity attesting to its contents.

Exports are themselves publishable (Create{Bundle} activity carrying the bundle CID). This is how an actor migrates instances cleanly: export bundle, import on new instance, publish Move activity.

15.7 Mirroring and replication

Two patterns:

  • Federation mirroring (the canonical kind) — when actor A follows B, A's instance mirrors B's outbox locally. This is just normal federation (§13). Each follower keeps its own copy.
  • Operational mirroring — for high availability. An operator runs two instances with shared filesystem (NFS / EFS) for log/ and objects/, separate SQLite files. Reads can hit either; writes go through one. Or: rsync-based hot standby with manual failover.

Operational mirroring is out of scope for v1. Federation mirroring is the substrate- level redundancy: as long as one peer that followed you is still online, your log is still recoverable.

15.8 Storage size estimates

Rough targets at moderate scale (10 active local actors, 1000 followed peers, 1 year of activity at 100 activities/actor/day):

  • Log: 10 actors × 100 act/day × 1 KB avg envelope × 365 days ≈ 365 MB local outbox. Mirrors: 1000 peers × 10 act/day × 1 KB × 365 ≈ 3.6 GB.
  • Objects: depends heavily on content. Assume 50% of activities have inline content of avg 5 KB → ~2 GB total inline. CID-referenced larger objects: count separately, depends on use case.
  • Snapshots: typically much smaller than the log. ~10 active projections × ~10 MB per snapshot × ~8 retained snapshots ≈ 800 MB.
  • SQLite: index sizes proportional to indexed projection content; typical few hundred MB.

Total: order of 10 GB at the described scale. Single-machine viable; SSD recommended for log throughput; spinning disk fine for snapshots and object storage cold tier.

15.9 Operational implications

  • The log is sacred. Never modify, never delete. Backups go to multiple media. Loss of log/ means loss of identity (actor activities) and loss of state-of- record. Loss of objects/ means loss of content but log + peers can recover most of it.
  • Everything else is rebuildable. Projections, indexes, snapshots, queue state can all be recomputed from the log at startup cost. Operationally, this means upgrades and migrations are forgiving.
  • CID-addressed storage is naturally idempotent. Two instances writing the same artifact write the same bytes to the same path. Race conditions become no-ops.
  • JSONL on disk pays for itself the first time an operator needs to debug a weird federation issue with grep and jq. Worth the storage cost vs dag-cbor.

16. API surface

HTTP API for reading the log, publishing activities, querying projections, and streaming updates. Three layers: AP-standard endpoints (for vanilla AP interop), fed-sx-specific endpoints (publish, query, capabilities), and discovery endpoints (webfinger, well-known).

16.1 Endpoint catalog

AP-standard

Method Path Purpose
GET /actors/<id> Actor doc (Person/Service/Group/Application)
GET /actors/<id>/inbox Read inbox — auth required
POST /actors/<id>/inbox Receive federated activity (HTTP Signature required)
GET /actors/<id>/outbox OrderedCollection of actor's published activities
POST /actors/<id>/outbox AP-standard publish (alias for POST /activity with actor set)
GET /actors/<id>/followers OrderedCollection of follower actor URIs
GET /actors/<id>/following OrderedCollection of followed actor URIs
GET /activities/<uuid> Single activity by id
GET /objects/<uuid> Single object by id (note: distinct from CID-addressed /artifacts/<cid>)

fed-sx-specific

Method Path Purpose
POST /activity Generalised publish — accepts any well-formed activity
GET /artifacts/<cid> CID-addressed artifact fetch (content negotiated)
GET /artifacts/<cid>/raw Raw bytes (whatever the codec stored)
GET /artifacts/<cid>/<path> IPLD path traversal into the artifact
GET /projections List of registered projections (name, CID, last-folded-tip)
GET /projections/<name> Full projection state (paginated for large states)
GET /projections/<name>?at=<ts> Time-travel: state as of timestamp
GET /projections/<name>/<key> Single key from a projection (uses indexes)
POST /query Run an SX query expression against one or more projections
GET /define-registry Currently active Define* artifacts by kind
GET /capabilities/<actor-id> Per-actor declared capabilities

Discovery and well-known

Method Path Purpose
GET /.well-known/webfinger?resource=acct:<user>@<host> RFC 7033 actor discovery
GET /.well-known/sx-capabilities This instance's capability advertisement (§7)
GET /.well-known/host-meta XRD describing the host
GET /.well-known/nodeinfo Standard fediverse node metadata (Mastodon, Pleroma compatibility)

Real-time (SSE)

Method Path Purpose
GET /actors/<id>/outbox/stream New activities as they're appended (events: activity)
GET /actors/<id>/inbox/stream New inbound activities (auth required)
GET /projections/<name>/subscribe Projection deltas (events: delta)
GET /federation/health/stream Per-peer delivery health (events: peer-status)

WebSocket equivalents (/ws/... paths) available where SSE is awkward (browsers behind proxies); same event payloads, different framing.

16.2 Authentication

Three mechanisms, each appropriate to a different caller type:

  • HTTP Signatures (RFC draft-cavage-http-signatures) — the AP-standard mechanism for inter-instance calls. Sender signs a digest of relevant headers + body with their actor's private key; receiver verifies via the actor's public keys projection (§9.6). Used for: POST /inbox, peer-to-peer outbox pulls when authentication is desired.
  • Bearer tokens — for interactive clients (CLIs, web UIs, mobile apps). Issued via OAuth2 (or simple admin-issued tokens for v1). Used for: POST /activity, GET /actors/<id>/inbox, anything requiring caller identity.
  • Capability tokens (§9.5) — for delegated publish. Token includes the granting actor, the granted capabilities (e.g. publish: Pin for path-prefix /docs/), the bearer's actor, expiry, and signature from the granter. Used for: child actors, service accounts, temporary publish access.

Public reads (most GET endpoints to public-audience activities) require no auth. Private/followers-only reads check the caller's identity against the audience.

16.3 Content negotiation

Same resource, multiple representations. Accept header dispatches:

Accept header Returns
application/activity+json AP-standard JSON-LD (default for ambiguous Accepts)
application/ld+json; profile="..." JSON-LD with explicit profile
application/cbor dag-cbor
application/json Plain JSON (compact, no @context expansion)
application/sx Canonical SX wire format
text/html HTML representation (for browsers — renders the artifact via SX)

Same negotiation applies to /artifacts/<cid>, /activities/<uuid>, /projections/<name>. Servers MUST honour the request; absent Accept defaults to application/activity+json.

16.4 Pagination

Cursor-based via AP's OrderedCollectionPage:

GET /actors/giles/outbox
→ {
    "type": "OrderedCollection",
    "totalItems": 12345,
    "first": "/actors/giles/outbox?page=true",
    "last": "/actors/giles/outbox?page=true&min_id=0"
  }

GET /actors/giles/outbox?page=true
→ {
    "type": "OrderedCollectionPage",
    "id": "...?page=true",
    "next": "...?page=true&max_id=<cid>",
    "prev": "...?page=true&min_id=<cid>",
    "orderedItems": [...]
  }

Cursors are CIDs of the boundary activity (not opaque tokens). Stable across restarts and instances. max_id returns activities before the cursor (newest first); min_id returns activities after the cursor.

Default page size: 50. Max: 1000. Link: <...>; rel="next" header also provided for HTTP-native pagination.

For projections: same shape, items are projection entries.

16.5 The query API

POST /query takes an SX expression evaluated in pure mode against named projections:

POST /query
Content-Type: application/sx
Accept: application/sx

(let ((actors  (projection actor-state))
      (pins    (projection pin-state)))
  (for-each ([(actor-id actor) actors])
    (when (> (count (filter (fn ((path cid)) (= (:owner cid) actor-id)) pins)) 10)
      {:actor (:preferredUsername actor)
       :pins-published (count ...)})))

Query semantics:

  • Evaluated in pure sandbox; all the determinism rules apply.
  • Projection access is read-only and snapshot-consistent: the query sees state as-of the time of the request (or ?at= if specified).
  • Result is serialized in the negotiated content type.
  • Gas limit applies (default 1M units per query, tunable by operator).
  • Cacheable: query CID + projection state CIDs uniquely determine the result.

Query results can themselves be published as Create{QueryResult} activities, making derived analyses federable.

16.6 Errors

Uniform JSON error envelope:

{
  "error": {
    "type": "https://next.rose-ash.com/ns/fed-sx/errors/v1#InvalidSignature",
    "status": 401,
    "title": "Activity signature invalid",
    "detail": "Key id 'https://example/actors/x#key-1' was superseded at 2026-01-15T...",
    "activity-id": "https://...",
    "key-id": "...#key-1",
    "instance": "/incidents/<incident-cid>"
  }
}

Error types are URIs in the fed-sx namespace; receivers can check type for programmatic handling. Standard errors:

  • MissingCapability — includes missing array of CIDs
  • SchemaViolation — includes schema-cid, field-path, expected, got
  • InvalidSignature
  • Quarantined — includes quarantine-id for operator-status tracking
  • RateLimited — includes retry-after
  • ResourceExhausted — for query gas exhaustion

16.7 Streaming details

SSE event format:

event: activity
id: <activity-cid>
data: { ...activity envelope... }

event: delta
id: <activity-cid that triggered the delta>
data: {"projection": "actor-state", "key": "...", "old": ..., "new": ...}

event: heartbeat
data: {"projected-up-to": "<cid>", "ts": "..."}

Clients reconnect with Last-Event-ID: <cid> to resume from the last event seen. Server replays from that point in the log (or returns 410 if too far behind, in which case client should switch to paginated pull).

16.8 Versioning

The substrate is versioned at three levels:

  • Envelope version — declared in /.well-known/sx-capabilities. Currently 1. Forward-compatible (new fields OK; semantics fixed).
  • API version — URL prefix optional: /v1/... works the same as /.... Future major version: /v2/... paths in parallel.
  • Definition versions — supersession via activity log (§§9.2, 12.7). No special URL handling.

Capability negotiation happens before federation; clients shouldn't hard-code URL paths beyond the canonical set documented here.

16.9 Operational implications

  • The API is small but layered. AP compatibility is one layer; fed-sx extensions are another; both share auth and content negotiation. Adding a new endpoint shouldn't require new transport machinery.
  • Content negotiation is the polyglot bridge. Same artifact addressable in JSON- LD (for AP peers), dag-cbor (for fed-sx peers), SX (for SX clients), HTML (for humans). One CID, four representations.
  • Cursor pagination is CID-based. Stable identifiers, no opaque tokens to invalidate, peers can synchronize without coordination.
  • The query API is a load-bearing differentiator. Datalog/GraphQL-equivalent expressiveness with no separate query language — it's just SX. Federable, signable, versionable like any other SX artifact.

17. Implementation languages

Polyglot authoring, monoglot runtime: every language-on-SX compiles to core SX and runs on any host with the SX evaluator. The language is an authoring choice; the federated artifact is uniform SX. Authors of Define* artifacts pick the source language they prefer; consumers don't need that compiler installed to execute the compiled SX.

Languages are picked because they genuinely fit the problem, not to demonstrate the polyglot story. Where a chosen language has gaps (e.g. Erlang-on-SX missing hot reload), we invest in maturing the port rather than working around the gap.

17.1 The v1 stack

Layer Language Why
Native primitives OCaml (existing runtime) Crypto (RSA, Ed25519, SHA), dag-cbor encode/decode, HTTP socket, file IO, SQLite. Surfaced as Erlang-on-SX BIFs.
Kernel orchestration Erlang-on-SX Actor model = federation. gen_server per actor / per projection / per peer. supervisor for delivery workers. Message passing is literally the substrate. Hot code reload (Phase 7) for Define* live extension.
Query API back-end Datalog-on-SX Projection state is relational; trust graph walks, provenance, projection joins are textbook Datalog. Already mature (276/276 tests, full core Datalog with stratified negation, aggregation, magic sets, federation-graph demo).
Define* semantics, schemas, validators, codecs, audience predicates Core SX The canonical federated language. Everything content-addressed and federated lives here.

17.2 Languages explicitly not booked for v1

Available, mature, considered — would be reached for if a real fed-sx need surfaced, but no preemptive use:

  • Haskell-on-SX (285/285 tests, 36 programs, type checker working) — for complex operator-authored extensions that benefit from typed pattern matching. Schemas in fed-sx are short predicates; types don't earn their keep here.
  • Smalltalk-on-SX (625/629 tests, classic corpus running) — natural fit for a live operator dashboard / Glamorous-Toolkit-style introspection. v2/v3 territory; a browser UI likely wins for operator audiences.
  • APL-on-SX — high-throughput batch reprojection if scalar SX folds become a bottleneck. Premature without measured need.
  • JS-on-SX, Elm-on-SX — browser-side client SDK / viewer. v2.
  • Common Lisp-on-SX, Forth-on-SX, Go-on-SX, Dream-on-SX, Elixir-on-SX, Erlang-on-SX (alternative form) — case by case if a use case appears.

17.3 The FFI BIF layer

Erlang-on-SX has no FFI / NIF mechanism in its current form (Phase 6 plan: "out of scope entirely"). fed-sx adds a BIF layer in lib/erlang/transpile.sx (or a dedicated lib/erlang/fed_bifs.sx) exposing native primitives:

crypto:rsa_verify/3       crypto:ed25519_verify/3
crypto:sha2_256/1         crypto:sha3_256/1

cid:cbor_encode/1         cid:cbor_decode/1
cid:multihash/2           cid:from_bytes/2
cid:to_string/1           cid:from_string/1

log:append/2              log:read/3
log:tip/1                 log:replay/3

http:listen/2             http:request/2
http:respond/3            http:sse_send/2

fs:read/1                 fs:write/2
fs:exists/1               fs:list/1

sqlite:open/1             sqlite:exec/2
sqlite:query/3            sqlite:close/1

snapshot:put/3            snapshot:get/2

Each BIF is a thin Erlang-on-SX function dispatching to the corresponding SX runtime IO primitive. Returns Erlang-shaped values (atoms, tuples, binaries). Errors raise appropriate Erlang exceptions (badarg, enoent, eaccess).

This is the only native-FFI surface in fed-sx. All other I/O goes through these BIFs. Operators can audit the BIF list to know exactly what the substrate touches outside SX.

17.4 Build pipeline

.sx files (core SX, registry entries) ──┐
.erl files (Erlang-on-SX kernel)    ──┼──> compile to core SX
.dl files (Datalog-on-SX queries)   ──┘
                                       │
                            content-addressed SX artifacts
                                       │
                                       ▼
                         genesis bundle (CID-verified)
                                       │
                                       ▼
                         OCaml runtime evaluates everything

Each authoring language's compiler runs at build time, producing core SX that goes into the genesis bundle (for bootstrap definitions) or gets published as activities (for runtime extensions).

17.5 Prerequisite work

Pieces of investment land in or alongside the Erlang-on-SX loop. The first two land before fed-sx kernel code starts; the third runs in parallel, not blocking milestone 1, but blocking production-grade throughput.

  1. Phase 7 — hot code reload. code:load_binary/3, gen_server code_change/3 callback dispatch, atomic module-version swap. Required for Define* live extension (no kernel restart to load new verbs). Reload- semantics choice (two-version coexistence vs single-version atomic swap with closure capture) decided during the work.

  2. Phase 8 — FFI mechanism + initial BIFs. define-bif registration + term marshalling + error mapping, then BIFs for crypto:*, cid:* (dag-cbor), fs:*, http:*, sqlite:*. Required for fed-sx kernel to call native primitives. Lands before kernel code that calls them.

  3. Phase 9 — specialized opcodes (the BEAM analog). Layered perf strategy:

    • Layer 1 (Phase 9, in scope) — specialized bytecode opcodes that bypass the general-purpose CEK machine for hot Erlang operations. OP_PATTERN_TUPLE, OP_PERFORM/OP_HANDLE, OP_RECEIVE_SCAN, OP_SPAWN/OP_SEND, BIF dispatch table. Targets: 100k+ message hops/sec, 1M-process spawn under 30sec — roughly 1000-3000× speedup over the current general-purpose path.
    • Layer 2 (Phase 10, deferred) — multi-core scheduler via OCaml 5 domains. Decided empirically after Layer 1 lands; likely unnecessary if Layer 1 alone hits target throughput.
    • Layer 3 (skipped) — incremental tuning of the existing call/cc-based receive and env-copy-per-call machinery. Obsoleted by Layer 1; not pursued.

    Architectural note for Phase 9. Phase 9a (the opcode extension mechanism in hosts/ocaml/evaluator/) is out of scope for the Erlang loop — it's SX VM core, used by every language port that wants specialized opcodes. Designed in plans/sx-vm-opcode-extension.md; lands as a separate focused workstream (~1-2 weeks) owning hosts/. Phase 9b-9g (the actual Erlang opcodes in lib/erlang/vm/) are designed and tested against a stub dispatcher in the Erlang loop until 9a is available.

    Shared-opcode discipline. Opcodes Phase 9 produces that other language ports could plausibly use (pattern match, perform/handle, record access) become candidates for chiselling out to lib/guest/vm/ — same lib/guest discipline, applied at the bytecode layer. Don't pre-extract; promote to lib/guest/vm/ when a second language port has an actual second use. The substrate accumulates a richer opcode surface over time as ports contribute, and every port benefits from every shared opcode (the structural advantage over BEAM, which is special-purpose-built for one language).

    fed-sx is not blocked by Phase 9. Milestone 1 ships on current Erlang- on-SX perf (which has 100-1000× headroom for a single demo instance). Phase 9 lands in parallel; by the time fed-sx needs production-grade throughput (federation hub use cases, milestone 2-3), Phase 9 is ready.

After Phases 7 and 8 land, fed-sx milestone 1 (kernel + registries + bootstrap entries + Pin smoke test + reactive application smoke test) becomes the next workstream. Phase 9 work continues in parallel.


18. Subscription model

Symmetric to the publish-side extensibility: just as DefineActivity registers what kinds of things can be published, DefineSubscription registers what kinds of patterns can be subscribed to. Follow becomes one standard subscription type among many, not a hardcoded primitive.

18.1 The asymmetry being fixed

Without this, the substrate has rich publish-side extensibility (any new verb is a DefineActivity) and one hardcoded subscription primitive (Follow). That mirrors AP but it's an arbitrary limitation in a substrate where everything else is registry-driven. Generalising restores symmetry.

18.2 The DefineSubscription shape

(activity 'Create
  :object {:type "DefineSubscription"
           :name "Follow"                        ; AP-standard
           :schema (fn (sub)                     ; what params the sub takes
             (and (cid? (-> sub :object))
                  (= "Person" (-> sub :object-type))))
           :match (fn (subscription activity)    ; pure-mode predicate
             (= (-> subscription :object) (:actor activity)))
           :delivery {:default :push
                      :modes [:push :pull :sse]
                      :digest-window nil}
           :capabilities-required []})           ; some subs may need authority

Four mandatory parts:

  • schema — pure-mode predicate validating subscription parameters at Subscribe time. Catches malformed subscriptions before they enter state.
  • match — pure-mode predicate (subscription, activity) → bool. Decides whether a given activity is a hit for this subscription. Determinism rules apply (§11.2).
  • delivery — supported modes (push to inbox / pull on demand / SSE streaming / batched digest). The subscription instance picks its preferred mode at Subscribe time from the supported set.
  • capabilities-required — capability tokens the subscriber must hold (empty for public subs; populated for paywalled/gated/private streams).

18.3 The Subscribe verb

The bootstrap verb that activates a subscription:

(activity 'Subscribe
  :object {:type "Follow"   :object "https://alice.example/actors/alice"})

(activity 'Subscribe
  :object {:type "Topic"    :tag "climate-change"
           :delivery :digest :digest-window "P1D"})

(activity 'Subscribe
  :object {:type "CidWatch" :cid "bafy..."
           :events [:supersede :endorse]})

(activity 'Subscribe
  :object {:type "Predicate"
           :pred '(fn (act) (and (= (:type act) "Note")
                                  (string-contains? (-> act :object :content) "fed-sx")))})

Unsubscribe is Undo{Subscribe} — AP's standard pattern, retains audit.

18.4 Standard subscription types (defined later, not bootstrap)

Same status as the custom verbs in §6.2 — substrate accepts any subscription type once a DefineSubscription artifact registers it. Standard set:

Name Params Match semantics Use case
Follow {object: actor-id} activity.actor == subscription.object AP-standard actor following
Topic {tag: string} tag in activity.object.tags Hashtag follows, RSS-like
CidWatch {cid, events: [...]} activity references cid AND activity.type in events "Notify me when this artifact is updated/endorsed/forked"
PathWatch {path, events: [...]} activity is a Pin/Update of named path "Notify me when domain:foo/bar/baz changes"
VerbFilter {wraps: subscription-cid, types: [...]} inner subscription matches AND activity.type in types "Follow Alice but only Endorse activities"
TrustGraph {root: actor-id, depth: int} activity.actor reachable from root in trust graph at depth Web-of-trust expansion
Predicate {pred: sx-fn} (pred activity) returns truthy Escape hatch — most powerful, highest cost
Channel {channel-id} activity addresses or originates from channel Multi-actor pooled streams

18.5 Match-fn execution location

The load-bearing question. Three choices, fed-sx adopts the hybrid model:

  • Coarse filter on the publisher side — audience predicates (§8) decide who the activity is delivered to at all. This is mandatory and cheap (audience set is usually small and well-defined).
  • Fine filter on the subscriber side — once an activity arrives in inbox, the subscriber's instance evaluates each active subscription's match-fn against it. Pure-mode evaluation (deterministic, gas-bounded). Activities matching one or more subscriptions enter the subscriber's projected state.

Why hybrid: publisher-side fine filtering would require the publisher to know every subscriber's match-fn (privacy-violating, scaling-killing). Subscriber-side filtering is wasteful only if the publisher's audience model is too coarse — which is the audience system's job to fix per §8.

18.6 Subscription state and storage

Active subscriptions are themselves projected state. A bootstrap projection subscriptions (paralleling audience-graph for the inverse direction) maintains:

{actor-id -> [{subscription-cid, type, params, mode, started-at}]}

Updated by Subscribe and Unsubscribe activities. Queryable like any other projection (§16). Used by:

  • The inbox dispatcher to know which match-fns to evaluate against incoming activities
  • Triggers (§19) to know which activities to fire on
  • Federation to advertise "here are the subscription types I currently subscribe to" (capability-style, opt-in)

18.7 Federation interactions

Subscriptions interact with federation in three ways:

  • Discovery. Peer's /.well-known/sx-capabilities (§7) lists registered DefineSubscription CIDs, so subscribers know what they can ask for.
  • Negotiation. A Subscribe activity carries capabilities-required; if the publisher's instance doesn't support the named subscription type, it responds with the standard 422 + missing-CIDs error (§14.2 #9). Subscriber can then deliver the bootstrapping DefineSubscription artifact and retry.
  • Cross-instance match-fn. If subscriber and publisher both run the same conformance-tested SX evaluator, identical subscriptions match identically (cross-host equivalence, §11.8). This is what makes federated topic subscriptions reliable: every conforming instance computes the same set-of-matches for the same activity.

18.8 Operational implications

  • The audience system handles "who do I send this to." The subscription system handles "what do I want to receive." They're complementary, not redundant.
  • Subscription types can themselves evolve via supersession. New version of Topic with case-insensitive matching? Publish a new DefineSubscription, Supersede the old one. Existing subscriptions migrate at next match evaluation.
  • Match-fn cost matters. A Predicate subscription with a slow predicate becomes a per-activity tax. Gas budgets (§11.5) bound the worst case; operators can disable expensive subscription types if needed.
  • Subscriptions are signed messages. Audit, accountability, and revocation all work the same way as activities — because subscriptions are activities.

19. Application model

The synthesis. With publish, subscribe, project, and trigger as registry-driven primitives, the substrate has everything needed to express distributed reactive applications as data — no native code, no kernel changes, no privileged runtime. Applications are themselves federated artifacts.

19.1 An application is a tuple of artifacts

Application = {
  subscriptions : [DefineSubscription instances and their parameters],
  triggers      : [DefineTrigger registrations],
  projections   : [DefineProjection registrations],
  storage       : [DefineStorage registrations]   (optional)
}

That tuple, signed and bundled, is the application. Installing one = following the named actors / activating the named subscriptions + loading the Define* CIDs into the local registry. Forking one = republishing the Define* with Supersede over the bits you change.

19.2 The reactive loop

       External actors                       Operator publishes activities
       publish activities                    via this instance's actors
              │                                      │
              ▼                                      ▼
       ┌─────────────────────────────────────────────┐
       │ Inbound + outbound activities               │
       └────────────────────┬────────────────────────┘
                            │
                            ▼
              For each active subscription:
              evaluate match-fn (pure mode)
                            │
              ┌─────────────┴─────────────┐
              ▼                           ▼
     Activity matches                Activity does
     a subscription                  not match
              │                           │
              ▼                           ▼
       Projections          ←     (silently dropped from
       fold the activity            this application's view;
              │                      may match other apps)
              ▼
       Triggers fire on the
       subscription's match
              │
              ▼
       Trigger then-sx runs
       (effectful sandbox)
              │
              ├──> updates local state (private projections)
              ├──> publishes new activity (via outbox)
              └──> calls effectful primitives (HTTP, fs, etc.)
                   per declared capabilities

Three things happen on a match: state updates (projection), derived publishes (new activities), side effects (effectful primitives). Each is authorisation-gated by the trigger's declared capabilities.

19.3 Trigger semantics

DefineTrigger registers (when-subscription, then-sx, cascade-limit):

  • when-subscription — references a subscription (by CID or by name). The trigger fires whenever that subscription matches an inbound or outbound activity. Multiple triggers can reference the same subscription.
  • then-sx — function of (activity, subscription, env) → trigger-result. Runs in pure or effectful sandbox per declaration. Returns one or more of:
    • :publish [activity-spec ...] — request publish of derived activities
    • :project [name → state-update ...] — request projection updates
    • :effect [capability-call ...] — request effectful primitive calls
    • :noop — observed but no action
  • cascade-limit — bounded depth for trigger cascades (§19.4).

A trigger is fundamentally a reactive rule: "when X happens, do Y." The substrate guarantees Y happens at most once per X (deduplicated by activity-CID), exactly-once-per-instance (delivery from trigger to its effects is durable), and bounded-cost (gas + cascade-limit).

19.4 Cascade control

A trigger that publishes activities can fire other triggers. Without limits, a single inbound activity could cascade across instances forever.

Each trigger declares cascade-limit: N (default 3). Each activity carries an implicit cascade-depth field, incremented when it's the result of a trigger firing. A trigger refuses to fire if cascade-depth > cascade-limit.

Cascade limits are local-only (operator policy, not federated). Defending against runaway cascades from peer instances is the operator's job; the substrate gives them the knob.

19.5 The DefineApplication bundle

A bundle artifact that names and groups the components of an application:

(activity 'Create
  :object {:type "DefineApplication"
           :name "rose-ash-blog"
           :version 1
           :subscriptions [{:type "Follow"   :object "https://blog.rose-ash.com/actors/main"}
                           {:type "Topic"    :tag "rose-ash"}
                           {:type "CidWatch" :cid <rose-ash-template-cid>
                                             :events [:supersede]}]
           :triggers      [<comment-moderation-trigger-cid>
                           <reaction-counter-trigger-cid>
                           <rss-republish-trigger-cid>]
           :projections   [<comment-thread-projection-cid>
                           <reaction-counts-projection-cid>]
           :storage       [<local-files-storage-cid>]
           :capabilities  [<http-allowlist-cap-cid>
                           <fs-write-cap-cid>]
           :description   "Federated blog with moderated comments and RSS"})

Three operations on applications, all themselves activities:

  • InstallSubscribe to each subscription, Create{} references in define-registry to each trigger/projection/storage CID. One activity per reference, audited and replayable. Or: a single Install{DefineApplication} meta-verb that does the bundle in one signed step (defined later as a custom verb, not bootstrap).
  • Update — publish a new DefineApplication with the same name + supersedes pointing at the old. Diff-then-apply: subscriptions added/ removed, triggers loaded/unloaded, projections reprojected per §10.5.
  • Fork — publish a new DefineApplication referencing the original's CID via forked-from, with whatever Define* CIDs you want to swap. Run alongside the original or in place of it.

19.6 Per-application namespacing

Multiple applications running on one instance need isolation:

  • Projections are namespaced by application. pin-state from app A is distinct from pin-state from app B — both addressable as /projections/<app-name>/pin-state.
  • Triggers fire only on subscriptions belonging to their application. App A's trigger doesn't see app B's subscription matches.
  • Storage backends are namespaced. App A's files-on-disk backend writes to data/apps/A/objects/; app B writes to data/apps/B/objects/.
  • Capabilities are per-application. Granting http-client to app A doesn't grant it to app B. Operator can audit per-app capability surface and revoke selectively.

Cross-application reads are explicit and require a capability grant (read-projection: <app>/<projection>). Default isolation; opt-in sharing.

19.7 Worked examples

Example A — Blog with moderated comments

DefineApplication "blog-with-comments":
  subscriptions:
    - Follow: <author-actor>
    - Topic:  "post-comment"  (filter: object.in-reply-to in our-posts)
  triggers:
    - on Topic match → publish Note (the new comment, derived if approved)
                     → projection pending-moderation
    - on inbound Approve{Reply} → projection comment-thread (visible)
  projections:
    - comment-thread:    post-cid → [approved comment activities]
    - pending-moderation: list of pending replies awaiting approval

Example B — Continuous integration

DefineApplication "ci-pipeline":
  subscriptions:
    - Follow: <developer-actor>
    - VerbFilter: wraps Follow, types: [Push]
  triggers:
    - on Push match → effect: run build (capability: subprocess + fs-write)
                    → publish Build{source: Push.cid, output: <build-cid>, status}
    - on Build{status: success} → effect: run tests
                                 → publish Test{...}
    - on (Test{passed} count for N days) → publish Release{...}
  projections:
    - build-history: commit-cid → [build activities]
    - release-history: ordered list of Release activities

Example C — Distributed code review

DefineApplication "code-review":
  subscriptions:
    - Topic: "review-request"
    - CidWatch: <organisation-actor>, events: [Endorse]
  triggers:
    - on review-request match → projection review-queue
                              → effect: notify-reviewer
    - on Endorse from authorised reviewer → publish Approve{review-cid}
                                          → projection approval-state
  projections:
    - review-queue: ordered list of pending requests with summaries
    - approval-state: review-cid → endorsement set

In all three: the application is just the bundle of subscriptions, triggers, and projections. Federation makes them composable across instances. The substrate provides exactly-once-per-CID semantics and pure-mode determinism for the matches and folds.

19.8 Composition and discovery

Applications are themselves federated content. This means:

  • App registries — actors can publish curated lists of applications they endorse. Discovery becomes follow-an-actor + browse-their-app-list.
  • Cross-app composition — application A publishes derived activities that application B subscribes to. Pipeline of applications via the activity log.
  • App marketplaces — pin a friendly path to a DefineApplication CID (rose-ash.com:apps/blog → bafy...) for human discoverability.

None of this requires kernel changes. It's all activities about activities.

19.9 Operational implications

  • Applications are inspectable from the activity log alone. Replay an actor's outbox and you can reconstruct the exact application installation state at any point in time.
  • Application updates are atomic relative to the activity log. Either the Update{DefineApplication} succeeded (new state visible from next activity) or it didn't (old state continues). No partial-update window.
  • Forking is the same as installing a copy. No special "fork" mechanism needed; the activity-log mechanics already support it.
  • Per-app capabilities are a real security surface. Operators must understand what they're granting when they install. The bundle's capabilities list is the audit point — should be human-readable and reviewable before installation.
  • The substrate isn't an "application platform" — it's an "application substrate." Applications aren't installed on fed-sx; they're expressed in fed-sx, as the same kind of content as everything else.

Appendix A: relationship to adjacent systems

Worth knowing about so we can borrow good ideas:

  • ATproto / Bluesky — Lexicons (schemas) + repos (per-actor signed merkle trees). Closest in spirit. We borrow the schema-as-data idea; we differ by making schemas themselves federated activities, not central registry entries.
  • Spritely Goblins — capability-secure actors. We borrow the capability-token pattern for delegation.
  • Ceramic — signed event streams, content-addressed. Similar log-as-state model; we differ by making the projection function pluggable per-stream rather than hardcoded per-streamtype.
  • Holochain — agent-centric DHT. We share the "every agent has their own log" shape; we use AP federation instead of DHT.
  • Farcaster — pubsub on hubs. We share the firehose model; we add cryptographic outbox-as-source-of-truth.

None of them are code-as-data the whole way down — that's the SX-distinctive bit. Handlers, validators, projections aren't bytecode shipped out-of-band; they're SX in the same log as everything else, evaluable by any host that speaks SX.

Appendix B: implications worth sitting with

  • Deployment dissolves. Releasing a feature = publishing DefineActivity{name: "Whatever", ...}. Federation distributes it. No build artifact, no rolling deploy, no version-skew between server and client.
  • Applications are forkable by default. "Fork the rose-ash blog" = take the bundle of Define* CIDs that constitute it, publish your own with Supersede over the ones to change, run your own projector. Same federation graph, divergent state.
  • Composition is by reference, not import. Pin activity points at the CID of the DefineActivity{name: "Pin"}. No package manager, no transitive deps, no lockfiles.
  • The boundary between "user" and "developer" softens. Both publish signed activities. Power users can publish handlers, projections, sig suites under their own actor.
  • This is more ambitious than a rose-ash rewrite. It's a substrate that happens to host rose-ash as its first application.

Appendix C: AI agent collaboration patterns

The substrate is incidentally well-shaped for one of the open problems of the next decade: infrastructure for AI agent collaboration where contributions are signed federated artifacts, behavior is bounded by declared capabilities, decisions are audit-by-replay, and infrastructure improves through agent contribution within a web of trust.

This is not a designed-for use case — fed-sx was conceived as a federated publishing and reactive application substrate. But the properties it has fit agent collaboration almost exactly. Worth being deliberate about, because the framing changes who fed-sx is for.

Why the substrate fits agent collaboration

AI agents need infrastructure where contributions are first-class artifacts, not pull requests against human-controlled repos. Currently agents squeeze through GitHub PRs, deployment pipelines, npm publishes — all of which assume a human in the loop. fed-sx is shaped for direct contribution:

  • Direct authoring of substrate features. An agent doesn't propose a feature, it publishes one. A DefineActivity artifact is the agent's contribution. A DefineProjection is its analysis. A DefineTrigger is its automation. The signed publication IS the deploy — no PR review, no CI, no DevOps.
  • Cryptographic identity without registration. Agents have actor keys; reputation is the endorsement graph; trust is provable by signature chain. Two agents that have never met can verify each other's contributions cryptographically.
  • Capability-bounded autonomy. An agent declares capabilities-required on its activities. A trigger says "I publish to path-prefix /agent-x/* and call http-client for api.example.com/*." Receivers verify the constraint cryptographically; the agent can't escape its declared surface even if the agent itself is misaligned. Sandbox model designed for autonomous code (§11).
  • Audit-by-replay applied to AI behavior. Every AI decision is reconstructable, deterministically, by anyone with the log. "Why did agent A do X?" replay the log to that moment, see the activities A subscribed to, the projection state it observed, the trigger that fired, the activity it published. Fundamentally better than today's "trust the model" posture.
  • Composition without coordination. Agent A publishes a moderation validator. Agent B subscribes and uses it. Agent C improves it, supersedes A's. B sees the supersession, decides whether to adopt. No central registry, no maintainer to coordinate with, no version skew.
  • Disagreement is visible, not hidden. If agents A and B compute the same projection over the same log and produce different snapshot CIDs, the disagreement is cryptographically observable. Today, two AI services answering the same question with different answers is invisible until somebody notices.

Dynamics that emerge

  • Agent specialisation = publication. "I'm the indexing agent" = publishes DefineProjection artifacts. "I'm the moderation agent" = publishes DefineValidator artifacts. "I'm the matchmaking agent" = publishes a DefineApplication for marketplace subscriptions and triggers. Specialisation is content, not service deployment.
  • Reputation = endorsement graph. Web of trust applied to agent contributions. Bad actors get cut out organically; no central authority to capture.
  • Forking = explicit disagreement resolution. Agents disagree on validation? Both publish their DefineValidators. Subscribers pick. The fork is signed, observable, recoverable. Compare today: when AI services have different rules, one is just invisibly applied.
  • Cascade limits = agent population safety. The cascade-depth and cascade-limit (§19.4) become the bounded-autonomy guard rails for agent populations. Self-coordination without runaway-cascade across the substrate.
  • Self-improving infrastructure. Agents observe substrate behavior, propose improvements as DefineProjection for monitoring, DefineTrigger for automation. The substrate itself improves through agent contribution — not through a release cycle. Every improvement is signed and traceable.

Use cases

  • Agent-managed scientific datasets — collection, cleaning, analysis, publication, peer review by other agents, all signed activities. Replication is replay; provenance is built in.
  • Multi-agent code maintenance — agents observing repos (subscribe to Push), running tests (triggers), proposing fixes (Pull-equivalent activities), endorsing each other's work.
  • Agent-curated knowledge — agents publish, endorse, and supersede knowledge artifacts. Truth accumulates via the trust graph; outdated info gets Superseded explicitly.
  • Distributed agent marketplaces — agents publish capabilities, subscribers find them via Topic / Predicate subscriptions, contracts via signed activity exchange.
  • Cross-agent AI safety monitoring — monitoring agents subscribe to other agents' outboxes, run validators, publish Alert activities when patterns of concern appear. Decentralised oversight without central authority.
  • Cross-org agent workflow coordination — supply chain, healthcare, legal — multiple specialised agents coordinating across organisational boundaries with cryptographic provenance.

Safety and governance properties

The substrate provides several properties AI safety has been asking for and that current infrastructure does not provide:

  • Every action is signed. Attribution is cryptographic, not a log file an agent could spoof.
  • Capabilities are declared and enforced. Agents operate within their declared sandbox; can't grow capabilities silently.
  • Cascades are bounded. No exponential agent-on-agent feedback loops without explicit configuration.
  • Audit is replay. Every decision can be reconstructed deterministically; no opaque "the model decided" moments.
  • Disagreement is visible. Two agents producing different projections of the same data is a cryptographically-detectable event, not invisible drift.
  • Trust is the endorsement graph, not central authority. No single point of capture or coercion.
  • Forks are first-class. When safety-critical disagreements occur, the substrate accommodates them without forcing a winner; observers see all positions.

What this implies for the project

  • Milestone 1's smoke tests remain right — the verb-extensibility and reactive-application proofs apply to agent contributions exactly as they apply to human contributions. The agent collaboration framing doesn't require new mechanisms; it interprets the existing mechanisms differently.
  • The application model (§§18-19) is the headline story for this audience, not a layer on top. Subscriptions + triggers + projections + capabilities = agent collaboration primitives.
  • Capability discovery and trust dynamics gain weight earlier. Where human-driven applications can rely on operator policy, agent-driven populations need the trust graph to be operational from milestone 2.
  • The pitch line evolves. Less "ActivityPub for code" / "rose-ash next gen," more "infrastructure for AI agent collaboration with cryptographic provenance, bounded autonomy, and audit-by-replay." The technical substance is unchanged; the framing of who needs this changes substantially.

The substrate accidentally being well-shaped for the most important software-distribution problem of the next decade is worth being deliberate about.