AP blueprints (activitypub.py, ap_social.py) were querying federation tables (ap_actor_profiles etc.) on g.s which points to the app's own DB after the per-app split. Now uses g._ap_s backed by get_federation_session() for non-federation apps. Also hardens Ghost sync before_app_serving to catch/rollback on failure instead of crashing the Hypercorn worker. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
16 KiB
Rose Ash Scalability Plan
Context
The coop runs 6 Quart microservices (blog, market, cart, events, federation, account) + Art-DAG (L1/L2 FastAPI) on a single-node Docker Swarm. Current architecture handles light traffic but has concrete bottlenecks: single Postgres, single Redis (256MB), no replicas, single Hypercorn worker per container, sequential AP delivery, no circuit breakers, and 6 competing EventProcessors. The decoupling work (contracts, HTTP data/actions, fragment composition) is complete — the code is structurally ready to scale, but the deployment and runtime aren't.
This plan covers everything from quick config wins to massive federation scale, organized in 4 tiers. Each tier unlocks roughly an order of magnitude.
TIER 0 — Deploy Existing Code + Quick Config
Target: low thousands concurrent. Effort: hours.
T0.1: Separate Auth Redis
Why: Auth keys (grant:*, did_auth:*, prompt:*) on DB 15 share a single Redis instance (256MB, allkeys-lru). Cache pressure from fragment/page caching can silently evict auth state, causing spurious logouts.
Files:
docker-compose.yml— addredis-authservice:redis:7-alpine,--maxmemory 64mb --maxmemory-policy noevictiondocker-compose.yml— updateREDIS_AUTH_URL: redis://redis-auth:6379/0inx-app-env
No code changes — shared/infrastructure/auth_redis.py already reads REDIS_AUTH_URL from env.
T0.2: Bump Redis Memory
File: docker-compose.yml (line 168)
- Change
--maxmemory 256mbto--maxmemory 1gb(or 512mb minimum) - Keep
allkeys-lrufor the data Redis (fragments + page cache)
T0.3: Deploy the Database Split
What exists: _config/init-databases.sql (creates 6 DBs) and _config/split-databases.sh (migrates table groups). Code in session.py lines 46-101 already creates separate engines when URLs differ. bus.py, user_loader.py, and factory.py all have conditional cross-DB paths.
Files:
docker-compose.yml— per-serviceDATABASE_URLoverrides (blog→db_blog, market→db_market, etc.) plusDATABASE_URL_ACCOUNT→db_account,DATABASE_URL_FEDERATION→db_federation_config/split-databases.sh— addmenu_nodes+container_relationsto ALL target DBs (small read-only tables needed byget_navigation_tree()in non-blog apps until T1.7 replaces this)
Deployment: run init-databases.sql, stop services, run split-databases.sh, update compose env, redeploy.
T0.4: Add PgBouncer
Why: 6 apps x pool_size 5 + overflow 10 = 90 connections to one Postgres. After DB split + workers, this multiplies. Postgres default max_connections=100 will be hit.
Files:
docker-compose.yml— addpgbouncerservice (transaction-mode pooling,default_pool_size=20,max_client_conn=300)docker-compose.yml— change allDATABASE_URLvalues from@db:5432to@pgbouncer:6432shared/db/session.py(lines 13-20) — addpool_timeout=10,pool_recycle=1800. Consider reducingpool_sizeto 3 since PgBouncer handles pooling.
T0.5: Hypercorn Workers
Why: Single async event loop per container. CPU-bound work (Jinja2 rendering, RSA signing) blocks everything.
Files:
- All 6
{app}/entrypoint.sh— change Hypercorn command to:exec hypercorn "${APP_MODULE:-app:app}" --bind 0.0.0.0:${PORT:-8000} --workers ${WORKERS:-2} --keep-alive 75
No code changes needed — EventProcessor uses SKIP LOCKED so multiple workers competing is safe. Each worker gets its own httpx client singleton and SQLAlchemy pool (correct fork behavior).
Depends on: T0.4 (PgBouncer) — doubling workers doubles connection count.
TIER 1 — Fix Hot Paths
Target: tens of thousands concurrent. Effort: days.
T1.1: Concurrent AP Delivery
The single biggest bottleneck. ap_delivery_handler.py delivers to inboxes sequentially (lines 230-246). 100 followers = up to 100 x 15s = 25 minutes.
File: shared/events/handlers/ap_delivery_handler.py
- Add
DELIVERY_CONCURRENCY = 10semaphore - Replace sequential inbox loop with
asyncio.gather()bounded by semaphore - Reuse a single
httpx.AsyncClientperon_any_activitycall with explicit pool limits:httpx.Limits(max_connections=20, max_keepalive_connections=10) - Batch
APDeliveryLoginserts (one flush per activity, not per inbox)
Result: 100 followers drops from 25 min → ~2.5 min (10 concurrent) or less.
T1.2: Fragment Circuit Breaker + Stale-While-Revalidate
Why: Every page render makes 3-4 internal HTTP calls (fetch_fragments()). A slow/down service blocks all page renders for up to 2s per fragment. No graceful degradation.
File: shared/infrastructure/fragments.py
Circuit breaker (add near line 27):
- Per-app
_CircuitStatetracking consecutive failures - Threshold: 3 failures → circuit opens for 30s
- When open: skip HTTP, fall through to stale cache
Stale-while-revalidate (modify fetch_fragment_cached(), lines 246-288):
- Store fragments in Redis as
{"html": "...", "ts": 1234567890.0}instead of plain string - Soft TTL = normal TTL (30s). Hard TTL = soft + 300s (5 min stale window)
- Within soft TTL: return cached. Between soft and hard: return cached + background revalidate. Past hard: block on fetch. On fetch failure: return stale if available, empty string if not.
T1.3: Partition Event Processors
Why: All 6 apps register the wildcard AP delivery handler via register_shared_handlers(). All 6 EventProcessor instances compete for every activity with SKIP LOCKED. 5 of 6 do wasted work on every public activity.
Files:
shared/events/handlers/__init__.py— addapp_nameparam toregister_shared_handlers(). Only importap_delivery_handlerandexternal_delivery_handlerwhenapp_name == "federation".shared/infrastructure/factory.py(line 277) — passnametoregister_shared_handlers(name)shared/infrastructure/factory.py(line 41) — addevent_processor_all_origins: bool = Falseparam tocreate_base_app()shared/infrastructure/factory.py(line 271) —EventProcessor(app_name=None if event_processor_all_origins else name)federation/app.py— passevent_processor_all_origins=True
Result: Federation processes ALL activities (no origin filter) for delivery. Other apps process only their own origin activities for domain-specific handlers. No wasted lock contention.
T1.4: httpx Client Pool Limits
Why: All three HTTP clients (fragments.py, data_client.py, actions.py) have no limits parameter. Default is 100 connections per client. With workers + replicas this fans out unbounded.
Files:
shared/infrastructure/fragments.py(lines 38-45) — addlimits=httpx.Limits(max_connections=20, max_keepalive_connections=10)shared/infrastructure/data_client.py(lines 36-43) — sameshared/infrastructure/actions.py(lines 37-44) —max_connections=10, max_keepalive_connections=5
T1.5: Data Client Caching
Why: fetch_data() has no caching. cart-summary is called on every page load across 4 apps. Blog post lookups by slug happen on every market/events page.
File: shared/infrastructure/data_client.py
- Add
fetch_data_cached()following thefetch_fragment_cached()pattern - Redis key:
data:{app}:{query}:{sorted_params}, default TTL=10s - Same circuit breaker + SWR as T1.2
Callers updated: all app.py context functions that call fetch_data() for repeated reads.
T1.6: Navigation via HTTP Data Endpoint
Why: After DB split, get_navigation_tree() queries menu_nodes via g.s. But menu_nodes lives in db_blog. The T0.3 workaround (replicate table) works short-term; this is the proper fix.
Files:
shared/contracts/dtos.py— addMenuNodeDTOblog/bp/data/routes.py— addnav-treehandler returning[dto_to_dict(node) for node in nodes]- All non-blog
app.pycontext functions — replaceget_navigation_tree(g.s)withfetch_data_cached("blog", "nav-tree", ttl=60) - Remove
menu_nodesfrom non-blog DBs insplit-databases.sh(no longer needed)
Depends on: T0.3 (DB split), T1.5 (data caching).
T1.7: Fix Fragment Batch Parser O(n^2)
File: shared/infrastructure/fragments.py _parse_fragment_markers() (lines 217-243)
- Replace nested loop with single-pass: find all
<!-- fragment:{key} -->markers in one scan, extract content between consecutive markers - O(n) instead of O(n^2)
T1.8: Read Replicas
Why: After DB split, each domain DB has one writer. Read-heavy pages (listings, calendars, product pages) can saturate it.
Files:
docker-compose.yml— add read replicas for high-traffic domains (blog, events)shared/db/session.py— addDATABASE_URL_ROenv var, create read-only engine, addget_read_session()context manager- All
bp/data/routes.pyandbp/fragments/routes.py— use read session (these endpoints are inherently read-only)
Depends on: T0.3 (DB split), T0.4 (PgBouncer).
TIER 2 — Decouple the Runtime
Target: hundreds of thousands concurrent. Effort: ~1 week.
T2.1: Edge-Side Fragment Composition (Nginx SSI)
Why: Currently every Quart app fetches 3-4 fragments per request via HTTP (fetch_fragments() in context processors). This adds latency and creates liveness coupling. SSI moves fragment assembly to Nginx, which caches each fragment independently.
Changes:
shared/infrastructure/jinja_setup.py— change_fragment()Jinja global to emit SSI directives:<!--#include virtual="/_ssi/{app}/{type}?{params}" -->- New Nginx config — map
/_ssi/{app}/{path}tohttp://{app}:8000/internal/fragments/{path}, enablessi on, proxy_cache with short TTL - All fragment blueprint routes — add
Cache-Control: public, max-age={ttl}, stale-while-revalidate=300headers - Remove
fetch_fragments()calls from allapp.pycontext processors (templates emit SSI directly)
T2.2: Replace Outbox Polling with Redis Streams
Why: EventProcessor polls ap_activities every 2s with SELECT FOR UPDATE SKIP LOCKED. This creates persistent DB load even when idle. LISTEN/NOTIFY helps but is fragile.
Changes:
shared/events/bus.pyemit_activity()— after writing to DB, alsoXADD coop:activities:pendingwith{activity_id, origin_app, type}shared/events/processor.py— replace_poll_loop+_listen_for_notifywithXREADGROUP(blocking read, no polling). Consumer groups handle partitioning.XPENDING+XCLAIMreplace the stuck activity reaper.- Redis Stream config:
MAXLEN ~10000to cap memory
T2.3: CDN for Static Assets
- Route
*.rose-ash.com/static/*through CDN (Cloudflare, BunnyCDN) _asset_url()already adds?v={hash}fingerprint — CDN can cache withmax-age=31536000, immutable- No code changes, just DNS + CDN config
T2.4: Horizontal Scaling (Docker Swarm Replicas)
Changes:
docker-compose.yml— addreplicas: 2(or 3) to blog, market, events, cart. Keep federation at 1 (handles all AP delivery).blog/entrypoint.sh— wrap Alembic in PostgreSQL advisory lock (SELECT pg_advisory_lock(42)) so only one replica runs migrationsdocker-compose.yml— add health checks per serviceshared/db/session.py— make pool sizes configurable via env vars (DB_POOL_SIZE,DB_MAX_OVERFLOW) so replicas can use smaller pools
Depends on: T0.4 (PgBouncer), T0.5 (workers).
TIER 3 — Federation Scale
Target: millions (federated network effects). Effort: weeks.
T3.1: Dedicated AP Delivery Service
Why: AP delivery is CPU-intensive (RSA signing) and I/O-intensive (HTTP to remote servers). Running inside federation's web worker blocks request processing.
Changes:
- New
delivery/service — standalone asyncio app (no Quart/web server) - Reads from Redis Stream
coop:delivery:pending(from T2.2) - Loads activity from federation DB, loads followers, delivers with semaphore (from T1.1)
ap_delivery_handler.pyon_any_activity→ enqueues to stream instead of delivering inlinedocker-compose.yml— adddelivery-workerservice withreplicas: 2, no port binding
Depends on: T2.2 (Redis Streams), T1.1 (concurrent delivery).
T3.2: Per-Domain Health Tracking + Backoff
Why: Dead remote servers waste delivery slots. Current code retries 5 times with no backoff.
Changes:
- New
shared/events/domain_health.py— Redis hashdomain:health:{domain}tracking consecutive failures, exponential backoff schedule (30s → 1min → 5min → 15min → 1hr → 6hr → 24hr) - Delivery worker checks domain health before attempting delivery; skips domains in backoff
- On success: reset. On failure: increment + extend backoff.
T3.3: Shared Inbox Optimization
Why: 100 Mastodon followers from one instance = 100 POSTs to the same server. Mastodon supports sharedInbox — one POST covers all followers on that instance.
Changes:
shared/models/federation.pyAPFollower— addshared_inbox_urlcolumn- Migration to backfill from
ap_remote_actors.shared_inbox_url ap_delivery_handler.py— group followers by domain, prefer shared inbox when available- Impact: 100 followers on one instance → 1 HTTP POST instead of 100
T3.4: Table Partitioning for ap_activities
Why: At millions of activities, the ap_activities table becomes the query bottleneck. The EventProcessor query orders by created_at — native range partitioning fits perfectly.
Changes:
- Alembic migration — convert
ap_activitiestoPARTITION BY RANGE (created_at)with monthly partitions - Add a cron job or startup hook to create future partitions
- No application code changes needed (transparent to SQLAlchemy)
T3.5: Read-Through DTO Cache
Why: Hot cross-app reads (cart-summary, post-by-slug) go through HTTP even with T1.5 caching. A Redis-backed DTO cache with event-driven invalidation eliminates HTTP for repeated reads entirely.
Changes:
- New
shared/infrastructure/dto_cache.py—get(app, query, params)/set(...)backed by Redis - Integrate into
fetch_data_cached()as L1 cache (check before HTTP) - Action endpoints invalidate relevant cache keys after successful writes
Depends on: T1.5 (data caching), T2.2 (event-driven invalidation via streams).
Implementation Order
TIER 0 (hours):
T0.1 Auth Redis ──┐
T0.2 Redis memory ├── all independent, do in parallel
T0.3 DB split ────┘
T0.4 PgBouncer ────── after T0.3
T0.5 Workers ──────── after T0.4
TIER 1 (days):
T1.1 Concurrent AP ──┐
T1.3 Partition procs ─├── independent, do in parallel
T1.4 Pool limits ─────┤
T1.7 Parse fix ───────┘
T1.2 Circuit breaker ── after T1.4
T1.5 Data caching ───── after T1.2 (same pattern)
T1.6 Nav endpoint ───── after T1.5
T1.8 Read replicas ──── after T0.3 + T0.4
TIER 2 (~1 week):
T2.3 CDN ──────────── independent, anytime
T2.2 Redis Streams ─── foundation for T3.1
T2.1 Nginx SSI ────── after T1.2 (fragments stable)
T2.4 Replicas ──────── after T0.4 + T0.5
TIER 3 (weeks):
T3.1 Delivery service ── after T2.2 + T1.1
T3.2 Domain health ───── after T3.1
T3.3 Shared inbox ────── after T3.1
T3.4 Table partitioning ─ independent
T3.5 DTO cache ────────── after T1.5 + T2.2
Verification
Each tier has a clear "it works" test:
- Tier 0: All 6 apps respond. Login works across apps.
pg_stat_activityshows connections through PgBouncer.redis-cli -p 6380 INFO memoryshows auth Redis separate. - Tier 1: Emit a public AP activity with 50+ followers — delivery completes in seconds not minutes. Stop account service — blog pages still render with stale auth-menu. Only federation's processor handles delivery.
- Tier 2: Nginx access log shows SSI fragment cache HITs.
XINFO GROUPS coop:activities:pendingshows active consumer groups. CDN cache-status headers show HITs on static assets. Multiple replicas serve traffic. - Tier 3: Delivery worker scales independently. Domain health Redis hash shows backoff state for unreachable servers.
EXPLAINonap_activitiesshows partition pruning. Shared inbox delivery logs show 1 POST per domain.