# Re-implementing rose-ash on SX — migration strategy Status: **strategy proposal** (drafted by the `radar` loop, 2026-06-07). Not a unilateral architecture decision — a starting point for the fleet to refine. Radar's role here is detection: the `*-on-sx` subsystems have converged into a host-agnostic re-implementation of rose-ash's domain logic, so this doc proposes *when* and *how* to wire them to production. --- ## 1. Premise: we are ~70% into a re-implementation already The fleet of `lib/` SX subsystems is not a set of experiments — it is rose-ash's domain logic, re-expressed substrate-by-substrate, deliberately **host-agnostic**: | SX subsystem (`lib/`) | rose-ash production domain | |---|---| | content-on-sx (CRDT docs, versioning, `page.sx` HTML render) | **blog** | | commerce-on-sx (catalog, pricing, cart, order + refund sagas) | **market + cart + orders** | | events-on-sx (calendar, ticketing, booking) | **events** | | feed-on-sx (activity streams, AP-shaped, threading) | **federation** | | identity-on-sx (OAuth2, sessions, grants, membership) | **account** | | acl-on-sx (permissions) | cross-cutting authZ | | relations / likes | **relations / likes** (internal) | | persist-on-sx (log / kv / snapshot facets) | per-service Postgres layer | | flow-on-sx (durable sagas) | order/refund/delivery workflows | | mod-on-sx, search-on-sx | new capabilities | **The architectural enabler:** every core was built with *injected seams* — `permit?`, `send-fn`/`fetch-fn`, `transport`, `dispatch`, `backend`. That is ports-and-adapters (hexagonal) on purpose. Evidence from the radar backlog (`plans/abstractions.md`): W1 (7/7 federation modules inject the fed-sx transport), W4 (content/commerce/events run live on `persist/log`), W8 (events+commerce run sagas on `lib/flow`). **The cores do not depend on how they're hosted, persisted, or federated.** **Corollary that makes the whole migration tractable:** because logic is separated from rendering and storage, we can hold the **domain logic to parity** while **freely redesigning the presentation** — the two are different layers with different rules. --- ## 2. The gating insight: the cores are *ahead of the host* The domain logic is mature. What is *not* yet production-grade is the **host trio** — and that is the real critical path: - **host-on-sx** — HTTP / request-response / session host (briefing exists; the OCaml SX HTTP server already serves `sx.rose-ash.com`). - **host-persist** — durable storage adapter (real disk/pg/ipfs) under `persist`'s facets (content-addressed blob blocker recently closed). - **fed-sx** — the real ActivityPub transport every core injects (well into m2). > **So "when do we start?" answers itself: start when the host trio is production-grade, > not when the cores are done — they mostly already are.** Prioritise the host loops over > further domain features. --- ## 3. The model: duplicate → cut over → diverge (per slice) This is the "duplicate first, then change" approach, made precise. Each domain slice goes through three phases independently: **Phase A — Duplicate (hold logic to parity).** Stand the SX implementation of the slice up *in parallel*, behind the existing edge, serving no users yet. Get its **domain/data behaviour** to match Python (see §4 on how). Presentation can start as a rough port or an early new design — it doesn't have to match. **Phase B — Cut over (strangler flip).** Point the edge route for that slice at the SX host. Python stays as instant rollback. The slice is now live on SX. **Phase C — Diverge (change freely).** With the slice live and validated, evolve the look/feel and functionality on the SX side. The validated domain logic underneath is untouched, so UX/feature changes can't silently corrupt data. You never rewrite the whole platform at once; you walk slices through A→B→C, oldest tree strangled last. --- ## 4. The two techniques, and how "we'll change things" reshapes them ### Strangler edge The edge (Caddy) is the front door every request hits. Add routing rules so **one route at a time** goes to the SX host while everything else still goes to Python. Properties: the site is never half-broken; any single route flips back to Python instantly; the old app is strangled route-by-route. (Opposite of big-bang swap, which is how these die.) ### Shadow diff — split by layer Run the new version on real traffic in the background, discard its output, and **log how it differs** from Python. Flip the edge only when diffs are zero/intended. But because we *intend* to change look/feel + functionality, parity is a tool we apply **only where we want sameness**, not a straitjacket: | Layer | Want parity? | Oracle | |---|---|---| | **Domain/data** (totals, tax, permissions, what's stored, who-sees-what) | **YES — silent difference = data corruption** | shadow-diff at the *core* boundary; deterministic cores → replay real request logs through the harness and diff | | **Presentation/UX** (HTML, layout, look, feel, flows) | **NO — this is what we're changing** | manual QA + design review; this is the Phase-C divergence | Practical shape: shadow-diff hits the **domain core's output** (the computed order, the visible-activity set, the permission decision) — not the rendered HTML. The deterministic, harness-replayable cores are the single biggest advantage we have here; it's the same parity discipline that made the A1 conformance migration safe (one reference slice, hard parity gate, revert on mismatch). --- ## 5. Readiness gates (start the production migration when ALL hold) 1. **Host trio production-grade** — host-on-sx (HTTP/session), host-persist (durable adapter), fed-sx (AP transport) — each conformance-green. 2. **Data-migration story exists** — a way to get existing production Postgres state into `persist` event streams (event-source the current state, or dual-write during overlap). This is the honest long-pole; it is *not* domain logic and nobody has built it yet. 3. **One vertical slice proven end-to-end** at data-parity in production — the reference migration, the way the conformance loop migrated one subsystem before the rest. --- ## 6. Sequencing 1. **Host trio first** (critical path — it's behind the cores). 2. **Build the strangler edge + shadow-diff harness** as first-class tooling: edge routing rules + a dual-run logger that diffs *core outputs* (not HTML) and stores discrepancies. 3. **First slice = lowest risk × highest readiness × cleanest data oracle.** Recommended: **the blog read path (content-on-sx)** or **the feed read path** — read-heavy, no money, CRDT/versioning + `page.sx` HTML already exist, and the data oracle is clean. *Avoid cart/orders/payments first* (transactional + SumUp webhooks = highest blast radius). 4. **Persistence-first, federation-last.** Land host-persist + migrate per-domain event stores before any cutover. Do fed-sx federation as a *coordinated* cut near the end — W1 shows all 7 cores light up federation together once the shared transport ships. 5. **Walk the remaining slices A→B→C**, retiring Python routes as each cuts over. --- ## 7. The honest long tail (mostly host + adapters, not cores) The cores are pure domain logic; the production *tail* is not in them yet and is most of the remaining real effort: - Auth: first-party cookies / Safari-ITP, CSRF, silent SSO, grant caching. - Cross-cutting: rate limiting, observability/metrics, error pages, caching. - Integrations: SumUp payment + webhooks, Ghost CMS sync. - Presentation: the actual HTMX templates + CSS (this is also where the redesign happens). - **Live data migration** — the single biggest non-core workstream. --- ## 8. Concrete next steps 1. Treat the **host trio** as the fleet's critical path; prioritise over more domain features. 2. Stand up the **strangler edge + core-level shadow-diff harness** as a tool. 3. Prove **one slice** (blog/content read path) end-to-end in production as the reference. 4. **Spec the Postgres → persist data migration** (the long-pole nobody has started). 5. Then walk slices through duplicate → cut over → diverge, redesigning UX in Phase C. --- ## 9. Why this is low-risk despite being a platform rewrite - It's **wiring host-agnostic cores to a host**, not rewriting domain logic from scratch. - The **strangler edge** means the site always works and any route reverts in seconds. - **Deterministic cores** make data-parity *mechanically checkable* (replay + diff), so correctness isn't a matter of faith. - **Logic/presentation separation** lets us change look/feel + functionality (Phase C) *without* re-risking the validated domain logic. - It's the **same discipline that just shipped A1**: one reference migration, a hard parity gate, honest exclusions, verify-before-merge.