# content-on-sx: Documents, blocks & collaborative editing on Smalltalk > **DRAFT outline.** The CMS vertical — blog, WYSIWYG editor, Ghost sync. Depends > on `persist-on-sx` (document history as an event log). Ghost/CMS sync stays a thin > external adapter (Python/FFI) until a native replacement exists. rose-ash's `blog` domain is content management: a block-based WYSIWYG editor, navigation, Ghost CMS sync. A document is a tree of live blocks; editing is a stream of operations; collaboration needs conflict-free merge. That is an object model — blocks are objects, edits are messages, and a document is the object graph responding to them. Smalltalk's "everything is an object responding to messages" maps directly to a block/WYSIWYG model, and a semilattice (CRDT) merge keeps concurrent edits conflict-free. End-state: a Smalltalk-on-SX document model (typed blocks, structural ops), operation log + CRDT merge for collaborative editing, versioning/history via the event store, and a render boundary to HTML/SX. External CMS (Ghost) sync is an injected adapter, not core. ## Status (rolling) `bash lib/content/conformance.sh` → **812/812** (Phases 1–4 COMPLETE + ~34 extensions, hardened: HTML/SX escaping, Markdown render + import/export incl. tables & frontmatter (full round-trip), CvRDT flat + nested-tree + durable replication, tree-aware validation, snapshot cache, doc metadata, plain-text render, nested block trees + deep editing + flatten + relative reorder, doc stats + summary + multi-doc index, table + callout + media blocks, HTML page wrapper + SEO page, doc composition + id-remap, portable data + wire serialization, block query + transforms + find/replace, TOC + anchored headings + outline, normalization) ## Ground rules - **Scope:** only `lib/content/**` and `plans/content-on-sx.md`. May **import** from `lib/smalltalk/`, and (once it exists) `lib/persist/`. Do not edit substrates. - **Architecture:** a document is an ordered tree of blocks (objects); an edit is a message (`insert`/`update`/`move`/`delete`); concurrent edits merge via a commutative (CRDT/semilattice) operation so order doesn't matter. History is the `persist` event stream; any version is a replay. - **Determinism:** merge must be commutative + idempotent (test: apply ops in any order / twice → same document). - **Commits:** one feature per commit. Progress log + tick boxes. ## Architecture sketch ``` Edit op Rendered document (insert block after id) ... HTML / SX tree │ ▲ ▼ │ lib/content/block.sx lib/content/render.sx — typed blocks as objects — block tree → HTML/SX — heading/text/image/embed — (reuses SX render boundary) │ ▲ ▼ │ lib/content/doc.sx lib/content/merge.sx — ordered block tree — CRDT/semilattice op merge — apply op, structural moves — concurrent-edit reconciliation │ ▲ ▼ │ lib/content/api.sx ── (content/edit) (content/render) (content/history) ──┐ │ │ ├── op log + versions → persist │ └── Ghost/CMS sync → injected external adapter (thin, non-core) ──┘ ``` ## Phase 1 — Block document model - [x] `block.sx` — typed block objects - [x] `doc.sx` — ordered tree, apply edit op, structural moves - [x] `render.sx` — block tree → HTML/SX - [x] `api.sx` + tests + scoreboard + conformance.sh ## Phase 2 — Op log + versioning - [x] edit ops as `persist` events; replay to any version - [x] `(content/history doc)`, diff between versions ## Phase 3 — Collaborative merge (CRDT) - [x] commutative/idempotent op merge - [x] concurrent-edit tests (any order, double-apply → identical) ## Phase 4 — External sync + federation - [x] Ghost/CMS sync via injected adapter (import/export) - [x] federated documents (peer-authored blocks) — trust-gated stub - [x] tests: round-trip import/export, conflict on concurrent external edit ## Extensions (post-roadmap) - [x] HTML escaping at the render boundary (`String>>htmlEscaped`: & < > ") - [x] asSx wire string-escaping (`String>>sxEscaped`: \ and " in SX literals) - [x] Markdown render mode (`asMarkdown:` / `content/render doc "md"`) - [x] durable CRDT replication (`crdt-store.sx`: ops on persist, replay + converge) - [x] document validation (`validate.sx`: ids, per-type fields, duplicate ids; tree-aware — descends into sections, tree-wide dup ids, section field check) - [x] Markdown import adapter (`md-import.sx`: text → blocks, round-trips export; incl. pipe tables + frontmatter → metadata) - [x] Markdown doc export (`md-doc.sx`: content/markdown-doc, frontmatter from metadata, full round-trip) - [x] snapshot cache over replay (`snapshot.sx`: cache-not-primary, transparent) - [x] document metadata (`meta.sx`: title/slug/tags + Ghost title plumbing) - [x] plain-text render + excerpt (`text.sx`: asText, content/excerpt) - [x] nested block trees (`section.sx`: CtSection container, recursive render, deep-find) - [x] document statistics (`stats.sx`: word/char/block counts, reading time) - [x] table block (`table.sx`: CtTable, renders html/sx/text/md, validated) - [x] callout block (`callout.sx`: CtCallout note/warning/tip, renders html/sx/text/md, validated) - [x] media block (`media.sx`: CtMedia video/audio, renders html/sx/text/md, validated) - [x] list-card summary (`summary.sx`: content/summary — title/excerpt/words/reading/cover) - [x] multi-doc index (`index.sx`: content/index + index-by-tag + all-tags + has-tag?) - [x] nested-tree CvRDT (`crdt-tree.sx`: parent-aware, sections merge collaboratively) - [x] HTML page wrapper (`page.sx`: content/page, escaped title from metadata) - [x] SEO page (`page-full.sx`: content/page-full, lang + meta description from excerpt) - [x] document composition (`compose.sx`: concat/prepend/concat-all/wrap-section) - [x] deep tree editing (`tree-edit.sx`: doc-deep-update/replace/delete/insert-into) - [x] id remapping / clone (`clone.sx`: content/remap-ids + prefix-ids, collision-free compose) - [x] block query + TOC (`query.sx`: content/select/select-type/count-type/headings) - [x] block transforms (`transform.sx`: content/map-blocks/map-type/set-field-on) - [x] TOC rendering (`toc.sx`: content/toc-markdown + toc-html from headings) - [x] anchored-heading render (`anchor.sx`: content/html-anchored, functional TOC links) - [x] document outline (`outline.sx`: content/outline, nested heading tree) - [x] document flatten (`flatten.sx`: content/flatten, un-nest sections; inverse of wrap-section) - [x] relative reorder (`move.sx`: content/move-before/after/to-front/to-back by id) - [x] tree reparent (`move.sx`: content/move-into a section + content/promote out to top level; tree-wide, cycle-safe) - [x] document normalization (`normalize.sx`: content/normalize, drop empty blocks/sections) - [x] document sanitization (`sanitize.sx`: content/sanitize, drop invalid blocks tree-wide; validate's enforcement partner) - [x] global find/replace (`find-replace.sx`: content/find-replace across text-bearing blocks) - [x] portable data serialization (`data.sx`: content/to-data + from-data, round-trips tree) - [x] wire serialization (`wire.sx`: content/to-wire + from-wire, SX-text on the wire) ## Known limitations - **Markdown table cells containing `|` do not round-trip.** `asMarkdown` on a table emits cell text raw (table.sx `CtTable>>asMarkdown:`), so a cell `x|y` renders the row `| x|y | z |` — which `md/import` then splits into *three* cells (`md-import.sx` `md/-cells` splits on every `|`). Repro: build `(mk-table "t" (list "A" "B") (list (list "x|y" "z")))`, `asMarkdown` → re-`md/import` → cells become `("x" "y" "z")`. Same applies to a literal `|` in a header. (HTML/SX/text/data/wire/CRDT round-trips are unaffected — only the Markdown text boundary.) *Fix sketch* (when sx-tree edit tooling is restored — see below): add `String>>mdCellEscaped` (escape `|` → `\|`) in table.sx and use it for every header/cell in `CtTable>>asMarkdown:`; in md-import.sx replace `md/-cells`' naive `(split … "|")` with an escaped-aware splitter that breaks only on unescaped `|` and unescapes `\|` → `|`. Both sides must change together (export-only escaping makes self-round-trip worse, not better). *Blocker:* in this worktree every sx-tree **edit** tool (`sx_replace_node`, `sx_replace_by_pattern`, `sx_insert_near`, …) raises yojson `"Expected string, got null"`; only `sx_write_file` works. md-import.sx is 449 lines, so a safe surgical edit isn't currently possible — deferred rather than risk a full manual rewrite of working import code. ## Progress log - 2026-06-07 — Feature: tree reparent in move.sx. Until now insert/move were positional and top-level only, so a block could never be moved *into* a section or *out* of one — a real gap for editing nested documents. Added `content/move-into doc id section-id i` (relocate a block, from anywhere in the tree, to be a child of a section at index i) and `content/promote doc id` (lift a nested block out to the end of the top level; a moved section keeps its whole subtree). Both are pure tree transforms (consistent with the existing move family — not new op-log ops) built on doc-find-deep / ct-find-id / ct-remove-id / ct-replace-id. **Cycle-safe**: move-into no-ops when target is the block itself or sits inside the block's own subtree, so a section can never become its own ancestor. +13 move tests (into/promote/across-sections/empty- shell/whole-section-subtree/cycle-guard/missing-id no-ops). 812/812. - 2026-06-07 — Feature: `content/sanitize` — the enforcement counterpart to `validate`. validate *reports* id/field issues; sanitize *removes* the offending blocks (tree-wide) so federated/imported input that failed validation can still be rendered/merged without faulting. Reuses validate's own per-block predicate (`content/-block-issues`) so "what is invalid" stays single-sourced and can't drift. Distinct from `normalize` (which drops *empty* blocks): a section emptied of invalid children is kept (sanitize removes invalid, not empty), but a section whose own shell is invalid (children not a list) is dropped whole. Scope is per-block id/field validity — it does not dedupe ids (cross-block, no single right answer). +12 tests (bad-field / unknown-type / blank-id dropped, deep pruning, invalid-shell section dropped, immutability, render-safe result). 799/799 (42 suites). (This was a genuine remaining gap — validate had no enforcement partner — not filler; saturation note below still holds for the roadmap proper.) - 2026-06-07 — Audit (markdown round-trip): probed the Markdown text boundary for round-trip fidelity. Found one real data-corruption bug — table cells containing `|` don't survive `asMarkdown` → `md/import` (recorded under **Known limitations** with repro + fix sketch). Could not land the fix this pass: it must touch md-import.sx (449 lines) and every sx-tree *edit* tool is currently broken in this worktree (yojson error; only `sx_write_file` works), so a safe surgical edit isn't possible and a full manual rewrite of working import code is too risky to be responsible. Deferred + documented rather than half-fix (export-only escaping worsens self-round-trip). Engine remains COMPLETE + audited at 787/787; with the roadmap exhausted, the tree-wide audit done, and the one open finding tooling-blocked, the vertical is **SATURATED** — pacing the loop down. - 2026-06-07 — Hardening: validation now vets collection blocks ELEMENT-DEEP. `validate` previously checked only that list `items` / table `headers`/`rows` *are lists* — a list holding a non-string, or a table whose rows aren't lists of strings, passed validation yet crashes asText/render/find-replace/search (which all assume string items/cells). Added `ct-all-str?`/`ct-all-rows?` and deepened the list/table branches (guarded so a non-list container reports only the is-a-list issue, not a spurious element issue). Since validate's job is guarding imports/federated input, this closes the boundary before the render layer can fault. +9 validate tests (list non-string item, table non-list row / non-string cell / non-string header, empties stay valid). 787/787. - 2026-06-07 — Hardening (tree-wide audit): the public facade `content/find` / `content/has?` were top-level-only (`doc-find`/`doc-has?`), so you could `content/edit` an update/delete to a nested block by id (those ops are tree-wide) but couldn't read that same block back by id through the facade — a concrete read/write asymmetry. Added a generic `ct-find-id` to doc.sx (descends into any `children` list, mirroring ct-replace-id/ct-remove-id, no section.sx dependency) plus `doc-find-deep`/`doc-has-deep?`; `content/find`/`content/has?` now point at them. Kept `content/find-top`/`content/has-top?` for the top-level-only lookup. Audited all `doc-find`/`doc-ids`/`ct-index-of` callers: the remaining ones are insert/move (positional, top-level by design) — no other seams. +6 api tests (nested deep find/has, top variants miss nested, edit-then-find round-trip). 778/778. - 2026-06-07 — Hardening: `content/diff` (and `content/diff-versions`) are now TREE-WIDE. They enumerated ids via `doc-ids`/`doc-find` (top-level only), so a diff between two versions of a document containing sections silently missed every nested-block add/remove/change — the same class of seam as the by-id op-log bug. Now ids come from `doc-tree-ids` and lookups from `doc-deep-find`, so nested changes surface precisely. Section containers are excluded from `:changed` (they hold no own content; a child change reports as that child), while whole-section add/remove still shows in `:added`/`:removed`. Flat-doc diffs are unchanged (deep == top-level with no sections). +9 store tests (nested add = section+child, nested change = child only, nested remove, no-op). 772/772. - 2026-06-07 — Feature: in-document prose search. `content/search-text` (and `content/search-text-ids`) return every content block, tree-wide, whose `(asText b)` contains a term — so search spans text/heading/code/quote/callout text, image alt, list items and table cells **by construction**: it reuses the one canonical "prose of a block" projection (asText) rather than re-listing fields, so it can't drift from stats/find-replace. Section containers are excluded (a term living only in a section's children returns the child, not the wrapper). +7 query tests (cross-field match, count, single-field, no-match, section exclusion, object return). 763/763. - 2026-06-07 — Consistency: `find-replace` now rewrites **every** text-bearing field, not just `text`. New `fr-rewrite` dispatches per block type — `alt` of image blocks, each item of list blocks, and every header/cell of table blocks now get rewritten alongside text/heading/code/quote/callout. This closes a real seam: `asText`/stats/word-count already fold image alt, list items, and table cells into a document's prose, so a `content/find-replace` rename that skipped them was inconsistent (a renamed term would still show up in word counts and exports). Flipped the two `image alt untouched` tests to `image alt replaced`; +4 tests (list items ×2, table header + cell). find-replace 16/16, 756/756. - 2026-06-07 — Consistency: `find-replace` now covers `callout` text. `fr-has-text?` (find-replace.sx) added `callout` to its text-bearing block kinds, matching `asText`/stats/summary which already treat callout bodies as prose. Previously a `content/find-replace` over a doc containing callouts silently skipped them. +2 find-replace tests (replace callout text; callout kind untouched by text replace). 752/752 (41 suites). - 2026-06-07 — Hardening: fixed a real layer seam (surfaced in the architecture review) — by-id ops (update/delete) now act TREE-WIDE. `ct-replace-id` / `ct-remove-id` (doc.sx) descend into any block carrying a `children` list, so the persist op-log and `content/edit` correctly reach blocks nested in sections (previously a silent no-op). `doc-move` stays top-level (guarded by doc-find); insert/move remain positional. Inline section detection (no section.sx dep). +4 store regression tests (nested update/delete via op-log + replay-to-seq). Full gate over foundational doc.sx: 750/750. - 2026-06-07 — Hardening: audit confirmed the persist op-log (store.sx) carries every block type through commit → replay (op-insert carries the block instance; updates apply by id). Locked with +4 store tests (callout/media insert + update via the durable log). No bug; coverage gap closed. Suite 746/746. - 2026-06-07 — Hardening: tree-CRDT orphan reparenting. Concurrent delete-section + insert-child previously orphaned the child (its parent no longer a live section) → silently dropped from materialise. Fixed `crdt-tree-materialize`/`crdt-tree-order` to root any element whose parent is "" OR not a live section, so content is never lost on concurrent edits. +4 tests (orphan survives, commutes, content preserved, renders at root). Suite 742/742. - 2026-06-07 — Hardening: regression suite `crdt-blocks` (7 tests) locking that non-core block types (callout/table/media/section) survive both the flat and nested-tree CvRDT materialise paths (insert → merge → materialise → render), the integration the ct-class-for-type fix repaired. Verified flat + tree, including concurrent mixed-type inserts into a section converging. Suite 738/738. - 2026-06-07 — Hardening: fixed `ct-class-for-type` (block.sx) to map all block tags (added section/table/callout/media). Latent bug: `content/from-data` and CRDT materialise of callout/media blocks failed with "unknown block type" (they fell through to `mk-block`, which only knew the original 8 types). Now all block types build uniformly via mk-block; data/wire/CRDT round-trips of callout/media work. +4 data regression tests; full no-regression gate over the foundational block.sx change: suite 731/731. - 2026-06-07 — Extension: nested-tree CvRDT (`crdt-tree.sx`). Extends the flat CvRDT to a TREE: each element carries a `parent` (containing section id, "" = root) beside its Logoot pos; merge reuses crdt.sx's pos/register/field joins + parent (immutable). Materialisation rebuilds the ordered tree (root + per-section children sorted by pos, recursive). Sections now merge collaboratively; proven commutative/associative/idempotent — same- and different-parent concurrent inserts converge, nested sections, LWW, two-replica convergence. Reuses crdt.sx + section.sx; flat crdt untouched (34/34). 17 tests; suite 727/727. This was the flagged "research-grade" gap — done as a clean self-contained layer. - 2026-06-07 — Extension: multi-document index (`index.sx`). `content/index` projects a doc list into summary cards (blog index); `content/index-by-tag` filters by tag (category pages); `content/all-tags` is a deduped tag cloud; `content/has-tag?`. Composes content/summary + doc metadata. 13 tests; suite 710/710. - 2026-06-07 — Extension: list-card summary (`summary.sx`). `content/summary` returns `{:id :title :excerpt :words :reading-minutes :cover}` for index/listing cards, composing metadata + text + stats + query (`content/cover` = first image's src). Title falls back to id. 14 tests; suite 697/697. - 2026-06-07 — Extension: video/audio media block (`media.sx`). `CtMedia` holds kind (video/audio) + src; answers asHTML (`