Files
rose-ash/plans/content-on-sx.md
giles a5ff21015e
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 59s
content: document composition (compose.sx) + 17 tests (502/502)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 03:02:54 +00:00

284 lines
19 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# content-on-sx: Documents, blocks & collaborative editing on Smalltalk
> **DRAFT outline.** The CMS vertical — blog, WYSIWYG editor, Ghost sync. Depends
> on `persist-on-sx` (document history as an event log). Ghost/CMS sync stays a thin
> external adapter (Python/FFI) until a native replacement exists.
rose-ash's `blog` domain is content management: a block-based WYSIWYG editor,
navigation, Ghost CMS sync. A document is a tree of live blocks; editing is a
stream of operations; collaboration needs conflict-free merge. That is an object
model — blocks are objects, edits are messages, and a document is the object graph
responding to them. Smalltalk's "everything is an object responding to messages"
maps directly to a block/WYSIWYG model, and a semilattice (CRDT) merge keeps
concurrent edits conflict-free.
End-state: a Smalltalk-on-SX document model (typed blocks, structural ops),
operation log + CRDT merge for collaborative editing, versioning/history via the
event store, and a render boundary to HTML/SX. External CMS (Ghost) sync is an
injected adapter, not core.
## Status (rolling)
`bash lib/content/conformance.sh`**502/502** (Phases 14 COMPLETE + extensions: HTML/SX escaping, Markdown render + import/export incl. tables & frontmatter (full round-trip), CRDT replication, tree-aware validation, snapshot cache, doc metadata, plain-text render, nested block trees, doc stats, table block, HTML page wrapper + SEO page, doc composition)
## Ground rules
- **Scope:** only `lib/content/**` and `plans/content-on-sx.md`. May **import**
from `lib/smalltalk/`, and (once it exists) `lib/persist/`. Do not edit substrates.
- **Architecture:** a document is an ordered tree of blocks (objects); an edit is a
message (`insert`/`update`/`move`/`delete`); concurrent edits merge via a
commutative (CRDT/semilattice) operation so order doesn't matter. History is the
`persist` event stream; any version is a replay.
- **Determinism:** merge must be commutative + idempotent (test: apply ops in any
order / twice → same document).
- **Commits:** one feature per commit. Progress log + tick boxes.
## Architecture sketch
```
Edit op Rendered document
(insert block after id) ... HTML / SX tree
│ ▲
▼ │
lib/content/block.sx lib/content/render.sx
— typed blocks as objects — block tree → HTML/SX
— heading/text/image/embed — (reuses SX render boundary)
│ ▲
▼ │
lib/content/doc.sx lib/content/merge.sx
— ordered block tree — CRDT/semilattice op merge
— apply op, structural moves — concurrent-edit reconciliation
│ ▲
▼ │
lib/content/api.sx ── (content/edit) (content/render) (content/history) ──┐
│ │
├── op log + versions → persist │
└── Ghost/CMS sync → injected external adapter (thin, non-core) ──┘
```
## Phase 1 — Block document model
- [x] `block.sx` — typed block objects
- [x] `doc.sx` — ordered tree, apply edit op, structural moves
- [x] `render.sx` — block tree → HTML/SX
- [x] `api.sx` + tests + scoreboard + conformance.sh
## Phase 2 — Op log + versioning
- [x] edit ops as `persist` events; replay to any version
- [x] `(content/history doc)`, diff between versions
## Phase 3 — Collaborative merge (CRDT)
- [x] commutative/idempotent op merge
- [x] concurrent-edit tests (any order, double-apply → identical)
## Phase 4 — External sync + federation
- [x] Ghost/CMS sync via injected adapter (import/export)
- [x] federated documents (peer-authored blocks) — trust-gated stub
- [x] tests: round-trip import/export, conflict on concurrent external edit
## Extensions (post-roadmap)
- [x] HTML escaping at the render boundary (`String>>htmlEscaped`: & < > ")
- [x] asSx wire string-escaping (`String>>sxEscaped`: \ and " in SX literals)
- [x] Markdown render mode (`asMarkdown:` / `content/render doc "md"`)
- [x] durable CRDT replication (`crdt-store.sx`: ops on persist, replay + converge)
- [x] document validation (`validate.sx`: ids, per-type fields, duplicate ids; tree-aware — descends into sections, tree-wide dup ids, section field check)
- [x] Markdown import adapter (`md-import.sx`: text → blocks, round-trips export; incl. pipe tables + frontmatter → metadata)
- [x] Markdown doc export (`md-doc.sx`: content/markdown-doc, frontmatter from metadata, full round-trip)
- [x] snapshot cache over replay (`snapshot.sx`: cache-not-primary, transparent)
- [x] document metadata (`meta.sx`: title/slug/tags + Ghost title plumbing)
- [x] plain-text render + excerpt (`text.sx`: asText, content/excerpt)
- [x] nested block trees (`section.sx`: CtSection container, recursive render, deep-find)
- [x] document statistics (`stats.sx`: word/char/block counts, reading time)
- [x] table block (`table.sx`: CtTable, renders html/sx/text/md, validated)
- [x] HTML page wrapper (`page.sx`: content/page, escaped title from metadata)
- [x] SEO page (`page-full.sx`: content/page-full, lang + meta description from excerpt)
- [x] document composition (`compose.sx`: concat/prepend/concat-all/wrap-section)
## Progress log
- 2026-06-07 — Extension: document composition (`compose.sx`). `content/concat`
/ `content/prepend` / `content/concat-all` combine documents (keeping the
first's id + metadata, concatenating blocks, immutable); `content/wrap-section`
collapses a doc's blocks into a single nested section. For assembling pages
from header/body/footer parts and templates. 17 tests; suite 502/502.
- 2026-06-07 — Extension: SEO-complete page (`page-full.sx`). `content/page-full`
extends content/page with `<html lang="en">` and a `<meta name="description">`
drawn from the document excerpt (plain text, escaped, 160 chars), composing the
page/metadata/text layers into the SEO-ready artifact. 4 tests; suite 485/485.
- 2026-06-07 — Extension: Markdown document export (`md-doc.sx`).
`content/markdown-doc` emits a `---` frontmatter block from metadata
(title/slug/tags, only present fields) ahead of the Markdown body, or plain
asMarkdown when there's no metadata. Completes the metadata round-trip:
`md/import ∘ content/markdown-doc` preserves title/slug/tags + blocks. 12
tests; suite 481/481.
- 2026-06-07 — Extension: Markdown frontmatter. `md/import` parses a leading
`---` / `key: value` / `---` block into document metadata (title, slug,
comma-separated tags via `doc-with-meta`) before parsing the body; a `---`
elsewhere stays a divider. Ties the Markdown importer to the metadata layer the
way real blog posts work. +9 tests; suite 469/469.
- 2026-06-07 — Extension: Markdown table import. `md-import.sx` now recognizes a
`| … |` header row followed by a `| --- |` separator and parses a `CtTable`
(cells trimmed, mixed with other blocks via blank-line separation), completing
the Markdown table round-trip (import∘export == identity). +5 tests; suite
460/460.
- 2026-06-07 — Extension: HTML page wrapper (`page.sx`). `content/page` composes
metadata + render into a minimal valid HTML5 document — escaped `<title>` from
doc metadata (falling back to id) and the rendered blocks as the body.
`content/page-title`. The shippable artifact the blog serves. 7 tests; suite
455/455.
- 2026-06-07 — Extension: table block (`table.sx`). `CtTable` holds headers +
rows (string lists); answers asHTML (escaped `<table>`), asSx, asText, and
asMarkdown: (pipe table with dashed separator row) by folding rows×cells via
nested `inject:into:`. Self-contained (no edits to block.sx/render.sx);
`mk-table`, `table?`, `table-headers/rows`. validate.sx gained a `table` field
case (headers/rows must be lists). 15 tests; suite 448/448.
- 2026-06-07 — Extension: document statistics (`stats.sx`). `content/stats`
returns `{:words :chars :blocks :reading-minutes}`; word/char counts derive
from the tree-accurate `asText` projection, block count from an inline tree
walk (no section.sx dep), reading time at 200 wpm rounded up. Counts descend
into nested sections. 17 tests; suite 433/433.
- 2026-06-07 — Refinement: tree-aware validation. `validate.sx` now flattens the
whole block tree (descending into `CtSection` children, guarding malformed
non-list children) so field checks and duplicate-id detection cover nested
blocks and span section boundaries; added a `section` field-type case. Inline
tree detection (class + st-iv-get) keeps it free of a section.sx dependency.
+6 tests; suite 416/416.
- 2026-06-07 — Extension: nested block trees (`section.sx`). `CtSection` is a
block whose `children` ivar is a list of blocks (incl. nested sections →
arbitrary depth), turning the flat document into the ordered TREE from the
architecture sketch. Self-contained: it answers asHTML/asSx/asText/asMarkdown:
by folding children's renderings (pure polymorphic recursion — no changes to
block.sx/render.sx). `mk-section`, `section-children`, `section-append` (cow),
and tree traversal `doc-deep-find` / `doc-tree-ids` / `doc-tree-count` that
descend into sections. 25 tests; suite 410/410.
- 2026-06-07 — Extension: plain-text render + excerpts (`text.sx`). Fourth
boundary format via polymorphic `asText` (heading/text/code/quote→text,
image→alt, embed/divider→"", list→", "-joined); the document joins non-empty
child texts with a space. `content/render doc "text"`, `content/text`,
`content/excerpt doc n` (first n chars + "…" if truncated). For previews,
meta-descriptions, search indexing. 20 tests; suite 385/385.
- 2026-06-07 — Extension: document metadata (`meta.sx`). CtDoc gained optional
title/slug/tags ivars (declared in doc.sx, default nil/empty, no effect on
block ops). Reads via message dispatch; copy-on-write setters
(`doc-with-title/slug/tags`, `doc-add-tag`, `doc-with-meta`, `doc-new-meta`)
and `content/*` aliases; `doc-meta` returns the metadata dict. Ghost adapter
now carries `:title` through import/export/round-trip. 27 tests; suite 365/365.
- 2026-06-07 — Extension: snapshot cache over op-log replay (`snapshot.sx`).
Snapshots are a cache, never primary state — the log stays the source of truth.
`content/snapshot!` stores a materialised head at a seq in the persist KV;
`content/head-cached` / `content/at-cached` start from the nearest snapshot and
replay only the tail, returning a document IDENTICAL to a full replay (tests
assert transparency before/after snapshot, across versions, and after
drop-snapshot fallback). `content/has-snapshot?` / `snapshot-seq` /
`drop-snapshot!`. 20 tests; suite 338/338.
- 2026-06-07 — Extension: Markdown import adapter (`md-import.sx`), inverse of
asMarkdown. Line-based parser: ATX headings, fenced code (```lang), blockquotes,
unordered/ordered lists (grouping consecutive items), thematic breaks,
paragraphs (consecutive plain lines joined with a space). Sequential ids
b0,b1…. `md/import` / `content/from-markdown` / `markdown-adapter` (import +
asMarkdown export). Round-trips canonical Markdown (import∘export == identity);
imported docs pass validation. 24 tests; suite 318/318.
- 2026-06-07 — Extension: document validation (`validate.sx`). `content/validate`
returns issue dicts `{:id :kind :detail}` (empty = valid); `content/valid?`
and `content/issue-kinds` convenience. Checks block id (non-empty string),
per-type required fields/types (heading level number, image src/alt strings,
list ordered boolean + items list, etc.), unknown block types, and
document-level duplicate ids. Guards imports/edits/federated input. 17 tests;
suite 294/294.
- 2026-06-07 — Extension: durable CRDT replication (`crdt-store.sx`), uniting
Phase 2 (persist) + Phase 3 (CvRDT). Each replica appends its CRDT ops to its
own stream (`crdt:<doc>:<replica>`); `crdt/replay` folds one log into a state,
`crdt/converge` merges every replica's replayed state, `crdt/document` /
`crdt/order` materialise. Converged result is identical regardless of replica
order or duplicate delivery (join + idempotent apply) → offline-capable,
eventually-consistent editing. 14 tests; suite 277/277.
- 2026-06-07 — Extension: Markdown render mode (`markdown.sx`). Third boundary
format alongside asHTML/asSx via the same polymorphic dispatch; blocks answer
`asMarkdown: nl` (boundary supplies the newline — this Smalltalk dialect has
no Character newline ctor). `content/render doc "md"`/`"markdown"`/`:md`,
`content/markdown`, `asMarkdown`. headings (`#`×level), fenced code, `> ` quote,
`![alt](src)`, `- `/`1. ` lists, `---`; doc joins blocks with a blank line. No
MD escaping yet. 20 tests; suite 263/263.
- 2026-06-07 — Extension: asSx wire string-escaping. Added `String>>sxEscaped`
(escapes `\`→`\\` then `"``\"`) and routed every `asSx` text/attr/list-item
through it, so the SX wire format stays valid when content contains quotes or
backslashes. +5 render tests (expected strings built from `q`/`bs` helpers to
avoid escaping miscounts). Suite 243/243.
- 2026-06-07 — Extension: HTML escaping at the render boundary. Added
`String>>htmlEscaped` (recursive char walk escaping & < > ", order-safe so &
isn't double-escaped) and routed every `asHTML` text/attr through it — heading,
text, code body + language, quote, image src/alt, embed url, list items.
Render stays fully polymorphic in Smalltalk; escaping lives at the boundary.
+8 render tests (incl. `<script>` payloads, attr breakout, ampersand-once).
asSx wire-escaping deferred to next. Suite 238/238.
- 2026-06-07 — Phase 4 `fed.sx` (**Phase 4 COMPLETE — roadmap done**):
trust-gated federation. Peer ops carry provenance (`:author`, `:sig` stub);
none are auto-accepted. The trust gate is a pluggable predicate (acl-on-sx
hook) with a trusted-actor-list convenience stub. `content/merge-peer[-with]`
applies only accepted ops through the CvRDT and quarantines the rest
(`{:state :accepted :rejected}`). Concurrent local/external edits reconcile
deterministically: same-field LWW by (ts,actor), commutative, idempotent;
untrusted ops never touch state. 20 tests; suite 230/230.
- 2026-06-07 — Phase 4 `sync.sx` (cb1): external CMS sync via an injected
adapter. Core defines the shape — `{:import :export}` — and delegates;
`content/import` / `content/export` / `content/round-trip` know nothing about
Ghost. A Ghost-flavoured adapter confines all format translation (post
`:sections` ↔ content blocks, all 8 kinds). Swapping in a stub `raw-adapter`
works identically. Round-trip (export∘import and import∘export) preserves ids,
types, fields, order. 14 tests; suite 210/210. Next: trust-gated federation +
concurrent-external-edit conflict (via CRDT).
- 2026-06-07 — Phase 3 `crdt.sx` (**Phase 3 complete**): collaborative merge as
a state-based CvRDT. Merge is a join (lub) on a semilattice → commutative,
associative, idempotent by construction. Ordering = unique dense Logoot
position keys (cell = (digit actor), lexicographic); presence = OR-tombstones
(remove-wins); each field = an LWW-Register keyed by logical (ts, actor). Every
op contributes a PARTIAL element and per-id state is their join, so
update-/delete-before-insert are not lost. `crdt-materialize` bridges back to a
Phase-1 `CtDoc` (sort live elements by pos → blocks). Tests prove: ops in any
order converge, double-apply is a no-op, merge commutes/associates/is
idempotent, concurrent inserts order deterministically, same-field LWW by
(ts,actor), disjoint fields both survive, two divergent replicas converge both
ways. 34 tests; suite 196/196.
- 2026-06-07 — Phase 2 `store.sx` (**Phase 2 complete**): op log + versioning
over the persist event stream. `content/commit!` appends an edit op as a
persist event to the doc's stream (`content:<id>`); the log is the source of
truth. `content/head` / `content/at b id seq` replay the op stream to the
latest / any version (materialised doc is a cache, never primary state).
`content/history` returns per-version metadata; `content/diff` /
`content/diff-versions` report added/removed/changed block ids. Backend is
injected via `(persist/open)` — content knows nothing about which backend.
Minimal persist load (event/backend/log/kv/api). 29 tests; suite 162/162.
- 2026-06-07 — Phase 1 `api.sx` (**Phase 1 complete**): `content/*` facade over
block + doc + render. `content/bootstrap!` registers the hierarchy;
`content/edit` applies one op or an op stream; `content/render` picks the
boundary format ("html"/"sx" or keyword). Re-exports `content/new`,
`content/append`, `content/insert|update|move|delete`, `content/find`, etc.
`content/op?` distinguishes a single op from a list/block. 26 tests; suite
133/133. content/history deferred to Phase 2 (needs the persist op log).
- 2026-06-07 — Phase 1 `render.sx`: render boundary as polymorphic message
dispatch. Every block and `CtDoc` answers `asHTML` / `asSx`; the document
folds children via Smalltalk `inject:into:` (works on raw SX lists), so
`(asHTML doc)` / `(asSx doc)` are pure sends with zero type-switching in SX.
Lists/headings render in Smalltalk source. No HTML escaping yet (noted in
render.sx — boundary concern before untrusted content). 29 tests; suite
107/107.
- 2026-06-06 — Phase 1 `doc.sx`: ordered block document (`CtDoc`) as a
Smalltalk object holding an ordered block sequence. Edit ops are data dicts
(`insert`/`update`/`move`/`delete`) with `op-*` constructors; `doc-apply` /
`doc-apply-all` interpret an op stream, each returning a NEW document (input
never mutated → replay-safe). Structural moves, insert-after/at, find/index,
immutability all tested. 40 tests; suite 78/78.
- 2026-06-06 — Phase 1 `block.sx`: typed block objects as Smalltalk instances
(`CtBlock` hierarchy: text/heading/code/quote/image/embed/divider/list).
Type tag + accessors are message sends (polymorphic dispatch); fields are
immutable copy-on-write via functional `st-iv-set!` (history-safe). Added
`mk-*` constructors, `block?` predicate, `lib/content/conformance.sh` +
scoreboard. 38/38.
## Blockers
- Smalltalk-only load chain (tokenizer/parser/runtime/eval) does **not** load
`lib/r7rs.sx`/`spec/stdlib.sx`, so r7rs aliases (`car`/`cdr`/`null?`) are
absent. Use base SX primitives (`first`/`rest`/`(= (len x) 0)`) in
`lib/content/**`. Not a substrate bug — just the load surface.