Files
rose-ash/plans/content-on-sx.md
giles 295864786d
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 25s
content: Markdown import adapter (md-import) + 24 tests (318/318)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 01:33:50 +00:00

199 lines
12 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# content-on-sx: Documents, blocks & collaborative editing on Smalltalk
> **DRAFT outline.** The CMS vertical — blog, WYSIWYG editor, Ghost sync. Depends
> on `persist-on-sx` (document history as an event log). Ghost/CMS sync stays a thin
> external adapter (Python/FFI) until a native replacement exists.
rose-ash's `blog` domain is content management: a block-based WYSIWYG editor,
navigation, Ghost CMS sync. A document is a tree of live blocks; editing is a
stream of operations; collaboration needs conflict-free merge. That is an object
model — blocks are objects, edits are messages, and a document is the object graph
responding to them. Smalltalk's "everything is an object responding to messages"
maps directly to a block/WYSIWYG model, and a semilattice (CRDT) merge keeps
concurrent edits conflict-free.
End-state: a Smalltalk-on-SX document model (typed blocks, structural ops),
operation log + CRDT merge for collaborative editing, versioning/history via the
event store, and a render boundary to HTML/SX. External CMS (Ghost) sync is an
injected adapter, not core.
## Status (rolling)
`bash lib/content/conformance.sh`**318/318** (Phases 14 COMPLETE + extensions: HTML/SX escaping, Markdown render+import, durable CRDT replication, validation)
## Ground rules
- **Scope:** only `lib/content/**` and `plans/content-on-sx.md`. May **import**
from `lib/smalltalk/`, and (once it exists) `lib/persist/`. Do not edit substrates.
- **Architecture:** a document is an ordered tree of blocks (objects); an edit is a
message (`insert`/`update`/`move`/`delete`); concurrent edits merge via a
commutative (CRDT/semilattice) operation so order doesn't matter. History is the
`persist` event stream; any version is a replay.
- **Determinism:** merge must be commutative + idempotent (test: apply ops in any
order / twice → same document).
- **Commits:** one feature per commit. Progress log + tick boxes.
## Architecture sketch
```
Edit op Rendered document
(insert block after id) ... HTML / SX tree
│ ▲
▼ │
lib/content/block.sx lib/content/render.sx
— typed blocks as objects — block tree → HTML/SX
— heading/text/image/embed — (reuses SX render boundary)
│ ▲
▼ │
lib/content/doc.sx lib/content/merge.sx
— ordered block tree — CRDT/semilattice op merge
— apply op, structural moves — concurrent-edit reconciliation
│ ▲
▼ │
lib/content/api.sx ── (content/edit) (content/render) (content/history) ──┐
│ │
├── op log + versions → persist │
└── Ghost/CMS sync → injected external adapter (thin, non-core) ──┘
```
## Phase 1 — Block document model
- [x] `block.sx` — typed block objects
- [x] `doc.sx` — ordered tree, apply edit op, structural moves
- [x] `render.sx` — block tree → HTML/SX
- [x] `api.sx` + tests + scoreboard + conformance.sh
## Phase 2 — Op log + versioning
- [x] edit ops as `persist` events; replay to any version
- [x] `(content/history doc)`, diff between versions
## Phase 3 — Collaborative merge (CRDT)
- [x] commutative/idempotent op merge
- [x] concurrent-edit tests (any order, double-apply → identical)
## Phase 4 — External sync + federation
- [x] Ghost/CMS sync via injected adapter (import/export)
- [x] federated documents (peer-authored blocks) — trust-gated stub
- [x] tests: round-trip import/export, conflict on concurrent external edit
## Extensions (post-roadmap)
- [x] HTML escaping at the render boundary (`String>>htmlEscaped`: & < > ")
- [x] asSx wire string-escaping (`String>>sxEscaped`: \ and " in SX literals)
- [x] Markdown render mode (`asMarkdown:` / `content/render doc "md"`)
- [x] durable CRDT replication (`crdt-store.sx`: ops on persist, replay + converge)
- [x] document validation (`validate.sx`: ids, per-type fields, duplicate ids)
- [x] Markdown import adapter (`md-import.sx`: text → blocks, round-trips export)
## Progress log
- 2026-06-07 — Extension: Markdown import adapter (`md-import.sx`), inverse of
asMarkdown. Line-based parser: ATX headings, fenced code (```lang), blockquotes,
unordered/ordered lists (grouping consecutive items), thematic breaks,
paragraphs (consecutive plain lines joined with a space). Sequential ids
b0,b1…. `md/import` / `content/from-markdown` / `markdown-adapter` (import +
asMarkdown export). Round-trips canonical Markdown (import∘export == identity);
imported docs pass validation. 24 tests; suite 318/318.
- 2026-06-07 — Extension: document validation (`validate.sx`). `content/validate`
returns issue dicts `{:id :kind :detail}` (empty = valid); `content/valid?`
and `content/issue-kinds` convenience. Checks block id (non-empty string),
per-type required fields/types (heading level number, image src/alt strings,
list ordered boolean + items list, etc.), unknown block types, and
document-level duplicate ids. Guards imports/edits/federated input. 17 tests;
suite 294/294.
- 2026-06-07 — Extension: durable CRDT replication (`crdt-store.sx`), uniting
Phase 2 (persist) + Phase 3 (CvRDT). Each replica appends its CRDT ops to its
own stream (`crdt:<doc>:<replica>`); `crdt/replay` folds one log into a state,
`crdt/converge` merges every replica's replayed state, `crdt/document` /
`crdt/order` materialise. Converged result is identical regardless of replica
order or duplicate delivery (join + idempotent apply) → offline-capable,
eventually-consistent editing. 14 tests; suite 277/277.
- 2026-06-07 — Extension: Markdown render mode (`markdown.sx`). Third boundary
format alongside asHTML/asSx via the same polymorphic dispatch; blocks answer
`asMarkdown: nl` (boundary supplies the newline — this Smalltalk dialect has
no Character newline ctor). `content/render doc "md"`/`"markdown"`/`:md`,
`content/markdown`, `asMarkdown`. headings (`#`×level), fenced code, `> ` quote,
`![alt](src)`, `- `/`1. ` lists, `---`; doc joins blocks with a blank line. No
MD escaping yet. 20 tests; suite 263/263.
- 2026-06-07 — Extension: asSx wire string-escaping. Added `String>>sxEscaped`
(escapes `\`→`\\` then `"``\"`) and routed every `asSx` text/attr/list-item
through it, so the SX wire format stays valid when content contains quotes or
backslashes. +5 render tests (expected strings built from `q`/`bs` helpers to
avoid escaping miscounts). Suite 243/243.
- 2026-06-07 — Extension: HTML escaping at the render boundary. Added
`String>>htmlEscaped` (recursive char walk escaping & < > ", order-safe so &
isn't double-escaped) and routed every `asHTML` text/attr through it — heading,
text, code body + language, quote, image src/alt, embed url, list items.
Render stays fully polymorphic in Smalltalk; escaping lives at the boundary.
+8 render tests (incl. `<script>` payloads, attr breakout, ampersand-once).
asSx wire-escaping deferred to next. Suite 238/238.
- 2026-06-07 — Phase 4 `fed.sx` (**Phase 4 COMPLETE — roadmap done**):
trust-gated federation. Peer ops carry provenance (`:author`, `:sig` stub);
none are auto-accepted. The trust gate is a pluggable predicate (acl-on-sx
hook) with a trusted-actor-list convenience stub. `content/merge-peer[-with]`
applies only accepted ops through the CvRDT and quarantines the rest
(`{:state :accepted :rejected}`). Concurrent local/external edits reconcile
deterministically: same-field LWW by (ts,actor), commutative, idempotent;
untrusted ops never touch state. 20 tests; suite 230/230.
- 2026-06-07 — Phase 4 `sync.sx` (cb1): external CMS sync via an injected
adapter. Core defines the shape — `{:import :export}` — and delegates;
`content/import` / `content/export` / `content/round-trip` know nothing about
Ghost. A Ghost-flavoured adapter confines all format translation (post
`:sections` ↔ content blocks, all 8 kinds). Swapping in a stub `raw-adapter`
works identically. Round-trip (export∘import and import∘export) preserves ids,
types, fields, order. 14 tests; suite 210/210. Next: trust-gated federation +
concurrent-external-edit conflict (via CRDT).
- 2026-06-07 — Phase 3 `crdt.sx` (**Phase 3 complete**): collaborative merge as
a state-based CvRDT. Merge is a join (lub) on a semilattice → commutative,
associative, idempotent by construction. Ordering = unique dense Logoot
position keys (cell = (digit actor), lexicographic); presence = OR-tombstones
(remove-wins); each field = an LWW-Register keyed by logical (ts, actor). Every
op contributes a PARTIAL element and per-id state is their join, so
update-/delete-before-insert are not lost. `crdt-materialize` bridges back to a
Phase-1 `CtDoc` (sort live elements by pos → blocks). Tests prove: ops in any
order converge, double-apply is a no-op, merge commutes/associates/is
idempotent, concurrent inserts order deterministically, same-field LWW by
(ts,actor), disjoint fields both survive, two divergent replicas converge both
ways. 34 tests; suite 196/196.
- 2026-06-07 — Phase 2 `store.sx` (**Phase 2 complete**): op log + versioning
over the persist event stream. `content/commit!` appends an edit op as a
persist event to the doc's stream (`content:<id>`); the log is the source of
truth. `content/head` / `content/at b id seq` replay the op stream to the
latest / any version (materialised doc is a cache, never primary state).
`content/history` returns per-version metadata; `content/diff` /
`content/diff-versions` report added/removed/changed block ids. Backend is
injected via `(persist/open)` — content knows nothing about which backend.
Minimal persist load (event/backend/log/kv/api). 29 tests; suite 162/162.
- 2026-06-07 — Phase 1 `api.sx` (**Phase 1 complete**): `content/*` facade over
block + doc + render. `content/bootstrap!` registers the hierarchy;
`content/edit` applies one op or an op stream; `content/render` picks the
boundary format ("html"/"sx" or keyword). Re-exports `content/new`,
`content/append`, `content/insert|update|move|delete`, `content/find`, etc.
`content/op?` distinguishes a single op from a list/block. 26 tests; suite
133/133. content/history deferred to Phase 2 (needs the persist op log).
- 2026-06-07 — Phase 1 `render.sx`: render boundary as polymorphic message
dispatch. Every block and `CtDoc` answers `asHTML` / `asSx`; the document
folds children via Smalltalk `inject:into:` (works on raw SX lists), so
`(asHTML doc)` / `(asSx doc)` are pure sends with zero type-switching in SX.
Lists/headings render in Smalltalk source. No HTML escaping yet (noted in
render.sx — boundary concern before untrusted content). 29 tests; suite
107/107.
- 2026-06-06 — Phase 1 `doc.sx`: ordered block document (`CtDoc`) as a
Smalltalk object holding an ordered block sequence. Edit ops are data dicts
(`insert`/`update`/`move`/`delete`) with `op-*` constructors; `doc-apply` /
`doc-apply-all` interpret an op stream, each returning a NEW document (input
never mutated → replay-safe). Structural moves, insert-after/at, find/index,
immutability all tested. 40 tests; suite 78/78.
- 2026-06-06 — Phase 1 `block.sx`: typed block objects as Smalltalk instances
(`CtBlock` hierarchy: text/heading/code/quote/image/embed/divider/list).
Type tag + accessors are message sends (polymorphic dispatch); fields are
immutable copy-on-write via functional `st-iv-set!` (history-safe). Added
`mk-*` constructors, `block?` predicate, `lib/content/conformance.sh` +
scoreboard. 38/38.
## Blockers
- Smalltalk-only load chain (tokenizer/parser/runtime/eval) does **not** load
`lib/r7rs.sx`/`spec/stdlib.sx`, so r7rs aliases (`car`/`cdr`/`null?`) are
absent. Use base SX primitives (`first`/`rest`/`(= (len x) 0)`) in
`lib/content/**`. Not a substrate bug — just the load surface.