content: Phase 5 — rich inline text via structured runs (861/861)
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 40s

CtText.text may be a list of runs (text marks href); CtHeading/CtQuote rich,
CtCode verbatim. New runs.sx overrides render/markdown/text methods (byte-
identical for plain strings, opt-in). 4 modes: HTML tags / markdown / nested
SX / plain asText (drift-proof). find-replace per-run marks-preserving;
search across run boundaries; CRDT block-granularity LWW; data+wire round-trip.
Runs are a Smalltalk-renderable list (not a dict — substrate can't read dict
fields under nested render dispatch). +36 tests (44 suites). Phase 6 (char-
level inline CRDT) recorded as future.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-07 21:10:56 +00:00
parent 160d0f2dd0
commit f68591456e
7 changed files with 444 additions and 6 deletions

View File

@@ -19,7 +19,7 @@ injected adapter, not core.
## Status (rolling)
`bash lib/content/conformance.sh`**825/825** (Phases 14 COMPLETE + ~34 extensions, hardened: HTML/SX escaping, Markdown render + import/export incl. tables & frontmatter (full round-trip), CvRDT flat + nested-tree + durable replication, tree-aware validation, snapshot cache, doc metadata, plain-text render, nested block trees + deep editing + flatten + relative reorder, doc stats + summary + multi-doc index, table + callout + media blocks, HTML page wrapper + SEO page, doc composition + id-remap, portable data + wire serialization, block query + transforms + find/replace, TOC + anchored headings + outline, normalization)
`bash lib/content/conformance.sh`**861/861** (Phases 14 COMPLETE + ~34 extensions, hardened: HTML/SX escaping, Markdown render + import/export incl. tables & frontmatter (full round-trip), CvRDT flat + nested-tree + durable replication, tree-aware validation, snapshot cache, doc metadata, plain-text render, nested block trees + deep editing + flatten + relative reorder, doc stats + summary + multi-doc index, table + callout + media blocks, HTML page wrapper + SEO page, doc composition + id-remap, portable data + wire serialization, block query + transforms + find/replace, TOC + anchored headings + outline, normalization)
## Ground rules
@@ -113,6 +113,52 @@ lib/content/api.sx ── (content/edit) (content/render) (content/history) ─
- [x] portable data serialization (`data.sx`: content/to-data + from-data, round-trips tree)
- [x] wire serialization (`wire.sx`: content/to-wire + from-wire, SX-text on the wire)
## Phase 5 — rich inline text (structured runs)
Drives the rose-ash blog migration: lexical post bodies carry inline formatting
(bold/italic/links) but `CtText` held one plain `text` string, so the canonical
lexical→blocks conversion was lossy. Variant **(b)**: a `CtText`'s `text` may be
EITHER a plain string (backward compat) OR a sequence of inline **runs**. Marks
cover the lexical bitmask (bold=1 italic=2 strike=4 underline=8 code=16 sub=32
sup=64) plus link nodes (which carry an href). Applies to `CtText` and its
subclasses `CtHeading`/`CtQuote` (rich), and `CtCode` (verbatim — runs render as
plain concatenated text, no marks).
**Run representation — a Smalltalk-renderable list, not a `{:text :marks}` dict.**
A run is `(text marks href)`: `text` a string, `marks` a list of mark tokens
(`:bold :italic :underline :strikethrough :code :subscript :superscript :link`
SX keywords evaluate to the strings the renderer matches), `href` a string (`""`
when absent; carries the link target). *Why a list and not the dict the brief
sketched:* rendering must happen inside the Smalltalk render methods (nested
blocks dispatch `asHTML`/etc. through Smalltalk message sends), and the
Smalltalk-on-SX layer can iterate SX lists (`do:`/`inject:into:`) but **cannot**
read SX dict fields (`Dictionary>>at:` is broken in lib/smalltalk, which is out
of scope). Lists are Smalltalk-native, render under nesting, and round-trip
through data/wire for free (they're just nested lists+strings). The blog-side
lexical→runs converter targets this `(text marks href)` shape.
Centralised in `runs.sx` (`content-bootstrap-runs!`) which OVERRIDES the
render/markdown/text methods of CtText/CtHeading/CtQuote/CtCode with run-aware
versions that fall through to identical output for plain strings — so it is
opt-in (the blog enables it) and the existing suites, which don't bootstrap it,
are untouched.
- [x] runs render in all four modes — asHTML `<strong>/<em>/<u>/<s>/<code>/<sub>/<sup>/<a href>`, asMarkdown `**`/`_`/`~~`/`` ` ``/`[..](..)` (u/sub/sup fall back to inline HTML), asSx emits nested run structure (`(p (strong "x") " y")`), asText returns the PLAIN concatenation (keeps search/stats/find-replace drift-proof)
- [x] backward compat — a plain-string CtText still renders identically; existing suites stay green
- [x] find-replace rewrites text across runs (per-run, marks preserved); runs join the text-bearing-field dispatch
- [x] search-text finds substrings via asText, including across run boundaries
- [x] CRDT invariant preserved — merge stays at BLOCK granularity (runs are the block's value): ops in any order / twice → identical document
- [x] data + wire serialization round-trip runs losslessly
### Future — Phase 6 (NOT in scope now)
Variant **(c)**: character/run-level concurrent inline CRDT (Peritext/Yjs-style)
so two authors can edit the same paragraph simultaneously — needed later for the
multi-author SX editor that replaces Ghost. Block-granularity (b) is sufficient
for the blog read-path migration. The lexical→runs converter itself lives on the
blog/migration side (mark-set reference: `blog/bp/blog/ghost/lexical_to_sx.py`),
not in lib/content.
## Known limitations
- **Markdown table cells containing `|` do not round-trip.** `asMarkdown` on a
@@ -137,6 +183,28 @@ lib/content/api.sx ── (content/edit) (content/render) (content/history) ─
## Progress log
- 2026-06-07 — **Phase 5 — rich inline text (structured runs) DONE.** A CtText's
`text` may now be a list of inline runs `(text marks href)` instead of a plain
string; CtHeading/CtQuote inherit rich rendering, CtCode renders runs as plain
verbatim text. New `runs.sx` (`content-bootstrap-runs!`) overrides the
render/markdown/text methods of CtText & subclasses with run-aware versions
that are byte-identical for plain-string bodies (opt-in; existing suites
untouched). All 4 modes: asHTML emits `<strong>/<em>/<u>/<s>/<code>/<sub>/<sup>`
+ `<a href>`, asMarkdown emits `**`/`_`/`~~`/`` ` ``/`[..](..)` (u/sub/sup →
inline HTML), asSx emits nested run structure `(p "a" (strong "b"))` (matches
the SX editor's wire format), asText returns the PLAIN concatenation — so
search-text/stats/find-replace stay drift-proof. find-replace (`fr-rep-text`)
rewrites per run with marks preserved; search-text finds across run boundaries
via asText; CRDT merge treats the runs list as one block-level LWW value
(commutes/idempotent, verified); data + wire round-trip runs losslessly.
**Design note:** runs are a Smalltalk-renderable LIST, not the brief's
`{:text :marks}` dict — the Smalltalk-on-SX render methods (which must run
under nested dispatch) can iterate SX lists but cannot read SX dict fields
(`Dictionary>>at:` is broken in lib/smalltalk, out of scope). Marks are built
from `:bold`-style keywords (which evaluate to the strings the renderer
matches). Phase 6 (char-level concurrent inline CRDT) recorded as future, not
built. +36 runs tests (44 suites). 861/861.
- 2026-06-07 — Feature: `content/block-path` + `content/block-depth`
(block-path.sx, new suite). The read-side companion to doc-find-deep (locate
the block) and move-into/promote (relocate it): returns the ancestor-section