Files
rose-ash/lib/blogimport/README.md
giles a4d93c61cc
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 1m9s
blogimport: lexical->persist genesis-import + at-rest parity verifier (55/55)
Implements plans/migration/data-migration.md (the un-started long-pole) and the
data-layer half of slice-01-blog §4. Host-ops migration module composing
content-on-sx + persist public APIs; isolated from lib/host and lib/content.

- lexical.sx: Ghost lexical (as SX dicts) -> content block list, deterministic ids
- import.sx: genesis import into content:<id> op-log, idempotent, + postmeta stream
- verify.sx: replay-and-diff vs row-derived oracle (proves round-trip lossless)

Inline formatting flattens to plain text (Phase-5 runs swap-point isolated in
lex-inline-text); live Postgres source (Q-M4) + improved-converter re-import (Q-M5)
flagged in README. 55/55 conformance: lexical 23, import 21, verify 11.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-30 13:14:30 +00:00

2.8 KiB

lib/blogimport — blog Postgres → persist genesis-import + parity verifier

Implements plans/migration/data-migration.md (the "long-pole nobody had started") and the at-rest half of slice-01-blog.md §4 — the data layer of the blog read-path migration. Host-ops migration tooling, not a domain core: it composes the public APIs of content-on-sx (lib/content) and persist (lib/persist). Kept in its own module (not lib/host, not lib/content) so it doesn't collide with the loops that own those.

Status: machinery complete, 55/55 conformance (lexical 23, import 21, verify 11).

What it does

Module Role
lexical.sx blogimport/lex-blocks doc — Ghost lexical body (as SX dicts) → content-on-sx block list, ids deterministic by position (b0,b1,…).
import.sx blogimport/import-post! b post at — genesis import: convert the post's lexical, commit blocks as ordered op-inserts into the content:<id> op-log stream, record metadata in a sibling postmeta:<id> stream. Idempotent (skip-if-exists). import-all! → coverage scoreboard.
verify.sx blogimport/verify-post b post — replay the stream → block model, diff vs the row-derived oracle with =. verify-all{:total :ok :mismatched} coverage.

What is proven

The verifier holds lexical → import → persist → replay → block-model equal to lexical → block-model computed directly. I.e. the genesis import + op-log replay is lossless — "did the backfill corrupt anything" at rest (data-migration.md §6). The verify.sx corruption test confirms a diverging stream is detected, not silently passed.

Known limitations / TODO (carry into the plan)

  • Inline formatting is flattened to plain text. Architecture's content model holds plain-string text (mk-text id text); Phase-5 rich inline runs are not merged here. The single swap-point is lex-inline-text in lexical.sx — return runs there once content-on-sx Phase 5 lands on architecture. Bold/italic/links currently collapse to their plain concatenation (drift-proof, == asText). (slice-01-blog Q-B1.)
  • Oracle is the in-memory lexical→blocks, not the live Python block model. This proves round-trip fidelity through persist. The "does SX match Python" half of Q-D2 needs the live source: read real Post rows via the internal-data query (/internal/data/…) or direct Postgres (Q-M4, undecided) and feed them as post dicts. The diff plumbing here is the twin that step reuses.
  • Re-import with an improved converter (Q-M5) is import-once today (skip-if-exists). Superseding prior genesis events (vs truncate+re-import) is future work.

Run

bash lib/blogimport/conformance.sh     # 55/55; writes scoreboard.{json,md}