Files
rose-ash/lib/blogimport/README.md
giles a4d93c61cc
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 1m9s
blogimport: lexical->persist genesis-import + at-rest parity verifier (55/55)
Implements plans/migration/data-migration.md (the un-started long-pole) and the
data-layer half of slice-01-blog §4. Host-ops migration module composing
content-on-sx + persist public APIs; isolated from lib/host and lib/content.

- lexical.sx: Ghost lexical (as SX dicts) -> content block list, deterministic ids
- import.sx: genesis import into content:<id> op-log, idempotent, + postmeta stream
- verify.sx: replay-and-diff vs row-derived oracle (proves round-trip lossless)

Inline formatting flattens to plain text (Phase-5 runs swap-point isolated in
lex-inline-text); live Postgres source (Q-M4) + improved-converter re-import (Q-M5)
flagged in README. 55/55 conformance: lexical 23, import 21, verify 11.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-30 13:14:30 +00:00

48 lines
2.8 KiB
Markdown

# lib/blogimport — blog Postgres → persist genesis-import + parity verifier
Implements **`plans/migration/data-migration.md`** (the "long-pole nobody had
started") and the at-rest half of **`slice-01-blog.md` §4** — the data layer of the
blog read-path migration. Host-ops migration tooling, **not** a domain core: it
composes the public APIs of content-on-sx (`lib/content`) and persist
(`lib/persist`). Kept in its own module (not `lib/host`, not `lib/content`) so it
doesn't collide with the loops that own those.
Status: **machinery complete, 55/55 conformance** (lexical 23, import 21, verify 11).
## What it does
| Module | Role |
|---|---|
| `lexical.sx` | `blogimport/lex-blocks doc` — Ghost **lexical** body (as SX dicts) → content-on-sx **block list**, ids deterministic by position (`b0,b1,…`). |
| `import.sx` | `blogimport/import-post! b post at` — genesis import: convert the post's lexical, commit blocks as ordered `op-insert`s into the `content:<id>` op-log stream, record metadata in a sibling `postmeta:<id>` stream. Idempotent (skip-if-exists). `import-all!` → coverage scoreboard. |
| `verify.sx` | `blogimport/verify-post b post` — replay the stream → block model, diff vs the row-derived oracle with `=`. `verify-all``{:total :ok :mismatched}` coverage. |
## What is proven
The verifier holds **`lexical → import → persist → replay → block-model`** equal to
**`lexical → block-model`** computed directly. I.e. **the genesis import + op-log
replay is lossless** — "did the backfill corrupt anything" at rest
(`data-migration.md` §6). The `verify.sx` corruption test confirms a diverging stream
is *detected*, not silently passed.
## Known limitations / TODO (carry into the plan)
- **Inline formatting is flattened to plain text.** Architecture's content model holds
plain-string text (`mk-text id text`); Phase-5 rich inline runs are not merged here.
The single swap-point is `lex-inline-text` in `lexical.sx` — return runs there once
content-on-sx Phase 5 lands on `architecture`. Bold/italic/links currently collapse
to their plain concatenation (drift-proof, == `asText`). (slice-01-blog Q-B1.)
- **Oracle is the in-memory lexical→blocks, not the live Python block model.** This
proves round-trip fidelity through persist. The "does SX match Python" half of Q-D2
needs the **live source**: read real `Post` rows via the internal-data query
(`/internal/data/…`) or direct Postgres (**Q-M4**, undecided) and feed them as `post`
dicts. The diff plumbing here is the twin that step reuses.
- **Re-import with an improved converter (Q-M5)** is import-once today (skip-if-exists).
Superseding prior genesis events (vs truncate+re-import) is future work.
## Run
```bash
bash lib/blogimport/conformance.sh # 55/55; writes scoreboard.{json,md}
```