Files
rose-ash/lib/blogimport
giles 3dd6626d86
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 55s
blogimport: published-posts source contract + blog-side draft (76/76)
source.sx refactored to a single published-posts batch query returning full rows
(incl. lexical) — the existing post-by-id/slug DTO lacks lexical (sx_content/html
only), so the canonical lexical->blocks path needs a dedicated migration provider.
backfill-ids! now filters client-side (no extra query).

drafts/published-posts.sx + drafts/README.md: paste-ready blog-app change (defquery +
SqlBlogService.list_published_posts returning rows incl. raw lexical). README updated.
source 21/21; total 76/76.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-30 14:17:52 +00:00
..

lib/blogimport — blog Postgres → persist genesis-import + parity verifier

Implements plans/migration/data-migration.md (the "long-pole nobody had started") and the at-rest half of slice-01-blog.md §4 — the data layer of the blog read-path migration. Host-ops migration tooling, not a domain core: it composes the public APIs of content-on-sx (lib/content) and persist (lib/persist). Kept in its own module (not lib/host, not lib/content) so it doesn't collide with the loops that own those.

Status: machinery complete + live-source wired, 75/75 conformance (lexical 23, import 21, verify 11, source 20).

What it does

Module Role
lexical.sx blogimport/lex-blocks doc — Ghost lexical body (as SX dicts) → content-on-sx block list, ids deterministic by position (b0,b1,…).
import.sx blogimport/import-post! b post at — genesis import: convert the post's lexical, commit blocks as ordered op-inserts into the content:<id> op-log stream, record metadata in a sibling postmeta:<id> stream. Idempotent (skip-if-exists). import-all! → coverage scoreboard.
verify.sx blogimport/verify-post b post — replay the stream → block model, diff vs the row-derived oracle with =. verify-all{:total :ok :mismatched} coverage.
source.sx Live source (Q-M4 = internal-data query). Injected fetch-fn transport port; parse-row maps a service post-row → importer post dict and parses the :lexical JSON string (dream-json-parse). backfill! b fetch-fn at = enumerate → fetch → import; sync-verify b fetch-fn = enumerate → fetch → verify. backfill-ids! is the explicit-id fallback.

What is proven

The verifier holds lexical → import → persist → replay → block-model equal to lexical → block-model computed directly. I.e. the genesis import + op-log replay is lossless — "did the backfill corrupt anything" at rest (data-migration.md §6). The verify.sx corruption test confirms a diverging stream is detected, not silently passed.

Known limitations / TODO (carry into the plan)

  • Inline formatting is flattened to plain text. Architecture's content model holds plain-string text (mk-text id text); Phase-5 rich inline runs are not merged here. The single swap-point is lex-inline-text in lexical.sx — return runs there once content-on-sx Phase 5 lands on architecture. Bold/italic/links currently collapse to their plain concatenation (drift-proof, == asText). (slice-01-blog Q-B1.)
  • Q-M4 RESOLVED — live source = internal-data query (source.sx), via an injected fetch-fn port. The remaining real-world wiring is operational, not design:
    1. One blog-side query must be added: blog/queries.sx has fetch-by-id/slug/ids but no enumeration query. Add a published-posts defquery returning the published ids/slugs (Python list_posts(status="published"), blog/bp/blog/ghost_db.py:102). Until then, drive backfill-ids! with an explicit id list. source.sx is mocked against this contract in tests/source.sx.
    2. Production fetch-fn = the host's HMAC-signed fetch_data wrapper (GET /internal/data/{query}). That wiring lives in lib/host (the host loop's territory); source.sx only needs the port injected.
    3. Confirm the response field names of the live get-post-by-* data handler against parse-row's contract (:uuid|:id :slug :title :status :visibility :tags :authors :lexical); a mismatch is a one-line field fix.
  • Oracle is the lexical→blocks of the SAME post, not the live Python block model. This proves round-trip fidelity through persist (no corruption at rest). The "does SX match the Python render" half of Q-D2 would additionally diff against the Python side's own block derivation — deferred with the read-path cutover.
  • Re-import with an improved converter (Q-M5) is import-once today (skip-if-exists). Superseding prior genesis events (vs truncate+re-import) is future work.

Run

bash lib/blogimport/conformance.sh     # 75/75; writes scoreboard.{json,md}