source.sx: live-source adapter resolving Q-M4 (internal-data query, not direct PG). Injected fetch-fn transport port (hexagonal seam); parse-row maps a blog post-row to the importer post dict and parses the :lexical JSON string via dream-json-parse. End-to-end drivers: backfill! (enumerate->fetch->import) and sync-verify (enumerate->fetch->verify), + backfill-ids! explicit-id fallback. Tests mock the transport against the documented response contract incl. a real lexical JSON string. README flags the one blog-side gap (add a published-posts enumeration query) + production fetch_data wiring (lives in lib/host). source 20/20; total 75/75. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
lib/blogimport — blog Postgres → persist genesis-import + parity verifier
Implements plans/migration/data-migration.md (the "long-pole nobody had
started") and the at-rest half of slice-01-blog.md §4 — the data layer of the
blog read-path migration. Host-ops migration tooling, not a domain core: it
composes the public APIs of content-on-sx (lib/content) and persist
(lib/persist). Kept in its own module (not lib/host, not lib/content) so it
doesn't collide with the loops that own those.
Status: machinery complete + live-source wired, 75/75 conformance (lexical 23, import 21, verify 11, source 20).
What it does
| Module | Role |
|---|---|
lexical.sx |
blogimport/lex-blocks doc — Ghost lexical body (as SX dicts) → content-on-sx block list, ids deterministic by position (b0,b1,…). |
import.sx |
blogimport/import-post! b post at — genesis import: convert the post's lexical, commit blocks as ordered op-inserts into the content:<id> op-log stream, record metadata in a sibling postmeta:<id> stream. Idempotent (skip-if-exists). import-all! → coverage scoreboard. |
verify.sx |
blogimport/verify-post b post — replay the stream → block model, diff vs the row-derived oracle with =. verify-all → {:total :ok :mismatched} coverage. |
source.sx |
Live source (Q-M4 = internal-data query). Injected fetch-fn transport port; parse-row maps a service post-row → importer post dict and parses the :lexical JSON string (dream-json-parse). backfill! b fetch-fn at = enumerate → fetch → import; sync-verify b fetch-fn = enumerate → fetch → verify. backfill-ids! is the explicit-id fallback. |
What is proven
The verifier holds lexical → import → persist → replay → block-model equal to
lexical → block-model computed directly. I.e. the genesis import + op-log
replay is lossless — "did the backfill corrupt anything" at rest
(data-migration.md §6). The verify.sx corruption test confirms a diverging stream
is detected, not silently passed.
Known limitations / TODO (carry into the plan)
- Inline formatting is flattened to plain text. Architecture's content model holds
plain-string text (
mk-text id text); Phase-5 rich inline runs are not merged here. The single swap-point islex-inline-textinlexical.sx— return runs there once content-on-sx Phase 5 lands onarchitecture. Bold/italic/links currently collapse to their plain concatenation (drift-proof, ==asText). (slice-01-blog Q-B1.) - Q-M4 RESOLVED — live source = internal-data query (
source.sx), via an injectedfetch-fnport. The remaining real-world wiring is operational, not design:- One blog-side query must be added:
blog/queries.sxhas fetch-by-id/slug/ids but no enumeration query. Add apublished-postsdefquery returning the published ids/slugs (Pythonlist_posts(status="published"),blog/bp/blog/ghost_db.py:102). Until then, drivebackfill-ids!with an explicit id list.source.sxis mocked against this contract intests/source.sx. - Production
fetch-fn= the host's HMAC-signedfetch_datawrapper (GET /internal/data/{query}). That wiring lives inlib/host(the host loop's territory);source.sxonly needs the port injected. - Confirm the response field names of the live
get-post-by-*data handler againstparse-row's contract (:uuid|:id :slug :title :status :visibility :tags :authors :lexical); a mismatch is a one-line field fix.
- One blog-side query must be added:
- Oracle is the lexical→blocks of the SAME post, not the live Python block model. This proves round-trip fidelity through persist (no corruption at rest). The "does SX match the Python render" half of Q-D2 would additionally diff against the Python side's own block derivation — deferred with the read-path cutover.
- Re-import with an improved converter (Q-M5) is import-once today (skip-if-exists). Superseding prior genesis events (vs truncate+re-import) is future work.
Run
bash lib/blogimport/conformance.sh # 75/75; writes scoreboard.{json,md}