source.sx refactored to a single published-posts batch query returning full rows (incl. lexical) — the existing post-by-id/slug DTO lacks lexical (sx_content/html only), so the canonical lexical->blocks path needs a dedicated migration provider. backfill-ids! now filters client-side (no extra query). drafts/published-posts.sx + drafts/README.md: paste-ready blog-app change (defquery + SqlBlogService.list_published_posts returning rows incl. raw lexical). README updated. source 21/21; total 76/76. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Blog-side draft — the published-posts migration query
The one blog-app change needed to make lib/blogimport's live source (Q-M4) real.
Two parts: an SX defquery (published-posts.sx in this dir) and a Python
provider it binds to. Both go in the blog app (production blog/ tree); they
are drafted here so the importer ships with its dependency spelled out. Apply on the
blog app's branch, not on this migration branch.
Why a new query (not reuse post-by-id)
blogimport/source.sx needs, for every published post: id, slug, title, status, visibility, tags, authors, lexical. The existing providers
(blog/services/__init__.py SqlBlogService.get_post_by_*) return a PostDTO whose
_post_to_dto exposes sx_content/html but not lexical — and the canonical
migration path is lexical→blocks (slice-01-blog Q-B1), not sx_content. So a dedicated
migration provider that returns full rows including the raw lexical body is the
minimal, honest change. One batch call covers both enumeration (Q-D2 corpus) and
bodies.
1. defquery (→ blog/queries.sx)
See published-posts.sx in this directory:
(defquery published-posts ()
"Enumerate every published, non-page blog post as a full row INCLUDING the raw
lexical body — the SX migration corpus (Q-D2). Read-only ..."
(service "blog" "list-published-posts"))
Kebab→snake convention (as for get-post-by-slug → get_post_by_slug) binds
"list-published-posts" to the SqlBlogService.list_published_posts method below.
2. Python provider (→ blog/services/__init__.py, in SqlBlogService)
from sqlalchemy.orm import selectinload # add to imports
async def list_published_posts(self, session: AsyncSession) -> list[dict]:
"""Migration corpus: every published, non-page post as a full row INCLUDING
the raw lexical body (Q-D2). Read-only; consumed by the SX blogimport
backfill/verify. Mirrors ghost_db.list_posts() base visibility filters."""
result = await session.execute(
select(Post)
.where(
Post.deleted_at.is_(None),
Post.status == "published",
Post.is_page.is_(False),
)
.options(selectinload(Post.tags), selectinload(Post.authors))
.order_by(Post.published_at.desc().nullslast())
)
return [
{
"id": p.id,
"uuid": p.uuid,
"slug": p.slug,
"title": p.title,
"status": p.status,
"visibility": p.visibility,
"lexical": p.lexical,
"tags": [t.slug for t in p.tags],
"authors": [a.slug for a in p.authors],
}
for p in result.scalars().unique().all()
]
Confirm before applying:
- The relationship names on
Post(tags,authors) — checkblog/models/content.pyjoin tables (post_tags,post_authors); adjustselectinload+ the comprehensions if they differ..unique()is needed because the eager joins fan out rows. Post.uuidandPost.lexicalcolumns exist (models/content.py~lines 61-63).- Visibility filters match
ghost_db.list_posts()(drafts excluded, pages excluded) so the corpus is exactly the published read-path set.
3. Verify the contract
After applying, the response shape must match blogimport/parse-row
(lib/blogimport/source.sx): keys :uuid|:id :slug :title :status :visibility :tags :authors :lexical, with :lexical a JSON string (parsed via dream-json-parse). The
mock in lib/blogimport/tests/source.sx is the executable spec of this contract.
4. Then wire the transport (host loop)
blogimport/backfill!/sync-verify take an injected fetch-fn. In production that is
the host's HMAC fetch_data wrapper (GET /internal/data/published-posts) — wiring
that lives in lib/host, not here.