blogimport: Q-M4 live source — internal-data query adapter (75/75)
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 1m5s

source.sx: live-source adapter resolving Q-M4 (internal-data query, not direct PG).
Injected fetch-fn transport port (hexagonal seam); parse-row maps a blog post-row to
the importer post dict and parses the :lexical JSON string via dream-json-parse.
End-to-end drivers: backfill! (enumerate->fetch->import) and sync-verify
(enumerate->fetch->verify), + backfill-ids! explicit-id fallback.

Tests mock the transport against the documented response contract incl. a real lexical
JSON string. README flags the one blog-side gap (add a published-posts enumeration
query) + production fetch_data wiring (lives in lib/host). source 20/20; total 75/75.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
2026-06-30 13:26:15 +00:00
parent a4d93c61cc
commit c82372c780
6 changed files with 205 additions and 12 deletions

View File

@@ -7,7 +7,8 @@ composes the public APIs of content-on-sx (`lib/content`) and persist
(`lib/persist`). Kept in its own module (not `lib/host`, not `lib/content`) so it
doesn't collide with the loops that own those.
Status: **machinery complete, 55/55 conformance** (lexical 23, import 21, verify 11).
Status: **machinery complete + live-source wired, 75/75 conformance**
(lexical 23, import 21, verify 11, source 20).
## What it does
@@ -16,6 +17,7 @@ Status: **machinery complete, 55/55 conformance** (lexical 23, import 21, verify
| `lexical.sx` | `blogimport/lex-blocks doc` — Ghost **lexical** body (as SX dicts) → content-on-sx **block list**, ids deterministic by position (`b0,b1,…`). |
| `import.sx` | `blogimport/import-post! b post at` — genesis import: convert the post's lexical, commit blocks as ordered `op-insert`s into the `content:<id>` op-log stream, record metadata in a sibling `postmeta:<id>` stream. Idempotent (skip-if-exists). `import-all!` → coverage scoreboard. |
| `verify.sx` | `blogimport/verify-post b post` — replay the stream → block model, diff vs the row-derived oracle with `=`. `verify-all``{:total :ok :mismatched}` coverage. |
| `source.sx` | **Live source (Q-M4 = internal-data query).** Injected `fetch-fn` transport port; `parse-row` maps a service post-row → importer `post` dict and parses the `:lexical` JSON string (`dream-json-parse`). `backfill! b fetch-fn at` = enumerate → fetch → import; `sync-verify b fetch-fn` = enumerate → fetch → verify. `backfill-ids!` is the explicit-id fallback. |
## What is proven
@@ -32,16 +34,28 @@ is *detected*, not silently passed.
The single swap-point is `lex-inline-text` in `lexical.sx` — return runs there once
content-on-sx Phase 5 lands on `architecture`. Bold/italic/links currently collapse
to their plain concatenation (drift-proof, == `asText`). (slice-01-blog Q-B1.)
- **Oracle is the in-memory lexical→blocks, not the live Python block model.** This
proves round-trip fidelity through persist. The "does SX match Python" half of Q-D2
needs the **live source**: read real `Post` rows via the internal-data query
(`/internal/data/…`) or direct Postgres (**Q-M4**, undecided) and feed them as `post`
dicts. The diff plumbing here is the twin that step reuses.
- **Q-M4 RESOLVED — live source = internal-data query** (`source.sx`), via an injected
`fetch-fn` port. The remaining real-world wiring is operational, not design:
1. **One blog-side query must be added**: `blog/queries.sx` has fetch-by-id/slug/ids
but **no enumeration query**. Add a `published-posts` defquery returning the
published ids/slugs (Python `list_posts(status="published")`,
`blog/bp/blog/ghost_db.py:102`). Until then, drive `backfill-ids!` with an explicit
id list. `source.sx` is mocked against this contract in `tests/source.sx`.
2. **Production `fetch-fn`** = the host's HMAC-signed `fetch_data` wrapper
(`GET /internal/data/{query}`). That wiring lives in `lib/host` (the host loop's
territory); `source.sx` only needs the port injected.
3. **Confirm the response field names** of the live `get-post-by-*` data handler
against `parse-row`'s contract (`:uuid|:id :slug :title :status :visibility :tags
:authors :lexical`); a mismatch is a one-line field fix.
- **Oracle is the lexical→blocks of the SAME post, not the live Python block model.**
This proves round-trip fidelity through persist (no corruption at rest). The "does SX
match the *Python render*" half of Q-D2 would additionally diff against the Python
side's own block derivation — deferred with the read-path cutover.
- **Re-import with an improved converter (Q-M5)** is import-once today (skip-if-exists).
Superseding prior genesis events (vs truncate+re-import) is future work.
## Run
```bash
bash lib/blogimport/conformance.sh # 55/55; writes scoreboard.{json,md}
bash lib/blogimport/conformance.sh # 75/75; writes scoreboard.{json,md}
```

View File

@@ -16,7 +16,7 @@ if [ ! -x "$SX_SERVER" ]; then
fi
fi
SUITES=(lexical import verify)
SUITES=(lexical import verify source)
OUT_JSON="lib/blogimport/scoreboard.json"
OUT_MD="lib/blogimport/scoreboard.md"
@@ -49,9 +49,11 @@ run_suite() {
(load "lib/content/callout.sx")
(load "lib/content/media.sx")
(load "lib/content/store.sx")
(load "lib/dream/json.sx")
(load "lib/blogimport/lexical.sx")
(load "lib/blogimport/import.sx")
(load "lib/blogimport/verify.sx")
(load "lib/blogimport/source.sx")
(epoch 2)
(eval "(define bi-test-pass 0)")
(eval "(define bi-test-fail 0)")

View File

@@ -2,9 +2,10 @@
"suites": {
"lexical": {"pass": 23, "fail": 0},
"import": {"pass": 21, "fail": 0},
"verify": {"pass": 11, "fail": 0}
"verify": {"pass": 11, "fail": 0},
"source": {"pass": 20, "fail": 0}
},
"total_pass": 55,
"total_pass": 75,
"total_fail": 0,
"total": 55
"total": 75
}

View File

@@ -7,4 +7,5 @@ _Generated by `lib/blogimport/conformance.sh`_
| lexical | 23 | 0 | 23 |
| import | 21 | 0 | 21 |
| verify | 11 | 0 | 11 |
| **Total** | **55** | **0** | **55** |
| source | 20 | 0 | 20 |
| **Total** | **75** | **0** | **75** |

92
lib/blogimport/source.sx Normal file
View File

@@ -0,0 +1,92 @@
; lib/blogimport/source.sx
; Live source adapter — Q-M4 RESOLVED: import via the blog INTERNAL-DATA QUERY
; surface (decoupled), not direct Postgres. Reuses the existing query contracts
; (blog/queries.sx: post-by-id/post-by-slug/posts-by-ids) and keeps the importer in
; the SX/host world (plans/migration/data-migration.md §7 recommended default).
;
; TRANSPORT SEAM (hexagonal, like every other subsystem): a `fetch-fn` port is
; INJECTED. Contract:
; (fetch-fn query-name params-dict) -> response-data
; In production `fetch-fn` is the host's HMAC-signed fetch_data wrapper
; (GET /internal/data/{query}); in tests it's a mock. The importer never knows how
; the bytes arrive.
;
; RESPONSE CONTRACT (one published-post row), the blog `get-post-by-*` data handler:
; {:uuid|:id :slug :title :status :visibility :tags :authors :lexical}
; :lexical is the Ghost body as a JSON STRING (the Post.lexical DB column) — parsed
; here with dream-json-parse into the SX dict shape blogimport/lex-blocks expects.
; (If a handler returns :lexical already-structured, it is used as-is.)
;
; REQUIRED BLOG-SIDE ADDITION (the one gap): blog/queries.sx exposes fetch-by-id/slug
; but NO enumeration query. The corpus (Q-D2 = every published post) needs a
; `published-posts` query returning the published ids/slugs (Python: list_posts(
; status="published"), blog/bp/blog/ghost_db.py:102). Flagged for the blog app; mocked
; in tests. Until it exists, callers can pass an explicit id list to backfill-ids!.
(define blogimport/dep-json-parse dream-json-parse)
; --- lexical field -> SX dict (string from DB column, or already structured) -----
(define
blogimport/parse-lexical
(fn (lx)
(cond
((equal? lx nil) {:root {:children (list)}})
((string? lx) (blogimport/dep-json-parse lx))
(else lx))))
; --- service post-row -> importer `post` dict -----------------------------------
(define
blogimport/parse-row
(fn (row)
{:id (or (get row :uuid) (get row :id))
:slug (or (get row :slug) "")
:title (or (get row :title) "")
:status (or (get row :status) "")
:visibility (or (get row :visibility) "")
:tags (or (get row :tags) (list))
:authors (or (get row :authors) (list))
:lexical (blogimport/parse-lexical (get row :lexical))}))
; --- fetch one post via an internal-data query ----------------------------------
(define
blogimport/fetch-post
(fn (fetch-fn query params)
(blogimport/parse-row (fetch-fn query params))))
; --- enumerate published post ids (needs the `published-posts` query) -----------
(define
blogimport/published-ids
(fn (fetch-fn) (fetch-fn "published-posts" {})))
; --- fetch all published posts as importer `post` dicts -------------------------
(define
blogimport/source-posts
(fn (fetch-fn)
(map
(fn (id) (blogimport/fetch-post fetch-fn "post-by-id" {:id id}))
(blogimport/published-ids fetch-fn))))
; --- fetch an explicit id list (fallback before the enumeration query lands) ----
(define
blogimport/source-posts-by-ids
(fn (fetch-fn ids)
(map (fn (id) (blogimport/fetch-post fetch-fn "post-by-id" {:id id})) ids)))
; --- end-to-end drivers ---------------------------------------------------------
; backfill = enumerate -> fetch -> genesis-import (idempotent). Re-runnable as the
; one-way DB->persist sync (data-migration.md Strategy 1).
(define
blogimport/backfill!
(fn (b fetch-fn at)
(blogimport/import-all! b (blogimport/source-posts fetch-fn) at)))
(define
blogimport/backfill-ids!
(fn (b fetch-fn ids at)
(blogimport/import-all! b (blogimport/source-posts-by-ids fetch-fn ids) at)))
; sync-verify = enumerate -> fetch -> shadow-diff the persisted streams at rest.
(define
blogimport/sync-verify
(fn (b fetch-fn)
(blogimport/verify-all b (blogimport/source-posts fetch-fn))))

View File

@@ -0,0 +1,83 @@
; lib/blogimport/tests/source.sx — live-source adapter (Q-M4 internal-data query)
(st-bootstrap-classes!)
(content-bootstrap-blocks!)
(content-bootstrap-doc!)
(content-bootstrap-callout!)
(content-bootstrap-media!)
; ---- canned service responses (lexical arrives as a JSON STRING, the DB column) ----
(define
lex1
"{\"root\":{\"children\":[{\"type\":\"heading\",\"tag\":\"h2\",\"children\":[{\"type\":\"text\",\"text\":\"Live\"}]},{\"type\":\"paragraph\",\"children\":[{\"type\":\"text\",\"text\":\"from db\"}]}]}}")
(define
row1
{:uuid "post-1" :slug "live" :title "Live" :status "published"
:visibility "public" :tags (list "x") :authors (list "u") :lexical lex1})
(define
row2
{:uuid "post-2" :slug "two" :title "Two" :status "published"
:lexical "{\"children\":[{\"type\":\"paragraph\",\"children\":[{\"type\":\"text\",\"text\":\"second\"}]}]}"})
; ---- mock transport: (fetch-fn query params) -> response ----
(define
mock-fetch
(fn (query params)
(cond
((equal? query "published-posts") (list "post-1" "post-2"))
((equal? query "post-by-id")
(cond
((equal? (get params :id) "post-1") row1)
((equal? (get params :id) "post-2") row2)
(else nil)))
(else nil))))
; ---- parse-row maps fields + parses the lexical JSON string ----
(define post1 (blogimport/parse-row row1))
(bi-test "parse-row id from uuid" (get post1 :id) "post-1")
(bi-test "parse-row title" (get post1 :title) "Live")
(bi-test "parse-row tags" (get post1 :tags) (list "x"))
(bi-test "parse-row lexical parsed to blocks"
(map blk-type (blogimport/lex-blocks (get post1 :lexical))) (list "heading" "text"))
; ---- id fallback (:id when no :uuid) + structured (non-string) lexical ----
(define
post3
(blogimport/parse-row
{:id "post-3" :slug "s3"
:lexical {:children (list {:type "paragraph" :children (list {:type "text" :text "x"})})}}))
(bi-test "parse-row id fallback" (get post3 :id) "post-3")
(bi-test "parse-row structured lexical used as-is"
(map blk-type (blogimport/lex-blocks (get post3 :lexical))) (list "text"))
; ---- enumeration + source-posts ----
(bi-test "published-ids" (blogimport/published-ids mock-fetch) (list "post-1" "post-2"))
(bi-test "source-posts ids"
(map (fn (p) (get p :id)) (blogimport/source-posts mock-fetch))
(list "post-1" "post-2"))
; ---- end-to-end backfill from the live source ----
(define B (persist/open))
(define cov (blogimport/backfill! B mock-fetch 10))
(bi-test "backfill total" (get cov :total) 2)
(bi-test "backfill imported" (get cov :imported) 2)
(bi-test "backfill post-1 version-count" (content/version-count B "post-1") 2)
(bi-test "backfill post-1 head ids" (doc-ids (content/head B "post-1")) (list "b0" "b1"))
(bi-test "backfill post-1 body text"
(str (blk-send (doc-find (content/head B "post-1") "b1") "text")) "from db")
(bi-test "backfill meta title" (get (blogimport/load-meta B "post-1") :title) "Live")
; ---- backfill is idempotent (one-way sync re-run) ----
(define cov2 (blogimport/backfill! B mock-fetch 11))
(bi-test "backfill rerun skipped" (get cov2 :skipped) 2)
; ---- sync-verify: persisted streams match the live-source oracle ----
(define sv (blogimport/sync-verify B mock-fetch))
(bi-test "sync-verify total" (get sv :total) 2)
(bi-test "sync-verify ok" (get sv :ok) 2)
(bi-test "sync-verify no mismatch" (get sv :mismatched) (list))
; ---- explicit-id fallback path (before the enumeration query lands) ----
(define B2 (persist/open))
(define covx (blogimport/backfill-ids! B2 mock-fetch (list "post-2") 10))
(bi-test "backfill-ids imported" (get covx :imported) 1)
(bi-test "backfill-ids post-2 ids" (doc-ids (content/head B2 "post-2")) (list "b0"))