blogimport: Q-M4 live source — internal-data query adapter (75/75)
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 1m5s
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 1m5s
source.sx: live-source adapter resolving Q-M4 (internal-data query, not direct PG). Injected fetch-fn transport port (hexagonal seam); parse-row maps a blog post-row to the importer post dict and parses the :lexical JSON string via dream-json-parse. End-to-end drivers: backfill! (enumerate->fetch->import) and sync-verify (enumerate->fetch->verify), + backfill-ids! explicit-id fallback. Tests mock the transport against the documented response contract incl. a real lexical JSON string. README flags the one blog-side gap (add a published-posts enumeration query) + production fetch_data wiring (lives in lib/host). source 20/20; total 75/75. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -7,7 +7,8 @@ composes the public APIs of content-on-sx (`lib/content`) and persist
|
||||
(`lib/persist`). Kept in its own module (not `lib/host`, not `lib/content`) so it
|
||||
doesn't collide with the loops that own those.
|
||||
|
||||
Status: **machinery complete, 55/55 conformance** (lexical 23, import 21, verify 11).
|
||||
Status: **machinery complete + live-source wired, 75/75 conformance**
|
||||
(lexical 23, import 21, verify 11, source 20).
|
||||
|
||||
## What it does
|
||||
|
||||
@@ -16,6 +17,7 @@ Status: **machinery complete, 55/55 conformance** (lexical 23, import 21, verify
|
||||
| `lexical.sx` | `blogimport/lex-blocks doc` — Ghost **lexical** body (as SX dicts) → content-on-sx **block list**, ids deterministic by position (`b0,b1,…`). |
|
||||
| `import.sx` | `blogimport/import-post! b post at` — genesis import: convert the post's lexical, commit blocks as ordered `op-insert`s into the `content:<id>` op-log stream, record metadata in a sibling `postmeta:<id>` stream. Idempotent (skip-if-exists). `import-all!` → coverage scoreboard. |
|
||||
| `verify.sx` | `blogimport/verify-post b post` — replay the stream → block model, diff vs the row-derived oracle with `=`. `verify-all` → `{:total :ok :mismatched}` coverage. |
|
||||
| `source.sx` | **Live source (Q-M4 = internal-data query).** Injected `fetch-fn` transport port; `parse-row` maps a service post-row → importer `post` dict and parses the `:lexical` JSON string (`dream-json-parse`). `backfill! b fetch-fn at` = enumerate → fetch → import; `sync-verify b fetch-fn` = enumerate → fetch → verify. `backfill-ids!` is the explicit-id fallback. |
|
||||
|
||||
## What is proven
|
||||
|
||||
@@ -32,16 +34,28 @@ is *detected*, not silently passed.
|
||||
The single swap-point is `lex-inline-text` in `lexical.sx` — return runs there once
|
||||
content-on-sx Phase 5 lands on `architecture`. Bold/italic/links currently collapse
|
||||
to their plain concatenation (drift-proof, == `asText`). (slice-01-blog Q-B1.)
|
||||
- **Oracle is the in-memory lexical→blocks, not the live Python block model.** This
|
||||
proves round-trip fidelity through persist. The "does SX match Python" half of Q-D2
|
||||
needs the **live source**: read real `Post` rows via the internal-data query
|
||||
(`/internal/data/…`) or direct Postgres (**Q-M4**, undecided) and feed them as `post`
|
||||
dicts. The diff plumbing here is the twin that step reuses.
|
||||
- **Q-M4 RESOLVED — live source = internal-data query** (`source.sx`), via an injected
|
||||
`fetch-fn` port. The remaining real-world wiring is operational, not design:
|
||||
1. **One blog-side query must be added**: `blog/queries.sx` has fetch-by-id/slug/ids
|
||||
but **no enumeration query**. Add a `published-posts` defquery returning the
|
||||
published ids/slugs (Python `list_posts(status="published")`,
|
||||
`blog/bp/blog/ghost_db.py:102`). Until then, drive `backfill-ids!` with an explicit
|
||||
id list. `source.sx` is mocked against this contract in `tests/source.sx`.
|
||||
2. **Production `fetch-fn`** = the host's HMAC-signed `fetch_data` wrapper
|
||||
(`GET /internal/data/{query}`). That wiring lives in `lib/host` (the host loop's
|
||||
territory); `source.sx` only needs the port injected.
|
||||
3. **Confirm the response field names** of the live `get-post-by-*` data handler
|
||||
against `parse-row`'s contract (`:uuid|:id :slug :title :status :visibility :tags
|
||||
:authors :lexical`); a mismatch is a one-line field fix.
|
||||
- **Oracle is the lexical→blocks of the SAME post, not the live Python block model.**
|
||||
This proves round-trip fidelity through persist (no corruption at rest). The "does SX
|
||||
match the *Python render*" half of Q-D2 would additionally diff against the Python
|
||||
side's own block derivation — deferred with the read-path cutover.
|
||||
- **Re-import with an improved converter (Q-M5)** is import-once today (skip-if-exists).
|
||||
Superseding prior genesis events (vs truncate+re-import) is future work.
|
||||
|
||||
## Run
|
||||
|
||||
```bash
|
||||
bash lib/blogimport/conformance.sh # 55/55; writes scoreboard.{json,md}
|
||||
bash lib/blogimport/conformance.sh # 75/75; writes scoreboard.{json,md}
|
||||
```
|
||||
|
||||
@@ -16,7 +16,7 @@ if [ ! -x "$SX_SERVER" ]; then
|
||||
fi
|
||||
fi
|
||||
|
||||
SUITES=(lexical import verify)
|
||||
SUITES=(lexical import verify source)
|
||||
|
||||
OUT_JSON="lib/blogimport/scoreboard.json"
|
||||
OUT_MD="lib/blogimport/scoreboard.md"
|
||||
@@ -49,9 +49,11 @@ run_suite() {
|
||||
(load "lib/content/callout.sx")
|
||||
(load "lib/content/media.sx")
|
||||
(load "lib/content/store.sx")
|
||||
(load "lib/dream/json.sx")
|
||||
(load "lib/blogimport/lexical.sx")
|
||||
(load "lib/blogimport/import.sx")
|
||||
(load "lib/blogimport/verify.sx")
|
||||
(load "lib/blogimport/source.sx")
|
||||
(epoch 2)
|
||||
(eval "(define bi-test-pass 0)")
|
||||
(eval "(define bi-test-fail 0)")
|
||||
|
||||
@@ -2,9 +2,10 @@
|
||||
"suites": {
|
||||
"lexical": {"pass": 23, "fail": 0},
|
||||
"import": {"pass": 21, "fail": 0},
|
||||
"verify": {"pass": 11, "fail": 0}
|
||||
"verify": {"pass": 11, "fail": 0},
|
||||
"source": {"pass": 20, "fail": 0}
|
||||
},
|
||||
"total_pass": 55,
|
||||
"total_pass": 75,
|
||||
"total_fail": 0,
|
||||
"total": 55
|
||||
"total": 75
|
||||
}
|
||||
|
||||
@@ -7,4 +7,5 @@ _Generated by `lib/blogimport/conformance.sh`_
|
||||
| lexical | 23 | 0 | 23 |
|
||||
| import | 21 | 0 | 21 |
|
||||
| verify | 11 | 0 | 11 |
|
||||
| **Total** | **55** | **0** | **55** |
|
||||
| source | 20 | 0 | 20 |
|
||||
| **Total** | **75** | **0** | **75** |
|
||||
|
||||
92
lib/blogimport/source.sx
Normal file
92
lib/blogimport/source.sx
Normal file
@@ -0,0 +1,92 @@
|
||||
; lib/blogimport/source.sx
|
||||
; Live source adapter — Q-M4 RESOLVED: import via the blog INTERNAL-DATA QUERY
|
||||
; surface (decoupled), not direct Postgres. Reuses the existing query contracts
|
||||
; (blog/queries.sx: post-by-id/post-by-slug/posts-by-ids) and keeps the importer in
|
||||
; the SX/host world (plans/migration/data-migration.md §7 recommended default).
|
||||
;
|
||||
; TRANSPORT SEAM (hexagonal, like every other subsystem): a `fetch-fn` port is
|
||||
; INJECTED. Contract:
|
||||
; (fetch-fn query-name params-dict) -> response-data
|
||||
; In production `fetch-fn` is the host's HMAC-signed fetch_data wrapper
|
||||
; (GET /internal/data/{query}); in tests it's a mock. The importer never knows how
|
||||
; the bytes arrive.
|
||||
;
|
||||
; RESPONSE CONTRACT (one published-post row), the blog `get-post-by-*` data handler:
|
||||
; {:uuid|:id :slug :title :status :visibility :tags :authors :lexical}
|
||||
; :lexical is the Ghost body as a JSON STRING (the Post.lexical DB column) — parsed
|
||||
; here with dream-json-parse into the SX dict shape blogimport/lex-blocks expects.
|
||||
; (If a handler returns :lexical already-structured, it is used as-is.)
|
||||
;
|
||||
; REQUIRED BLOG-SIDE ADDITION (the one gap): blog/queries.sx exposes fetch-by-id/slug
|
||||
; but NO enumeration query. The corpus (Q-D2 = every published post) needs a
|
||||
; `published-posts` query returning the published ids/slugs (Python: list_posts(
|
||||
; status="published"), blog/bp/blog/ghost_db.py:102). Flagged for the blog app; mocked
|
||||
; in tests. Until it exists, callers can pass an explicit id list to backfill-ids!.
|
||||
|
||||
(define blogimport/dep-json-parse dream-json-parse)
|
||||
|
||||
; --- lexical field -> SX dict (string from DB column, or already structured) -----
|
||||
(define
|
||||
blogimport/parse-lexical
|
||||
(fn (lx)
|
||||
(cond
|
||||
((equal? lx nil) {:root {:children (list)}})
|
||||
((string? lx) (blogimport/dep-json-parse lx))
|
||||
(else lx))))
|
||||
|
||||
; --- service post-row -> importer `post` dict -----------------------------------
|
||||
(define
|
||||
blogimport/parse-row
|
||||
(fn (row)
|
||||
{:id (or (get row :uuid) (get row :id))
|
||||
:slug (or (get row :slug) "")
|
||||
:title (or (get row :title) "")
|
||||
:status (or (get row :status) "")
|
||||
:visibility (or (get row :visibility) "")
|
||||
:tags (or (get row :tags) (list))
|
||||
:authors (or (get row :authors) (list))
|
||||
:lexical (blogimport/parse-lexical (get row :lexical))}))
|
||||
|
||||
; --- fetch one post via an internal-data query ----------------------------------
|
||||
(define
|
||||
blogimport/fetch-post
|
||||
(fn (fetch-fn query params)
|
||||
(blogimport/parse-row (fetch-fn query params))))
|
||||
|
||||
; --- enumerate published post ids (needs the `published-posts` query) -----------
|
||||
(define
|
||||
blogimport/published-ids
|
||||
(fn (fetch-fn) (fetch-fn "published-posts" {})))
|
||||
|
||||
; --- fetch all published posts as importer `post` dicts -------------------------
|
||||
(define
|
||||
blogimport/source-posts
|
||||
(fn (fetch-fn)
|
||||
(map
|
||||
(fn (id) (blogimport/fetch-post fetch-fn "post-by-id" {:id id}))
|
||||
(blogimport/published-ids fetch-fn))))
|
||||
|
||||
; --- fetch an explicit id list (fallback before the enumeration query lands) ----
|
||||
(define
|
||||
blogimport/source-posts-by-ids
|
||||
(fn (fetch-fn ids)
|
||||
(map (fn (id) (blogimport/fetch-post fetch-fn "post-by-id" {:id id})) ids)))
|
||||
|
||||
; --- end-to-end drivers ---------------------------------------------------------
|
||||
; backfill = enumerate -> fetch -> genesis-import (idempotent). Re-runnable as the
|
||||
; one-way DB->persist sync (data-migration.md Strategy 1).
|
||||
(define
|
||||
blogimport/backfill!
|
||||
(fn (b fetch-fn at)
|
||||
(blogimport/import-all! b (blogimport/source-posts fetch-fn) at)))
|
||||
|
||||
(define
|
||||
blogimport/backfill-ids!
|
||||
(fn (b fetch-fn ids at)
|
||||
(blogimport/import-all! b (blogimport/source-posts-by-ids fetch-fn ids) at)))
|
||||
|
||||
; sync-verify = enumerate -> fetch -> shadow-diff the persisted streams at rest.
|
||||
(define
|
||||
blogimport/sync-verify
|
||||
(fn (b fetch-fn)
|
||||
(blogimport/verify-all b (blogimport/source-posts fetch-fn))))
|
||||
83
lib/blogimport/tests/source.sx
Normal file
83
lib/blogimport/tests/source.sx
Normal file
@@ -0,0 +1,83 @@
|
||||
; lib/blogimport/tests/source.sx — live-source adapter (Q-M4 internal-data query)
|
||||
(st-bootstrap-classes!)
|
||||
(content-bootstrap-blocks!)
|
||||
(content-bootstrap-doc!)
|
||||
(content-bootstrap-callout!)
|
||||
(content-bootstrap-media!)
|
||||
|
||||
; ---- canned service responses (lexical arrives as a JSON STRING, the DB column) ----
|
||||
(define
|
||||
lex1
|
||||
"{\"root\":{\"children\":[{\"type\":\"heading\",\"tag\":\"h2\",\"children\":[{\"type\":\"text\",\"text\":\"Live\"}]},{\"type\":\"paragraph\",\"children\":[{\"type\":\"text\",\"text\":\"from db\"}]}]}}")
|
||||
(define
|
||||
row1
|
||||
{:uuid "post-1" :slug "live" :title "Live" :status "published"
|
||||
:visibility "public" :tags (list "x") :authors (list "u") :lexical lex1})
|
||||
(define
|
||||
row2
|
||||
{:uuid "post-2" :slug "two" :title "Two" :status "published"
|
||||
:lexical "{\"children\":[{\"type\":\"paragraph\",\"children\":[{\"type\":\"text\",\"text\":\"second\"}]}]}"})
|
||||
|
||||
; ---- mock transport: (fetch-fn query params) -> response ----
|
||||
(define
|
||||
mock-fetch
|
||||
(fn (query params)
|
||||
(cond
|
||||
((equal? query "published-posts") (list "post-1" "post-2"))
|
||||
((equal? query "post-by-id")
|
||||
(cond
|
||||
((equal? (get params :id) "post-1") row1)
|
||||
((equal? (get params :id) "post-2") row2)
|
||||
(else nil)))
|
||||
(else nil))))
|
||||
|
||||
; ---- parse-row maps fields + parses the lexical JSON string ----
|
||||
(define post1 (blogimport/parse-row row1))
|
||||
(bi-test "parse-row id from uuid" (get post1 :id) "post-1")
|
||||
(bi-test "parse-row title" (get post1 :title) "Live")
|
||||
(bi-test "parse-row tags" (get post1 :tags) (list "x"))
|
||||
(bi-test "parse-row lexical parsed to blocks"
|
||||
(map blk-type (blogimport/lex-blocks (get post1 :lexical))) (list "heading" "text"))
|
||||
|
||||
; ---- id fallback (:id when no :uuid) + structured (non-string) lexical ----
|
||||
(define
|
||||
post3
|
||||
(blogimport/parse-row
|
||||
{:id "post-3" :slug "s3"
|
||||
:lexical {:children (list {:type "paragraph" :children (list {:type "text" :text "x"})})}}))
|
||||
(bi-test "parse-row id fallback" (get post3 :id) "post-3")
|
||||
(bi-test "parse-row structured lexical used as-is"
|
||||
(map blk-type (blogimport/lex-blocks (get post3 :lexical))) (list "text"))
|
||||
|
||||
; ---- enumeration + source-posts ----
|
||||
(bi-test "published-ids" (blogimport/published-ids mock-fetch) (list "post-1" "post-2"))
|
||||
(bi-test "source-posts ids"
|
||||
(map (fn (p) (get p :id)) (blogimport/source-posts mock-fetch))
|
||||
(list "post-1" "post-2"))
|
||||
|
||||
; ---- end-to-end backfill from the live source ----
|
||||
(define B (persist/open))
|
||||
(define cov (blogimport/backfill! B mock-fetch 10))
|
||||
(bi-test "backfill total" (get cov :total) 2)
|
||||
(bi-test "backfill imported" (get cov :imported) 2)
|
||||
(bi-test "backfill post-1 version-count" (content/version-count B "post-1") 2)
|
||||
(bi-test "backfill post-1 head ids" (doc-ids (content/head B "post-1")) (list "b0" "b1"))
|
||||
(bi-test "backfill post-1 body text"
|
||||
(str (blk-send (doc-find (content/head B "post-1") "b1") "text")) "from db")
|
||||
(bi-test "backfill meta title" (get (blogimport/load-meta B "post-1") :title) "Live")
|
||||
|
||||
; ---- backfill is idempotent (one-way sync re-run) ----
|
||||
(define cov2 (blogimport/backfill! B mock-fetch 11))
|
||||
(bi-test "backfill rerun skipped" (get cov2 :skipped) 2)
|
||||
|
||||
; ---- sync-verify: persisted streams match the live-source oracle ----
|
||||
(define sv (blogimport/sync-verify B mock-fetch))
|
||||
(bi-test "sync-verify total" (get sv :total) 2)
|
||||
(bi-test "sync-verify ok" (get sv :ok) 2)
|
||||
(bi-test "sync-verify no mismatch" (get sv :mismatched) (list))
|
||||
|
||||
; ---- explicit-id fallback path (before the enumeration query lands) ----
|
||||
(define B2 (persist/open))
|
||||
(define covx (blogimport/backfill-ids! B2 mock-fetch (list "post-2") 10))
|
||||
(bi-test "backfill-ids imported" (get covx :imported) 1)
|
||||
(bi-test "backfill-ids post-2 ids" (doc-ids (content/head B2 "post-2")) (list "b0"))
|
||||
Reference in New Issue
Block a user