diff --git a/lib/blogimport/README.md b/lib/blogimport/README.md index 6bd12067..7a37c678 100644 --- a/lib/blogimport/README.md +++ b/lib/blogimport/README.md @@ -7,7 +7,8 @@ composes the public APIs of content-on-sx (`lib/content`) and persist (`lib/persist`). Kept in its own module (not `lib/host`, not `lib/content`) so it doesn't collide with the loops that own those. -Status: **machinery complete, 55/55 conformance** (lexical 23, import 21, verify 11). +Status: **machinery complete + live-source wired, 75/75 conformance** +(lexical 23, import 21, verify 11, source 20). ## What it does @@ -16,6 +17,7 @@ Status: **machinery complete, 55/55 conformance** (lexical 23, import 21, verify | `lexical.sx` | `blogimport/lex-blocks doc` — Ghost **lexical** body (as SX dicts) → content-on-sx **block list**, ids deterministic by position (`b0,b1,…`). | | `import.sx` | `blogimport/import-post! b post at` — genesis import: convert the post's lexical, commit blocks as ordered `op-insert`s into the `content:` op-log stream, record metadata in a sibling `postmeta:` stream. Idempotent (skip-if-exists). `import-all!` → coverage scoreboard. | | `verify.sx` | `blogimport/verify-post b post` — replay the stream → block model, diff vs the row-derived oracle with `=`. `verify-all` → `{:total :ok :mismatched}` coverage. | +| `source.sx` | **Live source (Q-M4 = internal-data query).** Injected `fetch-fn` transport port; `parse-row` maps a service post-row → importer `post` dict and parses the `:lexical` JSON string (`dream-json-parse`). `backfill! b fetch-fn at` = enumerate → fetch → import; `sync-verify b fetch-fn` = enumerate → fetch → verify. `backfill-ids!` is the explicit-id fallback. | ## What is proven @@ -32,16 +34,28 @@ is *detected*, not silently passed. The single swap-point is `lex-inline-text` in `lexical.sx` — return runs there once content-on-sx Phase 5 lands on `architecture`. Bold/italic/links currently collapse to their plain concatenation (drift-proof, == `asText`). (slice-01-blog Q-B1.) -- **Oracle is the in-memory lexical→blocks, not the live Python block model.** This - proves round-trip fidelity through persist. The "does SX match Python" half of Q-D2 - needs the **live source**: read real `Post` rows via the internal-data query - (`/internal/data/…`) or direct Postgres (**Q-M4**, undecided) and feed them as `post` - dicts. The diff plumbing here is the twin that step reuses. +- **Q-M4 RESOLVED — live source = internal-data query** (`source.sx`), via an injected + `fetch-fn` port. The remaining real-world wiring is operational, not design: + 1. **One blog-side query must be added**: `blog/queries.sx` has fetch-by-id/slug/ids + but **no enumeration query**. Add a `published-posts` defquery returning the + published ids/slugs (Python `list_posts(status="published")`, + `blog/bp/blog/ghost_db.py:102`). Until then, drive `backfill-ids!` with an explicit + id list. `source.sx` is mocked against this contract in `tests/source.sx`. + 2. **Production `fetch-fn`** = the host's HMAC-signed `fetch_data` wrapper + (`GET /internal/data/{query}`). That wiring lives in `lib/host` (the host loop's + territory); `source.sx` only needs the port injected. + 3. **Confirm the response field names** of the live `get-post-by-*` data handler + against `parse-row`'s contract (`:uuid|:id :slug :title :status :visibility :tags + :authors :lexical`); a mismatch is a one-line field fix. +- **Oracle is the lexical→blocks of the SAME post, not the live Python block model.** + This proves round-trip fidelity through persist (no corruption at rest). The "does SX + match the *Python render*" half of Q-D2 would additionally diff against the Python + side's own block derivation — deferred with the read-path cutover. - **Re-import with an improved converter (Q-M5)** is import-once today (skip-if-exists). Superseding prior genesis events (vs truncate+re-import) is future work. ## Run ```bash -bash lib/blogimport/conformance.sh # 55/55; writes scoreboard.{json,md} +bash lib/blogimport/conformance.sh # 75/75; writes scoreboard.{json,md} ``` diff --git a/lib/blogimport/conformance.sh b/lib/blogimport/conformance.sh index 465cd685..4fd97de2 100755 --- a/lib/blogimport/conformance.sh +++ b/lib/blogimport/conformance.sh @@ -16,7 +16,7 @@ if [ ! -x "$SX_SERVER" ]; then fi fi -SUITES=(lexical import verify) +SUITES=(lexical import verify source) OUT_JSON="lib/blogimport/scoreboard.json" OUT_MD="lib/blogimport/scoreboard.md" @@ -49,9 +49,11 @@ run_suite() { (load "lib/content/callout.sx") (load "lib/content/media.sx") (load "lib/content/store.sx") +(load "lib/dream/json.sx") (load "lib/blogimport/lexical.sx") (load "lib/blogimport/import.sx") (load "lib/blogimport/verify.sx") +(load "lib/blogimport/source.sx") (epoch 2) (eval "(define bi-test-pass 0)") (eval "(define bi-test-fail 0)") diff --git a/lib/blogimport/scoreboard.json b/lib/blogimport/scoreboard.json index 29b5bb3a..5870bda1 100644 --- a/lib/blogimport/scoreboard.json +++ b/lib/blogimport/scoreboard.json @@ -2,9 +2,10 @@ "suites": { "lexical": {"pass": 23, "fail": 0}, "import": {"pass": 21, "fail": 0}, - "verify": {"pass": 11, "fail": 0} + "verify": {"pass": 11, "fail": 0}, + "source": {"pass": 20, "fail": 0} }, - "total_pass": 55, + "total_pass": 75, "total_fail": 0, - "total": 55 + "total": 75 } diff --git a/lib/blogimport/scoreboard.md b/lib/blogimport/scoreboard.md index 3a05ff44..56dbd803 100644 --- a/lib/blogimport/scoreboard.md +++ b/lib/blogimport/scoreboard.md @@ -7,4 +7,5 @@ _Generated by `lib/blogimport/conformance.sh`_ | lexical | 23 | 0 | 23 | | import | 21 | 0 | 21 | | verify | 11 | 0 | 11 | -| **Total** | **55** | **0** | **55** | +| source | 20 | 0 | 20 | +| **Total** | **75** | **0** | **75** | diff --git a/lib/blogimport/source.sx b/lib/blogimport/source.sx new file mode 100644 index 00000000..143b849e --- /dev/null +++ b/lib/blogimport/source.sx @@ -0,0 +1,92 @@ +; lib/blogimport/source.sx +; Live source adapter — Q-M4 RESOLVED: import via the blog INTERNAL-DATA QUERY +; surface (decoupled), not direct Postgres. Reuses the existing query contracts +; (blog/queries.sx: post-by-id/post-by-slug/posts-by-ids) and keeps the importer in +; the SX/host world (plans/migration/data-migration.md §7 recommended default). +; +; TRANSPORT SEAM (hexagonal, like every other subsystem): a `fetch-fn` port is +; INJECTED. Contract: +; (fetch-fn query-name params-dict) -> response-data +; In production `fetch-fn` is the host's HMAC-signed fetch_data wrapper +; (GET /internal/data/{query}); in tests it's a mock. The importer never knows how +; the bytes arrive. +; +; RESPONSE CONTRACT (one published-post row), the blog `get-post-by-*` data handler: +; {:uuid|:id :slug :title :status :visibility :tags :authors :lexical} +; :lexical is the Ghost body as a JSON STRING (the Post.lexical DB column) — parsed +; here with dream-json-parse into the SX dict shape blogimport/lex-blocks expects. +; (If a handler returns :lexical already-structured, it is used as-is.) +; +; REQUIRED BLOG-SIDE ADDITION (the one gap): blog/queries.sx exposes fetch-by-id/slug +; but NO enumeration query. The corpus (Q-D2 = every published post) needs a +; `published-posts` query returning the published ids/slugs (Python: list_posts( +; status="published"), blog/bp/blog/ghost_db.py:102). Flagged for the blog app; mocked +; in tests. Until it exists, callers can pass an explicit id list to backfill-ids!. + +(define blogimport/dep-json-parse dream-json-parse) + +; --- lexical field -> SX dict (string from DB column, or already structured) ----- +(define + blogimport/parse-lexical + (fn (lx) + (cond + ((equal? lx nil) {:root {:children (list)}}) + ((string? lx) (blogimport/dep-json-parse lx)) + (else lx)))) + +; --- service post-row -> importer `post` dict ----------------------------------- +(define + blogimport/parse-row + (fn (row) + {:id (or (get row :uuid) (get row :id)) + :slug (or (get row :slug) "") + :title (or (get row :title) "") + :status (or (get row :status) "") + :visibility (or (get row :visibility) "") + :tags (or (get row :tags) (list)) + :authors (or (get row :authors) (list)) + :lexical (blogimport/parse-lexical (get row :lexical))})) + +; --- fetch one post via an internal-data query ---------------------------------- +(define + blogimport/fetch-post + (fn (fetch-fn query params) + (blogimport/parse-row (fetch-fn query params)))) + +; --- enumerate published post ids (needs the `published-posts` query) ----------- +(define + blogimport/published-ids + (fn (fetch-fn) (fetch-fn "published-posts" {}))) + +; --- fetch all published posts as importer `post` dicts ------------------------- +(define + blogimport/source-posts + (fn (fetch-fn) + (map + (fn (id) (blogimport/fetch-post fetch-fn "post-by-id" {:id id})) + (blogimport/published-ids fetch-fn)))) + +; --- fetch an explicit id list (fallback before the enumeration query lands) ---- +(define + blogimport/source-posts-by-ids + (fn (fetch-fn ids) + (map (fn (id) (blogimport/fetch-post fetch-fn "post-by-id" {:id id})) ids))) + +; --- end-to-end drivers --------------------------------------------------------- +; backfill = enumerate -> fetch -> genesis-import (idempotent). Re-runnable as the +; one-way DB->persist sync (data-migration.md Strategy 1). +(define + blogimport/backfill! + (fn (b fetch-fn at) + (blogimport/import-all! b (blogimport/source-posts fetch-fn) at))) + +(define + blogimport/backfill-ids! + (fn (b fetch-fn ids at) + (blogimport/import-all! b (blogimport/source-posts-by-ids fetch-fn ids) at))) + +; sync-verify = enumerate -> fetch -> shadow-diff the persisted streams at rest. +(define + blogimport/sync-verify + (fn (b fetch-fn) + (blogimport/verify-all b (blogimport/source-posts fetch-fn)))) diff --git a/lib/blogimport/tests/source.sx b/lib/blogimport/tests/source.sx new file mode 100644 index 00000000..b80f3258 --- /dev/null +++ b/lib/blogimport/tests/source.sx @@ -0,0 +1,83 @@ +; lib/blogimport/tests/source.sx — live-source adapter (Q-M4 internal-data query) +(st-bootstrap-classes!) +(content-bootstrap-blocks!) +(content-bootstrap-doc!) +(content-bootstrap-callout!) +(content-bootstrap-media!) + +; ---- canned service responses (lexical arrives as a JSON STRING, the DB column) ---- +(define + lex1 + "{\"root\":{\"children\":[{\"type\":\"heading\",\"tag\":\"h2\",\"children\":[{\"type\":\"text\",\"text\":\"Live\"}]},{\"type\":\"paragraph\",\"children\":[{\"type\":\"text\",\"text\":\"from db\"}]}]}}") +(define + row1 + {:uuid "post-1" :slug "live" :title "Live" :status "published" + :visibility "public" :tags (list "x") :authors (list "u") :lexical lex1}) +(define + row2 + {:uuid "post-2" :slug "two" :title "Two" :status "published" + :lexical "{\"children\":[{\"type\":\"paragraph\",\"children\":[{\"type\":\"text\",\"text\":\"second\"}]}]}"}) + +; ---- mock transport: (fetch-fn query params) -> response ---- +(define + mock-fetch + (fn (query params) + (cond + ((equal? query "published-posts") (list "post-1" "post-2")) + ((equal? query "post-by-id") + (cond + ((equal? (get params :id) "post-1") row1) + ((equal? (get params :id) "post-2") row2) + (else nil))) + (else nil)))) + +; ---- parse-row maps fields + parses the lexical JSON string ---- +(define post1 (blogimport/parse-row row1)) +(bi-test "parse-row id from uuid" (get post1 :id) "post-1") +(bi-test "parse-row title" (get post1 :title) "Live") +(bi-test "parse-row tags" (get post1 :tags) (list "x")) +(bi-test "parse-row lexical parsed to blocks" + (map blk-type (blogimport/lex-blocks (get post1 :lexical))) (list "heading" "text")) + +; ---- id fallback (:id when no :uuid) + structured (non-string) lexical ---- +(define + post3 + (blogimport/parse-row + {:id "post-3" :slug "s3" + :lexical {:children (list {:type "paragraph" :children (list {:type "text" :text "x"})})}})) +(bi-test "parse-row id fallback" (get post3 :id) "post-3") +(bi-test "parse-row structured lexical used as-is" + (map blk-type (blogimport/lex-blocks (get post3 :lexical))) (list "text")) + +; ---- enumeration + source-posts ---- +(bi-test "published-ids" (blogimport/published-ids mock-fetch) (list "post-1" "post-2")) +(bi-test "source-posts ids" + (map (fn (p) (get p :id)) (blogimport/source-posts mock-fetch)) + (list "post-1" "post-2")) + +; ---- end-to-end backfill from the live source ---- +(define B (persist/open)) +(define cov (blogimport/backfill! B mock-fetch 10)) +(bi-test "backfill total" (get cov :total) 2) +(bi-test "backfill imported" (get cov :imported) 2) +(bi-test "backfill post-1 version-count" (content/version-count B "post-1") 2) +(bi-test "backfill post-1 head ids" (doc-ids (content/head B "post-1")) (list "b0" "b1")) +(bi-test "backfill post-1 body text" + (str (blk-send (doc-find (content/head B "post-1") "b1") "text")) "from db") +(bi-test "backfill meta title" (get (blogimport/load-meta B "post-1") :title) "Live") + +; ---- backfill is idempotent (one-way sync re-run) ---- +(define cov2 (blogimport/backfill! B mock-fetch 11)) +(bi-test "backfill rerun skipped" (get cov2 :skipped) 2) + +; ---- sync-verify: persisted streams match the live-source oracle ---- +(define sv (blogimport/sync-verify B mock-fetch)) +(bi-test "sync-verify total" (get sv :total) 2) +(bi-test "sync-verify ok" (get sv :ok) 2) +(bi-test "sync-verify no mismatch" (get sv :mismatched) (list)) + +; ---- explicit-id fallback path (before the enumeration query lands) ---- +(define B2 (persist/open)) +(define covx (blogimport/backfill-ids! B2 mock-fetch (list "post-2") 10)) +(bi-test "backfill-ids imported" (get covx :imported) 1) +(bi-test "backfill-ids post-2 ids" (doc-ids (content/head B2 "post-2")) (list "b0"))