Merge loops/blogimport into architecture: blog Postgres->persist genesis-import + parity verifier + Q-M4 live source (76/76)

lib/blogimport — data-migration tooling (plans/migration/data-migration.md): lexical
-> content blocks, genesis import into content:<id> op-log, at-rest shadow-diff verify,
and the Q-M4 internal-data-query live source (injected fetch-fn). Additive (new dir);
composes content-on-sx + persist + dream-json. drafts/ holds the blog-side query to add.
For loops/host to consume when ready.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
2026-06-30 14:23:42 +00:00
14 changed files with 969 additions and 0 deletions

61
lib/blogimport/README.md Normal file
View File

@@ -0,0 +1,61 @@
# lib/blogimport — blog Postgres → persist genesis-import + parity verifier
Implements **`plans/migration/data-migration.md`** (the "long-pole nobody had
started") and the at-rest half of **`slice-01-blog.md` §4** — the data layer of the
blog read-path migration. Host-ops migration tooling, **not** a domain core: it
composes the public APIs of content-on-sx (`lib/content`) and persist
(`lib/persist`). Kept in its own module (not `lib/host`, not `lib/content`) so it
doesn't collide with the loops that own those.
Status: **machinery complete + live-source wired, 75/75 conformance**
(lexical 23, import 21, verify 11, source 20).
## What it does
| Module | Role |
|---|---|
| `lexical.sx` | `blogimport/lex-blocks doc` — Ghost **lexical** body (as SX dicts) → content-on-sx **block list**, ids deterministic by position (`b0,b1,…`). |
| `import.sx` | `blogimport/import-post! b post at` — genesis import: convert the post's lexical, commit blocks as ordered `op-insert`s into the `content:<id>` op-log stream, record metadata in a sibling `postmeta:<id>` stream. Idempotent (skip-if-exists). `import-all!` → coverage scoreboard. |
| `verify.sx` | `blogimport/verify-post b post` — replay the stream → block model, diff vs the row-derived oracle with `=`. `verify-all``{:total :ok :mismatched}` coverage. |
| `source.sx` | **Live source (Q-M4 = internal-data query).** Injected `fetch-fn` transport port; `parse-row` maps a service post-row → importer `post` dict and parses the `:lexical` JSON string (`dream-json-parse`). `backfill! b fetch-fn at` = enumerate → fetch → import; `sync-verify b fetch-fn` = enumerate → fetch → verify. `backfill-ids!` is the explicit-id fallback. |
## What is proven
The verifier holds **`lexical → import → persist → replay → block-model`** equal to
**`lexical → block-model`** computed directly. I.e. **the genesis import + op-log
replay is lossless** — "did the backfill corrupt anything" at rest
(`data-migration.md` §6). The `verify.sx` corruption test confirms a diverging stream
is *detected*, not silently passed.
## Known limitations / TODO (carry into the plan)
- **Inline formatting is flattened to plain text.** Architecture's content model holds
plain-string text (`mk-text id text`); Phase-5 rich inline runs are not merged here.
The single swap-point is `lex-inline-text` in `lexical.sx` — return runs there once
content-on-sx Phase 5 lands on `architecture`. Bold/italic/links currently collapse
to their plain concatenation (drift-proof, == `asText`). (slice-01-blog Q-B1.)
- **Q-M4 RESOLVED — live source = internal-data query** (`source.sx`), via an injected
`fetch-fn` port. The remaining real-world wiring is operational, not design:
1. **One blog-side query must be added**: `blog/queries.sx` has fetch-by-id/slug/ids
but **no enumeration query**. Add a `published-posts` defquery returning the
published ids/slugs (Python `list_posts(status="published")`,
`blog/bp/blog/ghost_db.py:102`). Until then, drive `backfill-ids!` with an explicit
id list. `source.sx` is mocked against this contract in `tests/source.sx`.
2. **Production `fetch-fn`** = the host's HMAC-signed `fetch_data` wrapper
(`GET /internal/data/{query}`). That wiring lives in `lib/host` (the host loop's
territory); `source.sx` only needs the port injected.
3. **Confirm the response field names** of the live `get-post-by-*` data handler
against `parse-row`'s contract (`:uuid|:id :slug :title :status :visibility :tags
:authors :lexical`); a mismatch is a one-line field fix.
- **Oracle is the lexical→blocks of the SAME post, not the live Python block model.**
This proves round-trip fidelity through persist (no corruption at rest). The "does SX
match the *Python render*" half of Q-D2 would additionally diff against the Python
side's own block derivation — deferred with the read-path cutover.
- **Re-import with an improved converter (Q-M5)** is import-once today (skip-if-exists).
Superseding prior genesis events (vs truncate+re-import) is future work.
## Run
```bash
bash lib/blogimport/conformance.sh # 75/75; writes scoreboard.{json,md}
```

121
lib/blogimport/conformance.sh Executable file
View File

@@ -0,0 +1,121 @@
#!/usr/bin/env bash
# lib/blogimport/conformance.sh — run blog-import suites, emit scoreboard.
# Mirrors lib/content/conformance.sh: epoch-loaded modules + a bi-test counter.
set -uo pipefail
cd "$(git rev-parse --show-toplevel)"
SX_SERVER="hosts/ocaml/_build/default/bin/sx_server.exe"
if [ ! -x "$SX_SERVER" ]; then
MAIN_ROOT=$(git worktree list | head -1 | awk '{print $1}')
if [ -x "$MAIN_ROOT/$SX_SERVER" ]; then
SX_SERVER="$MAIN_ROOT/$SX_SERVER"
else
echo "ERROR: sx_server.exe not found." >&2
exit 1
fi
fi
SUITES=(lexical import verify source)
OUT_JSON="lib/blogimport/scoreboard.json"
OUT_MD="lib/blogimport/scoreboard.md"
run_suite() {
local suite=$1
local file="lib/blogimport/tests/${suite}.sx"
[ -f "$file" ] || { echo "0 0"; return; }
local TMP
TMP=$(mktemp)
cat > "$TMP" << EPOCHS
(epoch 1)
(load "lib/smalltalk/tokenizer.sx")
(load "lib/smalltalk/parser.sx")
(load "lib/guest/reflective/class-chain.sx")
(load "lib/smalltalk/runtime.sx")
(load "lib/guest/reflective/env.sx")
(load "lib/smalltalk/eval.sx")
(load "lib/persist/event.sx")
(load "lib/persist/backend.sx")
(load "lib/persist/log.sx")
(load "lib/persist/kv.sx")
(load "lib/persist/api.sx")
(load "lib/content/block.sx")
(load "lib/content/doc.sx")
(load "lib/content/render.sx")
(load "lib/content/api.sx")
(load "lib/content/meta.sx")
(load "lib/content/section.sx")
(load "lib/content/callout.sx")
(load "lib/content/media.sx")
(load "lib/content/store.sx")
(load "lib/dream/json.sx")
(load "lib/blogimport/lexical.sx")
(load "lib/blogimport/import.sx")
(load "lib/blogimport/verify.sx")
(load "lib/blogimport/source.sx")
(epoch 2)
(eval "(define bi-test-pass 0)")
(eval "(define bi-test-fail 0)")
(eval "(define bi-test-fails (list))")
(eval "(define bi-test (fn (name got expected) (if (= got expected) (set! bi-test-pass (+ bi-test-pass 1)) (begin (set! bi-test-fail (+ bi-test-fail 1)) (set! bi-test-fails (cons name bi-test-fails))))))")
(epoch 3)
(load "${file}")
(epoch 4)
(eval "(list bi-test-pass bi-test-fail)")
EPOCHS
local OUTPUT
OUTPUT=$(timeout 240 "$SX_SERVER" < "$TMP" 2>/dev/null)
rm -f "$TMP"
local LINE
LINE=$(echo "$OUTPUT" | awk '/^\(ok-len 4 / {getline; print; exit}')
if [ -z "$LINE" ]; then
LINE=$(echo "$OUTPUT" | grep -E '^\(ok 4 \([0-9]+ [0-9]+\)\)' | tail -1 \
| sed -E 's/^\(ok 4 //; s/\)$//')
fi
local P F
P=$(echo "$LINE" | sed -E 's/^\(([0-9]+) ([0-9]+)\).*/\1/')
F=$(echo "$LINE" | sed -E 's/^\(([0-9]+) ([0-9]+)\).*/\2/')
echo "${P:-0} ${F:-0}"
}
declare -A SUITE_PASS SUITE_FAIL
TOTAL_PASS=0
TOTAL_FAIL=0
echo "Running blogimport conformance suite..." >&2
for s in "${SUITES[@]}"; do
read -r p f < <(run_suite "$s")
SUITE_PASS[$s]=$p
SUITE_FAIL[$s]=$f
TOTAL_PASS=$((TOTAL_PASS + p))
TOTAL_FAIL=$((TOTAL_FAIL + f))
printf " %-10s %d/%d\n" "$s" "$p" "$((p+f))" >&2
done
{
printf '{\n "suites": {\n'
first=1
for s in "${SUITES[@]}"; do
if [ $first -eq 0 ]; then printf ',\n'; fi
printf ' "%s": {"pass": %d, "fail": %d}' "$s" "${SUITE_PASS[$s]}" "${SUITE_FAIL[$s]}"
first=0
done
printf '\n },\n "total_pass": %d,\n "total_fail": %d,\n "total": %d\n}\n' \
"$TOTAL_PASS" "$TOTAL_FAIL" "$((TOTAL_PASS + TOTAL_FAIL))"
} > "$OUT_JSON"
{
printf '# blogimport Conformance Scoreboard\n\n_Generated by `lib/blogimport/conformance.sh`_\n\n'
printf '| Suite | Pass | Fail | Total |\n|-------|-----:|-----:|------:|\n'
for s in "${SUITES[@]}"; do
printf '| %s | %d | %d | %d |\n' "$s" "${SUITE_PASS[$s]}" "${SUITE_FAIL[$s]}" "$(( ${SUITE_PASS[$s]} + ${SUITE_FAIL[$s]} ))"
done
printf '| **Total** | **%d** | **%d** | **%d** |\n' "$TOTAL_PASS" "$TOTAL_FAIL" "$((TOTAL_PASS + TOTAL_FAIL))"
} > "$OUT_MD"
echo "Total: $TOTAL_PASS pass, $TOTAL_FAIL fail" >&2
[ "$TOTAL_FAIL" -eq 0 ]

View File

@@ -0,0 +1,88 @@
# Blog-side draft — the `published-posts` migration query
The one blog-app change needed to make `lib/blogimport`'s live source (Q-M4) real.
Two parts: an SX **defquery** (`published-posts.sx` in this dir) and a Python
**provider** it binds to. Both go in the **blog app** (production `blog/` tree); they
are drafted here so the importer ships with its dependency spelled out. Apply on the
blog app's branch, not on this migration branch.
## Why a new query (not reuse post-by-id)
`blogimport/source.sx` needs, for every published post: `id, slug, title, status,
visibility, tags, authors, lexical`. The existing providers
(`blog/services/__init__.py` `SqlBlogService.get_post_by_*`) return a `PostDTO` whose
`_post_to_dto` exposes `sx_content`/`html` but **not `lexical`** — and the canonical
migration path is lexical→blocks (slice-01-blog Q-B1), not sx_content. So a dedicated
migration provider that returns full rows including the raw lexical body is the
minimal, honest change. One batch call covers both enumeration (Q-D2 corpus) and
bodies.
## 1. defquery (→ `blog/queries.sx`)
See `published-posts.sx` in this directory:
```lisp
(defquery published-posts ()
"Enumerate every published, non-page blog post as a full row INCLUDING the raw
lexical body — the SX migration corpus (Q-D2). Read-only ..."
(service "blog" "list-published-posts"))
```
Kebab→snake convention (as for `get-post-by-slug``get_post_by_slug`) binds
`"list-published-posts"` to the `SqlBlogService.list_published_posts` method below.
## 2. Python provider (→ `blog/services/__init__.py`, in `SqlBlogService`)
```python
from sqlalchemy.orm import selectinload # add to imports
async def list_published_posts(self, session: AsyncSession) -> list[dict]:
"""Migration corpus: every published, non-page post as a full row INCLUDING
the raw lexical body (Q-D2). Read-only; consumed by the SX blogimport
backfill/verify. Mirrors ghost_db.list_posts() base visibility filters."""
result = await session.execute(
select(Post)
.where(
Post.deleted_at.is_(None),
Post.status == "published",
Post.is_page.is_(False),
)
.options(selectinload(Post.tags), selectinload(Post.authors))
.order_by(Post.published_at.desc().nullslast())
)
return [
{
"id": p.id,
"uuid": p.uuid,
"slug": p.slug,
"title": p.title,
"status": p.status,
"visibility": p.visibility,
"lexical": p.lexical,
"tags": [t.slug for t in p.tags],
"authors": [a.slug for a in p.authors],
}
for p in result.scalars().unique().all()
]
```
**Confirm before applying:**
- The relationship names on `Post` (`tags`, `authors`) — check `blog/models/content.py`
join tables (`post_tags`, `post_authors`); adjust `selectinload` + the comprehensions
if they differ. `.unique()` is needed because the eager joins fan out rows.
- `Post.uuid` and `Post.lexical` columns exist (`models/content.py` ~lines 61-63).
- Visibility filters match `ghost_db.list_posts()` (drafts excluded, pages excluded) so
the corpus is exactly the published read-path set.
## 3. Verify the contract
After applying, the response shape must match `blogimport/parse-row`
(`lib/blogimport/source.sx`): keys `:uuid|:id :slug :title :status :visibility :tags
:authors :lexical`, with `:lexical` a JSON string (parsed via `dream-json-parse`). The
mock in `lib/blogimport/tests/source.sx` is the executable spec of this contract.
## 4. Then wire the transport (host loop)
`blogimport/backfill!`/`sync-verify` take an injected `fetch-fn`. In production that is
the host's HMAC `fetch_data` wrapper (`GET /internal/data/published-posts`) — wiring
that lives in `lib/host`, not here.

View File

@@ -0,0 +1,16 @@
; DRAFT — proposed addition to blog/queries.sx (the blog app's internal-data surface).
; Resolves the one blog-side gap for Q-M4: blogimport needs to enumerate published
; posts AND read their raw lexical bodies. The existing post-by-id/slug/ids queries
; return a PostDTO that carries sx_content/html but NOT lexical, so a dedicated
; migration query that returns full rows (incl. lexical) is the minimal change.
;
; Paste this defquery into blog/queries.sx alongside the others, and add the matching
; `list_published_posts` provider to SqlBlogService (see drafts/README.md).
;
; This file is a DRAFT artifact (not loaded by anything); it is parse-validated only.
(defquery published-posts ()
"Enumerate every published, non-page blog post as a full row INCLUDING the raw
lexical body — the SX migration corpus (Q-D2). Read-only; used by the blogimport
backfill + at-rest verify. Newest-first."
(service "blog" "list-published-posts"))

84
lib/blogimport/import.sx Normal file
View File

@@ -0,0 +1,84 @@
; lib/blogimport/import.sx
; Genesis import: a blog Post row -> a persist content op-log stream.
;
; Per plans/migration/data-migration.md §3-5: for each Post, convert its lexical
; body to content blocks and commit them as genesis insert ops into the
; content:<id> stream, idempotently, with post metadata recorded as an event in a
; sibling stream. The same code runs on mem and durable persist backends (every fn
; takes the backend `b`, the acl.sx design principle).
;
; A `post` is a dict mirroring the blog Post row:
; {:id "uuid" :slug "hello" :title "Hello" :status "published"
; :visibility "public" :tags (list "a") :authors (list "u1")
; :lexical <lexical-doc-as-sx-dict>}
; Reading real rows (internal-data query vs direct Postgres, Q-M4) is the live-source
; edge, out of scope here; this drives content/commit! given a `post` dict.
; --- genesis ops: insert each block in document order (deterministic) -----------
; first block after nil (prepend), each subsequent after the previous block's id,
; reproducing source order so re-import yields the same sequence (data-migration §5).
(define
blogimport/genesis-ops
(fn (blocks)
(let ((ids (map blk-id blocks)))
(map-indexed
(fn (i blk) (op-insert blk (if (= i 0) nil (nth ids (- i 1)))))
blocks))))
; --- post metadata (title/slug/status/visibility/tags/authors) ------------------
(define
blogimport/post-meta
(fn (post)
{:title (or (get post :title) "")
:slug (or (get post :slug) "")
:status (or (get post :status) "")
:visibility (or (get post :visibility) "")
:tags (or (get post :tags) (list))
:authors (or (get post :authors) (list))}))
; metadata is not a content op, so it rides a sibling event stream postmeta:<id>;
; latest event wins (LWW). Replayable + durable like the block op-log.
(define blogimport/meta-stream (fn (id) (str "postmeta:" id)))
(define
blogimport/commit-meta!
(fn (b id meta at)
(persist/append b (blogimport/meta-stream id) "post-meta" at meta)))
(define
blogimport/load-meta
(fn (b id)
(let ((evs (persist/read b (blogimport/meta-stream id))))
(if (= (len evs) 0) nil (persist/event-data (nth evs (- (len evs) 1)))))))
; --- idempotency: a stream already holding events is already imported -----------
; (host-persist guarantees monotonic seq but NOT dedupe — skip-if-exists is the
; importer's dedupe, so re-running the backfill never double-imports. data-migration
; §5.) Re-import with an improved converter (Q-M5) is future work — superseding,
; not duplicating; this build is import-once.
(define
blogimport/imported?
(fn (b id) (> (content/version-count b id) 0)))
; --- import one post ------------------------------------------------------------
(define
blogimport/import-post!
(fn (b post at)
(let ((id (get post :id)))
(if
(blogimport/imported? b id)
{:id id :imported false :reason "exists"}
(let ((blocks (blogimport/lex-blocks (get post :lexical))))
(begin
(content/commit-all! b id (blogimport/genesis-ops blocks) at)
(blogimport/commit-meta! b id (blogimport/post-meta post) at)
{:id id :imported true :blocks (len blocks)}))))))
; --- import many: coverage scoreboard -------------------------------------------
(define
blogimport/import-all!
(fn (b posts at)
(let ((results (map (fn (p) (blogimport/import-post! b p at)) posts)))
{:total (len results)
:imported (len (filter (fn (r) (get r :imported)) results))
:skipped (len (filter (fn (r) (not (get r :imported))) results))})))

129
lib/blogimport/lexical.sx Normal file
View File

@@ -0,0 +1,129 @@
; lib/blogimport/lexical.sx
; Lexical (Ghost editor JSON, as SX dicts) -> content-on-sx block list.
;
; The blog migration's lexical->blocks converter. Lives on the blog/migration
; side (NOT lib/content, NOT lib/host) per plans/migration/data-migration.md §7.
;
; Input shape: a lexical document is an SX dict mirroring the JSON 1:1, e.g.
; {:root {:children (list
; {:type "heading" :tag "h2" :children (list {:type "text" :text "Hi"})}
; {:type "paragraph" :children (list
; {:type "text" :text "plain "}
; {:type "text" :text "bold" :format 1}
; {:type "link" :url "/x" :children (list {:type "text" :text "here"})})})}}
;
; Block ids are assigned deterministically by top-level position ("b0","b1",...)
; so a re-import yields the SAME block sequence (data-migration.md §5 ordering rule).
;
; INLINE FORMATTING: architecture's content model holds PLAIN-STRING text
; (mk-text id text). Phase-5 rich inline runs are not merged here yet, so inline
; nodes are flattened to their plain concatenation (== asText, drift-proof). The
; single swap-point for the runs upgrade is `lex-inline-text` below — when
; content-on-sx Phase 5 lands on architecture, return runs there instead of a
; string. (slice-01-blog.md Q-B1; "prove the machinery first, then swap".)
; Inline format bitmask (lexical): bold=1 italic=2 strikethrough=4 underline=8
; code=16 subscript=32 superscript=64. Decoding the bitmask into mark keywords is
; deferred to the Phase-5 runs upgrade (no bitwise prim on architecture, and the
; active path flattens to plain text anyway). The :format field is read at the
; swap-point `lex-inline-text` when runs land.
; --- inline node -> plain text --------------------------------------------------
(define
lex-inline-node-text
(fn (node)
(let ((t (get node :type)))
(cond
((equal? t "text") (or (get node :text) ""))
((equal? t "linebreak") "\n")
((equal? t "tab") "\t")
((equal? t "link") (lex-inline-text (or (get node :children) (list))))
((equal? t "autolink") (lex-inline-text (or (get node :children) (list))))
((equal? t "at-link") (lex-inline-text (or (get node :children) (list))))
((equal? t "code-highlight") (or (get node :text) ""))
(else "")))))
; flatten a list of inline nodes to one plain string.
; *** Phase-5 swap-point: return a runs list here once mk-text accepts runs. ***
(define
lex-inline-text
(fn (children)
(reduce
(fn (acc n) (str acc (lex-inline-node-text n)))
""
children)))
; --- helpers --------------------------------------------------------------------
(define
lex-heading-level
(fn (tag)
(cond
((equal? tag "h1") 1)
((equal? tag "h2") 2)
((equal? tag "h3") 3)
((equal? tag "h4") 4)
((equal? tag "h5") 5)
((equal? tag "h6") 6)
(else 2))))
(define
lex-listitem-text
(fn (item)
(lex-inline-text (or (get item :children) (list)))))
; --- one lexical block node -> a content block (id assigned by caller) ----------
(define
lex-block
(fn (node id)
(let ((t (get node :type)))
(cond
((equal? t "paragraph")
(mk-text id (lex-inline-text (or (get node :children) (list)))))
((equal? t "extended-text")
(mk-text id (lex-inline-text (or (get node :children) (list)))))
((equal? t "heading")
(mk-heading id (lex-heading-level (get node :tag))
(lex-inline-text (or (get node :children) (list)))))
((equal? t "extended-heading")
(mk-heading id (lex-heading-level (get node :tag))
(lex-inline-text (or (get node :children) (list)))))
((equal? t "quote")
(mk-quote id "" (lex-inline-text (or (get node :children) (list)))))
((equal? t "extended-quote")
(mk-quote id "" (lex-inline-text (or (get node :children) (list)))))
((equal? t "codeblock")
(mk-code id (or (get node :language) "") (or (get node :code) "")))
((equal? t "list")
(mk-list id
(equal? (get node :listType) "number")
(map lex-listitem-text (or (get node :children) (list)))))
((equal? t "horizontalrule") (mk-divider id))
((equal? t "image")
(mk-image id (or (get node :src) "") (or (get node :alt) "")))
((equal? t "callout")
(mk-callout id (or (get node :backgroundColor) "grey")
(lex-inline-text (or (get node :children) (list)))))
((equal? t "video") (mk-media id "video" (or (get node :src) "")))
((equal? t "audio") (mk-media id "audio" (or (get node :src) "")))
((equal? t "embed") (mk-embed id (or (get node :url) "") "embed"))
((equal? t "bookmark") (mk-embed id (or (get node :url) "") "bookmark"))
; unknown/unsupported card: route to a generic embed tagged by type so
; nothing is silently dropped (provider records the original node type).
(else (mk-embed id "" (or t "unknown")))))))
; --- doc -> top-level children list ---------------------------------------------
(define
lex-doc-children
(fn (doc)
(cond
((not (equal? (get doc :root) nil)) (or (get (get doc :root) :children) (list)))
((not (equal? (get doc :children) nil)) (get doc :children))
(else (list)))))
; --- doc -> content block list (deterministic ids by position) ------------------
(define
blogimport/lex-blocks
(fn (doc)
(map-indexed
(fn (i node) (lex-block node (str "b" i)))
(lex-doc-children doc))))

View File

@@ -0,0 +1,11 @@
{
"suites": {
"lexical": {"pass": 23, "fail": 0},
"import": {"pass": 21, "fail": 0},
"verify": {"pass": 11, "fail": 0},
"source": {"pass": 21, "fail": 0}
},
"total_pass": 76,
"total_fail": 0,
"total": 76
}

View File

@@ -0,0 +1,11 @@
# blogimport Conformance Scoreboard
_Generated by `lib/blogimport/conformance.sh`_
| Suite | Pass | Fail | Total |
|-------|-----:|-----:|------:|
| lexical | 23 | 0 | 23 |
| import | 21 | 0 | 21 |
| verify | 11 | 0 | 11 |
| source | 21 | 0 | 21 |
| **Total** | **76** | **0** | **76** |

84
lib/blogimport/source.sx Normal file
View File

@@ -0,0 +1,84 @@
; lib/blogimport/source.sx
; Live source adapter — Q-M4 RESOLVED: import via the blog INTERNAL-DATA QUERY
; surface (decoupled), not direct Postgres. Reuses the existing query contracts
; (blog/queries.sx: post-by-id/post-by-slug/posts-by-ids) and keeps the importer in
; the SX/host world (plans/migration/data-migration.md §7 recommended default).
;
; TRANSPORT SEAM (hexagonal, like every other subsystem): a `fetch-fn` port is
; INJECTED. Contract:
; (fetch-fn query-name params-dict) -> response-data
; In production `fetch-fn` is the host's HMAC-signed fetch_data wrapper
; (GET /internal/data/{query}); in tests it's a mock. The importer never knows how
; the bytes arrive.
;
; RESPONSE CONTRACT (one published-post row), the blog `get-post-by-*` data handler:
; {:uuid|:id :slug :title :status :visibility :tags :authors :lexical}
; :lexical is the Ghost body as a JSON STRING (the Post.lexical DB column) — parsed
; here with dream-json-parse into the SX dict shape blogimport/lex-blocks expects.
; (If a handler returns :lexical already-structured, it is used as-is.)
;
; REQUIRED BLOG-SIDE ADDITION (the one gap — draft in drafts/published-posts.sx):
; the migration needs a `published-posts` query that returns full published-post ROWS
; INCLUDING the raw `:lexical` body. The existing post-by-id/slug providers return a
; PostDTO that carries sx_content/html but NOT lexical (blog/services/__init__.py
; _post_to_dto), so they cannot feed the canonical lexical->blocks converter. One new
; provider (Python list_published_posts over list_posts(status="published"),
; blog/bp/blog/ghost_db.py:102) covers both enumeration AND bodies in one batch call.
; Mocked here against that contract; see drafts/ for the paste-ready blog-side change.
(define blogimport/dep-json-parse dream-json-parse)
; --- lexical field -> SX dict (string from DB column, or already structured) -----
(define
blogimport/parse-lexical
(fn (lx)
(cond
((equal? lx nil) {:root {:children (list)}})
((string? lx) (blogimport/dep-json-parse lx))
(else lx))))
; --- service post-row -> importer `post` dict -----------------------------------
(define
blogimport/parse-row
(fn (row)
{:id (or (get row :uuid) (get row :id))
:slug (or (get row :slug) "")
:title (or (get row :title) "")
:status (or (get row :status) "")
:visibility (or (get row :visibility) "")
:tags (or (get row :tags) (list))
:authors (or (get row :authors) (list))
:lexical (blogimport/parse-lexical (get row :lexical))}))
; --- the published-post rows from the live source (one batch query) -------------
(define
blogimport/source-rows
(fn (fetch-fn) (fetch-fn "published-posts" {})))
; --- all published posts as importer `post` dicts -------------------------------
(define
blogimport/source-posts
(fn (fetch-fn) (map blogimport/parse-row (blogimport/source-rows fetch-fn))))
; --- end-to-end drivers ---------------------------------------------------------
; backfill = enumerate+fetch -> genesis-import (idempotent). Re-runnable as the
; one-way DB->persist sync (data-migration.md Strategy 1).
(define
blogimport/backfill!
(fn (b fetch-fn at)
(blogimport/import-all! b (blogimport/source-posts fetch-fn) at)))
; partial backfill: client-side filter to a subset of ids (no extra blog query).
(define
blogimport/backfill-ids!
(fn (b fetch-fn ids at)
(blogimport/import-all!
b
(filter (fn (p) (contains? ids (get p :id))) (blogimport/source-posts fetch-fn))
at)))
; sync-verify = fetch -> shadow-diff the persisted streams at rest.
(define
blogimport/sync-verify
(fn (b fetch-fn)
(blogimport/verify-all b (blogimport/source-posts fetch-fn))))

View File

@@ -0,0 +1,62 @@
; lib/blogimport/tests/import.sx — genesis import + idempotency
(st-bootstrap-classes!)
(content-bootstrap-blocks!)
(content-bootstrap-doc!)
(content-bootstrap-callout!)
(content-bootstrap-media!)
(define
p1
{:id "post-1" :slug "hello" :title "Hello" :status "published"
:visibility "public" :tags (list "news") :authors (list "u1")
:lexical {:root {:children (list
{:type "heading" :tag "h1" :children (list {:type "text" :text "Hello"})}
{:type "paragraph" :children (list {:type "text" :text "world"})})}}})
(define
p2
{:id "post-2" :slug "two" :title "Two" :status "published"
:lexical {:children (list
{:type "paragraph" :children (list {:type "text" :text "second"})})}})
; ---- genesis-ops ordering ----
(define ops1 (blogimport/genesis-ops (blogimport/lex-blocks (get p1 :lexical))))
(bi-test "genesis op kinds" (map (fn (o) (get o :op)) ops1) (list "insert" "insert"))
(bi-test "genesis first after nil" (get (nth ops1 0) :after) nil)
(bi-test "genesis second after first id" (get (nth ops1 1) :after) "b0")
; ---- import one ----
(define B (persist/open))
(define r1 (blogimport/import-post! B p1 10))
(bi-test "import imported flag" (get r1 :imported) true)
(bi-test "import block count" (get r1 :blocks) 2)
(bi-test "stream version-count" (content/version-count B "post-1") 2)
(bi-test "head ids" (doc-ids (content/head B "post-1")) (list "b0" "b1"))
(bi-test "head body text"
(str (blk-send (doc-find (content/head B "post-1") "b1") "text")) "world")
(bi-test "head heading level"
(blk-send (doc-find (content/head B "post-1") "b0") "level") 1)
; ---- metadata round-trip ----
(bi-test "meta round-trip" (blogimport/load-meta B "post-1") (blogimport/post-meta p1))
(bi-test "meta title" (get (blogimport/load-meta B "post-1") :title) "Hello")
(bi-test "meta tags" (get (blogimport/load-meta B "post-1") :tags) (list "news"))
; ---- idempotent re-import (skip-if-exists, no duplication) ----
(define r1b (blogimport/import-post! B p1 99))
(bi-test "reimport skipped" (get r1b :imported) false)
(bi-test "reimport reason" (get r1b :reason) "exists")
(bi-test "version-count unchanged after reimport" (content/version-count B "post-1") 2)
(bi-test "head ids unchanged after reimport"
(doc-ids (content/head B "post-1")) (list "b0" "b1"))
; ---- import-all! coverage scoreboard ----
(define B2 (persist/open))
(define cov1 (blogimport/import-all! B2 (list p1 p2) 5))
(bi-test "import-all total" (get cov1 :total) 2)
(bi-test "import-all imported" (get cov1 :imported) 2)
(bi-test "import-all skipped" (get cov1 :skipped) 0)
; re-run is fully idempotent
(define cov2 (blogimport/import-all! B2 (list p1 p2) 6))
(bi-test "import-all rerun imported" (get cov2 :imported) 0)
(bi-test "import-all rerun skipped" (get cov2 :skipped) 2)

View File

@@ -0,0 +1,92 @@
; lib/blogimport/tests/lexical.sx — lexical -> content block converter
(st-bootstrap-classes!)
(content-bootstrap-blocks!)
(content-bootstrap-doc!)
(content-bootstrap-callout!)
(content-bootstrap-media!)
; ---- a representative lexical document (Ghost editor JSON, as SX dicts) ----
(define
doc
{:root {:children (list
{:type "heading" :tag "h2" :children (list {:type "text" :text "Title"})}
{:type "paragraph" :children (list
{:type "text" :text "plain "}
{:type "text" :text "bold" :format 1}
{:type "text" :text " then "}
{:type "link" :url "/x" :children (list {:type "text" :text "a link"})})}
{:type "quote" :children (list {:type "text" :text "wise words"})}
{:type "list" :listType "number" :children (list
{:type "listitem" :children (list {:type "text" :text "one"})}
{:type "listitem" :children (list {:type "text" :text "two"})})}
{:type "codeblock" :language "python" :code "print(1)"}
{:type "horizontalrule"}
{:type "image" :src "/c.png" :alt "a cat"}
{:type "callout" :backgroundColor "blue" :children (list {:type "text" :text "note!"})}
{:type "twitter" :url "https://t/x"})}})
(define blocks (blogimport/lex-blocks doc))
; ---- structure ----
(bi-test "block count" (len blocks) 9)
(bi-test "ids by position" (map blk-id blocks)
(list "b0" "b1" "b2" "b3" "b4" "b5" "b6" "b7" "b8"))
(bi-test "types in order" (map blk-type blocks)
(list "heading" "text" "quote" "list" "code" "divider" "image" "callout" "embed"))
; ---- heading ----
(bi-test "heading level" (blk-send (nth blocks 0) "level") 2)
(bi-test "heading text" (str (blk-send (nth blocks 0) "text")) "Title")
; ---- paragraph with inline bold + link, flattened to plain concatenation ----
(bi-test "paragraph flattened text"
(str (blk-send (nth blocks 1) "text")) "plain bold then a link")
; ---- quote ----
(bi-test "quote text" (str (blk-send (nth blocks 2) "text")) "wise words")
; ---- ordered list with items ----
(bi-test "list ordered" (blk-send (nth blocks 3) "ordered") true)
(bi-test "list items" (blk-send (nth blocks 3) "items") (list "one" "two"))
; ---- code block ----
(bi-test "code language" (str (blk-send (nth blocks 4) "language")) "python")
(bi-test "code text" (str (blk-send (nth blocks 4) "text")) "print(1)")
; ---- image ----
(bi-test "image src" (str (blk-send (nth blocks 6) "src")) "/c.png")
(bi-test "image alt" (str (blk-send (nth blocks 6) "alt")) "a cat")
; ---- callout ----
(bi-test "callout kind" (str (blk-send (nth blocks 7) "kind")) "blue")
(bi-test "callout text" (str (blk-send (nth blocks 7) "text")) "note!")
; ---- unknown card routed to embed, provider records original type ----
(bi-test "unknown -> embed provider" (str (blk-send (nth blocks 8) "provider")) "twitter")
; ---- heading level mapping ----
(bi-test "h1 level" (lex-heading-level "h1") 1)
(bi-test "h4 level" (lex-heading-level "h4") 4)
(bi-test "unknown tag default" (lex-heading-level "hx") 2)
; ---- bullet list ----
(define
bdoc
{:children (list {:type "list" :listType "bullet" :children (list
{:type "listitem" :children (list {:type "text" :text "x"})})})})
(bi-test "bullet not ordered" (blk-send (nth (blogimport/lex-blocks bdoc) 0) "ordered") false)
; ---- empty doc ----
(bi-test "empty doc -> no blocks" (len (blogimport/lex-blocks {:root {:children (list)}})) 0)
; ---- bare-children doc (no :root wrapper) ----
(bi-test "bare children doc"
(map blk-type (blogimport/lex-blocks {:children (list {:type "paragraph" :children (list {:type "text" :text "hi"})})}))
(list "text"))
; ---- linebreak/tab in inline flattening ----
(bi-test "linebreak flatten"
(str (blk-send (nth (blogimport/lex-blocks
{:children (list {:type "paragraph" :children (list
{:type "text" :text "a"} {:type "linebreak"} {:type "text" :text "b"})})}) 0) "text"))
"a\nb")

View File

@@ -0,0 +1,80 @@
; lib/blogimport/tests/source.sx — live-source adapter (Q-M4 internal-data query)
(st-bootstrap-classes!)
(content-bootstrap-blocks!)
(content-bootstrap-doc!)
(content-bootstrap-callout!)
(content-bootstrap-media!)
; ---- canned service rows (lexical arrives as a JSON STRING, the DB column) ----
(define
lex1
"{\"root\":{\"children\":[{\"type\":\"heading\",\"tag\":\"h2\",\"children\":[{\"type\":\"text\",\"text\":\"Live\"}]},{\"type\":\"paragraph\",\"children\":[{\"type\":\"text\",\"text\":\"from db\"}]}]}}")
(define
row1
{:uuid "post-1" :slug "live" :title "Live" :status "published"
:visibility "public" :tags (list "x") :authors (list "u") :lexical lex1})
(define
row2
{:uuid "post-2" :slug "two" :title "Two" :status "published"
:lexical "{\"children\":[{\"type\":\"paragraph\",\"children\":[{\"type\":\"text\",\"text\":\"second\"}]}]}"})
; ---- mock transport: (fetch-fn query params) -> response ----
; the `published-posts` migration query returns full rows (incl. lexical) in one batch.
(define
mock-fetch
(fn (query params)
(cond
((equal? query "published-posts") (list row1 row2))
(else nil))))
; ---- parse-row maps fields + parses the lexical JSON string ----
(define post1 (blogimport/parse-row row1))
(bi-test "parse-row id from uuid" (get post1 :id) "post-1")
(bi-test "parse-row title" (get post1 :title) "Live")
(bi-test "parse-row tags" (get post1 :tags) (list "x"))
(bi-test "parse-row lexical parsed to blocks"
(map blk-type (blogimport/lex-blocks (get post1 :lexical))) (list "heading" "text"))
; ---- id fallback (:id when no :uuid) + structured (non-string) lexical ----
(define
post3
(blogimport/parse-row
{:id "post-3" :slug "s3"
:lexical {:children (list {:type "paragraph" :children (list {:type "text" :text "x"})})}}))
(bi-test "parse-row id fallback" (get post3 :id) "post-3")
(bi-test "parse-row structured lexical used as-is"
(map blk-type (blogimport/lex-blocks (get post3 :lexical))) (list "text"))
; ---- source-rows / source-posts ----
(bi-test "source-rows count" (len (blogimport/source-rows mock-fetch)) 2)
(bi-test "source-posts ids"
(map (fn (p) (get p :id)) (blogimport/source-posts mock-fetch))
(list "post-1" "post-2"))
; ---- end-to-end backfill from the live source ----
(define B (persist/open))
(define cov (blogimport/backfill! B mock-fetch 10))
(bi-test "backfill total" (get cov :total) 2)
(bi-test "backfill imported" (get cov :imported) 2)
(bi-test "backfill post-1 version-count" (content/version-count B "post-1") 2)
(bi-test "backfill post-1 head ids" (doc-ids (content/head B "post-1")) (list "b0" "b1"))
(bi-test "backfill post-1 body text"
(str (blk-send (doc-find (content/head B "post-1") "b1") "text")) "from db")
(bi-test "backfill meta title" (get (blogimport/load-meta B "post-1") :title) "Live")
; ---- backfill is idempotent (one-way sync re-run) ----
(define cov2 (blogimport/backfill! B mock-fetch 11))
(bi-test "backfill rerun skipped" (get cov2 :skipped) 2)
; ---- sync-verify: persisted streams match the live-source oracle ----
(define sv (blogimport/sync-verify B mock-fetch))
(bi-test "sync-verify total" (get sv :total) 2)
(bi-test "sync-verify ok" (get sv :ok) 2)
(bi-test "sync-verify no mismatch" (get sv :mismatched) (list))
; ---- partial backfill: client-side id filter (no extra blog query) ----
(define B2 (persist/open))
(define covx (blogimport/backfill-ids! B2 mock-fetch (list "post-2") 10))
(bi-test "backfill-ids imported" (get covx :imported) 1)
(bi-test "backfill-ids post-2 ids" (doc-ids (content/head B2 "post-2")) (list "b0"))
(bi-test "backfill-ids other not imported" (content/version-count B2 "post-1") 0)

View File

@@ -0,0 +1,57 @@
; lib/blogimport/tests/verify.sx — shadow-diff at rest (round-trip parity)
(st-bootstrap-classes!)
(content-bootstrap-blocks!)
(content-bootstrap-doc!)
(content-bootstrap-callout!)
(content-bootstrap-media!)
(define
p1
{:id "post-1" :slug "hello" :title "Hello" :status "published"
:visibility "public" :tags (list "news") :authors (list "u1")
:lexical {:root {:children (list
{:type "heading" :tag "h2" :children (list {:type "text" :text "Title"})}
{:type "paragraph" :children (list
{:type "text" :text "plain "}
{:type "text" :text "bold" :format 1})}
{:type "list" :listType "number" :children (list
{:type "listitem" :children (list {:type "text" :text "one"})}
{:type "listitem" :children (list {:type "text" :text "two"})})}
{:type "image" :src "/c.png" :alt "cat"})}}})
(define
px
{:id "post-x" :slug "ghost" :title "Ghost" :status "published"
:lexical {:children (list {:type "paragraph" :children (list {:type "text" :text "never imported"})})}})
; ---- happy path: replayed == oracle ----
(define B (persist/open))
(blogimport/import-post! B p1 10)
(define v1 (blogimport/verify-post B p1))
(bi-test "verify ok" (get v1 :ok) true)
(bi-test "verify block-ok" (get v1 :block-ok) true)
(bi-test "verify meta-ok" (get v1 :meta-ok) true)
; ---- oracle block model is what we expect (inline bold flattened) ----
(define orc (blogimport/oracle p1))
(bi-test "oracle types"
(get (get orc :blocks) :types) (list "heading" "text" "list" "image"))
(bi-test "oracle contents"
(get (get orc :blocks) :contents) (list "Title" "plain bold" (list "one" "two") "/c.png"))
; ---- corruption is DETECTED (op-log diverges from oracle) ----
(content/commit! B "post-1" (op-update "b1" "text" "CORRUPTED") 100)
(define v2 (blogimport/verify-post B p1))
(bi-test "verify detects corruption" (get v2 :ok) false)
(bi-test "verify corruption is block-level" (get v2 :block-ok) false)
; ---- an un-imported post fails verification (empty replay vs non-empty oracle) ----
(bi-test "unimported not ok" (get (blogimport/verify-post B px) :ok) false)
; ---- verify-all coverage scoreboard ----
(define B3 (persist/open))
(blogimport/import-post! B3 p1 10)
(define cov (blogimport/verify-all B3 (list p1 px)))
(bi-test "verify-all total" (get cov :total) 2)
(bi-test "verify-all ok count" (get cov :ok) 1)
(bi-test "verify-all mismatched" (get cov :mismatched) (list "post-x"))

73
lib/blogimport/verify.sx Normal file
View File

@@ -0,0 +1,73 @@
; lib/blogimport/verify.sx
; Shadow-diff at rest (plans/migration/data-migration.md §6, slice-01-blog.md §4).
;
; After backfill, replay each content:<id> stream -> materialized doc -> block
; model, and diff against the row-derived oracle (lexical->blocks computed directly).
; Structural compare with `=` (not equal?). This proves the genesis import + op-log
; replay is LOSSLESS — "did the backfill corrupt anything" at rest.
;
; The oracle here is the in-memory lexical->blocks of the SAME post, so the property
; verified is round-trip fidelity through persist. Cross-checking against the LIVE
; Python block model (the "does SX match Python" half of Q-D2) is a later wiring
; step that needs the Python oracle via the internal-data query (Q-M4) — flagged,
; not built. The diff plumbing here is the twin that step reuses.
; --- salient content per block (normalized; same on both sides) -----------------
; ids are deterministic + identical on both sides, so they are kept (not stripped).
(define
blogimport/blk-content
(fn (b)
(let ((t (blk-type b)))
(cond
((equal? t "image") (str (blk-send b "src")))
((equal? t "media") (str (blk-send b "src")))
((equal? t "embed") (str (blk-send b "url")))
((equal? t "list") (blk-send b "items"))
((equal? t "divider") "")
(else (str (blk-send b "text")))))))
; --- block model of a block list ------------------------------------------------
(define
blogimport/blocks-model
(fn (blocks)
{:ids (map blk-id blocks)
:types (map blk-type blocks)
:contents (map blogimport/blk-content blocks)}))
; --- oracle: lexical->blocks computed directly from the post (no persist) --------
(define
blogimport/oracle
(fn (post)
{:blocks (blogimport/blocks-model (blogimport/lex-blocks (get post :lexical)))
:meta (blogimport/post-meta post)}))
; --- replayed: from the persisted stream ----------------------------------------
(define
blogimport/replayed
(fn (b id)
{:blocks (blogimport/blocks-model (content/blocks (content/head b id)))
:meta (blogimport/load-meta b id)}))
; --- verify one post: replayed must equal oracle --------------------------------
(define
blogimport/verify-post
(fn (b post)
(let ((id (get post :id)))
(let ((orc (blogimport/oracle post))
(rep (blogimport/replayed b id)))
(let ((block-ok (= (get orc :blocks) (get rep :blocks)))
(meta-ok (= (get orc :meta) (get rep :meta))))
{:id id
:ok (and block-ok meta-ok)
:block-ok block-ok
:meta-ok meta-ok})))))
; --- verify many: coverage scoreboard -------------------------------------------
(define
blogimport/verify-all
(fn (b posts)
(let ((results (map (fn (p) (blogimport/verify-post b p)) posts)))
{:total (len results)
:ok (len (filter (fn (r) (get r :ok)) results))
:mismatched (map (fn (r) (get r :id))
(filter (fn (r) (not (get r :ok))) results))})))