Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 1m5s
source.sx: live-source adapter resolving Q-M4 (internal-data query, not direct PG). Injected fetch-fn transport port (hexagonal seam); parse-row maps a blog post-row to the importer post dict and parses the :lexical JSON string via dream-json-parse. End-to-end drivers: backfill! (enumerate->fetch->import) and sync-verify (enumerate->fetch->verify), + backfill-ids! explicit-id fallback. Tests mock the transport against the documented response contract incl. a real lexical JSON string. README flags the one blog-side gap (add a published-posts enumeration query) + production fetch_data wiring (lives in lib/host). source 20/20; total 75/75. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
62 lines
4.1 KiB
Markdown
62 lines
4.1 KiB
Markdown
# lib/blogimport — blog Postgres → persist genesis-import + parity verifier
|
|
|
|
Implements **`plans/migration/data-migration.md`** (the "long-pole nobody had
|
|
started") and the at-rest half of **`slice-01-blog.md` §4** — the data layer of the
|
|
blog read-path migration. Host-ops migration tooling, **not** a domain core: it
|
|
composes the public APIs of content-on-sx (`lib/content`) and persist
|
|
(`lib/persist`). Kept in its own module (not `lib/host`, not `lib/content`) so it
|
|
doesn't collide with the loops that own those.
|
|
|
|
Status: **machinery complete + live-source wired, 75/75 conformance**
|
|
(lexical 23, import 21, verify 11, source 20).
|
|
|
|
## What it does
|
|
|
|
| Module | Role |
|
|
|---|---|
|
|
| `lexical.sx` | `blogimport/lex-blocks doc` — Ghost **lexical** body (as SX dicts) → content-on-sx **block list**, ids deterministic by position (`b0,b1,…`). |
|
|
| `import.sx` | `blogimport/import-post! b post at` — genesis import: convert the post's lexical, commit blocks as ordered `op-insert`s into the `content:<id>` op-log stream, record metadata in a sibling `postmeta:<id>` stream. Idempotent (skip-if-exists). `import-all!` → coverage scoreboard. |
|
|
| `verify.sx` | `blogimport/verify-post b post` — replay the stream → block model, diff vs the row-derived oracle with `=`. `verify-all` → `{:total :ok :mismatched}` coverage. |
|
|
| `source.sx` | **Live source (Q-M4 = internal-data query).** Injected `fetch-fn` transport port; `parse-row` maps a service post-row → importer `post` dict and parses the `:lexical` JSON string (`dream-json-parse`). `backfill! b fetch-fn at` = enumerate → fetch → import; `sync-verify b fetch-fn` = enumerate → fetch → verify. `backfill-ids!` is the explicit-id fallback. |
|
|
|
|
## What is proven
|
|
|
|
The verifier holds **`lexical → import → persist → replay → block-model`** equal to
|
|
**`lexical → block-model`** computed directly. I.e. **the genesis import + op-log
|
|
replay is lossless** — "did the backfill corrupt anything" at rest
|
|
(`data-migration.md` §6). The `verify.sx` corruption test confirms a diverging stream
|
|
is *detected*, not silently passed.
|
|
|
|
## Known limitations / TODO (carry into the plan)
|
|
|
|
- **Inline formatting is flattened to plain text.** Architecture's content model holds
|
|
plain-string text (`mk-text id text`); Phase-5 rich inline runs are not merged here.
|
|
The single swap-point is `lex-inline-text` in `lexical.sx` — return runs there once
|
|
content-on-sx Phase 5 lands on `architecture`. Bold/italic/links currently collapse
|
|
to their plain concatenation (drift-proof, == `asText`). (slice-01-blog Q-B1.)
|
|
- **Q-M4 RESOLVED — live source = internal-data query** (`source.sx`), via an injected
|
|
`fetch-fn` port. The remaining real-world wiring is operational, not design:
|
|
1. **One blog-side query must be added**: `blog/queries.sx` has fetch-by-id/slug/ids
|
|
but **no enumeration query**. Add a `published-posts` defquery returning the
|
|
published ids/slugs (Python `list_posts(status="published")`,
|
|
`blog/bp/blog/ghost_db.py:102`). Until then, drive `backfill-ids!` with an explicit
|
|
id list. `source.sx` is mocked against this contract in `tests/source.sx`.
|
|
2. **Production `fetch-fn`** = the host's HMAC-signed `fetch_data` wrapper
|
|
(`GET /internal/data/{query}`). That wiring lives in `lib/host` (the host loop's
|
|
territory); `source.sx` only needs the port injected.
|
|
3. **Confirm the response field names** of the live `get-post-by-*` data handler
|
|
against `parse-row`'s contract (`:uuid|:id :slug :title :status :visibility :tags
|
|
:authors :lexical`); a mismatch is a one-line field fix.
|
|
- **Oracle is the lexical→blocks of the SAME post, not the live Python block model.**
|
|
This proves round-trip fidelity through persist (no corruption at rest). The "does SX
|
|
match the *Python render*" half of Q-D2 would additionally diff against the Python
|
|
side's own block derivation — deferred with the read-path cutover.
|
|
- **Re-import with an improved converter (Q-M5)** is import-once today (skip-if-exists).
|
|
Superseding prior genesis events (vs truncate+re-import) is future work.
|
|
|
|
## Run
|
|
|
|
```bash
|
|
bash lib/blogimport/conformance.sh # 75/75; writes scoreboard.{json,md}
|
|
```
|