blogimport: lexical->persist genesis-import + at-rest parity verifier (55/55)
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 1m9s
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 1m9s
Implements plans/migration/data-migration.md (the un-started long-pole) and the data-layer half of slice-01-blog §4. Host-ops migration module composing content-on-sx + persist public APIs; isolated from lib/host and lib/content. - lexical.sx: Ghost lexical (as SX dicts) -> content block list, deterministic ids - import.sx: genesis import into content:<id> op-log, idempotent, + postmeta stream - verify.sx: replay-and-diff vs row-derived oracle (proves round-trip lossless) Inline formatting flattens to plain text (Phase-5 runs swap-point isolated in lex-inline-text); live Postgres source (Q-M4) + improved-converter re-import (Q-M5) flagged in README. 55/55 conformance: lexical 23, import 21, verify 11. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
47
lib/blogimport/README.md
Normal file
47
lib/blogimport/README.md
Normal file
@@ -0,0 +1,47 @@
|
|||||||
|
# lib/blogimport — blog Postgres → persist genesis-import + parity verifier
|
||||||
|
|
||||||
|
Implements **`plans/migration/data-migration.md`** (the "long-pole nobody had
|
||||||
|
started") and the at-rest half of **`slice-01-blog.md` §4** — the data layer of the
|
||||||
|
blog read-path migration. Host-ops migration tooling, **not** a domain core: it
|
||||||
|
composes the public APIs of content-on-sx (`lib/content`) and persist
|
||||||
|
(`lib/persist`). Kept in its own module (not `lib/host`, not `lib/content`) so it
|
||||||
|
doesn't collide with the loops that own those.
|
||||||
|
|
||||||
|
Status: **machinery complete, 55/55 conformance** (lexical 23, import 21, verify 11).
|
||||||
|
|
||||||
|
## What it does
|
||||||
|
|
||||||
|
| Module | Role |
|
||||||
|
|---|---|
|
||||||
|
| `lexical.sx` | `blogimport/lex-blocks doc` — Ghost **lexical** body (as SX dicts) → content-on-sx **block list**, ids deterministic by position (`b0,b1,…`). |
|
||||||
|
| `import.sx` | `blogimport/import-post! b post at` — genesis import: convert the post's lexical, commit blocks as ordered `op-insert`s into the `content:<id>` op-log stream, record metadata in a sibling `postmeta:<id>` stream. Idempotent (skip-if-exists). `import-all!` → coverage scoreboard. |
|
||||||
|
| `verify.sx` | `blogimport/verify-post b post` — replay the stream → block model, diff vs the row-derived oracle with `=`. `verify-all` → `{:total :ok :mismatched}` coverage. |
|
||||||
|
|
||||||
|
## What is proven
|
||||||
|
|
||||||
|
The verifier holds **`lexical → import → persist → replay → block-model`** equal to
|
||||||
|
**`lexical → block-model`** computed directly. I.e. **the genesis import + op-log
|
||||||
|
replay is lossless** — "did the backfill corrupt anything" at rest
|
||||||
|
(`data-migration.md` §6). The `verify.sx` corruption test confirms a diverging stream
|
||||||
|
is *detected*, not silently passed.
|
||||||
|
|
||||||
|
## Known limitations / TODO (carry into the plan)
|
||||||
|
|
||||||
|
- **Inline formatting is flattened to plain text.** Architecture's content model holds
|
||||||
|
plain-string text (`mk-text id text`); Phase-5 rich inline runs are not merged here.
|
||||||
|
The single swap-point is `lex-inline-text` in `lexical.sx` — return runs there once
|
||||||
|
content-on-sx Phase 5 lands on `architecture`. Bold/italic/links currently collapse
|
||||||
|
to their plain concatenation (drift-proof, == `asText`). (slice-01-blog Q-B1.)
|
||||||
|
- **Oracle is the in-memory lexical→blocks, not the live Python block model.** This
|
||||||
|
proves round-trip fidelity through persist. The "does SX match Python" half of Q-D2
|
||||||
|
needs the **live source**: read real `Post` rows via the internal-data query
|
||||||
|
(`/internal/data/…`) or direct Postgres (**Q-M4**, undecided) and feed them as `post`
|
||||||
|
dicts. The diff plumbing here is the twin that step reuses.
|
||||||
|
- **Re-import with an improved converter (Q-M5)** is import-once today (skip-if-exists).
|
||||||
|
Superseding prior genesis events (vs truncate+re-import) is future work.
|
||||||
|
|
||||||
|
## Run
|
||||||
|
|
||||||
|
```bash
|
||||||
|
bash lib/blogimport/conformance.sh # 55/55; writes scoreboard.{json,md}
|
||||||
|
```
|
||||||
119
lib/blogimport/conformance.sh
Executable file
119
lib/blogimport/conformance.sh
Executable file
@@ -0,0 +1,119 @@
|
|||||||
|
#!/usr/bin/env bash
|
||||||
|
# lib/blogimport/conformance.sh — run blog-import suites, emit scoreboard.
|
||||||
|
# Mirrors lib/content/conformance.sh: epoch-loaded modules + a bi-test counter.
|
||||||
|
|
||||||
|
set -uo pipefail
|
||||||
|
cd "$(git rev-parse --show-toplevel)"
|
||||||
|
|
||||||
|
SX_SERVER="hosts/ocaml/_build/default/bin/sx_server.exe"
|
||||||
|
if [ ! -x "$SX_SERVER" ]; then
|
||||||
|
MAIN_ROOT=$(git worktree list | head -1 | awk '{print $1}')
|
||||||
|
if [ -x "$MAIN_ROOT/$SX_SERVER" ]; then
|
||||||
|
SX_SERVER="$MAIN_ROOT/$SX_SERVER"
|
||||||
|
else
|
||||||
|
echo "ERROR: sx_server.exe not found." >&2
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
|
||||||
|
SUITES=(lexical import verify)
|
||||||
|
|
||||||
|
OUT_JSON="lib/blogimport/scoreboard.json"
|
||||||
|
OUT_MD="lib/blogimport/scoreboard.md"
|
||||||
|
|
||||||
|
run_suite() {
|
||||||
|
local suite=$1
|
||||||
|
local file="lib/blogimport/tests/${suite}.sx"
|
||||||
|
[ -f "$file" ] || { echo "0 0"; return; }
|
||||||
|
local TMP
|
||||||
|
TMP=$(mktemp)
|
||||||
|
cat > "$TMP" << EPOCHS
|
||||||
|
(epoch 1)
|
||||||
|
(load "lib/smalltalk/tokenizer.sx")
|
||||||
|
(load "lib/smalltalk/parser.sx")
|
||||||
|
(load "lib/guest/reflective/class-chain.sx")
|
||||||
|
(load "lib/smalltalk/runtime.sx")
|
||||||
|
(load "lib/guest/reflective/env.sx")
|
||||||
|
(load "lib/smalltalk/eval.sx")
|
||||||
|
(load "lib/persist/event.sx")
|
||||||
|
(load "lib/persist/backend.sx")
|
||||||
|
(load "lib/persist/log.sx")
|
||||||
|
(load "lib/persist/kv.sx")
|
||||||
|
(load "lib/persist/api.sx")
|
||||||
|
(load "lib/content/block.sx")
|
||||||
|
(load "lib/content/doc.sx")
|
||||||
|
(load "lib/content/render.sx")
|
||||||
|
(load "lib/content/api.sx")
|
||||||
|
(load "lib/content/meta.sx")
|
||||||
|
(load "lib/content/section.sx")
|
||||||
|
(load "lib/content/callout.sx")
|
||||||
|
(load "lib/content/media.sx")
|
||||||
|
(load "lib/content/store.sx")
|
||||||
|
(load "lib/blogimport/lexical.sx")
|
||||||
|
(load "lib/blogimport/import.sx")
|
||||||
|
(load "lib/blogimport/verify.sx")
|
||||||
|
(epoch 2)
|
||||||
|
(eval "(define bi-test-pass 0)")
|
||||||
|
(eval "(define bi-test-fail 0)")
|
||||||
|
(eval "(define bi-test-fails (list))")
|
||||||
|
(eval "(define bi-test (fn (name got expected) (if (= got expected) (set! bi-test-pass (+ bi-test-pass 1)) (begin (set! bi-test-fail (+ bi-test-fail 1)) (set! bi-test-fails (cons name bi-test-fails))))))")
|
||||||
|
(epoch 3)
|
||||||
|
(load "${file}")
|
||||||
|
(epoch 4)
|
||||||
|
(eval "(list bi-test-pass bi-test-fail)")
|
||||||
|
EPOCHS
|
||||||
|
|
||||||
|
local OUTPUT
|
||||||
|
OUTPUT=$(timeout 240 "$SX_SERVER" < "$TMP" 2>/dev/null)
|
||||||
|
rm -f "$TMP"
|
||||||
|
|
||||||
|
local LINE
|
||||||
|
LINE=$(echo "$OUTPUT" | awk '/^\(ok-len 4 / {getline; print; exit}')
|
||||||
|
if [ -z "$LINE" ]; then
|
||||||
|
LINE=$(echo "$OUTPUT" | grep -E '^\(ok 4 \([0-9]+ [0-9]+\)\)' | tail -1 \
|
||||||
|
| sed -E 's/^\(ok 4 //; s/\)$//')
|
||||||
|
fi
|
||||||
|
|
||||||
|
local P F
|
||||||
|
P=$(echo "$LINE" | sed -E 's/^\(([0-9]+) ([0-9]+)\).*/\1/')
|
||||||
|
F=$(echo "$LINE" | sed -E 's/^\(([0-9]+) ([0-9]+)\).*/\2/')
|
||||||
|
echo "${P:-0} ${F:-0}"
|
||||||
|
}
|
||||||
|
|
||||||
|
declare -A SUITE_PASS SUITE_FAIL
|
||||||
|
TOTAL_PASS=0
|
||||||
|
TOTAL_FAIL=0
|
||||||
|
|
||||||
|
echo "Running blogimport conformance suite..." >&2
|
||||||
|
for s in "${SUITES[@]}"; do
|
||||||
|
read -r p f < <(run_suite "$s")
|
||||||
|
SUITE_PASS[$s]=$p
|
||||||
|
SUITE_FAIL[$s]=$f
|
||||||
|
TOTAL_PASS=$((TOTAL_PASS + p))
|
||||||
|
TOTAL_FAIL=$((TOTAL_FAIL + f))
|
||||||
|
printf " %-10s %d/%d\n" "$s" "$p" "$((p+f))" >&2
|
||||||
|
done
|
||||||
|
|
||||||
|
{
|
||||||
|
printf '{\n "suites": {\n'
|
||||||
|
first=1
|
||||||
|
for s in "${SUITES[@]}"; do
|
||||||
|
if [ $first -eq 0 ]; then printf ',\n'; fi
|
||||||
|
printf ' "%s": {"pass": %d, "fail": %d}' "$s" "${SUITE_PASS[$s]}" "${SUITE_FAIL[$s]}"
|
||||||
|
first=0
|
||||||
|
done
|
||||||
|
printf '\n },\n "total_pass": %d,\n "total_fail": %d,\n "total": %d\n}\n' \
|
||||||
|
"$TOTAL_PASS" "$TOTAL_FAIL" "$((TOTAL_PASS + TOTAL_FAIL))"
|
||||||
|
} > "$OUT_JSON"
|
||||||
|
|
||||||
|
{
|
||||||
|
printf '# blogimport Conformance Scoreboard\n\n_Generated by `lib/blogimport/conformance.sh`_\n\n'
|
||||||
|
printf '| Suite | Pass | Fail | Total |\n|-------|-----:|-----:|------:|\n'
|
||||||
|
for s in "${SUITES[@]}"; do
|
||||||
|
printf '| %s | %d | %d | %d |\n' "$s" "${SUITE_PASS[$s]}" "${SUITE_FAIL[$s]}" "$(( ${SUITE_PASS[$s]} + ${SUITE_FAIL[$s]} ))"
|
||||||
|
done
|
||||||
|
printf '| **Total** | **%d** | **%d** | **%d** |\n' "$TOTAL_PASS" "$TOTAL_FAIL" "$((TOTAL_PASS + TOTAL_FAIL))"
|
||||||
|
} > "$OUT_MD"
|
||||||
|
|
||||||
|
echo "Total: $TOTAL_PASS pass, $TOTAL_FAIL fail" >&2
|
||||||
|
[ "$TOTAL_FAIL" -eq 0 ]
|
||||||
84
lib/blogimport/import.sx
Normal file
84
lib/blogimport/import.sx
Normal file
@@ -0,0 +1,84 @@
|
|||||||
|
; lib/blogimport/import.sx
|
||||||
|
; Genesis import: a blog Post row -> a persist content op-log stream.
|
||||||
|
;
|
||||||
|
; Per plans/migration/data-migration.md §3-5: for each Post, convert its lexical
|
||||||
|
; body to content blocks and commit them as genesis insert ops into the
|
||||||
|
; content:<id> stream, idempotently, with post metadata recorded as an event in a
|
||||||
|
; sibling stream. The same code runs on mem and durable persist backends (every fn
|
||||||
|
; takes the backend `b`, the acl.sx design principle).
|
||||||
|
;
|
||||||
|
; A `post` is a dict mirroring the blog Post row:
|
||||||
|
; {:id "uuid" :slug "hello" :title "Hello" :status "published"
|
||||||
|
; :visibility "public" :tags (list "a") :authors (list "u1")
|
||||||
|
; :lexical <lexical-doc-as-sx-dict>}
|
||||||
|
; Reading real rows (internal-data query vs direct Postgres, Q-M4) is the live-source
|
||||||
|
; edge, out of scope here; this drives content/commit! given a `post` dict.
|
||||||
|
|
||||||
|
; --- genesis ops: insert each block in document order (deterministic) -----------
|
||||||
|
; first block after nil (prepend), each subsequent after the previous block's id,
|
||||||
|
; reproducing source order so re-import yields the same sequence (data-migration §5).
|
||||||
|
(define
|
||||||
|
blogimport/genesis-ops
|
||||||
|
(fn (blocks)
|
||||||
|
(let ((ids (map blk-id blocks)))
|
||||||
|
(map-indexed
|
||||||
|
(fn (i blk) (op-insert blk (if (= i 0) nil (nth ids (- i 1)))))
|
||||||
|
blocks))))
|
||||||
|
|
||||||
|
; --- post metadata (title/slug/status/visibility/tags/authors) ------------------
|
||||||
|
(define
|
||||||
|
blogimport/post-meta
|
||||||
|
(fn (post)
|
||||||
|
{:title (or (get post :title) "")
|
||||||
|
:slug (or (get post :slug) "")
|
||||||
|
:status (or (get post :status) "")
|
||||||
|
:visibility (or (get post :visibility) "")
|
||||||
|
:tags (or (get post :tags) (list))
|
||||||
|
:authors (or (get post :authors) (list))}))
|
||||||
|
|
||||||
|
; metadata is not a content op, so it rides a sibling event stream postmeta:<id>;
|
||||||
|
; latest event wins (LWW). Replayable + durable like the block op-log.
|
||||||
|
(define blogimport/meta-stream (fn (id) (str "postmeta:" id)))
|
||||||
|
|
||||||
|
(define
|
||||||
|
blogimport/commit-meta!
|
||||||
|
(fn (b id meta at)
|
||||||
|
(persist/append b (blogimport/meta-stream id) "post-meta" at meta)))
|
||||||
|
|
||||||
|
(define
|
||||||
|
blogimport/load-meta
|
||||||
|
(fn (b id)
|
||||||
|
(let ((evs (persist/read b (blogimport/meta-stream id))))
|
||||||
|
(if (= (len evs) 0) nil (persist/event-data (nth evs (- (len evs) 1)))))))
|
||||||
|
|
||||||
|
; --- idempotency: a stream already holding events is already imported -----------
|
||||||
|
; (host-persist guarantees monotonic seq but NOT dedupe — skip-if-exists is the
|
||||||
|
; importer's dedupe, so re-running the backfill never double-imports. data-migration
|
||||||
|
; §5.) Re-import with an improved converter (Q-M5) is future work — superseding,
|
||||||
|
; not duplicating; this build is import-once.
|
||||||
|
(define
|
||||||
|
blogimport/imported?
|
||||||
|
(fn (b id) (> (content/version-count b id) 0)))
|
||||||
|
|
||||||
|
; --- import one post ------------------------------------------------------------
|
||||||
|
(define
|
||||||
|
blogimport/import-post!
|
||||||
|
(fn (b post at)
|
||||||
|
(let ((id (get post :id)))
|
||||||
|
(if
|
||||||
|
(blogimport/imported? b id)
|
||||||
|
{:id id :imported false :reason "exists"}
|
||||||
|
(let ((blocks (blogimport/lex-blocks (get post :lexical))))
|
||||||
|
(begin
|
||||||
|
(content/commit-all! b id (blogimport/genesis-ops blocks) at)
|
||||||
|
(blogimport/commit-meta! b id (blogimport/post-meta post) at)
|
||||||
|
{:id id :imported true :blocks (len blocks)}))))))
|
||||||
|
|
||||||
|
; --- import many: coverage scoreboard -------------------------------------------
|
||||||
|
(define
|
||||||
|
blogimport/import-all!
|
||||||
|
(fn (b posts at)
|
||||||
|
(let ((results (map (fn (p) (blogimport/import-post! b p at)) posts)))
|
||||||
|
{:total (len results)
|
||||||
|
:imported (len (filter (fn (r) (get r :imported)) results))
|
||||||
|
:skipped (len (filter (fn (r) (not (get r :imported))) results))})))
|
||||||
129
lib/blogimport/lexical.sx
Normal file
129
lib/blogimport/lexical.sx
Normal file
@@ -0,0 +1,129 @@
|
|||||||
|
; lib/blogimport/lexical.sx
|
||||||
|
; Lexical (Ghost editor JSON, as SX dicts) -> content-on-sx block list.
|
||||||
|
;
|
||||||
|
; The blog migration's lexical->blocks converter. Lives on the blog/migration
|
||||||
|
; side (NOT lib/content, NOT lib/host) per plans/migration/data-migration.md §7.
|
||||||
|
;
|
||||||
|
; Input shape: a lexical document is an SX dict mirroring the JSON 1:1, e.g.
|
||||||
|
; {:root {:children (list
|
||||||
|
; {:type "heading" :tag "h2" :children (list {:type "text" :text "Hi"})}
|
||||||
|
; {:type "paragraph" :children (list
|
||||||
|
; {:type "text" :text "plain "}
|
||||||
|
; {:type "text" :text "bold" :format 1}
|
||||||
|
; {:type "link" :url "/x" :children (list {:type "text" :text "here"})})})}}
|
||||||
|
;
|
||||||
|
; Block ids are assigned deterministically by top-level position ("b0","b1",...)
|
||||||
|
; so a re-import yields the SAME block sequence (data-migration.md §5 ordering rule).
|
||||||
|
;
|
||||||
|
; INLINE FORMATTING: architecture's content model holds PLAIN-STRING text
|
||||||
|
; (mk-text id text). Phase-5 rich inline runs are not merged here yet, so inline
|
||||||
|
; nodes are flattened to their plain concatenation (== asText, drift-proof). The
|
||||||
|
; single swap-point for the runs upgrade is `lex-inline-text` below — when
|
||||||
|
; content-on-sx Phase 5 lands on architecture, return runs there instead of a
|
||||||
|
; string. (slice-01-blog.md Q-B1; "prove the machinery first, then swap".)
|
||||||
|
|
||||||
|
; Inline format bitmask (lexical): bold=1 italic=2 strikethrough=4 underline=8
|
||||||
|
; code=16 subscript=32 superscript=64. Decoding the bitmask into mark keywords is
|
||||||
|
; deferred to the Phase-5 runs upgrade (no bitwise prim on architecture, and the
|
||||||
|
; active path flattens to plain text anyway). The :format field is read at the
|
||||||
|
; swap-point `lex-inline-text` when runs land.
|
||||||
|
|
||||||
|
; --- inline node -> plain text --------------------------------------------------
|
||||||
|
(define
|
||||||
|
lex-inline-node-text
|
||||||
|
(fn (node)
|
||||||
|
(let ((t (get node :type)))
|
||||||
|
(cond
|
||||||
|
((equal? t "text") (or (get node :text) ""))
|
||||||
|
((equal? t "linebreak") "\n")
|
||||||
|
((equal? t "tab") "\t")
|
||||||
|
((equal? t "link") (lex-inline-text (or (get node :children) (list))))
|
||||||
|
((equal? t "autolink") (lex-inline-text (or (get node :children) (list))))
|
||||||
|
((equal? t "at-link") (lex-inline-text (or (get node :children) (list))))
|
||||||
|
((equal? t "code-highlight") (or (get node :text) ""))
|
||||||
|
(else "")))))
|
||||||
|
|
||||||
|
; flatten a list of inline nodes to one plain string.
|
||||||
|
; *** Phase-5 swap-point: return a runs list here once mk-text accepts runs. ***
|
||||||
|
(define
|
||||||
|
lex-inline-text
|
||||||
|
(fn (children)
|
||||||
|
(reduce
|
||||||
|
(fn (acc n) (str acc (lex-inline-node-text n)))
|
||||||
|
""
|
||||||
|
children)))
|
||||||
|
|
||||||
|
; --- helpers --------------------------------------------------------------------
|
||||||
|
(define
|
||||||
|
lex-heading-level
|
||||||
|
(fn (tag)
|
||||||
|
(cond
|
||||||
|
((equal? tag "h1") 1)
|
||||||
|
((equal? tag "h2") 2)
|
||||||
|
((equal? tag "h3") 3)
|
||||||
|
((equal? tag "h4") 4)
|
||||||
|
((equal? tag "h5") 5)
|
||||||
|
((equal? tag "h6") 6)
|
||||||
|
(else 2))))
|
||||||
|
|
||||||
|
(define
|
||||||
|
lex-listitem-text
|
||||||
|
(fn (item)
|
||||||
|
(lex-inline-text (or (get item :children) (list)))))
|
||||||
|
|
||||||
|
; --- one lexical block node -> a content block (id assigned by caller) ----------
|
||||||
|
(define
|
||||||
|
lex-block
|
||||||
|
(fn (node id)
|
||||||
|
(let ((t (get node :type)))
|
||||||
|
(cond
|
||||||
|
((equal? t "paragraph")
|
||||||
|
(mk-text id (lex-inline-text (or (get node :children) (list)))))
|
||||||
|
((equal? t "extended-text")
|
||||||
|
(mk-text id (lex-inline-text (or (get node :children) (list)))))
|
||||||
|
((equal? t "heading")
|
||||||
|
(mk-heading id (lex-heading-level (get node :tag))
|
||||||
|
(lex-inline-text (or (get node :children) (list)))))
|
||||||
|
((equal? t "extended-heading")
|
||||||
|
(mk-heading id (lex-heading-level (get node :tag))
|
||||||
|
(lex-inline-text (or (get node :children) (list)))))
|
||||||
|
((equal? t "quote")
|
||||||
|
(mk-quote id "" (lex-inline-text (or (get node :children) (list)))))
|
||||||
|
((equal? t "extended-quote")
|
||||||
|
(mk-quote id "" (lex-inline-text (or (get node :children) (list)))))
|
||||||
|
((equal? t "codeblock")
|
||||||
|
(mk-code id (or (get node :language) "") (or (get node :code) "")))
|
||||||
|
((equal? t "list")
|
||||||
|
(mk-list id
|
||||||
|
(equal? (get node :listType) "number")
|
||||||
|
(map lex-listitem-text (or (get node :children) (list)))))
|
||||||
|
((equal? t "horizontalrule") (mk-divider id))
|
||||||
|
((equal? t "image")
|
||||||
|
(mk-image id (or (get node :src) "") (or (get node :alt) "")))
|
||||||
|
((equal? t "callout")
|
||||||
|
(mk-callout id (or (get node :backgroundColor) "grey")
|
||||||
|
(lex-inline-text (or (get node :children) (list)))))
|
||||||
|
((equal? t "video") (mk-media id "video" (or (get node :src) "")))
|
||||||
|
((equal? t "audio") (mk-media id "audio" (or (get node :src) "")))
|
||||||
|
((equal? t "embed") (mk-embed id (or (get node :url) "") "embed"))
|
||||||
|
((equal? t "bookmark") (mk-embed id (or (get node :url) "") "bookmark"))
|
||||||
|
; unknown/unsupported card: route to a generic embed tagged by type so
|
||||||
|
; nothing is silently dropped (provider records the original node type).
|
||||||
|
(else (mk-embed id "" (or t "unknown")))))))
|
||||||
|
|
||||||
|
; --- doc -> top-level children list ---------------------------------------------
|
||||||
|
(define
|
||||||
|
lex-doc-children
|
||||||
|
(fn (doc)
|
||||||
|
(cond
|
||||||
|
((not (equal? (get doc :root) nil)) (or (get (get doc :root) :children) (list)))
|
||||||
|
((not (equal? (get doc :children) nil)) (get doc :children))
|
||||||
|
(else (list)))))
|
||||||
|
|
||||||
|
; --- doc -> content block list (deterministic ids by position) ------------------
|
||||||
|
(define
|
||||||
|
blogimport/lex-blocks
|
||||||
|
(fn (doc)
|
||||||
|
(map-indexed
|
||||||
|
(fn (i node) (lex-block node (str "b" i)))
|
||||||
|
(lex-doc-children doc))))
|
||||||
10
lib/blogimport/scoreboard.json
Normal file
10
lib/blogimport/scoreboard.json
Normal file
@@ -0,0 +1,10 @@
|
|||||||
|
{
|
||||||
|
"suites": {
|
||||||
|
"lexical": {"pass": 23, "fail": 0},
|
||||||
|
"import": {"pass": 21, "fail": 0},
|
||||||
|
"verify": {"pass": 11, "fail": 0}
|
||||||
|
},
|
||||||
|
"total_pass": 55,
|
||||||
|
"total_fail": 0,
|
||||||
|
"total": 55
|
||||||
|
}
|
||||||
10
lib/blogimport/scoreboard.md
Normal file
10
lib/blogimport/scoreboard.md
Normal file
@@ -0,0 +1,10 @@
|
|||||||
|
# blogimport Conformance Scoreboard
|
||||||
|
|
||||||
|
_Generated by `lib/blogimport/conformance.sh`_
|
||||||
|
|
||||||
|
| Suite | Pass | Fail | Total |
|
||||||
|
|-------|-----:|-----:|------:|
|
||||||
|
| lexical | 23 | 0 | 23 |
|
||||||
|
| import | 21 | 0 | 21 |
|
||||||
|
| verify | 11 | 0 | 11 |
|
||||||
|
| **Total** | **55** | **0** | **55** |
|
||||||
62
lib/blogimport/tests/import.sx
Normal file
62
lib/blogimport/tests/import.sx
Normal file
@@ -0,0 +1,62 @@
|
|||||||
|
; lib/blogimport/tests/import.sx — genesis import + idempotency
|
||||||
|
(st-bootstrap-classes!)
|
||||||
|
(content-bootstrap-blocks!)
|
||||||
|
(content-bootstrap-doc!)
|
||||||
|
(content-bootstrap-callout!)
|
||||||
|
(content-bootstrap-media!)
|
||||||
|
|
||||||
|
(define
|
||||||
|
p1
|
||||||
|
{:id "post-1" :slug "hello" :title "Hello" :status "published"
|
||||||
|
:visibility "public" :tags (list "news") :authors (list "u1")
|
||||||
|
:lexical {:root {:children (list
|
||||||
|
{:type "heading" :tag "h1" :children (list {:type "text" :text "Hello"})}
|
||||||
|
{:type "paragraph" :children (list {:type "text" :text "world"})})}}})
|
||||||
|
|
||||||
|
(define
|
||||||
|
p2
|
||||||
|
{:id "post-2" :slug "two" :title "Two" :status "published"
|
||||||
|
:lexical {:children (list
|
||||||
|
{:type "paragraph" :children (list {:type "text" :text "second"})})}})
|
||||||
|
|
||||||
|
; ---- genesis-ops ordering ----
|
||||||
|
(define ops1 (blogimport/genesis-ops (blogimport/lex-blocks (get p1 :lexical))))
|
||||||
|
(bi-test "genesis op kinds" (map (fn (o) (get o :op)) ops1) (list "insert" "insert"))
|
||||||
|
(bi-test "genesis first after nil" (get (nth ops1 0) :after) nil)
|
||||||
|
(bi-test "genesis second after first id" (get (nth ops1 1) :after) "b0")
|
||||||
|
|
||||||
|
; ---- import one ----
|
||||||
|
(define B (persist/open))
|
||||||
|
(define r1 (blogimport/import-post! B p1 10))
|
||||||
|
(bi-test "import imported flag" (get r1 :imported) true)
|
||||||
|
(bi-test "import block count" (get r1 :blocks) 2)
|
||||||
|
(bi-test "stream version-count" (content/version-count B "post-1") 2)
|
||||||
|
(bi-test "head ids" (doc-ids (content/head B "post-1")) (list "b0" "b1"))
|
||||||
|
(bi-test "head body text"
|
||||||
|
(str (blk-send (doc-find (content/head B "post-1") "b1") "text")) "world")
|
||||||
|
(bi-test "head heading level"
|
||||||
|
(blk-send (doc-find (content/head B "post-1") "b0") "level") 1)
|
||||||
|
|
||||||
|
; ---- metadata round-trip ----
|
||||||
|
(bi-test "meta round-trip" (blogimport/load-meta B "post-1") (blogimport/post-meta p1))
|
||||||
|
(bi-test "meta title" (get (blogimport/load-meta B "post-1") :title) "Hello")
|
||||||
|
(bi-test "meta tags" (get (blogimport/load-meta B "post-1") :tags) (list "news"))
|
||||||
|
|
||||||
|
; ---- idempotent re-import (skip-if-exists, no duplication) ----
|
||||||
|
(define r1b (blogimport/import-post! B p1 99))
|
||||||
|
(bi-test "reimport skipped" (get r1b :imported) false)
|
||||||
|
(bi-test "reimport reason" (get r1b :reason) "exists")
|
||||||
|
(bi-test "version-count unchanged after reimport" (content/version-count B "post-1") 2)
|
||||||
|
(bi-test "head ids unchanged after reimport"
|
||||||
|
(doc-ids (content/head B "post-1")) (list "b0" "b1"))
|
||||||
|
|
||||||
|
; ---- import-all! coverage scoreboard ----
|
||||||
|
(define B2 (persist/open))
|
||||||
|
(define cov1 (blogimport/import-all! B2 (list p1 p2) 5))
|
||||||
|
(bi-test "import-all total" (get cov1 :total) 2)
|
||||||
|
(bi-test "import-all imported" (get cov1 :imported) 2)
|
||||||
|
(bi-test "import-all skipped" (get cov1 :skipped) 0)
|
||||||
|
; re-run is fully idempotent
|
||||||
|
(define cov2 (blogimport/import-all! B2 (list p1 p2) 6))
|
||||||
|
(bi-test "import-all rerun imported" (get cov2 :imported) 0)
|
||||||
|
(bi-test "import-all rerun skipped" (get cov2 :skipped) 2)
|
||||||
92
lib/blogimport/tests/lexical.sx
Normal file
92
lib/blogimport/tests/lexical.sx
Normal file
@@ -0,0 +1,92 @@
|
|||||||
|
; lib/blogimport/tests/lexical.sx — lexical -> content block converter
|
||||||
|
(st-bootstrap-classes!)
|
||||||
|
(content-bootstrap-blocks!)
|
||||||
|
(content-bootstrap-doc!)
|
||||||
|
(content-bootstrap-callout!)
|
||||||
|
(content-bootstrap-media!)
|
||||||
|
|
||||||
|
; ---- a representative lexical document (Ghost editor JSON, as SX dicts) ----
|
||||||
|
(define
|
||||||
|
doc
|
||||||
|
{:root {:children (list
|
||||||
|
{:type "heading" :tag "h2" :children (list {:type "text" :text "Title"})}
|
||||||
|
{:type "paragraph" :children (list
|
||||||
|
{:type "text" :text "plain "}
|
||||||
|
{:type "text" :text "bold" :format 1}
|
||||||
|
{:type "text" :text " then "}
|
||||||
|
{:type "link" :url "/x" :children (list {:type "text" :text "a link"})})}
|
||||||
|
{:type "quote" :children (list {:type "text" :text "wise words"})}
|
||||||
|
{:type "list" :listType "number" :children (list
|
||||||
|
{:type "listitem" :children (list {:type "text" :text "one"})}
|
||||||
|
{:type "listitem" :children (list {:type "text" :text "two"})})}
|
||||||
|
{:type "codeblock" :language "python" :code "print(1)"}
|
||||||
|
{:type "horizontalrule"}
|
||||||
|
{:type "image" :src "/c.png" :alt "a cat"}
|
||||||
|
{:type "callout" :backgroundColor "blue" :children (list {:type "text" :text "note!"})}
|
||||||
|
{:type "twitter" :url "https://t/x"})}})
|
||||||
|
|
||||||
|
(define blocks (blogimport/lex-blocks doc))
|
||||||
|
|
||||||
|
; ---- structure ----
|
||||||
|
(bi-test "block count" (len blocks) 9)
|
||||||
|
(bi-test "ids by position" (map blk-id blocks)
|
||||||
|
(list "b0" "b1" "b2" "b3" "b4" "b5" "b6" "b7" "b8"))
|
||||||
|
(bi-test "types in order" (map blk-type blocks)
|
||||||
|
(list "heading" "text" "quote" "list" "code" "divider" "image" "callout" "embed"))
|
||||||
|
|
||||||
|
; ---- heading ----
|
||||||
|
(bi-test "heading level" (blk-send (nth blocks 0) "level") 2)
|
||||||
|
(bi-test "heading text" (str (blk-send (nth blocks 0) "text")) "Title")
|
||||||
|
|
||||||
|
; ---- paragraph with inline bold + link, flattened to plain concatenation ----
|
||||||
|
(bi-test "paragraph flattened text"
|
||||||
|
(str (blk-send (nth blocks 1) "text")) "plain bold then a link")
|
||||||
|
|
||||||
|
; ---- quote ----
|
||||||
|
(bi-test "quote text" (str (blk-send (nth blocks 2) "text")) "wise words")
|
||||||
|
|
||||||
|
; ---- ordered list with items ----
|
||||||
|
(bi-test "list ordered" (blk-send (nth blocks 3) "ordered") true)
|
||||||
|
(bi-test "list items" (blk-send (nth blocks 3) "items") (list "one" "two"))
|
||||||
|
|
||||||
|
; ---- code block ----
|
||||||
|
(bi-test "code language" (str (blk-send (nth blocks 4) "language")) "python")
|
||||||
|
(bi-test "code text" (str (blk-send (nth blocks 4) "text")) "print(1)")
|
||||||
|
|
||||||
|
; ---- image ----
|
||||||
|
(bi-test "image src" (str (blk-send (nth blocks 6) "src")) "/c.png")
|
||||||
|
(bi-test "image alt" (str (blk-send (nth blocks 6) "alt")) "a cat")
|
||||||
|
|
||||||
|
; ---- callout ----
|
||||||
|
(bi-test "callout kind" (str (blk-send (nth blocks 7) "kind")) "blue")
|
||||||
|
(bi-test "callout text" (str (blk-send (nth blocks 7) "text")) "note!")
|
||||||
|
|
||||||
|
; ---- unknown card routed to embed, provider records original type ----
|
||||||
|
(bi-test "unknown -> embed provider" (str (blk-send (nth blocks 8) "provider")) "twitter")
|
||||||
|
|
||||||
|
; ---- heading level mapping ----
|
||||||
|
(bi-test "h1 level" (lex-heading-level "h1") 1)
|
||||||
|
(bi-test "h4 level" (lex-heading-level "h4") 4)
|
||||||
|
(bi-test "unknown tag default" (lex-heading-level "hx") 2)
|
||||||
|
|
||||||
|
; ---- bullet list ----
|
||||||
|
(define
|
||||||
|
bdoc
|
||||||
|
{:children (list {:type "list" :listType "bullet" :children (list
|
||||||
|
{:type "listitem" :children (list {:type "text" :text "x"})})})})
|
||||||
|
(bi-test "bullet not ordered" (blk-send (nth (blogimport/lex-blocks bdoc) 0) "ordered") false)
|
||||||
|
|
||||||
|
; ---- empty doc ----
|
||||||
|
(bi-test "empty doc -> no blocks" (len (blogimport/lex-blocks {:root {:children (list)}})) 0)
|
||||||
|
|
||||||
|
; ---- bare-children doc (no :root wrapper) ----
|
||||||
|
(bi-test "bare children doc"
|
||||||
|
(map blk-type (blogimport/lex-blocks {:children (list {:type "paragraph" :children (list {:type "text" :text "hi"})})}))
|
||||||
|
(list "text"))
|
||||||
|
|
||||||
|
; ---- linebreak/tab in inline flattening ----
|
||||||
|
(bi-test "linebreak flatten"
|
||||||
|
(str (blk-send (nth (blogimport/lex-blocks
|
||||||
|
{:children (list {:type "paragraph" :children (list
|
||||||
|
{:type "text" :text "a"} {:type "linebreak"} {:type "text" :text "b"})})}) 0) "text"))
|
||||||
|
"a\nb")
|
||||||
57
lib/blogimport/tests/verify.sx
Normal file
57
lib/blogimport/tests/verify.sx
Normal file
@@ -0,0 +1,57 @@
|
|||||||
|
; lib/blogimport/tests/verify.sx — shadow-diff at rest (round-trip parity)
|
||||||
|
(st-bootstrap-classes!)
|
||||||
|
(content-bootstrap-blocks!)
|
||||||
|
(content-bootstrap-doc!)
|
||||||
|
(content-bootstrap-callout!)
|
||||||
|
(content-bootstrap-media!)
|
||||||
|
|
||||||
|
(define
|
||||||
|
p1
|
||||||
|
{:id "post-1" :slug "hello" :title "Hello" :status "published"
|
||||||
|
:visibility "public" :tags (list "news") :authors (list "u1")
|
||||||
|
:lexical {:root {:children (list
|
||||||
|
{:type "heading" :tag "h2" :children (list {:type "text" :text "Title"})}
|
||||||
|
{:type "paragraph" :children (list
|
||||||
|
{:type "text" :text "plain "}
|
||||||
|
{:type "text" :text "bold" :format 1})}
|
||||||
|
{:type "list" :listType "number" :children (list
|
||||||
|
{:type "listitem" :children (list {:type "text" :text "one"})}
|
||||||
|
{:type "listitem" :children (list {:type "text" :text "two"})})}
|
||||||
|
{:type "image" :src "/c.png" :alt "cat"})}}})
|
||||||
|
|
||||||
|
(define
|
||||||
|
px
|
||||||
|
{:id "post-x" :slug "ghost" :title "Ghost" :status "published"
|
||||||
|
:lexical {:children (list {:type "paragraph" :children (list {:type "text" :text "never imported"})})}})
|
||||||
|
|
||||||
|
; ---- happy path: replayed == oracle ----
|
||||||
|
(define B (persist/open))
|
||||||
|
(blogimport/import-post! B p1 10)
|
||||||
|
(define v1 (blogimport/verify-post B p1))
|
||||||
|
(bi-test "verify ok" (get v1 :ok) true)
|
||||||
|
(bi-test "verify block-ok" (get v1 :block-ok) true)
|
||||||
|
(bi-test "verify meta-ok" (get v1 :meta-ok) true)
|
||||||
|
|
||||||
|
; ---- oracle block model is what we expect (inline bold flattened) ----
|
||||||
|
(define orc (blogimport/oracle p1))
|
||||||
|
(bi-test "oracle types"
|
||||||
|
(get (get orc :blocks) :types) (list "heading" "text" "list" "image"))
|
||||||
|
(bi-test "oracle contents"
|
||||||
|
(get (get orc :blocks) :contents) (list "Title" "plain bold" (list "one" "two") "/c.png"))
|
||||||
|
|
||||||
|
; ---- corruption is DETECTED (op-log diverges from oracle) ----
|
||||||
|
(content/commit! B "post-1" (op-update "b1" "text" "CORRUPTED") 100)
|
||||||
|
(define v2 (blogimport/verify-post B p1))
|
||||||
|
(bi-test "verify detects corruption" (get v2 :ok) false)
|
||||||
|
(bi-test "verify corruption is block-level" (get v2 :block-ok) false)
|
||||||
|
|
||||||
|
; ---- an un-imported post fails verification (empty replay vs non-empty oracle) ----
|
||||||
|
(bi-test "unimported not ok" (get (blogimport/verify-post B px) :ok) false)
|
||||||
|
|
||||||
|
; ---- verify-all coverage scoreboard ----
|
||||||
|
(define B3 (persist/open))
|
||||||
|
(blogimport/import-post! B3 p1 10)
|
||||||
|
(define cov (blogimport/verify-all B3 (list p1 px)))
|
||||||
|
(bi-test "verify-all total" (get cov :total) 2)
|
||||||
|
(bi-test "verify-all ok count" (get cov :ok) 1)
|
||||||
|
(bi-test "verify-all mismatched" (get cov :mismatched) (list "post-x"))
|
||||||
73
lib/blogimport/verify.sx
Normal file
73
lib/blogimport/verify.sx
Normal file
@@ -0,0 +1,73 @@
|
|||||||
|
; lib/blogimport/verify.sx
|
||||||
|
; Shadow-diff at rest (plans/migration/data-migration.md §6, slice-01-blog.md §4).
|
||||||
|
;
|
||||||
|
; After backfill, replay each content:<id> stream -> materialized doc -> block
|
||||||
|
; model, and diff against the row-derived oracle (lexical->blocks computed directly).
|
||||||
|
; Structural compare with `=` (not equal?). This proves the genesis import + op-log
|
||||||
|
; replay is LOSSLESS — "did the backfill corrupt anything" at rest.
|
||||||
|
;
|
||||||
|
; The oracle here is the in-memory lexical->blocks of the SAME post, so the property
|
||||||
|
; verified is round-trip fidelity through persist. Cross-checking against the LIVE
|
||||||
|
; Python block model (the "does SX match Python" half of Q-D2) is a later wiring
|
||||||
|
; step that needs the Python oracle via the internal-data query (Q-M4) — flagged,
|
||||||
|
; not built. The diff plumbing here is the twin that step reuses.
|
||||||
|
|
||||||
|
; --- salient content per block (normalized; same on both sides) -----------------
|
||||||
|
; ids are deterministic + identical on both sides, so they are kept (not stripped).
|
||||||
|
(define
|
||||||
|
blogimport/blk-content
|
||||||
|
(fn (b)
|
||||||
|
(let ((t (blk-type b)))
|
||||||
|
(cond
|
||||||
|
((equal? t "image") (str (blk-send b "src")))
|
||||||
|
((equal? t "media") (str (blk-send b "src")))
|
||||||
|
((equal? t "embed") (str (blk-send b "url")))
|
||||||
|
((equal? t "list") (blk-send b "items"))
|
||||||
|
((equal? t "divider") "")
|
||||||
|
(else (str (blk-send b "text")))))))
|
||||||
|
|
||||||
|
; --- block model of a block list ------------------------------------------------
|
||||||
|
(define
|
||||||
|
blogimport/blocks-model
|
||||||
|
(fn (blocks)
|
||||||
|
{:ids (map blk-id blocks)
|
||||||
|
:types (map blk-type blocks)
|
||||||
|
:contents (map blogimport/blk-content blocks)}))
|
||||||
|
|
||||||
|
; --- oracle: lexical->blocks computed directly from the post (no persist) --------
|
||||||
|
(define
|
||||||
|
blogimport/oracle
|
||||||
|
(fn (post)
|
||||||
|
{:blocks (blogimport/blocks-model (blogimport/lex-blocks (get post :lexical)))
|
||||||
|
:meta (blogimport/post-meta post)}))
|
||||||
|
|
||||||
|
; --- replayed: from the persisted stream ----------------------------------------
|
||||||
|
(define
|
||||||
|
blogimport/replayed
|
||||||
|
(fn (b id)
|
||||||
|
{:blocks (blogimport/blocks-model (content/blocks (content/head b id)))
|
||||||
|
:meta (blogimport/load-meta b id)}))
|
||||||
|
|
||||||
|
; --- verify one post: replayed must equal oracle --------------------------------
|
||||||
|
(define
|
||||||
|
blogimport/verify-post
|
||||||
|
(fn (b post)
|
||||||
|
(let ((id (get post :id)))
|
||||||
|
(let ((orc (blogimport/oracle post))
|
||||||
|
(rep (blogimport/replayed b id)))
|
||||||
|
(let ((block-ok (= (get orc :blocks) (get rep :blocks)))
|
||||||
|
(meta-ok (= (get orc :meta) (get rep :meta))))
|
||||||
|
{:id id
|
||||||
|
:ok (and block-ok meta-ok)
|
||||||
|
:block-ok block-ok
|
||||||
|
:meta-ok meta-ok})))))
|
||||||
|
|
||||||
|
; --- verify many: coverage scoreboard -------------------------------------------
|
||||||
|
(define
|
||||||
|
blogimport/verify-all
|
||||||
|
(fn (b posts)
|
||||||
|
(let ((results (map (fn (p) (blogimport/verify-post b p)) posts)))
|
||||||
|
{:total (len results)
|
||||||
|
:ok (len (filter (fn (r) (get r :ok)) results))
|
||||||
|
:mismatched (map (fn (r) (get r :id))
|
||||||
|
(filter (fn (r) (not (get r :ok))) results))})))
|
||||||
Reference in New Issue
Block a user