host: SX-native HTML→SX converter (the radar migrator) + first-class HTML import

lib/host/htmlsx.sx — a pure-SX HTML → SX converter (char-level tokenizer + stack parser):
host/html->sx turns a post's HTML into an (article …) tree that host/blog--decompose! consumes
— img / p / figure+figcaption / iframe / headings / blockquote / lists, inline strong/em/a kept
nested (decompose flattens to text), entities decoded to UTF-8, comments+doctype skipped. This
replaces the one-off external Python converter used for the nt-live-encore import.

import-post! now accepts a raw "html" field (converted via html->sx, serialized to sx_content,
decomposed) alongside "sx_content" — so importing real Ghost HTML is first-class. Wired
htmlsx.sx into conformance.sh + serve.sh module lists (loads in conformance AND live).

New htmlsx suite 8/8 (text/entities/void/nested/figure/iframe/comments + an html→sx→decompose→
typed-cards round-trip); blog 197/197 (+ import-from-html test).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
2026-07-01 15:32:06 +00:00
parent a99e64b661
commit 7e2275b90c
6 changed files with 205 additions and 6 deletions

View File

@@ -2211,9 +2211,16 @@
;; (put!/seed!/relate! are sets). Contract: plans/NOTE-blog-types-for-radar.md.
(define host/blog-import-post!
(fn (gp)
(let ((slug (get gp "slug")) (title (get gp "title")))
(let ((slug (get gp "slug")) (title (get gp "title"))
;; content may arrive as raw "html" (converted to an SX tree by the pure-SX
;; converter) OR as "sx_content" (SX source). Either way -> one tree.
(tree (if (get gp "html")
(host/html->sx (get gp "html"))
(parse-safe (or (get gp "sx_content") "")))))
(begin
(host/blog-put! slug title (or (get gp "sx_content") "") (or (get gp "status") "published"))
(host/blog-put! slug title
(if (get gp "html") (serialize tree) (or (get gp "sx_content") ""))
(or (get gp "status") "published"))
(host/blog-relate! slug "article" "is-a")
(host/blog--set-field-values! slug
{"subtitle" (or (get gp "custom_excerpt") (get gp "excerpt") "")
@@ -2226,10 +2233,9 @@
(host/blog-relate! tslug "tag" "is-a")
(host/blog-relate! slug tslug "tagged"))))
(or (get gp "tags") (list)))
;; cards-as-objects: decompose the Ghost body into card objects + a `contains`
;; body, so the post renders via the composition fold (its :body supersedes the
;; opaque sx_content). parse-safe degrades to nil on bad input -> decompose no-ops.
(host/blog--decompose! slug (parse-safe (or (get gp "sx_content") "")))
;; cards-as-objects: decompose the (html- or sx-derived) content tree into card
;; objects + a `contains` body, so the post renders via the composition fold.
(host/blog--decompose! slug tree)
slug))))
;; Import a batch; returns the imported slugs.
(define host/blog-import-all!