Capture the tooling that pays for itself across the remaining slices, ranked by ROI-per-effort: (1) host_conformance(suite) per-suite fast runner — trivial bash arg, done by hand this session; (2) host_live_check — boot ephemeral server, authed request sequence, return rendered HTML (generalizes run-picker-check.sh; the pre-deploy check that catches serving-JIT divergence conformance misses); (3) host_render_diff — render a route JIT-vs-interpreter and flag divergence (the precise detector that ends the bug class; builds on sx_render_trace; regression oracle for the jit-bytecode-correctness loop); (4) surface deps-check/prim-check as MCP. Plus: file the sx-tree worktree write/validate bug. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
5.2 KiB
Host dev tooling — close the loop on the serving-JIT bug class
The host-on-sx build loop has one expensive, recurring failure mode and a handful of ergonomic papercuts. This plan captures the tooling that would pay for itself across the remaining slices (content-addressing, Slices 6–9). Ordered by ROI-per-effort, not ambition.
The core problem this addresses
Green conformance ≠ correct live. The serving-JIT miscompiles iteration over a
function-produced list under the http-listen render VM — (map f (some-fn)) /
(for-each f (some-fn)) can process only the first element and silently drop the rest.
Conformance (lib/host/conformance.sh) and the ephemeral picker-check do NOT reproduce it
(they passed 287/287 while live rendered 1 of 4 relation editors). The fix lives in a separate
loop (plans/jit-bytecode-correctness.md); until it lands, every host render path has to be
eyeballed live (login + curl + grep the rendered HTML). The tools below make that cheap and,
eventually, automatic. See [[feedback_host_serving_jit_iteration]],
[[project_sx_engine_harness_tests]].
1. host_conformance(suite?) — per-suite, fast (trivial; do first)
Today conformance.sh runs all 11 suites (~10 min, all-or-nothing). Iterating on one subsystem
means hand-extracting the MODULES array to build a focused runner (done by hand this session).
- Change:
conformance.shtakes an optional suite-name arg; with it, emit only that suite'sload+(eval (RUNNER))after the shared MODULES. Without it, run all (current behaviour). - MCP (optional): thin
host_conformance(suite)wrapper on the rose-ash-services server so it returns the{:total :passed :failed :fails}dict directly. - Effort: ~1 line of bash + arg parse. Payoff: every remaining iteration of this loop.
- Not MCP-shaped on its own — the bash arg is 90% of the value; wrap only if convenient.
2. host_live_check(seed, requests) — rendered HTML from an ephemeral server (high ROI)
Generalize lib/host/playwright/run-picker-check.sh from "the picker" to "any route." Boot an
ephemeral host server on a temp persist dir, seed posts, run an authed request sequence, and
return the rendered HTML of each response.
- Why: this is the manual dance we repeat for every render-path change. It's the only thing that catches the serving-JIT divergence conformance misses — because it exercises the real http-listen render VM, not the test harness.
- Shape:
host_live_check({seed: [{title, sx_content, status}...], requests: [{method, path, auth?, body?}...]})→[{status, content_type, body}...]. Reuse serve.sh + the temp-persist / admin-cred / cleanup scaffolding already in run-picker-check.sh. - Effort: medium (mostly lifting run-picker-check.sh's boot/seed/teardown into a parameterized runner). Payoff: kills the most expensive recurring class — turns "deploy then eyeball" into a pre-deploy check.
- Constraint: never
pkill sx_server(sibling loop agents share the binary) — bind the ephemeral server to its own port + temp dir and kill only its own PID, as run-picker-check.sh already does ([[feedback_no_pkill_sx_server]]).
3. host_render_diff(route) — JIT vs interpreter, flag divergence (ends the bug class)
The precise detector. Render a route twice — once through the JIT-served path, once through the interpreter — and diff the HTML. Any divergence IS a serving-JIT miscompile, surfaced at build time instead of live.
- Why: #2 catches divergence only if a human notices the wrong output; this catches it mechanically. It's the tool that would have flagged the 1-of-4-editors bug before deploy.
- Builds on:
sx_render_trace(already in the server's deferred toolset),vm-trace,bytecode-inspect,prim-check(epoch-protocol diagnostics in CLAUDE.md). - Effort: highest (needs a deterministic interpreter-only render path to diff against, and a
stable HTML normalization so incidental ordering doesn't false-positive). Payoff: retires the
"verify live by hand" tax entirely. Coordinate with the
jit-bytecode-correctnessloop — this is also their regression oracle.
4. Surface deps-check / prim-check as MCP (low effort, modest payoff)
Both already exist as epoch-protocol commands (CLAUDE.md). Wrapping them as MCP tools lets us catch unresolved symbols / missing primitives before a live boot, instead of via a load-time error. Strictly an ergonomic win — the capability is already there.
Explicitly NOT building
- A CID / canon inspector.
sx_evalalready giveshost/blog-cid/host/blog--canoninteractively; a dedicated tool wouldn't earn its keep.
Separately: file the sx-tree worktree bug
Not a new tool — a bug. In this worktree (loops/host) every sx-tree WRITE/validate tool
raises yojson "Expected string, got null", forcing Edit/Write on .sx files (against
CLAUDE.md's structural-edit protocol) and sx_eval-load as the validate substitute. File against
whoever owns the sx-tree MCP; it degrades the intended workflow on every .sx edit here.
Sequence
1 (bash suite-filter) → 2 (host_live_check) → 3 (host_render_diff), as natural breaks allow.
Don't detour an in-flight slice for these; pick them up between slices.