Files
rose-ash/plans/host-dev-tooling.md
giles 99d8527d30 plan: host dev tooling — close the loop on the serving-JIT bug class
Capture the tooling that pays for itself across the remaining slices, ranked by
ROI-per-effort: (1) host_conformance(suite) per-suite fast runner — trivial bash arg,
done by hand this session; (2) host_live_check — boot ephemeral server, authed request
sequence, return rendered HTML (generalizes run-picker-check.sh; the pre-deploy check that
catches serving-JIT divergence conformance misses); (3) host_render_diff — render a route
JIT-vs-interpreter and flag divergence (the precise detector that ends the bug class;
builds on sx_render_trace; regression oracle for the jit-bytecode-correctness loop); (4)
surface deps-check/prim-check as MCP. Plus: file the sx-tree worktree write/validate bug.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-30 10:07:18 +00:00

87 lines
5.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Host dev tooling — close the loop on the serving-JIT bug class
The host-on-sx build loop has one expensive, recurring failure mode and a handful of
ergonomic papercuts. This plan captures the tooling that would pay for itself across the
remaining slices (content-addressing, Slices 69). Ordered by ROI-per-effort, not ambition.
## The core problem this addresses
**Green conformance ≠ correct live.** The serving-JIT miscompiles iteration over a
*function-produced list* under the http-listen render VM — `(map f (some-fn))` /
`(for-each f (some-fn))` can process only the first element and silently drop the rest.
Conformance (`lib/host/conformance.sh`) and the ephemeral picker-check do NOT reproduce it
(they passed 287/287 while live rendered 1 of 4 relation editors). The fix lives in a separate
loop (`plans/jit-bytecode-correctness.md`); until it lands, **every host render path has to be
eyeballed live** (login + curl + grep the rendered HTML). The tools below make that cheap and,
eventually, automatic. See `[[feedback_host_serving_jit_iteration]]`,
`[[project_sx_engine_harness_tests]]`.
## 1. `host_conformance(suite?)` — per-suite, fast (trivial; do first)
Today `conformance.sh` runs all 11 suites (~10 min, all-or-nothing). Iterating on one subsystem
means hand-extracting the `MODULES` array to build a focused runner (done by hand this session).
- **Change:** `conformance.sh` takes an optional suite-name arg; with it, emit only that suite's
`load` + `(eval (RUNNER))` after the shared MODULES. Without it, run all (current behaviour).
- **MCP (optional):** thin `host_conformance(suite)` wrapper on the rose-ash-services server so it
returns the `{:total :passed :failed :fails}` dict directly.
- **Effort:** ~1 line of bash + arg parse. **Payoff:** every remaining iteration of this loop.
- **Not MCP-shaped on its own** — the bash arg is 90% of the value; wrap only if convenient.
## 2. `host_live_check(seed, requests)` — rendered HTML from an ephemeral server (high ROI)
Generalize `lib/host/playwright/run-picker-check.sh` from "the picker" to "any route." Boot an
ephemeral host server on a temp persist dir, seed posts, run an **authed request sequence**, and
return the **rendered HTML** of each response.
- **Why:** this is the manual dance we repeat for every render-path change. It's the only thing
that catches the serving-JIT divergence conformance misses — because it exercises the real
http-listen render VM, not the test harness.
- **Shape:** `host_live_check({seed: [{title, sx_content, status}...], requests: [{method, path,
auth?, body?}...]})` → `[{status, content_type, body}...]`. Reuse serve.sh + the temp-persist /
admin-cred / cleanup scaffolding already in run-picker-check.sh.
- **Effort:** medium (mostly lifting run-picker-check.sh's boot/seed/teardown into a parameterized
runner). **Payoff:** kills the most expensive recurring class — turns "deploy then eyeball" into
a pre-deploy check.
- **Constraint:** never `pkill sx_server` (sibling loop agents share the binary) — bind the
ephemeral server to its own port + temp dir and kill only its own PID, as run-picker-check.sh
already does (`[[feedback_no_pkill_sx_server]]`).
## 3. `host_render_diff(route)` — JIT vs interpreter, flag divergence (ends the bug class)
The precise detector. Render a route **twice** — once through the JIT-served path, once through
the interpreter — and diff the HTML. Any divergence IS a serving-JIT miscompile, surfaced at build
time instead of live.
- **Why:** #2 catches divergence only if a human notices the wrong output; this catches it
mechanically. It's the tool that would have flagged the 1-of-4-editors bug before deploy.
- **Builds on:** `sx_render_trace` (already in the server's deferred toolset), `vm-trace`,
`bytecode-inspect`, `prim-check` (epoch-protocol diagnostics in CLAUDE.md).
- **Effort:** highest (needs a deterministic interpreter-only render path to diff against, and a
stable HTML normalization so incidental ordering doesn't false-positive). **Payoff:** retires the
"verify live by hand" tax entirely. Coordinate with the `jit-bytecode-correctness` loop — this is
also their regression oracle.
## 4. Surface `deps-check` / `prim-check` as MCP (low effort, modest payoff)
Both already exist as epoch-protocol commands (CLAUDE.md). Wrapping them as MCP tools lets us catch
unresolved symbols / missing primitives **before** a live boot, instead of via a load-time error.
Strictly an ergonomic win — the capability is already there.
## Explicitly NOT building
- A CID / canon inspector. `sx_eval` already gives `host/blog-cid` / `host/blog--canon`
interactively; a dedicated tool wouldn't earn its keep.
## Separately: file the sx-tree worktree bug
Not a new tool — a **bug**. In this worktree (`loops/host`) every sx-tree WRITE/validate tool
raises `yojson "Expected string, got null"`, forcing `Edit`/`Write` on `.sx` files (against
CLAUDE.md's structural-edit protocol) and `sx_eval`-load as the validate substitute. File against
whoever owns the sx-tree MCP; it degrades the intended workflow on every `.sx` edit here.
## Sequence
1 (bash suite-filter) → 2 (`host_live_check`) → 3 (`host_render_diff`), as natural breaks allow.
Don't detour an in-flight slice for these; pick them up between slices.