diff --git a/plans/host-dev-tooling.md b/plans/host-dev-tooling.md new file mode 100644 index 00000000..960480e3 --- /dev/null +++ b/plans/host-dev-tooling.md @@ -0,0 +1,86 @@ +# Host dev tooling — close the loop on the serving-JIT bug class + +The host-on-sx build loop has one expensive, recurring failure mode and a handful of +ergonomic papercuts. This plan captures the tooling that would pay for itself across the +remaining slices (content-addressing, Slices 6–9). Ordered by ROI-per-effort, not ambition. + +## The core problem this addresses + +**Green conformance ≠ correct live.** The serving-JIT miscompiles iteration over a +*function-produced list* under the http-listen render VM — `(map f (some-fn))` / +`(for-each f (some-fn))` can process only the first element and silently drop the rest. +Conformance (`lib/host/conformance.sh`) and the ephemeral picker-check do NOT reproduce it +(they passed 287/287 while live rendered 1 of 4 relation editors). The fix lives in a separate +loop (`plans/jit-bytecode-correctness.md`); until it lands, **every host render path has to be +eyeballed live** (login + curl + grep the rendered HTML). The tools below make that cheap and, +eventually, automatic. See `[[feedback_host_serving_jit_iteration]]`, +`[[project_sx_engine_harness_tests]]`. + +## 1. `host_conformance(suite?)` — per-suite, fast (trivial; do first) + +Today `conformance.sh` runs all 11 suites (~10 min, all-or-nothing). Iterating on one subsystem +means hand-extracting the `MODULES` array to build a focused runner (done by hand this session). + +- **Change:** `conformance.sh` takes an optional suite-name arg; with it, emit only that suite's + `load` + `(eval (RUNNER))` after the shared MODULES. Without it, run all (current behaviour). +- **MCP (optional):** thin `host_conformance(suite)` wrapper on the rose-ash-services server so it + returns the `{:total :passed :failed :fails}` dict directly. +- **Effort:** ~1 line of bash + arg parse. **Payoff:** every remaining iteration of this loop. +- **Not MCP-shaped on its own** — the bash arg is 90% of the value; wrap only if convenient. + +## 2. `host_live_check(seed, requests)` — rendered HTML from an ephemeral server (high ROI) + +Generalize `lib/host/playwright/run-picker-check.sh` from "the picker" to "any route." Boot an +ephemeral host server on a temp persist dir, seed posts, run an **authed request sequence**, and +return the **rendered HTML** of each response. + +- **Why:** this is the manual dance we repeat for every render-path change. It's the only thing + that catches the serving-JIT divergence conformance misses — because it exercises the real + http-listen render VM, not the test harness. +- **Shape:** `host_live_check({seed: [{title, sx_content, status}...], requests: [{method, path, + auth?, body?}...]})` → `[{status, content_type, body}...]`. Reuse serve.sh + the temp-persist / + admin-cred / cleanup scaffolding already in run-picker-check.sh. +- **Effort:** medium (mostly lifting run-picker-check.sh's boot/seed/teardown into a parameterized + runner). **Payoff:** kills the most expensive recurring class — turns "deploy then eyeball" into + a pre-deploy check. +- **Constraint:** never `pkill sx_server` (sibling loop agents share the binary) — bind the + ephemeral server to its own port + temp dir and kill only its own PID, as run-picker-check.sh + already does (`[[feedback_no_pkill_sx_server]]`). + +## 3. `host_render_diff(route)` — JIT vs interpreter, flag divergence (ends the bug class) + +The precise detector. Render a route **twice** — once through the JIT-served path, once through +the interpreter — and diff the HTML. Any divergence IS a serving-JIT miscompile, surfaced at build +time instead of live. + +- **Why:** #2 catches divergence only if a human notices the wrong output; this catches it + mechanically. It's the tool that would have flagged the 1-of-4-editors bug before deploy. +- **Builds on:** `sx_render_trace` (already in the server's deferred toolset), `vm-trace`, + `bytecode-inspect`, `prim-check` (epoch-protocol diagnostics in CLAUDE.md). +- **Effort:** highest (needs a deterministic interpreter-only render path to diff against, and a + stable HTML normalization so incidental ordering doesn't false-positive). **Payoff:** retires the + "verify live by hand" tax entirely. Coordinate with the `jit-bytecode-correctness` loop — this is + also their regression oracle. + +## 4. Surface `deps-check` / `prim-check` as MCP (low effort, modest payoff) + +Both already exist as epoch-protocol commands (CLAUDE.md). Wrapping them as MCP tools lets us catch +unresolved symbols / missing primitives **before** a live boot, instead of via a load-time error. +Strictly an ergonomic win — the capability is already there. + +## Explicitly NOT building + +- A CID / canon inspector. `sx_eval` already gives `host/blog-cid` / `host/blog--canon` + interactively; a dedicated tool wouldn't earn its keep. + +## Separately: file the sx-tree worktree bug + +Not a new tool — a **bug**. In this worktree (`loops/host`) every sx-tree WRITE/validate tool +raises `yojson "Expected string, got null"`, forcing `Edit`/`Write` on `.sx` files (against +CLAUDE.md's structural-edit protocol) and `sx_eval`-load as the validate substitute. File against +whoever owns the sx-tree MCP; it degrades the intended workflow on every `.sx` edit here. + +## Sequence + +1 (bash suite-filter) → 2 (`host_live_check`) → 3 (`host_render_diff`), as natural breaks allow. +Don't detour an in-flight slice for these; pick them up between slices.