Files
rose-ash/plans/host-dev-tooling.md
giles bd108ae7dd
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 25s
tooling: per-suite conformance filter + live-check.sh; note render-diff to vm-extensions
- conformance.sh [suite] runs ONE suite (filters the SUITES array so result-parser
  indices stay aligned; all MODULES still load). 'conformance.sh sxtp' = 0.3s vs ~8min.
- lib/host/live-check.sh: non-browser live smoke — boot ephemeral host, login, seed a
  post (exercises form-ingest write), print status|content-type|body-head per path,
  assert reads are text/sx + no JSON leak + no 5xx. The counterpart to run-picker-check.sh.
- plans/NOTE-render-diff-for-vm-ext.md: defer host_render_diff (JIT-vs-interpreter
  regression oracle) to the sx-vm-extensions loop — it's their fix's oracle, not a host
  feature; building it from loops/host would fork JIT-engine understanding.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-30 11:24:29 +00:00

97 lines
6.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Host dev tooling — close the loop on the serving-JIT bug class
The host-on-sx build loop has one expensive, recurring failure mode and a handful of
ergonomic papercuts. This plan captures the tooling that would pay for itself across the
remaining slices (content-addressing, Slices 69). Ordered by ROI-per-effort, not ambition.
## The core problem this addresses
**Green conformance ≠ correct live.** The serving-JIT miscompiles iteration over a
*function-produced list* under the http-listen render VM — `(map f (some-fn))` /
`(for-each f (some-fn))` can process only the first element and silently drop the rest.
Conformance (`lib/host/conformance.sh`) and the ephemeral picker-check do NOT reproduce it
(they passed 287/287 while live rendered 1 of 4 relation editors). The fix lives in a separate
loop (`plans/jit-bytecode-correctness.md`); until it lands, **every host render path has to be
eyeballed live** (login + curl + grep the rendered HTML). The tools below make that cheap and,
eventually, automatic. See `[[feedback_host_serving_jit_iteration]]`,
`[[project_sx_engine_harness_tests]]`.
## 1. `host_conformance(suite?)` — per-suite, fast (trivial; do first) — DONE 2026-06-30
`conformance.sh [suite] [-v]` now takes an optional suite name (filters the SUITES array so
result-parser indices stay aligned; all MODULES still load). `conformance.sh sxtp` runs in
**0.3s** vs ~8min for the full Datalog-heavy run. Bad name → error listing valid suites.
Today `conformance.sh` runs all 11 suites (~10 min, all-or-nothing). Iterating on one subsystem
means hand-extracting the `MODULES` array to build a focused runner (done by hand this session).
- **Change:** `conformance.sh` takes an optional suite-name arg; with it, emit only that suite's
`load` + `(eval (RUNNER))` after the shared MODULES. Without it, run all (current behaviour).
- **MCP (optional):** thin `host_conformance(suite)` wrapper on the rose-ash-services server so it
returns the `{:total :passed :failed :fails}` dict directly.
- **Effort:** ~1 line of bash + arg parse. **Payoff:** every remaining iteration of this loop.
- **Not MCP-shaped on its own** — the bash arg is 90% of the value; wrap only if convenient.
## 2. `host_live_check` — rendered HTML from an ephemeral server (high ROI) — DONE 2026-06-30
Built as `lib/host/live-check.sh` (shell, the right grain — matches run-picker-check.sh). Boots
an ephemeral host, logs in, seeds a post (exercising the form-ingest write path), then prints
`status | content-type | body-head` for `/health /posts /feed / /<seeded>/` (or paths passed as
args). Asserts reads are `text/sx`, no JSON leak, no 5xx, non-empty bodies — ~10s, no browser.
Caught nothing new today (the wire was already verified) but it's the standing pre-deploy smoke.
Generalize `lib/host/playwright/run-picker-check.sh` from "the picker" to "any route." Boot an
ephemeral host server on a temp persist dir, seed posts, run an **authed request sequence**, and
return the **rendered HTML** of each response.
- **Why:** this is the manual dance we repeat for every render-path change. It's the only thing
that catches the serving-JIT divergence conformance misses — because it exercises the real
http-listen render VM, not the test harness.
- **Shape:** `host_live_check({seed: [{title, sx_content, status}...], requests: [{method, path,
auth?, body?}...]})` → `[{status, content_type, body}...]`. Reuse serve.sh + the temp-persist /
admin-cred / cleanup scaffolding already in run-picker-check.sh.
- **Effort:** medium (mostly lifting run-picker-check.sh's boot/seed/teardown into a parameterized
runner). **Payoff:** kills the most expensive recurring class — turns "deploy then eyeball" into
a pre-deploy check.
- **Constraint:** never `pkill sx_server` (sibling loop agents share the binary) — bind the
ephemeral server to its own port + temp dir and kill only its own PID, as run-picker-check.sh
already does (`[[feedback_no_pkill_sx_server]]`).
## 3. `host_render_diff(route)` — JIT vs interpreter, flag divergence (ends the bug class)
The precise detector. Render a route **twice** — once through the JIT-served path, once through
the interpreter — and diff the HTML. Any divergence IS a serving-JIT miscompile, surfaced at build
time instead of live.
- **Why:** #2 catches divergence only if a human notices the wrong output; this catches it
mechanically. It's the tool that would have flagged the 1-of-4-editors bug before deploy.
- **Builds on:** `sx_render_trace` (already in the server's deferred toolset), `vm-trace`,
`bytecode-inspect`, `prim-check` (epoch-protocol diagnostics in CLAUDE.md).
- **Effort:** highest (needs a deterministic interpreter-only render path to diff against, and a
stable HTML normalization so incidental ordering doesn't false-positive). **Payoff:** retires the
"verify live by hand" tax entirely. Coordinate with the `jit-bytecode-correctness` loop — this is
also their regression oracle.
## 4. Surface `deps-check` / `prim-check` as MCP (low effort, modest payoff)
Both already exist as epoch-protocol commands (CLAUDE.md). Wrapping them as MCP tools lets us catch
unresolved symbols / missing primitives **before** a live boot, instead of via a load-time error.
Strictly an ergonomic win — the capability is already there.
## Explicitly NOT building
- A CID / canon inspector. `sx_eval` already gives `host/blog-cid` / `host/blog--canon`
interactively; a dedicated tool wouldn't earn its keep.
## Separately: file the sx-tree worktree bug
Not a new tool — a **bug**. In this worktree (`loops/host`) every sx-tree WRITE/validate tool
raises `yojson "Expected string, got null"`, forcing `Edit`/`Write` on `.sx` files (against
CLAUDE.md's structural-edit protocol) and `sx_eval`-load as the validate substitute. File against
whoever owns the sx-tree MCP; it degrades the intended workflow on every `.sx` edit here.
## Sequence
1 (bash suite-filter) → 2 (`host_live_check`) → 3 (`host_render_diff`), as natural breaks allow.
Don't detour an in-flight slice for these; pick them up between slices.