Files
rose-ash/plans/host-dev-tooling.md
giles bd108ae7dd
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 25s
tooling: per-suite conformance filter + live-check.sh; note render-diff to vm-extensions
- conformance.sh [suite] runs ONE suite (filters the SUITES array so result-parser
  indices stay aligned; all MODULES still load). 'conformance.sh sxtp' = 0.3s vs ~8min.
- lib/host/live-check.sh: non-browser live smoke — boot ephemeral host, login, seed a
  post (exercises form-ingest write), print status|content-type|body-head per path,
  assert reads are text/sx + no JSON leak + no 5xx. The counterpart to run-picker-check.sh.
- plans/NOTE-render-diff-for-vm-ext.md: defer host_render_diff (JIT-vs-interpreter
  regression oracle) to the sx-vm-extensions loop — it's their fix's oracle, not a host
  feature; building it from loops/host would fork JIT-engine understanding.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-30 11:24:29 +00:00

6.0 KiB
Raw Blame History

Host dev tooling — close the loop on the serving-JIT bug class

The host-on-sx build loop has one expensive, recurring failure mode and a handful of ergonomic papercuts. This plan captures the tooling that would pay for itself across the remaining slices (content-addressing, Slices 69). Ordered by ROI-per-effort, not ambition.

The core problem this addresses

Green conformance ≠ correct live. The serving-JIT miscompiles iteration over a function-produced list under the http-listen render VM — (map f (some-fn)) / (for-each f (some-fn)) can process only the first element and silently drop the rest. Conformance (lib/host/conformance.sh) and the ephemeral picker-check do NOT reproduce it (they passed 287/287 while live rendered 1 of 4 relation editors). The fix lives in a separate loop (plans/jit-bytecode-correctness.md); until it lands, every host render path has to be eyeballed live (login + curl + grep the rendered HTML). The tools below make that cheap and, eventually, automatic. See [[feedback_host_serving_jit_iteration]], [[project_sx_engine_harness_tests]].

1. host_conformance(suite?) — per-suite, fast (trivial; do first) — DONE 2026-06-30

conformance.sh [suite] [-v] now takes an optional suite name (filters the SUITES array so result-parser indices stay aligned; all MODULES still load). conformance.sh sxtp runs in 0.3s vs ~8min for the full Datalog-heavy run. Bad name → error listing valid suites.

Today conformance.sh runs all 11 suites (~10 min, all-or-nothing). Iterating on one subsystem means hand-extracting the MODULES array to build a focused runner (done by hand this session).

  • Change: conformance.sh takes an optional suite-name arg; with it, emit only that suite's load + (eval (RUNNER)) after the shared MODULES. Without it, run all (current behaviour).
  • MCP (optional): thin host_conformance(suite) wrapper on the rose-ash-services server so it returns the {:total :passed :failed :fails} dict directly.
  • Effort: ~1 line of bash + arg parse. Payoff: every remaining iteration of this loop.
  • Not MCP-shaped on its own — the bash arg is 90% of the value; wrap only if convenient.

2. host_live_check — rendered HTML from an ephemeral server (high ROI) — DONE 2026-06-30

Built as lib/host/live-check.sh (shell, the right grain — matches run-picker-check.sh). Boots an ephemeral host, logs in, seeds a post (exercising the form-ingest write path), then prints status | content-type | body-head for /health /posts /feed / /<seeded>/ (or paths passed as args). Asserts reads are text/sx, no JSON leak, no 5xx, non-empty bodies — ~10s, no browser. Caught nothing new today (the wire was already verified) but it's the standing pre-deploy smoke.

Generalize lib/host/playwright/run-picker-check.sh from "the picker" to "any route." Boot an ephemeral host server on a temp persist dir, seed posts, run an authed request sequence, and return the rendered HTML of each response.

  • Why: this is the manual dance we repeat for every render-path change. It's the only thing that catches the serving-JIT divergence conformance misses — because it exercises the real http-listen render VM, not the test harness.
  • Shape: host_live_check({seed: [{title, sx_content, status}...], requests: [{method, path, auth?, body?}...]})[{status, content_type, body}...]. Reuse serve.sh + the temp-persist / admin-cred / cleanup scaffolding already in run-picker-check.sh.
  • Effort: medium (mostly lifting run-picker-check.sh's boot/seed/teardown into a parameterized runner). Payoff: kills the most expensive recurring class — turns "deploy then eyeball" into a pre-deploy check.
  • Constraint: never pkill sx_server (sibling loop agents share the binary) — bind the ephemeral server to its own port + temp dir and kill only its own PID, as run-picker-check.sh already does ([[feedback_no_pkill_sx_server]]).

3. host_render_diff(route) — JIT vs interpreter, flag divergence (ends the bug class)

The precise detector. Render a route twice — once through the JIT-served path, once through the interpreter — and diff the HTML. Any divergence IS a serving-JIT miscompile, surfaced at build time instead of live.

  • Why: #2 catches divergence only if a human notices the wrong output; this catches it mechanically. It's the tool that would have flagged the 1-of-4-editors bug before deploy.
  • Builds on: sx_render_trace (already in the server's deferred toolset), vm-trace, bytecode-inspect, prim-check (epoch-protocol diagnostics in CLAUDE.md).
  • Effort: highest (needs a deterministic interpreter-only render path to diff against, and a stable HTML normalization so incidental ordering doesn't false-positive). Payoff: retires the "verify live by hand" tax entirely. Coordinate with the jit-bytecode-correctness loop — this is also their regression oracle.

4. Surface deps-check / prim-check as MCP (low effort, modest payoff)

Both already exist as epoch-protocol commands (CLAUDE.md). Wrapping them as MCP tools lets us catch unresolved symbols / missing primitives before a live boot, instead of via a load-time error. Strictly an ergonomic win — the capability is already there.

Explicitly NOT building

  • A CID / canon inspector. sx_eval already gives host/blog-cid / host/blog--canon interactively; a dedicated tool wouldn't earn its keep.

Separately: file the sx-tree worktree bug

Not a new tool — a bug. In this worktree (loops/host) every sx-tree WRITE/validate tool raises yojson "Expected string, got null", forcing Edit/Write on .sx files (against CLAUDE.md's structural-edit protocol) and sx_eval-load as the validate substitute. File against whoever owns the sx-tree MCP; it degrades the intended workflow on every .sx edit here.

Sequence

1 (bash suite-filter) → 2 (host_live_check) → 3 (host_render_diff), as natural breaks allow. Don't detour an in-flight slice for these; pick them up between slices.