Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 25s
- conformance.sh [suite] runs ONE suite (filters the SUITES array so result-parser indices stay aligned; all MODULES still load). 'conformance.sh sxtp' = 0.3s vs ~8min. - lib/host/live-check.sh: non-browser live smoke — boot ephemeral host, login, seed a post (exercises form-ingest write), print status|content-type|body-head per path, assert reads are text/sx + no JSON leak + no 5xx. The counterpart to run-picker-check.sh. - plans/NOTE-render-diff-for-vm-ext.md: defer host_render_diff (JIT-vs-interpreter regression oracle) to the sx-vm-extensions loop — it's their fix's oracle, not a host feature; building it from loops/host would fork JIT-engine understanding. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
97 lines
6.0 KiB
Markdown
97 lines
6.0 KiB
Markdown
# Host dev tooling — close the loop on the serving-JIT bug class
|
||
|
||
The host-on-sx build loop has one expensive, recurring failure mode and a handful of
|
||
ergonomic papercuts. This plan captures the tooling that would pay for itself across the
|
||
remaining slices (content-addressing, Slices 6–9). Ordered by ROI-per-effort, not ambition.
|
||
|
||
## The core problem this addresses
|
||
|
||
**Green conformance ≠ correct live.** The serving-JIT miscompiles iteration over a
|
||
*function-produced list* under the http-listen render VM — `(map f (some-fn))` /
|
||
`(for-each f (some-fn))` can process only the first element and silently drop the rest.
|
||
Conformance (`lib/host/conformance.sh`) and the ephemeral picker-check do NOT reproduce it
|
||
(they passed 287/287 while live rendered 1 of 4 relation editors). The fix lives in a separate
|
||
loop (`plans/jit-bytecode-correctness.md`); until it lands, **every host render path has to be
|
||
eyeballed live** (login + curl + grep the rendered HTML). The tools below make that cheap and,
|
||
eventually, automatic. See `[[feedback_host_serving_jit_iteration]]`,
|
||
`[[project_sx_engine_harness_tests]]`.
|
||
|
||
## 1. `host_conformance(suite?)` — per-suite, fast (trivial; do first) — DONE 2026-06-30
|
||
|
||
`conformance.sh [suite] [-v]` now takes an optional suite name (filters the SUITES array so
|
||
result-parser indices stay aligned; all MODULES still load). `conformance.sh sxtp` runs in
|
||
**0.3s** vs ~8min for the full Datalog-heavy run. Bad name → error listing valid suites.
|
||
|
||
Today `conformance.sh` runs all 11 suites (~10 min, all-or-nothing). Iterating on one subsystem
|
||
means hand-extracting the `MODULES` array to build a focused runner (done by hand this session).
|
||
|
||
- **Change:** `conformance.sh` takes an optional suite-name arg; with it, emit only that suite's
|
||
`load` + `(eval (RUNNER))` after the shared MODULES. Without it, run all (current behaviour).
|
||
- **MCP (optional):** thin `host_conformance(suite)` wrapper on the rose-ash-services server so it
|
||
returns the `{:total :passed :failed :fails}` dict directly.
|
||
- **Effort:** ~1 line of bash + arg parse. **Payoff:** every remaining iteration of this loop.
|
||
- **Not MCP-shaped on its own** — the bash arg is 90% of the value; wrap only if convenient.
|
||
|
||
## 2. `host_live_check` — rendered HTML from an ephemeral server (high ROI) — DONE 2026-06-30
|
||
|
||
Built as `lib/host/live-check.sh` (shell, the right grain — matches run-picker-check.sh). Boots
|
||
an ephemeral host, logs in, seeds a post (exercising the form-ingest write path), then prints
|
||
`status | content-type | body-head` for `/health /posts /feed / /<seeded>/` (or paths passed as
|
||
args). Asserts reads are `text/sx`, no JSON leak, no 5xx, non-empty bodies — ~10s, no browser.
|
||
Caught nothing new today (the wire was already verified) but it's the standing pre-deploy smoke.
|
||
|
||
Generalize `lib/host/playwright/run-picker-check.sh` from "the picker" to "any route." Boot an
|
||
ephemeral host server on a temp persist dir, seed posts, run an **authed request sequence**, and
|
||
return the **rendered HTML** of each response.
|
||
|
||
- **Why:** this is the manual dance we repeat for every render-path change. It's the only thing
|
||
that catches the serving-JIT divergence conformance misses — because it exercises the real
|
||
http-listen render VM, not the test harness.
|
||
- **Shape:** `host_live_check({seed: [{title, sx_content, status}...], requests: [{method, path,
|
||
auth?, body?}...]})` → `[{status, content_type, body}...]`. Reuse serve.sh + the temp-persist /
|
||
admin-cred / cleanup scaffolding already in run-picker-check.sh.
|
||
- **Effort:** medium (mostly lifting run-picker-check.sh's boot/seed/teardown into a parameterized
|
||
runner). **Payoff:** kills the most expensive recurring class — turns "deploy then eyeball" into
|
||
a pre-deploy check.
|
||
- **Constraint:** never `pkill sx_server` (sibling loop agents share the binary) — bind the
|
||
ephemeral server to its own port + temp dir and kill only its own PID, as run-picker-check.sh
|
||
already does (`[[feedback_no_pkill_sx_server]]`).
|
||
|
||
## 3. `host_render_diff(route)` — JIT vs interpreter, flag divergence (ends the bug class)
|
||
|
||
The precise detector. Render a route **twice** — once through the JIT-served path, once through
|
||
the interpreter — and diff the HTML. Any divergence IS a serving-JIT miscompile, surfaced at build
|
||
time instead of live.
|
||
|
||
- **Why:** #2 catches divergence only if a human notices the wrong output; this catches it
|
||
mechanically. It's the tool that would have flagged the 1-of-4-editors bug before deploy.
|
||
- **Builds on:** `sx_render_trace` (already in the server's deferred toolset), `vm-trace`,
|
||
`bytecode-inspect`, `prim-check` (epoch-protocol diagnostics in CLAUDE.md).
|
||
- **Effort:** highest (needs a deterministic interpreter-only render path to diff against, and a
|
||
stable HTML normalization so incidental ordering doesn't false-positive). **Payoff:** retires the
|
||
"verify live by hand" tax entirely. Coordinate with the `jit-bytecode-correctness` loop — this is
|
||
also their regression oracle.
|
||
|
||
## 4. Surface `deps-check` / `prim-check` as MCP (low effort, modest payoff)
|
||
|
||
Both already exist as epoch-protocol commands (CLAUDE.md). Wrapping them as MCP tools lets us catch
|
||
unresolved symbols / missing primitives **before** a live boot, instead of via a load-time error.
|
||
Strictly an ergonomic win — the capability is already there.
|
||
|
||
## Explicitly NOT building
|
||
|
||
- A CID / canon inspector. `sx_eval` already gives `host/blog-cid` / `host/blog--canon`
|
||
interactively; a dedicated tool wouldn't earn its keep.
|
||
|
||
## Separately: file the sx-tree worktree bug
|
||
|
||
Not a new tool — a **bug**. In this worktree (`loops/host`) every sx-tree WRITE/validate tool
|
||
raises `yojson "Expected string, got null"`, forcing `Edit`/`Write` on `.sx` files (against
|
||
CLAUDE.md's structural-edit protocol) and `sx_eval`-load as the validate substitute. File against
|
||
whoever owns the sx-tree MCP; it degrades the intended workflow on every `.sx` edit here.
|
||
|
||
## Sequence
|
||
|
||
1 (bash suite-filter) → 2 (`host_live_check`) → 3 (`host_render_diff`), as natural breaks allow.
|
||
Don't detour an in-flight slice for these; pick them up between slices.
|