otel: tick P1, log progress

This commit is contained in:
2026-07-01 14:18:55 +00:00
parent 087c01e890
commit c8cc4a70dc

70
plans/otel-loop.md Normal file
View File

@@ -0,0 +1,70 @@
# OpenTelemetry in SX — loop briefing
**Goal:** self-hosting observability for the SX host — traces/spans/metrics in **pure SX**, a
**live SVG waterfall dashboard** (reactive island), and **OTLP-JSON export** for interop with
real backends (Jaeger/Grafana). Reference shape: nektro/zig-tracer `src/otel.zig` (the OTLP span
struct + HTTP emit) — that's just the export step here.
**The key insight — a TRACE is a COMPOSITION.** A span has `{name, start, end, parent, attrs}`,
so a trace is a *tree of spans* — the same shape as an object's `:body` composition. So reuse the
existing fold machinery in `lib/host/compose.sx` (render-fold) and `lib/host/execute.sx`
(execute-fold): a span is a *timed effect*; a waterfall is a *render-fold over the span tree*;
OTLP export is an *export-fold*; metrics are an *aggregate-fold*. Don't reinvent — fold.
**Base:** this worktree is branched off `loops/host` (has the composition machinery + Parts A/C:
type-block grammar + type-def editor). You are on branch `loops/otel` in
`/root/rose-ash-loops/otel`.
## Rules
- **Test-first.** Write the failing test, then implement to green.
- **Fast tests via the warm server:** `bash lib/host/warm-conf.sh run <suite>` (starts a warm
persistent server; `run` alone = full conformance; `eval "<expr>"` for a REPL probe). New suite
→ add it to the runner the same way `lib/host/tests/*.sx` are wired.
- **Do NOT deploy to the live container.** blog.rose-ash.com is bind-mounted from
`/root/rose-ash-loops/host` (a *different* worktree). Build + test only; integration/deploy
happens when this branch is merged. (If you want a live smoke, ask — don't recreate the shared
container.)
- **`.sx` editing:** prefer `sx_write_file` (validates on parse); if the sx-tree WRITE tools raise
a yojson-null error in this worktree, fall back to the `Write` tool + `sx_validate`.
- Commit each increment to `loops/otel` with a short factual message. Never push to `main`.
- **Cheap by construction:** spans go in a **bounded in-memory ring buffer**, NOT the durable KV
(persisting every span would hammer persist like the old `relations/relate` re-saturation bug).
Sample + export on demand.
## Roadmap — do ONE unchecked `[ ]` per iteration, test, commit, tick the box.
- [x] **P1 — span model + API.** `lib/host/otel.sx`: a span dict `{:trace :span :parent :name
:t0 :t1 :attrs :events}`; `otel/with-span name attrs thunk` (records t0/t1, pushes/pops a
dynamic parent stack so nesting builds the tree); a bounded ring buffer (`otel/record!`,
`otel/recent`, cap ~1000, drop-oldest); `otel/current-span`/`otel/current-trace`. Tests:
nested with-span builds parent links; ring caps at N.
- [ ] **P2 — monotonic clock.** Find/confirm a time prim on the OCaml host (the warm-conf
profiler + response cache already measure time; grep `lib/host` + the OCaml bridge). Wrap as
`otel/now-ns`. Tests: monotonic non-decreasing, non-negative, a `with-span` has `t1 >= t0`.
- [ ] **P3 — auto-instrument the handlers.** Wrap route handlers at the `host/make-app` / router
seam (see `lib/host/server.sx`) so every HTTP request becomes a trace: a root span per request
named by method+route, with `{:http.method :http.route :http.status}` attrs. Tests: a request
through the app produces one trace with the right span name + status attr.
- [ ] **P4 — render-fold → SVG waterfall.** A trace → an inline `<svg>` timeline: one `<rect>`
per span, `x` ∝ (t0 trace.t0), `width` ∝ duration, `y` ∝ depth, a label. Reuse the
compose-fold walk shape. Tests: N spans → N rects; nested spans get increasing y.
- [ ] **P5 — metrics (aggregate-fold).** Fold recent spans → per-route counters (request count)
+ latency histogram (p50/p95/p99 from durations). Tests: known spans → expected counts +
percentiles.
- [ ] **P6 — live dashboard.** `GET /otel` — a reactive island (signals + an SSE stream of new
traces) that renders the waterfall of the latest trace + the metrics strip, updating live
without reload. Reuse the reactive runtime (`sx/sx/reactive-runtime.sx`, `web/`) + Dream
SSE/streaming already in `lib/host`. Tests: the island SSRs; the SSE endpoint emits a span
event; the page lists recent traces.
- [ ] **P7 — OTLP-JSON export.** Serialize spans to the OTLP/JSON schema (resourceSpans →
scopeSpans → spans with traceId/spanId/parentSpanId/name/startTimeUnixNano/endTimeUnixNano/
attributes). `otel/export-otlp traces` → the JSON; POST to an OTLP HTTP collector via an
**injected transport** (so it's testable without a live collector). Tests: OTLP shape matches
the spec for a known trace; the transport receives the payload.
- [ ] **P8 — context propagation + errors.** Parse/emit the W3C `traceparent` header so a trace
spans services (fed with the host's inter-service calls); mark error spans (`:status :error`
+ an event). Tests: traceparent round-trips; an error thunk yields an error span.
## Progress log (newest first)
- 2026-07-01 — P1 done. `lib/host/otel.sx`: span dict + `otel/with-span` (dynamic parent stack builds the trace tree), monotonic id/clock placeholders (P2 replaces now-ns), bounded ring buffer (`record!`/`recent`/`set-cap!`, drop-oldest), `current-span`/`current-trace`, `reset!`. Suite `lib/host/tests/otel.sx` wired into conformance — 18/18 (nested parent links, attrs, ring caps at N drops oldest).
- (append one dated line per iteration)