11 KiB
OpenTelemetry in SX — loop briefing
Goal: self-hosting observability for the SX host — traces/spans/metrics in pure SX, a
live SVG waterfall dashboard (reactive island), and OTLP-JSON export for interop with
real backends (Jaeger/Grafana). Reference shape: nektro/zig-tracer src/otel.zig (the OTLP span
struct + HTTP emit) — that's just the export step here.
The key insight — a TRACE is a COMPOSITION. A span has {name, start, end, parent, attrs},
so a trace is a tree of spans — the same shape as an object's :body composition. So reuse the
existing fold machinery in lib/host/compose.sx (render-fold) and lib/host/execute.sx
(execute-fold): a span is a timed effect; a waterfall is a render-fold over the span tree;
OTLP export is an export-fold; metrics are an aggregate-fold. Don't reinvent — fold.
Base: this worktree is branched off loops/host (has the composition machinery + Parts A/C:
type-block grammar + type-def editor). You are on branch loops/otel in
/root/rose-ash-loops/otel.
Rules
- Test-first. Write the failing test, then implement to green.
- Fast tests via the warm server:
bash lib/host/warm-conf.sh run <suite>(starts a warm persistent server;runalone = full conformance;eval "<expr>"for a REPL probe). New suite → add it to the runner the same waylib/host/tests/*.sxare wired. - Do NOT deploy to the live container. blog.rose-ash.com is bind-mounted from
/root/rose-ash-loops/host(a different worktree). Build + test only; integration/deploy happens when this branch is merged. (If you want a live smoke, ask — don't recreate the shared container.) .sxediting: prefersx_write_file(validates on parse); if the sx-tree WRITE tools raise a yojson-null error in this worktree, fall back to theWritetool +sx_validate.- Commit each increment to
loops/otelwith a short factual message. Never push tomain. - Cheap by construction: spans go in a bounded in-memory ring buffer, NOT the durable KV
(persisting every span would hammer persist like the old
relations/relatere-saturation bug). Sample + export on demand.
Roadmap — do ONE unchecked [ ] per iteration, test, commit, tick the box.
- P1 — span model + API.
lib/host/otel.sx: a span dict{:trace :span :parent :name :t0 :t1 :attrs :events};otel/with-span name attrs thunk(records t0/t1, pushes/pops a dynamic parent stack so nesting builds the tree); a bounded ring buffer (otel/record!,otel/recent, cap ~1000, drop-oldest);otel/current-span/otel/current-trace. Tests: nested with-span builds parent links; ring caps at N. - P2 — monotonic clock. Find/confirm a time prim on the OCaml host (the warm-conf
profiler + response cache already measure time; grep
lib/host+ the OCaml bridge). Wrap asotel/now-ns. Tests: monotonic non-decreasing, non-negative, awith-spanhast1 >= t0. - P3 — auto-instrument the handlers. Wrap route handlers at the
host/make-app/ router seam (seelib/host/server.sx) so every HTTP request becomes a trace: a root span per request named by method+route, with{:http.method :http.route :http.status}attrs. Tests: a request through the app produces one trace with the right span name + status attr. - P4 — render-fold → SVG waterfall. A trace → an inline
<svg>timeline: one<rect>per span,x∝ (t0 − trace.t0),width∝ duration,y∝ depth, a label. Reuse the compose-fold walk shape. Tests: N spans → N rects; nested spans get increasing y. - P5 — metrics (aggregate-fold). Fold recent spans → per-route counters (request count)
- latency histogram (p50/p95/p99 from durations). Tests: known spans → expected counts + percentiles.
- P6 — live dashboard.
GET /otel— a reactive island (signals + an SSE stream of new traces) that renders the waterfall of the latest trace + the metrics strip, updating live without reload. Reuse the reactive runtime (sx/sx/reactive-runtime.sx,web/) + Dream SSE/streaming already inlib/host. Tests: the island SSRs; the SSE endpoint emits a span event; the page lists recent traces. - P7 — OTLP-JSON export. Serialize spans to the OTLP/JSON schema (resourceSpans →
scopeSpans → spans with traceId/spanId/parentSpanId/name/startTimeUnixNano/endTimeUnixNano/
attributes).
otel/export-otlp traces→ the JSON; POST to an OTLP HTTP collector via an injected transport (so it's testable without a live collector). Tests: OTLP shape matches the spec for a known trace; the transport receives the payload. - P8 — context propagation + errors. Parse/emit the W3C
traceparentheader so a trace spans services (fed with the host's inter-service calls); mark error spans (:status :error- an event). Tests: traceparent round-trips; an error thunk yields an error span.
Progress log (newest first)
- 2026-07-01 — P8 done — ROADMAP COMPLETE (P1–P8, 124/124).
otel/format-traceparent/otel/current-traceparentemit W3C00-<32hex trace>-<16hex span>-01;otel/parse-traceparent→{:version :trace-id :parent-id :flags :sampled}, nil on malformed/bad-width — round-trips.otel/-timednow GUARDS the thunk: success spans get top-level:status "ok"(attrs untouched), a raised error records a span with:status "error"+ an{:name "exception" :message}event, pops the stack, and propagates. 20 new tests (traceparent round-trip + current + malformed; error span status/name/event/message + clean stack; success=ok). GOTCHA (saved to memory): an explicit(raise e)inside a guard handler RE-ENTERS the same guard and hangs — propagate instead via a clause whose TEST does the side-effect and returnsfalse, letting R7RS guard auto-reraise to the outer handler. - 2026-07-01 — P7 done.
otel/export-otlp spansfolds → the OTLP/JSON envelope{:resourceSpans [{:resource … :scopeSpans [{:scope … :spans […]}]}]}; each span has hextraceId(32)/spanId(16)/parentSpanId(fromotel/-pad-hexof the numeric id suffix viastring->number+number->string _ 16), uint64-as-stringstartTimeUnixNano/endTimeUnixNano, typedattributes(number→intValue, elsestringValue), andkind(2 SERVER if http.method, else 1 INTERNAL); root omitsparentSpanId.otel/export-otlp-json→dream-json-encode.otel/post-otlp endpoint spans transportPOSTs{:method :url :headers :body}through an INJECTED transport (tests pass a recorder; real deploy passes http POST). Suite 104/104 (26 new: nesting depth, hex widths+values, string timestamps, kinds, typed attrs, parentSpanId link, json+transport, empty envelope). All needed prims (string->number,number->stringradix,split,keys,assoc,has-key?,dream-json-encode) are real (not server-env), so conformance-safe. - 2026-07-01 — P6 done.
GET /otel(otel/dashboard-route) SSRsotel/dashboard: metrics strip (table) + latest-trace waterfall<svg>+ recent-traces<ul>, on a root carrying Datastar-styledata-on-load="@get('/otel/stream')".GET /otel/stream(otel/stream-route) emits an SSE frameevent: otel.span\ndata: <sxtp event>—otel/span-eventwraps a span as an SXTPevent(the host's Datastar-borrowed wire format),otel/-stream-bodyframes the latest. Plusotel/recent-traces(newest-first {:trace :name :spans}) +otel/latest-trace.otel/routesmounts via make-app. Suite 78/78 (17 new: recent-traces order, SSR svg+strip+id+sub, SSE event-stream/framing/name, GET /otel via make-app, empty-ring placeholder). DECISION: SSR + declarative reactive attrs + SSE patches IS the reactive-island model here (sxtp = Datastar); SSRs viarender-to-html(plain HTML tags, notrender-pagewhich is a server-env prim unavailable in conformance). Live client hydration = deploy concern, out of build+test scope. - 2026-07-01 — P5 done.
otel/metrics spans→{:total-requests N :routes (…)}; each route ={:route :count :p50 :p95 :p99}, route key =:http.routeattr (falls back to span name). Nearest-rank percentiles (rank=ceil(p/100·N), 1-based) over per-route durations; needed a hand-rolledotel/-insert/otel/-sort-nums(nosortprim) + order-preservingotel/-distinct.otel/metrics-recent= over the ring. Suite 61/61 (11 new: total, 2 routes, feed count + p50=30/p95=50/p99=50 from [10..50], single-sample p50, sort helper, empty→zeroed). Note:/is float division here soceil(p/100·N)is exact. - 2026-07-01 — P4 done.
otel/waterfall-rectsfolds a trace's spans → rect geometry (x ∝ t0−trace.t0, width ∝ duration, y ∝ depth viaotel/-depthparent-link ancestor count; zero-dur spans get a 1px sliver).otel/waterfallfolds those into an inline(svg … (g (rect …) (text …)) …)— one rect+label per span — whichrender-to-htmlemits as real SVG (verified: nesteddbspan at y=22 below itsGET /feedroot at y=4). Suite 50/50 (13 new: rect-per-span, depth 0/1/2, increasing-y with nesting, positive widths, svg head + rect/label counts viaotel/-tree-count, empty-trace). GOTCHA: this evaluator's quasiquote splice symbol issplice-unquote, NOTunquote-splicing(plainunquoteis fine) — the wrong name serialised literally and produced 0 rects. - 2026-07-01 — P3 done.
otel/instrument-routeswraps each flattened Dream route's handler in a timed span "METHOD /route" with{:http.method :http.route :http.status};host/make-appapplies it (seam) so every matched request is a trace. Refactoredwith-spanonto a sharedotel/-timedcore with afinalizefn for result-derived attrs (http.status is only known post-handler; bare-string handler results coerced → 200). Suite 37/37; server 13/13 unchanged. NOTE: coldconformance.sh feed|relations|blogcurrently fail at test-file load withUndefined symbol: parse-safe/render-page— these arebind-registered server-env prims insx_server.mlnot resolving in the current shared binary's epoch context; pre-existing & environmental (reproduces with my P3 changes stashed), NOT caused by this work. otel/server/page suites unaffected. - 2026-07-01 — P2 done. Host time prim is
clock-milliseconds(OCamlUnix.gettimeofday, epoch ms; no dedicated nano/monotonic prim).otel/now-nswraps it as epoch NANOSECONDS (×1e6, the OTLP unit) with a high-water clamp so it never steps backwards → durations non-negative across NTP steps. P1 placeholder counter removed. Suite 23/23 (added: non-negative, monotonic non-decreasing, ns-scale, real with-span t1≥t0 + ns-scale t0). - 2026-07-01 — P1 done.
lib/host/otel.sx: span dict +otel/with-span(dynamic parent stack builds the trace tree), monotonic id/clock placeholders (P2 replaces now-ns), bounded ring buffer (record!/recent/set-cap!, drop-oldest),current-span/current-trace,reset!. Suitelib/host/tests/otel.sxwired into conformance — 18/18 (nested parent links, attrs, ring caps at N drops oldest). - (append one dated line per iteration)