rose-ash/plans/agent-briefings/probabilistic-loop.md

# probabilistic-on-sx loop agent (single agent, queue-driven)

Role: iterates `plans/probabilistic-on-sx.md` forever. **Weighted nondeterminism +
traces + inference** — programs declare distributions, the runtime infers.
Church-flavoured core. The chisel is *trace*: what it means to record a weighted
execution, and how `sample`/`observe` differ from plain nondeterminism. One
feature per commit.

```
description: probabilistic-on-sx queue loop
subagent_type: general-purpose
run_in_background: true
isolation: worktree
```

## Prerequisites — check before starting

1. **lib-guest lex + pratt present** — the Scheme-flavoured parser consumes
   `lib/guest/lex.sx` + `lib/guest/pratt.sx`.
2. **Multi-shot continuations (`perform`/`cek-resume`)** must be real, not a
   single-shot stub — MH (Phase 6) re-executes from a changed choice point. This is
   the same capability `koka-on-sx` validates; confirm it before Phase 4.

**Pre-flight:**
```
ls /root/rose-ash/lib/guest/lex.sx /root/rose-ash/lib/guest/pratt.sx
```
If lib-guest is missing, stop and record a Blockers entry. (Phases 1–3 don't need
multi-shot; verify multi-shot before starting Phase 4/6.)

## Prompt

You are the sole background agent working
`/root/rose-ash/plans/probabilistic-on-sx.md`, in an isolated git worktree on
branch `loops/probabilistic`, forever, one commit per feature. Push to
`origin/loops/probabilistic` after every commit. Never touch `main` or
`architecture`.

## Restart baseline — check before iterating

1. Read `plans/probabilistic-on-sx.md` — Roadmap + Progress log + Blockers.
2. Run the pre-flight; record gaps in Blockers.
3. `ls lib/probabilistic/` — pick up from the most advanced file. No dir → Phase 1.
4. If `lib/probabilistic/tests/*.sx` exist, run them via the epoch protocol against
   `sx_server.exe`. Green before new work.

## The queue

Phase order per `plans/probabilistic-on-sx.md`:

- **Phase 1** — parser + deterministic Scheme core on the CEK
- **Phase 2** — `sample`/`observe` as effects (`perform :sample` / `:observe`);
  default = forward sampling
- **Phase 3** — distribution library (uniform/normal/gamma/beta/bernoulli/
  categorical/dirichlet/poisson), each `(sample-fn, log-prob-fn)`
- **Phase 4** — **trace recording + replay** (the chisel: a tracing handler logs
  `{:id :value :log-weight}`; a replay handler forces recorded values)
- **Phase 5** — importance sampling (run N times, accumulate `observe` log-weights)
- **Phase 6** — Metropolis-Hastings (**multi-shot**: re-execute from a changed
  choice point; accept/reject by Hastings ratio)
- **Phase 7** — mean-field VI (ELBO + `lib/probabilistic/autodiff.sx`, forward-mode)
- **Phase 8** — stdlib/idioms (mixtures, GPs, HMMs, change-point)
- **Phase 9** — propose `lib/guest/probabilistic/` extraction (wait for a 2nd consumer)

Within a phase, pick the checkbox with the best tests-per-effort ratio.
Every iteration: implement → test → commit → tick `[ ]` → Progress log → push → next.

## Chisel discipline — trace & weight

Two substrate payoffs. (1) **Phase 4 trace/replay** forces SX to articulate what
recording an execution means — every `sample` is a labelled, weighted choice in a
trace value. (2) **Phase 6 MH** is the multi-shot continuation stress test from the
inference side: re-running from a proposed-changed point requires `cek-resume` to
resume the *same* captured continuation more than once. If MH gives wrong
posteriors and the math checks out, suspect single-shot resumption — write the
failing test + Blockers entry (the fix is in `spec/`, not this loop).
Determinism for tests: vary draws by trace `id`/seed passed in, never a wall clock;
inference tests assert *approximate* posteriors with tolerances, not exact values.

## Ground rules (hard)

- **Scope:** only `lib/probabilistic/**` and `plans/probabilistic-on-sx.md`. Do
  **not** edit `spec/`, `hosts/`, `shared/`, `lib/guest/**` (read-only), or other
  `lib/<lang>/`.
- **Consume `lib/guest/`** (lex, pratt). Inference machinery (IS/MH/VI, autodiff) is
  yours, in SX.
- **Don't patch the substrate.** Multi-shot misbehavior → failing test + Blockers
  entry; the fix lives in `spec/evaluator.sx`, not here.
- **NEVER call `sx_build`** (600s watchdog). Broken binary → Blockers, stop.
- **SX files:** `sx-tree` MCP tools ONLY; `sx_validate` after every edit; `file:` not
  `path:`. Never `Edit`/`Read`/`Write` on `.sx`.
- **Worktree:** commit, then push `origin/loops/probabilistic`. Never
  `main`/`architecture`.
- **Commits:** one feature per commit (`prob: trace/replay handler + 5 tests`).
- **Plan file:** Progress log + tick boxes every commit.
- **Blocked 2 iterations → Blockers, move on.**

## Probabilistic-specific gotchas

- **`sample` choices ≠ `conde`-style nondeterminism.** A `sample` is a *weighted*
  choice carrying a log-prob; an `observe` conditions (multiplies in a weight)
  without branching. Keep weight bookkeeping in the log domain to avoid underflow.
- **Trace identity is the linchpin.** Replay/MH match choices by stable `id` (call
  site + loop index), not by order — get id assignment deterministic and stable
  across re-execution or replay silently diverges.
- **MH proposes a local change, then re-executes the tail.** Only the chosen site's
  value changes; downstream `sample`s are replayed where possible. The accept ratio
  uses prior × likelihood × proposal — get the Hastings correction right.
- **Inference is approximate.** Never assert exact posteriors; use ESS/tolerance
  checks. Seed-dependent flakiness means deterministic seeds in tests.
- **Autodiff (Phase 7) is forward-mode minimum** — dual numbers over the arithmetic
  prims; don't reach for reverse-mode unless a test demands it.

## General gotchas (all loops)

- SX `do` = R7RS iteration; use `begin` for multi-expr sequences.
- `cond`/`when`/`let` clauses evaluate only the last expr — wrap multiples in `begin`.
- `let` is parallel — nest `let`s when one binding references an earlier one.
- `env-bind!` creates a binding; `env-set!` mutates an existing one.
- Namespace-prefix guest helpers (`prob/…`).
- Shell heredoc `||` gets eaten — escape or use `case`.

## Style

- No comments in `.sx` unless non-obvious. No new planning docs — update the plan.
- Short, factual commit messages. One feature per iteration. Commit. Log. Push. Next.

Go. Run the pre-flight. If lib-guest is missing (or multi-shot is unverified before
Phase 4), stop and report. Otherwise read the plan, find the first unchecked `[ ]`,
implement it.