rose-ash/plans/probabilistic-on-sx.md

# Probabilistic-on-SX: weighted nondeterminism + traces + inference

Programs declare distributions; the runtime infers. The most orthogonal addition to the set — every existing guest treats execution as deterministic-or-resumable. Probabilistic programming requires *weighted, traceable* executions with explicit posterior-inference machinery on top. **Anglican** (Wood et al.) or **Church** (Goodman et al.) is the closest reference; we'll target a Church-flavoured core.

**The chisel:** *trace*. What does it mean to record an execution? What's a probability weight? How do branches in `conde`-like nondeterminism differ from `sample`/`observe` choices? The substrate has multi-shot continuations (a prerequisite for any decent inference algorithm) but doesn't articulate weights or traces — implementing a probabilistic language forces it to.

**What this exposes about the substrate:**
- Whether `cek-resume` can be invoked many times per `perform` with different values (multi-shot we know works; *parameterised* multi-shot is the question).
- Whether traces — sequences of (random-variable-id, sampled-value, log-weight) — fit naturally in the value space.
- Whether the substrate can support efficient *trace replay* (start a fresh execution but force certain random choices to specific values).
- Whether handler/effect machinery (lib/guest/effects/ when it exists) can host inference-as-handler.

**End-state goal:** **Anglican-style probabilistic Scheme** — `sample`, `observe`, basic distribution library, importance sampling, MCMC (Metropolis-Hastings), and a path to variational inference. Programs are distributions; `query expr` returns a distribution over outcomes.

## Ground rules
- Scope: `lib/probabilistic/**` and `plans/probabilistic-on-sx.md` only. Substrate gaps → `sx-improvements.md`.
- Consumes from `lib/guest/`: `core/lex`, `core/pratt`, `core/ast`, `core/match`. Possibly `effects/` once that sub-layer exists (inference algorithms are naturally handlers over `sample`/`observe`).
- **May propose** `lib/guest/probabilistic/` sub-layer — trace-recording infrastructure, weight-algebra primitives (log-domain arithmetic), inference combinators, distribution constructors. Second consumer would be a future Pyro-style language or a Bayesian DSL.
- Branch: `loops/probabilistic`. Standard worktree pattern.

## Architecture sketch

```
Probabilistic source text (Church-flavoured: scheme + sample/observe)
    │
    ▼
lib/probabilistic/parser.sx     — s-expression reader
    │
    ▼
lib/probabilistic/eval.sx       — pure evaluator (deterministic except at sample/observe)
    │                             sample/observe are perform-shaped: suspend execution,
    │                             let inference algorithm decide what to do
    ▼
lib/probabilistic/inference/    — handlers that interpret sample/observe:
    │  importance.sx               importance sampling, likelihood-weighting
    │  mh.sx                       Metropolis-Hastings (proposal kernels)
    │  variational.sx              mean-field VI
    │
    ▼
lib/probabilistic/distributions.sx — uniform, normal, gamma, beta, dirichlet,
                                     mixture, conditional, etc.
```

## Semantic mappings

| Probabilistic construct | SX mapping |
|------------------------|-----------|
| `(sample (uniform 0 1))` | `(perform (:sample (uniform 0 1)))` — inference handler decides actual value |
| `(observe (normal 0 1) 0.5)` | `(perform (:observe (normal 0 1) 0.5))` — adds log-prob to weight |
| `(query body)` | run `body` under inference handler; return weighted samples |
| `(uniform a b)` | distribution value: `{:type :dist :family :uniform :params (a b)}` |
| `(score lpdf x)` | accumulate log-prob; equivalent to observe |
| Trace | `(list (:choice id sampled-value log-weight) ...)` — first-class value |

The key trick: `sample` and `observe` aren't primitives — they're effect requests. The inference algorithm is a handler that interprets them. Importance sampling samples each `sample` from the prior and accumulates weights from each `observe`. MH proposes changes to the trace and accepts/rejects.

## Roadmap

### Phase 1 — Parser + deterministic core
- [ ] Scheme-flavoured parser (s-expressions, `let`, `lambda`, `if`, arithmetic, lists).
- [ ] Deterministic evaluator running on SX CEK.
- [ ] Tests: standard Scheme programs run.

### Phase 2 — `sample` / `observe` as effects
- [ ] `sample dist` → `perform :sample`.
- [ ] `observe dist value` → `perform :observe`.
- [ ] Default handler: forward sampling, no inference (just produce a draw).
- [ ] Tests: simple stochastic programs (coin flip, sum-of-dice) produce different results across runs.

### Phase 3 — Distribution library
- [ ] `uniform`, `normal`, `gamma`, `beta`, `bernoulli`, `categorical`, `dirichlet`, `poisson`.
- [ ] Each carries `(sample-fn, log-prob-fn)`.
- [ ] Tests: log-prob of known density values matches reference.

### Phase 4 — Trace recording + replay
- [ ] Tracing handler: every `sample` records `{:id :value :log-weight}` in a trace value.
- [ ] Replay handler: given a trace, force `sample` to return the recorded value when called with the same `id`.
- [ ] Tests: record a trace, replay it, get identical outputs.

### Phase 5 — Importance sampling
- [ ] `importance-sample n query` runs `query` `n` times under sampling handler.
- [ ] Each run accumulates log-weights from `observe` calls.
- [ ] Returns weighted samples.
- [ ] Tests: posterior over a coin's bias given Bernoulli observations.

### Phase 6 — Metropolis-Hastings
- [ ] `mh n query` runs MH for `n` steps.
- [ ] Each step: pick a random choice in the current trace, propose a new value, accept/reject by Hastings ratio.
- [ ] Multi-shot continuation usage: re-execute from the proposed-changed point onward.
- [ ] Tests: gaussian regression, change-point detection, mixture clustering.

### Phase 7 — Mean-field variational inference
- [ ] Approximate posterior as product of independent simple distributions.
- [ ] Optimise ELBO via gradient ascent.
- [ ] Requires automatic differentiation — `lib/probabilistic/autodiff.sx` (forward-mode minimum).
- [ ] Tests: normal-normal model, ELBO converges to known truth.

### Phase 8 — Standard library + idioms
- [ ] Mixture models, Gaussian processes, hidden Markov models, change-point models.
- [ ] Tests: each as an end-to-end test that should give roughly known posteriors.

### Phase 9 — Propose `lib/guest/probabilistic/`
- [ ] Identify reusable trace + weight infrastructure (log-domain arithmetic, ESS, sample weighting).
- [ ] Wait for a second consumer before extracting.

## lib/guest feedback loop

**Consumes:** `core/lex`, `core/pratt`, `core/ast`, `core/match`. Future: `effects/` for handler-based inference.

**Stresses substrate:** parameterised multi-shot continuations (each MH step replays from a chosen point with a new value); efficient trace storage; whether `perform`/`cek-resume` survives nesting (handler within handler — inference inside another inference).

**May propose:** `lib/guest/probabilistic/` — trace primitives, weight algebra (log-sum-exp etc.), distribution interfaces.

**What it teaches:** whether SX's effect/continuation machinery is up to *real* multi-shot work, not just textbook examples. Inference algorithms call `cek-resume` thousands of times per query; if the substrate has hidden quadratic costs in continuation manipulation, this surfaces them.

## References
- Goodman, Mansinghka, Roy, Bonawitz, Tenenbaum, "Church: a language for generative models" (UAI 2008).
- Wood, van de Meent, Mansinghka, "A new approach to probabilistic programming inference" (AISTATS 2014) — Anglican.
- van de Meent, Paige, Yang, Wood, "An Introduction to Probabilistic Programming" (arXiv 2018).
- Bingham et al., "Pyro: Deep Universal Probabilistic Programming" (JMLR 2019).

## Progress log
_(awaiting Phase 1 — depends on multi-shot continuation stability)_

## Blockers
_(none yet — main concern is hidden substrate costs in continuation manipulation)_