rose-ash/plans/acl-on-sx.md

# acl-on-sx: Access Control on Datalog

rose-ash needs fine-grained, explainable, federation-aware access control. Subjects
(users, groups, roles, services) × actions (read, edit, comment, moderate, federate)
× resources (pages, posts, threads, peers). Decisions must come with a trace — not just
permit/deny, but **why**.

Datalog's bottom-up rule engine produces transparent permit/deny chains: the proof tree
is the audit trail. Inheritance over groups + resource hierarchies is recursive Datalog
in one rule. Federation extends naturally — fed-sx replicates ACL facts, peers reason
over the union.

End-state: a Datalog-on-SX layer specifically for ACL, with explanation API, audit log,
and federation extension. Reuses `lib/datalog/` evaluator and term model where possible.

## Status (rolling)

`bash lib/acl/conformance.sh` → **145/145** (all four phases + hardening)

## Ground rules

- **Scope:** only touch `lib/acl/**` and `plans/acl-on-sx.md`. Do **not** edit `spec/`,
  `hosts/`, `shared/`, `lib/datalog/**`, or other `lib/<lang>/`. You may **import**
  from `lib/datalog/` (its public API in `lib/datalog/datalog.sx`); do **not** copy or
  modify Datalog code.
- **Shared-file issues** go under "Blockers" with a minimal repro; do not fix here.
- **SX files:** use `sx-tree` MCP tools only.
- **Architecture:** thin layer on top of `lib/datalog/`. Define schema, surface API,
  audit + federation hooks. The rule engine itself is Datalog's.
- **Watch for shared patterns** going into `lib/guest/` — both acl-sx and mod-sx need
  rule-engine plumbing. If you find shared shape, flag it for extraction (don't
  extract yet — wait for mod-sx to start).
- **Commits:** one feature per commit. Keep Progress log updated and tick boxes.

## Architecture sketch

```
ACL declarations (SX)            User query
        │                            │
        ▼                            ▼
lib/acl/schema.sx             lib/acl/api.sx
  — subject sorts                — (acl/permit? subj act res)
  — resource sorts               — (acl/explain subj act res)
  — action sorts                 — (acl/audit subj act res :allowed?)
  — fact schema                       │
        │                             ▼
        ▼                       lib/acl/engine.sx
lib/acl/facts.sx                  — builds Datalog query
  — actor(id, kind)               — invokes lib/datalog/
  — resource(id, kind)            — extracts proof tree
  — member_of(actor, group)            │
  — child_of(res, parent)              ▼
  — grant(actor, act, res)        lib/acl/audit.sx
  — deny (actor, act, res)          — persistent decision log
                                    — query API
```

## Phase 1 — Direct grants

- [x] `lib/acl/schema.sx` — sorts: subject {user, group, role, service}, action,
  resource {page, post, thread, peer}
- [x] `lib/acl/facts.sx` — `actor`, `resource`, `grant`, `deny` predicates as Datalog
  EDB
- [x] `lib/acl/engine.sx` — `(permit? subj act res db)` reduces to Datalog query
- [x] `lib/acl/api.sx` — public `(acl/permit? ...)` taking implicit current db
- [x] `lib/acl/tests/direct.sx` — 15+ cases: direct grant, missing grant, explicit deny
- [x] `lib/acl/scoreboard.{json,md}` baseline
- [x] `lib/acl/conformance.sh` runs the suite

## Phase 2 — Inheritance

- [x] `member_of(actor, group)` chain — group grants apply to members (transitive)
- [x] `child_of(res, parent)` chain — parent grants apply to children (transitive)
- [x] role expansion — role contains list of (action, resource) tuples
- [x] deny-overrides — explicit deny wins over inherited allow
- [x] `lib/acl/tests/inherit.sx` — 25+ cases: nested groups, deep resource trees,
  conflict resolution, deny precedence
- [x] document the deny-overrides choice in plan

### deny-overrides policy (the choice)

Encoded as stratified negation: `permit(S,A,R) :- eff_grant(S,A,R), not
eff_deny(S,A,R)`. Both `eff_grant` and `eff_deny` inherit through the *same*
`member_of` (group/role) and `child_of` (resource) chains. Consequences:

- An explicit deny on the exact (S,A,R) defeats any inherited allow.
- A **group-level** or **ancestor-resource** deny inherits down and defeats a
  member's/descendant's grant — deny is authoritative across the closure, not
  only at the leaf. This is the fail-safe reading: the most permissive
  interpretation of "deny wins" would let a narrow grant escape a broad deny;
  we chose the opposite.
- Deny is dimension-scoped: a deny on (S, edit, R) never blocks (S, read, R).

Stratifiable because neither `eff_grant` nor `eff_deny` depends on `permit`;
`permit` sits in a strictly higher stratum. Termination is guaranteed —
recursion is only over EDB `member_of`/`child_of` (no function symbols), so
cyclic membership/containment reaches a fixpoint rather than looping (tested).

## Phase 3 — Explanation + audit

- [x] `(acl/explain subj act res)` → `{:allowed? T :proof <tree>}`
- [x] proof tree extracts from Datalog's derivation
- [x] `lib/acl/audit.sx` — append-only decision log (in-memory + serializer for disk)
- [x] `(acl/audit-tail n)` for recent decisions
- [x] `lib/acl/tests/explain.sx` — proof correctness, audit completeness

### proof reconstruction (the choice)

`lib/datalog/` records derived facts but not provenance, so `lib/acl/explain.sx`
reconstructs the proof by goal-directed search over the *saturated* db: for a
ground goal, find the first ACL rule (in `acl-rules` order) whose body holds,
take the first `dl-query` solution binding the rest, recurse on each body
literal; negated literals become verified `:neg-ok` leaves. The Datalog
derivation graph is a DAG (a fact may hold many ways) — we pick ONE **canonical
proof: first-rule, first-solution**, with EDB/direct rules ordered first so
proofs bottom out quickly. A depth cap (64) guards pathological cyclic data.
`acl-explain` returns `{:allowed? :proof :reason}`; on denial `:reason` carries
the blocking `eff_deny` proof (explicit or inherited) when one exists, else nil
(no grant). Audit log is append-only with monotonic seq numbers (no wall-clock,
for determinism); `acl-audit-decide!` is the logged path, `acl-permit?` stays
pure.

## Phase 4 — Federation

- [x] peer trust facts — `peer(addr, kind)`, `trust(peer, level)`
- [x] delegated grants — `delegate(peer, actor, action, resource)`
- [x] cross-instance permit chain — query asks local + queries trusted peers via fed-sx
- [x] revocation propagation — fact retraction across federation
- [x] `lib/acl/tests/fed.sx` — federated grant chains (mock fed-sx transport in tests)

### federation policy (the choice)

One engine rule carries federation:
`eff_grant(S,A,R) :- delegate(Peer,S,A,R), trust(Peer,L), level_covers(L,A)`.

- **Non-transitive trust.** A peer's `delegate` only grants if a *local*
  `trust(Peer,L)` exists and that level `level_covers` the action. There is no
  peer-to-peer trust propagation — trusting α never extends to peers α trusts.
- **Trust re-checked every query.** `trust`/`level_covers` are body literals
  evaluated at decision time, never baked in at ingestion. Revoking trust or
  narrowing a level takes effect on the next `acl-permit?`.
- **Deny still wins.** Federated grants are `eff_grant`, so local (and inherited)
  deny overrides them exactly as for local grants.
- **Composes with inheritance.** A delegate to a group flows to members; a
  delegate on a parent resource flows to children (federated `eff_grant` feeds
  the same recursion).
- **Revocation = retraction.** `acl-revoke!` wraps `dl-retract!`; the next query
  re-saturates. `acl-fed-assert!` wraps `dl-assert!` for newly-replicated facts.
- **Transport is fed-sx's job.** `lib/acl/federation.sx` mocks the pull as a
  dict {peer-addr → delegate-facts}; `acl-fed-build-db` merges local policy +
  pulled delegates.

## Progress log

- **Phase 1 complete (24/24).** ACL is a thin layer over `lib/datalog/`:
  - `schema.sx` — sorts (subject/resource kinds, well-known actions) + EDB
    predicate arity table + `acl-fact-valid?` validator. Schema is data, since
    Datalog is untyped.
  - `facts.sx` — `acl-actor`/`acl-resource-fact`/`acl-grant`/`acl-deny`
    constructors returning Datalog fact tuples.
  - `engine.sx` — owns the ruleset `acl-phase1-rules` and reduces decisions to
    `dl-query`. `acl-build-db` = `dl-program-data facts rules`; `acl-permit?` =
    non-empty `(permit S A R)` query.
  - `api.sx` — `acl/load!` rebuilds an implicit current db; `acl/permit?` queries
    it. (Slash-symbols like `acl/permit?` parse fine as single tokens.)
  - **deny-overrides** encoded as `permit(S,A,R) :- grant(S,A,R), not deny(S,A,R)`.
    Stratifies cleanly because `deny` is EDB-only (no rule derives it). Verified:
    grant+deny on same (S,A,R) → denied.
  - Conformance: `conformance.conf` (datalog preloads + acl modules) + thin
    `conformance.sh` wrapper over `lib/guest/conformance.sh`. Scoreboard
    generated by the shared driver.
  - **Shared-plumbing note (for eventual `lib/guest/rules/`):** the
    `build-db = dl-program-data(facts, rules)` + `decide = non-empty ground query`
    shape is exactly what mod-sx (Prolog moderation) will also need. The reusable
    seam is engine.sx's two functions — facts→db and ground-query→bool — both
    pure pass-throughs to the rule engine. Not extracting yet (wait for mod-sx as
    second consumer per ground rules).
- **Phase 2 complete (54/54, +30 inherit).** Extended `acl-rules` with
  `eff_grant`/`eff_deny` derived relations; `member_of` carries both group and
  role membership, `child_of` carries resource trees, `role_grant` confers a
  role's (action,resource) capabilities. Direct grants unchanged (base case of
  `eff_grant`), Phase 1 suite still green. Constructors `acl-member-of`,
  `acl-child-of`, `acl-role-grant` added; schema arity table extended. See the
  deny-overrides policy section above. Verified cyclic membership terminates.
  - **Shared-plumbing update:** the reusable seam is still just engine.sx's
    `facts -> db` + `ground-query -> bool`. The inheritance *rules* are
    ACL-specific (group/resource/role vocabulary); mod-sx will have its own. So
    the `lib/guest/rules/` extraction stays at the build/decide level, not the
    ruleset level.
- **Phase 3 complete (89/89, +35 explain).** Added `explain.sx` (proof
  reconstruction, see policy section above), `audit.sx` (append-only log), and
  extended `api.sx` with `acl/explain`/`acl/audit`/`acl/audit-tail`. No engine
  changes — explanation reads the same saturated db the decision uses.
  - **Substrate gotcha:** the host `=` compares symbols by interned identity,
    which is *unstable* across `dl-query` saturation/substitution within a
    single image — the same two structurally-equal symbol-lists compared `=`
    true once and false moments later in the REPL. Conformance runs in a fresh
    process per suite so it's deterministic there, but test assertions now use a
    name-based `acl-et-eq?` (compare symbols via `symbol->string`), matching the
    datalog suite's `dl-api-deep=?` convention. Worth flagging to the kernel
    owners but out of acl scope.
- **Phase 4 complete (120/120, +31 fed).** Added `federation.sx` (mock
  transport + `acl-fed-build-db`/`acl-revoke!`/`acl-fed-assert!`), one engine
  rule (the trust-gated delegate rule), 4 fact constructors, 4 schema arities.
  Federated proofs reconstruct for free — `explain.sx` iterates `acl-rules`, so
  the delegate rule's EDB body (`delegate`/`trust`/`level_covers`) shows up as
  proof leaves with no explain changes. **Roadmap done: all four phases green.**
  - **Shared-plumbing final note (for `lib/guest/rules/`):** the durable
    reusable seam across acl-sx and the coming mod-sx is exactly four
    pass-throughs to the rule engine — `build-db(facts)`, `decide(ground-query)
    -> bool`, `explain(goal) -> proof-tree`, and the revoke/assert mutators.
    The *rulesets* and *vocabulary* are language-specific (ACL: grant/deny/
    member_of/...; mod-sx: Prolog moderation predicates). When mod-sx lands,
    extract those four functions (engine.sx + the generic half of explain.sx's
    goal-directed reconstructor) into `lib/guest/rules/`, leaving each consumer
    its own rules + fact constructors. Proof reconstruction is the non-obvious
    reusable piece: it only needs the ruleset as data + a saturated db, both of
    which any datalog-backed guest has.
  - **dict-mode conformance is slow, not hung:** all suites load + run in one
    process (~30-40s for 120 tests, no per-suite timeout). Do not kill early.
  - **Tooling note:** sx-tree path-based edit tools (`sx_replace_node`,
    `sx_read_subtree` with a path) ignored the path argument in this worktree
    (always resolved to index 0 / `[0,..]`), in BOTH `(a b c)` and `(a,b,c)`
    forms. `sx_write_file`, `sx_validate`, `sx_find_all`, `sx_summarise`,
    `sx_eval` all work; used full-file rewrites instead of path edits throughout.

## Hardening (post-roadmap)

- **`lib/acl/tests/harden.sx` (+25).** Adversarial / cross-phase coverage beyond
  the per-phase suites. **Prover-free by design** (see JIT blocker below): only
  `acl-permit?` (compiled Datalog, safe at any depth) + pure data ops, never
  `acl-explain`/`acl-prove-d`.
  - Diamond hierarchies (resource and group): grant on one path + deny on
    another → deny wins; both-grant → permit; deny does not leak to siblings.
  - Chain inheritance (literal 4-link): top-group grant reaches leaf member and
    intermediates; leaf-member deny overrides the top grant; deny on the leaf
    doesn't block a higher level.
  - Cycle termination: self-membership, self-child, and 2-node membership cycles
    all reach a fixpoint and decide correctly.
  - Federation conflicts: federated group-grant with a locally-denied member →
    deny; multi-peer delegation (one trusted, or both trusted) → permit.
  - Degenerate inputs: empty db permits nothing.
  - Fact validation: `acl-validate-facts` surfaces wrong-arity + unknown
    predicates; `acl-facts-valid?` on clean/empty sets.
  - Audit save/restore: snapshot → clear → restore round-trips entries + seq;
    seq continues without collision after restore; snapshot is an immutable copy.
  - Proof reconstruction itself is covered by tests/explain.sx + tests/fed.sx
    (both stay under the warm-process JIT depth threshold); the depth-cap safety
    net is verified manually in a warm REPL image but excluded from conformance.
- **New API:** `acl-validate-facts`/`acl-facts-valid?` (schema.sx, opt-in — build
  stays lenient); `acl-audit-snapshot`/`acl-audit-restore!`/`acl-audit-copy`
  (audit.sx).
- **Substrate gotcha (recorded):** `append!` extends a list built with `(list)`
  but **silently no-ops on a `map`/`rest`-derived list** in this runtime. Bit the
  first cut of `acl-audit-restore!` (rebuilt the live log via `map`, so later
  records didn't append). Fix: always rebuild mutable lists via `(list)` +
  `append!` (`acl-audit-copy`). Worth flagging to kernel owners; out of acl scope.

## Blockers

- **JIT loops on deep proof reconstruction (substrate, not acl).** Once the
  kernel JIT-compiles the mutually-recursive prover (`acl-prove-d`/
  `acl-prove-rules`/`acl-prove-build` in `explain.sx`) — which happens after a
  process has run enough explains to cross the compile threshold — the compiled
  version **loops indefinitely** on a `member_of`/`child_of` chain deeper than
  ~3. Symptoms: `acl-explain` over a 4+-deep chain returns instantly in a cold /
  warm-REPL image but **hangs** in a long-lived process. The per-phase explain
  and fed suites pass only because their proofs stay ≤3 deep; a 5th suite that
  explained deeper chains hung the whole conformance run (no per-suite timeout
  in dict mode). Matches `[[project_jit_bytecode_bug]]` (ACTIVE).
  - *Impact beyond tests:* `acl-explain` is unsafe for deep hierarchies in a
    warm production OCaml server. `acl-permit?` is unaffected (it reduces to
    compiled Datalog, no SX-side recursion) — only the SX proof reconstructor is.
  - *Workaround in acl:* harden suite is prover-free; explain is exercised only
    at shallow depth. *Real fix is in the kernel JIT* (out of acl scope) — e.g.
    the `_jit_compiling` guard / disabling JIT for the recursive prover, or
    fixing the bytecode loop. Recommend the kernel owners reproduce with:
    `acl-explain` over a 6-deep `member_of` chain after ~70 prior explains.
  - *Minimal repro recorded.* Until fixed, callers needing explanations for
    deep hierarchies should flatten or cap depth, or run explain in a cold
    worker.