fed-sx-m2: Step 8b-timer — live retry-loop wiring on send_after
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 44s
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 44s
Wires the delivery_worker's retry loop on top of the erlang:send_after / cancel_timer primitives just landed on loops/erlang (3709460d,98b0104c,779e53b2— cherry-picked here since origin/architecture hasn't caught up yet). Surface: - new :timers [{Cid, Ref}] state field tracks live timer refs - handle_call(flush): drain (existing semantics) + arm_retry_timer per retried Cid (computes backoff slot from the now-bumped attempt count, sets next_retry_at, send_after self-cast). Reply shape unchanged. - handle_info({retry, Cid}, S): redrives that one Cid through deliver_one_pure. Success → record_success_pure + clear pending. Failure → schedule_retry_for (which bumps attempts, dead-letters on slot 6, or arms next slot). - cancel_timer_for/2 before arming a new timer so stale timers don't keep the scheduler's run loop alive after the work is done. - state_srv/1 + timer_ref_for/2 for test introspection. 5/5 in new delivery_retry_timer.sh; existing delivery_worker.sh 17/17 and delivery_retry.sh 11/11 still green. Conformance gate 771/771 (was 761/761; the +10 is the cherry-picked send_after suite). Closes Blockers #3. m2 is now feature-complete. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -562,10 +562,24 @@ a dead-letter list visible via `/admin/dead-letter`.
|
||||
is cleared from `:next_retry`. `record_success_pure` clears
|
||||
both. `next_due_pure` returns cids whose retry time has
|
||||
passed. 11 cases in `delivery_retry.sh`.
|
||||
- [ ] **8b-timer** — Erlang-side timer wiring (`erlang:send_after`
|
||||
self-cast or equivalent). Needs the same substrate primitive
|
||||
that `gen_server` uses for `timeout` returns. Defer behind
|
||||
substrate gap discovery for now — see Blockers.
|
||||
- [x] **8b-timer** — Erlang-side timer wiring on the
|
||||
`delivery_worker` gen_server. handle_call(flush) drains then
|
||||
arms a `send_after` self-cast per retried Cid (backoff from
|
||||
the now-bumped attempt counter); handle_info({retry, Cid})
|
||||
redrives that single Cid through deliver_one_pure. Success
|
||||
clears bookkeeping via record_success; failure bumps attempts
|
||||
via record_failure_pure and arms the next backoff slot — or
|
||||
promotes to dead-letter on the 6th attempt and stops arming.
|
||||
A `:timers [{Cid, Ref}]` state field tracks live refs so
|
||||
schedule_retry_for can cancel the previous one before arming
|
||||
the next (otherwise stale timers keep the scheduler's run
|
||||
loop alive long after the work is done). 5/5 in
|
||||
`delivery_retry_timer.sh`: T1 timer scheduled, T2 attempts=1,
|
||||
T3 retry fires + attempts=2, T4 next timer rearmed, T5 ets-
|
||||
counter dispatch (fail/fail/ok) lands in 3 attempts and
|
||||
clears state. Substrate dependency landed via cherry-pick
|
||||
from `loops/erlang` (3709460d / 98b0104c / 779e53b2) until
|
||||
`loops/erlang` → architecture catches up.
|
||||
- [x] **8c** — Delivery-state projection
|
||||
(`next/kernel/delivery_state.erl`). Folds delivery events into
|
||||
per-peer worker-shaped snapshots so the outbound queue survives
|
||||
@@ -1105,8 +1119,16 @@ proceed.
|
||||
through `delivery_worker`) and Step 10c (peer-actor doc
|
||||
fetch in `peer_actors`) are now unblocked.
|
||||
|
||||
3. **`erlang:send_after`-style timer primitive** — discovered
|
||||
during Step 8b prep. The retry loop needs a way for the
|
||||
3. **`erlang:send_after`-style timer primitive** — ~~discovered
|
||||
during Step 8b prep~~ **RESOLVED 2026-06-30** via the
|
||||
`loops/erlang` `send_after`/`cancel_timer`/`monotonic_time`
|
||||
work landing on `origin/loops/erlang` (commits 3709460d,
|
||||
98b0104c, b10e55f0; 766/766 → 771/771). m2 cherry-picked all
|
||||
three onto this branch so 8b-timer could land without waiting
|
||||
for `loops/erlang` → architecture; the cherry-picks fall away
|
||||
as no-op duplicates when architecture catches up. Original
|
||||
diagnosis preserved below for the audit trail.
|
||||
The retry loop needs a way for the
|
||||
delivery_worker to wake itself up after `backoff_for(N)`
|
||||
seconds. Erlang's `erlang:send_after/3` is the standard
|
||||
primitive; this port doesn't seem to register it (looked at
|
||||
@@ -1241,6 +1263,31 @@ proceed.
|
||||
|
||||
Newest first.
|
||||
|
||||
- **2026-06-30** — Step 8b-timer closed. Cherry-picked the three
|
||||
`loops/erlang` send_after commits onto m2 (3709460d, 98b0104c,
|
||||
779e53b2 — the substrate landed standalone on origin/loops/erlang
|
||||
earlier and hadn't propagated to origin/architecture yet). Wired
|
||||
the live timer loop in `next/kernel/delivery_worker.erl`: a
|
||||
`:timers [{Cid, Ref}]` state field; `handle_call(flush)` drains
|
||||
then arms a `send_after` self-cast per retried Cid; the new
|
||||
`handle_info({retry, Cid})` callback redrives that one Cid through
|
||||
`deliver_one_pure` and either records success / clears state, or
|
||||
bumps and arms the next backoff slot (or dead-letters on the 6th
|
||||
attempt). Two arm-paths split — `arm_retry_timer` (post-drain,
|
||||
attempts already bumped) vs `schedule_retry_for` (post-retry
|
||||
attempt, needs to bump). `cancel_timer_for/1` clears the previous
|
||||
timer before arming the next so stale timers don't keep the
|
||||
scheduler's run loop alive after the work is done. Two new public
|
||||
APIs for tests: `state_srv/1` returns the worker's full state,
|
||||
`timer_ref_for/2` looks up a Cid's live ref. 5/5 in new
|
||||
`delivery_retry_timer.sh` (T1 timer scheduled, T2 attempts=1, T3
|
||||
retry fires + attempts=2, T4 next timer rearmed, T5 ets-counter
|
||||
dispatch fail/fail/ok lands in 3 attempts and clears state).
|
||||
Existing `delivery_worker.sh` 17/17 and `delivery_retry.sh` 11/11
|
||||
still green. Conformance gate 771/771 (was 761/761; the +10 is
|
||||
the cherry-picked send_after suite). Blockers #3 RESOLVED.
|
||||
Reply shape of `flush` unchanged; no caller updates needed.
|
||||
|
||||
- **2026-06-28** — Merge-prep pass. Conformance 761/761 still green
|
||||
on m2 tip `cd0de8cb`. Both smoke tests still pass cold:
|
||||
`next/tests/smoke_kernel_route.sh` 6/6 (port 54471, listener up
|
||||
|
||||
Reference in New Issue
Block a user