fix(connect): start retry backoff small instead of at ~1.4s#289
Open
mishushakov wants to merge 1 commit into
Open
fix(connect): start retry backoff small instead of at ~1.4s#289mishushakov wants to merge 1 commit into
mishushakov wants to merge 1 commit into
Conversation
connect_forever opened its retry backoff at e^(1/3) ≈ 1.4s, so a peer whose port wasn't bound yet (ConnectionRefused) wasn't reached until the first ~1.4s sleep elapsed, even though the kernel typically binds in ~330ms. Replace the e^(try_num/3) formula with exponential backoff starting at 50ms and doubling (capped at 30s), mirroring ReconnectConfig semantics. A peer ready at ~330ms is now reached at ~350ms. Jitter and the outer connect timeout are unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
connect_forever(src/util.rs) is the retry loop behind everySocket::connect. When the peer's port isn't bound yet,transport::connectreturnsConnectionRefused(correctly classified as retryable) and the loop sleeps before retrying.The delay was computed as:
So the first retry already waits ~1.4s (then ~1.95s, ~2.7s, …). A peer whose kernel binds in ~330ms therefore isn't reached until ~1.4s — attempt #1 fails, the loop sleeps ~1.4s, attempt #2 succeeds.
Fix
Start the backoff small and grow it, mirroring the crate's existing
ReconnectConfigsemantics (exponential, capped, jittered):Retries now fall at ~50, 100, 200, 400ms… — a peer ready at ~330ms is reached at ~350ms instead of ~1.4s. Same semantics otherwise: still exponential, still capped at 30s, still jittered. The outer
run_with_timeout(connect_timeout)(default 30s) still bounds a peer that never binds.Testing
cargo buildsucceedscargo clippy --libis clean🤖 Generated with Claude Code