Skip to content

systemd: retry dnstapir-renew.service upon error#54

Closed
byKryptogram wants to merge 1 commit into
mainfrom
mattias/retry-on-failure
Closed

systemd: retry dnstapir-renew.service upon error#54
byKryptogram wants to merge 1 commit into
mainfrom
mattias/retry-on-failure

Conversation

@byKryptogram

@byKryptogram byKryptogram commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

If dnstapir-renew.service fails we need to wait a whole week to the next retry. Change it to try again every 12h for about 5 days if the renewal exited/exits with an error.

Summary by CodeRabbit

  • Bug Fixes
    • Improved service reliability by automatically retrying failed renewals.
    • Added a one-hour delay before retrying after a failure to reduce repeated restarts.

@byKryptogram byKryptogram requested review from jschlyter and zluudg July 1, 2026 16:22
@byKryptogram byKryptogram requested a review from a team as a code owner July 1, 2026 16:22
@coderabbitai

coderabbitai Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Warning

Review limit reached

@byKryptogram, you've reached your PR review limit, so we couldn't start this review.

Next review available in: 36 minutes

Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available.
You're only billed for reviews past your plan's rate limits ($0.25/file).

How can I continue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews.

How do review limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please refer docs for additional details.

Review details
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9403f83c-7431-442f-9a25-c5f2f5edcd0e

📥 Commits

Reviewing files that changed from the base of the PR and between aacecbe and b6c6151.

📒 Files selected for processing (1)
  • rpm/SOURCES/dnstapir-renew.service
📝 Walkthrough

Walkthrough

The systemd unit file for the dnstapir renew service was modified to add automatic restart behavior on failure, with a one-hour delay before retrying.

Changes

Service Restart Configuration

Layer / File(s) Summary
Add restart-on-failure policy
rpm/SOURCES/dnstapir-renew.service
Adds `Restart=on-failure` and `RestartSec=1h` to the [Service] section, enabling automatic retries after failure with a one-hour delay.

Estimated code review effort: 1 (Trivial) | ~2 minutes

Poem

A little unit, once it'd fail,
now waits an hour, then tries the trail.
Restart on failure, steady and slow,
this bunny hops where systemd goes. 🐇
Hop, retry, hop again — success will grow!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the service retry-on-error behavior added to dnstapir-renew.service.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
rpm/SOURCES/dnstapir-renew.service (1)

14-15: 🩺 Stability & Availability | 🔵 Trivial | 💤 Low value

LGTM — minor optional suggestion.

Restart=on-failure + RestartSec=1h is the correct systemd pattern for retrying a failed oneshot; the 1h spacing means the default StartLimitBurst/StartLimitIntervalSec (5 starts/10s) won't be triggered. One thing worth considering: there's no cap on the number of retries, so a persistently failing renewal will retry hourly indefinitely (potentially overlapping with the next weekly timer fire) without any operator signal. Optionally bound retries with StartLimitIntervalSec/StartLimitBurst (and OnFailure= for alerting) if unattended indefinite retries are a concern.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@rpm/SOURCES/dnstapir-renew.service` around lines 14 - 15, The oneshot service
restart policy currently retries failures indefinitely with only
Restart=on-failure and RestartSec=1h; update the dnstapir-renew service to add
an explicit retry limit and optional failure notification. Use the service
stanza around Restart=on-failure as the place to add StartLimitIntervalSec and
StartLimitBurst, and consider wiring OnFailure to an alerting path so a
persistently failing renewal doesn’t silently retry forever.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@rpm/SOURCES/dnstapir-renew.service`:
- Around line 14-15: The oneshot service restart policy currently retries
failures indefinitely with only Restart=on-failure and RestartSec=1h; update the
dnstapir-renew service to add an explicit retry limit and optional failure
notification. Use the service stanza around Restart=on-failure as the place to
add StartLimitIntervalSec and StartLimitBurst, and consider wiring OnFailure to
an alerting path so a persistently failing renewal doesn’t silently retry
forever.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ee24c135-c5f3-417f-9303-78ce5cbc47e3

📥 Commits

Reviewing files that changed from the base of the PR and between 071d720 and aacecbe.

📒 Files selected for processing (1)
  • rpm/SOURCES/dnstapir-renew.service

@byKryptogram byKryptogram force-pushed the mattias/retry-on-failure branch from aacecbe to b6c6151 Compare July 1, 2026 16:46

@zluudg zluudg left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we go with a restart on failure every 12h with a max amount of restarts of nine per five days?

Why not, for instance, restart on failure every hour for a maximum of five restarts per day? And then reduce the timer's interval to maybe every fourth or fifth day?

@byKryptogram

byKryptogram commented Jul 2, 2026

Copy link
Copy Markdown
Contributor Author

Why do we go with a restart on failure every 12h with a max amount of restarts of nine per five days?

To stop retries before next week comes around with a new timer trigger.

Why not, for instance, restart on failure every hour for a maximum of five restarts per day? And then reduce the timer's interval to maybe every fourth or fifth day?

Not sure if that is possible. When a unit reaches its limit, it is cancelled (and blocked?) until it is triggered by other means(eg. timers). See man systemd.unit

Consensus seam to be hourly retries. I'll change it.

@byKryptogram

Copy link
Copy Markdown
Contributor Author

I've done some more changes to this mechanism and I think it will be not clear if I reuse this PR. Closing this and will send a new one later.

@byKryptogram byKryptogram deleted the mattias/retry-on-failure branch July 2, 2026 21:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants