Prevent busy-loop retries in queue subscriber and workerpool #609

pitabwire · 2025-12-12T21:06:18Z

Summary

Prevents tight retry loops in the queue subscriber receive loop by introducing context-aware exponential backoff after consecutive Subscription.Receive errors (including subscription recreation attempts).
Prevents tight retry loops in the workerpool when jobs are configured with retries by adding exponential backoff before resubmission.
Adds a regression test to ensure workerpool retries are delayed (no immediate spin).

Why

Both gocloud.dev/pubsub.Subscription.Receive (on permanent/non-retryable errors) and workerpool retry resubmission could otherwise fail fast in a tight loop and burn CPU.

Tests

go test ./...

- Add exponential backoff after subscriber Receive errors and avoid spinning on ctx-like errors when ctx is still active - Add exponential backoff before workerpool job resubmission retries - Add regression test to ensure retries are delayed

gemini-code-assist · 2025-12-12T21:06:35Z

Summary of Changes

Hello @pitabwire, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a performance and stability issue where both the queue subscriber and the workerpool could enter busy-loop retry scenarios, consuming excessive CPU resources. By introducing context-aware exponential backoff strategies for both message reception errors and job resubmission, the system now gracefully handles transient failures, allowing for recovery without resource exhaustion.

Highlights

Queue Subscriber Backoff: Implemented an exponential backoff mechanism in the queue subscriber's message reception loop to prevent tight retry loops when Subscription.Receive encounters consecutive errors. This includes context-aware delays and subscription recreation.
Workerpool Job Retry Backoff: Introduced exponential backoff for job resubmission within the workerpool when jobs are configured to retry. This prevents immediate re-submission of failed jobs, reducing CPU burn.
Regression Test for Workerpool Backoff: Added a new regression test to verify that workerpool job retries are indeed delayed by the exponential backoff mechanism, ensuring the fix works as intended.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

github-actions · 2025-12-12T21:07:09Z

There is a problem with the Gemini CLI PR review. Please check the action logs for details.

gemini-code-assist

Code Review

This pull request introduces exponential backoff to prevent busy-looping on retries in both the queue subscriber and the worker pool, which is a great improvement for CPU usage under failure conditions. The implementation is solid and includes a good regression test. I have a few suggestions to improve maintainability and correctness: one regarding simplifying context cancellation handling, another about removing some dead code, and a final one about refactoring duplicated backoff logic into a shared helper function to adhere to the DRY principle.

gemini-code-assist · 2025-12-12T21:08:08Z

queue/subscriber.go

+			if errors.Is(err, context.Canceled) || errors.Is(err, context.DeadlineExceeded) {
+				// Context cancelled or deadline exceeded, loop again to check ctx.Done()
+				if ctx.Err() != nil {
 					continue
 				}
-				// Other errors from Receive are critical for the listener.
-				logger.WithError(err).Error("could not pull message")
+			}


The logic for handling context cancellation errors can be simplified. If s.Receive(ctx) returns a context.Canceled or context.DeadlineExceeded error, we should unconditionally continue to the top of the loop. The if ctx.Err() != nil check at the beginning of the loop will then correctly handle the shutdown. The current implementation with the nested if ctx.Err() != nil check is unnecessarily complex and could lead to incorrect behavior (like an unwanted backoff) if Receive returns a context error while ctx.Err() is momentarily nil.

if errors.Is(err, context.Canceled) || errors.Is(err, context.DeadlineExceeded) { // Context cancelled or deadline exceeded, loop again to check ctx.Done() continue }

gemini-code-assist · 2025-12-12T21:08:08Z

workerpool/manager.go

+		attempt := job.Runs()
+		if attempt > 10 {
+			attempt = 10
+		}
+
+		delay := 100 * time.Millisecond * time.Duration(1<<uint(attempt-1))
+		if delay > 30*time.Second {
+			delay = 30 * time.Second
+		}


This exponential backoff logic is nearly identical to the logic in queue/subscriber.go. To improve maintainability and avoid code duplication (DRY principle), consider extracting this into a shared helper function.

Additionally, the hardcoded values 10, 100 * time.Millisecond, and 30 * time.Second should be defined as constants for better readability and easier modification.

A shared utility could look like this:

const ( maxBackoffAttempts = 10 baseBackoffDelay = 100 * time.Millisecond maxBackoffDelay = 30 * time.Second ) // CalculateBackoff computes an exponential backoff duration for a given attempt. func CalculateBackoff(attempt int) time.Duration { if attempt <= 0 { return 0 } if attempt > maxBackoffAttempts { attempt = maxBackoffAttempts } delay := baseBackoffDelay * time.Duration(1<<uint(attempt-1)) if delay > maxBackoffDelay { delay = maxBackoffDelay } return delay }

This would make the code in both places cleaner and more consistent.

gemini-code-assist · 2025-12-12T21:08:08Z

workerpool/manager.go

+		if delay > 0 {
+			go func() {
+				timer := time.NewTimer(delay)
+				defer timer.Stop()
+
+				select {
+				case <-ctx.Done():
+					job.Close()
+					return
+				case <-timer.C:
+				}
+
+				resubmitErr := SubmitJob(ctx, s, job)
+				if resubmitErr != nil {
+					log.WithError(resubmitErr).Error("Failed to resubmit job")
+					_ = job.WriteError(ctx, fmt.Errorf("failed to resubmit job: %w", executionErr))
+					job.Close()
+				}
+			}()
+			return
+		}


The added backoff logic is a good improvement. However, as implemented, the delay will always be greater than 0 because job.Runs() is at least 1. This means the if delay > 0 condition is always true, and the function will always return from within this block. Consequently, the original code for immediate resubmission starting at line 142 is now unreachable (dead code) and should be removed for clarity.

gemini-code-assist bot reviewed Dec 12, 2025

View reviewed changes

pitabwire added 3 commits December 13, 2025 08:20

fix: address golangci-lint issues in subscriber and workerpool

4344b6b

resolve minor lint issue

86851b5

further fixes for localization

6035437

pitabwire merged commit a5428fb into main Dec 13, 2025
2 checks passed

pitabwire deleted the fix/busy-loop-backoff-20251212 branch December 13, 2025 05:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent busy-loop retries in queue subscriber and workerpool #609

Prevent busy-loop retries in queue subscriber and workerpool #609

Uh oh!

pitabwire commented Dec 12, 2025

Uh oh!

gemini-code-assist bot commented Dec 12, 2025

Uh oh!

github-actions bot commented Dec 12, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Dec 12, 2025

Uh oh!

gemini-code-assist bot Dec 12, 2025

Uh oh!

gemini-code-assist bot Dec 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Prevent busy-loop retries in queue subscriber and workerpool #609

Prevent busy-loop retries in queue subscriber and workerpool #609

Uh oh!

Conversation

pitabwire commented Dec 12, 2025

Summary

Why

Tests

Uh oh!

gemini-code-assist bot commented Dec 12, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

github-actions bot commented Dec 12, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant