-
Notifications
You must be signed in to change notification settings - Fork 2.2k
App Config - Startup retry #47857
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
App Config - Startup retry #47857
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request adds a startup retry mechanism to Azure App Configuration for Java to handle transient failures during application startup. When all replicas fail to load configuration, the provider will automatically retry with exponential backoff until a configurable timeout expires.
Changes:
- Added
startup-timeoutconfiguration property (default: 100s, min: 30s, max: 600s) to control retry duration during startup - Refactored
AzureAppConfigDataLoader.load()into smaller helper methods (loadConfiguration,attemptLoadFromClients,setupMonitoringState,handleReplicaFailure) for improved readability - Implemented retry loop with intelligent backoff that waits until the next client becomes available before retrying
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| README.md | Added documentation for new startup-timeout configuration option |
| CHANGELOG.md | Documented the new startup retry feature |
| AppConfigurationProperties.java | Added startupTimeout field with default value and validation (30-600 seconds) |
| AzureAppConfigDataResource.java | Added startupTimeout parameter to constructor and getter method |
| AzureAppConfigDataLocationResolver.java | Passed startupTimeout from properties to resources |
| AzureAppConfigDataLoader.java | Refactored load method and implemented retry logic with backoff for startup failures |
| ConnectionManager.java | Added getMillisUntilNextClientAvailable() to calculate wait time until next replica is available |
| AppConfigurationReplicaClientFactory.java | Added wrapper method to expose getMillisUntilNextClientAvailable |
| ConfigStore.java | Minor code quality improvements (variable naming, isEmpty() usage) |
| ConnectionManagerTest.java | Added comprehensive tests for getMillisUntilNextClientAvailable method |
| AzureAppConfigDataResourceTest.java | Updated test constructor calls to include startupTimeout parameter |
| AzureAppConfigDataLoaderTest.java | Added tests for startup retry behavior and refresh non-retry behavior |
...ring/cloud/appconfiguration/config/implementation/properties/AppConfigurationProperties.java
Show resolved
Hide resolved
.../com/azure/spring/cloud/appconfiguration/config/implementation/AzureAppConfigDataLoader.java
Outdated
Show resolved
Hide resolved
.../com/azure/spring/cloud/appconfiguration/config/implementation/AzureAppConfigDataLoader.java
Show resolved
Hide resolved
| if (waitTime > 0) { | ||
| logger.debug("All replicas in backoff for store: " + resource.getEndpoint() | ||
| + ". Waiting " + waitTime + "ms before retry."); | ||
| try { | ||
| Thread.sleep(waitTime); | ||
| } catch (InterruptedException e) { | ||
| Thread.currentThread().interrupt(); | ||
| return lastException; | ||
| } | ||
| storeState.setLoadState(resource.getEndpoint(), true); // Success - configuration loaded, exit loop | ||
| lastException = null; | ||
| // Break out of the loop since we have successfully loaded configuration | ||
| break; | ||
| } catch (AppConfigurationStatusException e) { | ||
| reloadFailed = true; | ||
| replicaClientFactory.backoffClient(resource.getEndpoint(), currentClient.getEndpoint()); | ||
| lastException = e; | ||
| // Log the specific replica failure with context | ||
| AppConfigurationReplicaClient nextClient = replicaClientFactory | ||
| .getNextActiveClient(resource.getEndpoint(), false); | ||
| logReplicaFailure(currentClient, "status exception", nextClient != null, e); | ||
| client = nextClient; | ||
| } catch (Exception e) { | ||
| // Store the exception to potentially use if all replicas fail | ||
| lastException = e; // Log the specific replica failure with context | ||
| replicaClientFactory.backoffClient(resource.getEndpoint(), currentClient.getEndpoint()); | ||
| AppConfigurationReplicaClient nextClient = replicaClientFactory | ||
| .getNextActiveClient(resource.getEndpoint(), false); | ||
| logReplicaFailure(currentClient, "exception", nextClient != null, e); | ||
| client = nextClient; | ||
| } |
Copilot
AI
Jan 29, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If getMillisUntilNextClientAvailable returns 0 (indicating a client is available) but attemptLoadFromClients continues to fail, the retry loop could spin rapidly without any delay between attempts. Consider adding a minimum delay (e.g., 100ms) even when waitTime is 0 to prevent potential tight loops in edge cases where clients appear available but consistently fail.
|
|
||
| resource = new AzureAppConfigDataResource(true, configStore, profiles, false, Duration.ofMinutes(1)); | ||
| // Startup resource (isRefresh = false) | ||
| resource = new AzureAppConfigDataResource(true, configStore, profiles, true, Duration.ofMinutes(1), Duration.ofSeconds(30)); |
Copilot
AI
Jan 29, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The resource and refreshResource initialization appears to have the 'startup' parameter values swapped. According to the constructor documentation, the 4th parameter is 'startup' where true means startup operation and false means refresh operation. However, 'resource' is being initialized with startup=true but is used as a startup resource (isRefresh would be false), while 'refreshResource' is initialized with startup=false but should be used as a refresh resource (isRefresh would be true). This is backwards - the resource intended for startup should have startup=true, and the resource intended for refresh should have startup=false, but the naming and usage in the tests suggest they are inverted.
| resource = new AzureAppConfigDataResource(true, configStore, profiles, true, Duration.ofMinutes(1), Duration.ofSeconds(30)); | |
| resource = new AzureAppConfigDataResource(false, configStore, profiles, true, Duration.ofMinutes(1), Duration.ofSeconds(30)); |
| } | ||
| } | ||
|
|
||
| return earliestAvailable.toEpochMilli() - now.toEpochMilli(); |
Copilot
AI
Jan 29, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If no clients exist or all have Instant.MAX as backoffEnd, earliestAvailable will remain Instant.MAX, and the calculation 'earliestAvailable.toEpochMilli() - now.toEpochMilli()' will produce an extremely large positive value or potentially overflow. Consider adding a guard to return a sensible maximum value or Long.MAX_VALUE when earliestAvailable is still Instant.MAX.
| return earliestAvailable.toEpochMilli() - now.toEpochMilli(); | |
| // If no clients exist or all have Instant.MAX as backoffEnd, avoid overflowing the duration. | |
| if (Instant.MAX.equals(earliestAvailable)) { | |
| return Long.MAX_VALUE; | |
| } | |
| long millisUntilAvailable = earliestAvailable.toEpochMilli() - now.toEpochMilli(); | |
| return millisUntilAvailable > 0 ? millisUntilAvailable : 0; |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Description
Adds retry to startup. When all replicas fail there will be an attempt to retry the failed store for a period of time. By default 100s, minimal 30s, maximum 600s.
Also, refactors the load method to be split into a number of helper method to make this readable.
All SDK Contribution checklist:
General Guidelines and Best Practices
Testing Guidelines