Skip to content

Conversation

@mrm9084
Copy link
Member

@mrm9084 mrm9084 commented Jan 29, 2026

Description

Adds retry to startup. When all replicas fail there will be an attempt to retry the failed store for a period of time. By default 100s, minimal 30s, maximum 600s.

Also, refactors the load method to be split into a number of helper method to make this readable.

All SDK Contribution checklist:

  • The pull request does not introduce [breaking changes]
  • CHANGELOG is updated for new features, bug fixes or other significant changes.
  • I have read the contribution guidelines.

General Guidelines and Best Practices

  • Title of the pull request is clear and informative.
  • There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, see this page.

Testing Guidelines

  • Pull request includes test coverage for the included changes.

Copilot AI review requested due to automatic review settings January 29, 2026 20:04
@github-actions github-actions bot added the azure-spring All azure-spring related issues label Jan 29, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds a startup retry mechanism to Azure App Configuration for Java to handle transient failures during application startup. When all replicas fail to load configuration, the provider will automatically retry with exponential backoff until a configurable timeout expires.

Changes:

  • Added startup-timeout configuration property (default: 100s, min: 30s, max: 600s) to control retry duration during startup
  • Refactored AzureAppConfigDataLoader.load() into smaller helper methods (loadConfiguration, attemptLoadFromClients, setupMonitoringState, handleReplicaFailure) for improved readability
  • Implemented retry loop with intelligent backoff that waits until the next client becomes available before retrying

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
README.md Added documentation for new startup-timeout configuration option
CHANGELOG.md Documented the new startup retry feature
AppConfigurationProperties.java Added startupTimeout field with default value and validation (30-600 seconds)
AzureAppConfigDataResource.java Added startupTimeout parameter to constructor and getter method
AzureAppConfigDataLocationResolver.java Passed startupTimeout from properties to resources
AzureAppConfigDataLoader.java Refactored load method and implemented retry logic with backoff for startup failures
ConnectionManager.java Added getMillisUntilNextClientAvailable() to calculate wait time until next replica is available
AppConfigurationReplicaClientFactory.java Added wrapper method to expose getMillisUntilNextClientAvailable
ConfigStore.java Minor code quality improvements (variable naming, isEmpty() usage)
ConnectionManagerTest.java Added comprehensive tests for getMillisUntilNextClientAvailable method
AzureAppConfigDataResourceTest.java Updated test constructor calls to include startupTimeout parameter
AzureAppConfigDataLoaderTest.java Added tests for startup retry behavior and refresh non-retry behavior

Comment on lines +186 to 195
if (waitTime > 0) {
logger.debug("All replicas in backoff for store: " + resource.getEndpoint()
+ ". Waiting " + waitTime + "ms before retry.");
try {
Thread.sleep(waitTime);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
return lastException;
}
storeState.setLoadState(resource.getEndpoint(), true); // Success - configuration loaded, exit loop
lastException = null;
// Break out of the loop since we have successfully loaded configuration
break;
} catch (AppConfigurationStatusException e) {
reloadFailed = true;
replicaClientFactory.backoffClient(resource.getEndpoint(), currentClient.getEndpoint());
lastException = e;
// Log the specific replica failure with context
AppConfigurationReplicaClient nextClient = replicaClientFactory
.getNextActiveClient(resource.getEndpoint(), false);
logReplicaFailure(currentClient, "status exception", nextClient != null, e);
client = nextClient;
} catch (Exception e) {
// Store the exception to potentially use if all replicas fail
lastException = e; // Log the specific replica failure with context
replicaClientFactory.backoffClient(resource.getEndpoint(), currentClient.getEndpoint());
AppConfigurationReplicaClient nextClient = replicaClientFactory
.getNextActiveClient(resource.getEndpoint(), false);
logReplicaFailure(currentClient, "exception", nextClient != null, e);
client = nextClient;
}
Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If getMillisUntilNextClientAvailable returns 0 (indicating a client is available) but attemptLoadFromClients continues to fail, the retry loop could spin rapidly without any delay between attempts. Consider adding a minimum delay (e.g., 100ms) even when waitTime is 0 to prevent potential tight loops in edge cases where clients appear available but consistently fail.

Copilot uses AI. Check for mistakes.

resource = new AzureAppConfigDataResource(true, configStore, profiles, false, Duration.ofMinutes(1));
// Startup resource (isRefresh = false)
resource = new AzureAppConfigDataResource(true, configStore, profiles, true, Duration.ofMinutes(1), Duration.ofSeconds(30));
Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The resource and refreshResource initialization appears to have the 'startup' parameter values swapped. According to the constructor documentation, the 4th parameter is 'startup' where true means startup operation and false means refresh operation. However, 'resource' is being initialized with startup=true but is used as a startup resource (isRefresh would be false), while 'refreshResource' is initialized with startup=false but should be used as a refresh resource (isRefresh would be true). This is backwards - the resource intended for startup should have startup=true, and the resource intended for refresh should have startup=false, but the naming and usage in the tests suggest they are inverted.

Suggested change
resource = new AzureAppConfigDataResource(true, configStore, profiles, true, Duration.ofMinutes(1), Duration.ofSeconds(30));
resource = new AzureAppConfigDataResource(false, configStore, profiles, true, Duration.ofMinutes(1), Duration.ofSeconds(30));

Copilot uses AI. Check for mistakes.
}
}

return earliestAvailable.toEpochMilli() - now.toEpochMilli();
Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If no clients exist or all have Instant.MAX as backoffEnd, earliestAvailable will remain Instant.MAX, and the calculation 'earliestAvailable.toEpochMilli() - now.toEpochMilli()' will produce an extremely large positive value or potentially overflow. Consider adding a guard to return a sensible maximum value or Long.MAX_VALUE when earliestAvailable is still Instant.MAX.

Suggested change
return earliestAvailable.toEpochMilli() - now.toEpochMilli();
// If no clients exist or all have Instant.MAX as backoffEnd, avoid overflowing the duration.
if (Instant.MAX.equals(earliestAvailable)) {
return Long.MAX_VALUE;
}
long millisUntilAvailable = earliestAvailable.toEpochMilli() - now.toEpochMilli();
return millisUntilAvailable > 0 ? millisUntilAvailable : 0;

Copilot uses AI. Check for mistakes.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

azure-spring All azure-spring related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant