Skip to content

Conversation

@mwtzzz
Copy link
Contributor

@mwtzzz mwtzzz commented Dec 22, 2025

What

  1. configureable retry count
  2. additional logging

Why

  1. pass retry count as a command line arg; default 5
  2. show details when api requests fail

Testing before merge

compiles cleanly

Validation after merge

compile and test

Issue addressed by this PR

https://github.com/stellar/ops/issues/2039

### What
1. configureable retry count
2. additional logging

### Why
1. pass retry count as a command line arg; default 5
2. show details when api requests fail

### Testing before merge
compiles cleanly

### Validation after merge
compile and test

### Issue addressed by this PR
stellar/ops#2039
@Iamrodos
Copy link
Contributor

Good to have the number of retries configurable.

I noticed an issue with how the --retries flag is wired up - it looks like the value doesn't actually propagate to the retry logic. Here's a quick test:

  from github_backup.github_backup import MAX_RETRIES as GB_MAX_RETRIES
  from github_backup import max_retries

  print(f'Before: github_backup.MAX_RETRIES = {GB_MAX_RETRIES}')

  # Simulate what cli.py does
  max_retries.MAX_RETRIES = 20

  from github_backup.github_backup import MAX_RETRIES as GB_MAX_RETRIES_AFTER
  print(f'After:  github_backup.MAX_RETRIES = {GB_MAX_RETRIES_AFTER}')

Output:
Before: github_backup.MAX_RETRIES = 5
After: github_backup.MAX_RETRIES = 5

The issue is that MAX_RETRIES = max_retries.MAX_RETRIES in github_backup.py copies the value at import time, so later updates to the module don't affect it.

Adding a new file just to hold a single mutable global is also a bit heavyweight. Since --retries already has default=5 in argparse, you could just use args.max_retries directly? Given retrieve_data already receives args, and make_request_with_retry could be updated to accept it too. Then no new module needed and its more consistent with the existing way options flow.

A couple of other things:

  • There's a duplicate log line around line 748-749 - the first one can probably be removed since the second includes the URL
  • The new --retries flag should be documented in README.rst with the other CLI options

@Iamrodos
Copy link
Contributor

Curious as to what https://github.com/stellar/ops/issues/2039 details, as it can't be accessed. Could you summarise it?

@mwtzzz
Copy link
Contributor Author

mwtzzz commented Dec 22, 2025

argparse, you could just use args.max_retries directly**? Given retrieve_data already receives args, and make_request_with_retry could be updated to accept it too. Then no new module needed and its more consistent with the existing way options flow.

Possibly, I'll have to try this out.

  • The new --retries flag should be documented in README.rst with the other CLI options

Ack

@mwtzzz
Copy link
Contributor Author

mwtzzz commented Dec 22, 2025

Curious as to what stellar/ops#2039 details, as it can't be accessed. Could you summarise it?

We ran into an issue where the backup wasn't working but it wasn't clear from the logs what the problem was. We did some quick hacking of the script to see the problem was API calls failing. It'd be nice to have this level of logging by default.

For max_retries, it'd be nice for the operator to be able to set this to whatever value they want.

@mwtzzz
Copy link
Contributor Author

mwtzzz commented Dec 23, 2025

I noticed an issue with how the --retries flag is wired up - it looks like the value doesn't actually propagate to the retry logic.

I'm not seeing what you're seeing. Perhaps you could make a new docker image with my branch and test?

It seems to me the simplest way is to use a file with a mutable global, but if that's not acceptable, I can create a new function in github_backup.py to receive args and update MAX_RETRIES in this file from the argument; -or- I can update make_request_with_retry to get passed the retries argument .... Either of these approaches would also work.

@Iamrodos
Copy link
Contributor

I'm not seeing what you're seeing. Perhaps you could make a new docker image with my branch and test?

The issue I saw was fixed in 1f2ec01 - removing MAX_RETRIES = max_retries.MAX_RETRIES addresses the import-time copy issue. I was looking at the original commit.

It seems to me the simplest way is to use a file with a mutable global, but if that's not acceptable, I can create a new function in github_backup.py to receive args and update MAX_RETRIES in this file from the argument; -or- I can update make_request_with_retry to get passed the retries argument .... Either of these approaches would also work.

Passing the retry count as a parameter rather than using a separate module feels a lot cleaner to me. I think @josegonzalez is the only one who gets to pull the 'acceptable' card. :)

Since retrieve_data already has args, it's a small change:

  def make_request_with_retry(request, auth, max_retries=5):
      ...
      for attempt in range(max_retries):

And in retrieve_data:

  http_response = make_request_with_retry(request, auth, args.max_retries)

The other loop in fetch_all can use args.max_retries directly since it's nested inside retrieve_data and has access via closure.

The global MAX_RETRIES could be removed. The existing tests would need updating for the new CLI option too, but that's easy enough. I can redo the tests if you want.

This follows the existing pattern in the codebase and avoids the extra max_retries.py module.

@josegonzalez
Copy link
Owner

Whatever either:

  • Is simple to read/understand
  • Follows patterns in the codebase
  • You want to implement

Is fine with me, as long as it doesn't break existing functionality.

@mwtzzz
Copy link
Contributor Author

mwtzzz commented Dec 23, 2025

Passing the retry count as a parameter rather than using a separate module feels a lot cleaner to me. I think @josegonzalez is the only one who gets to pull the 'acceptable' card. :)

Since retrieve_ can redo the tests if you want.

This follows the existing pattern in the codebase and avoids the extra max_retries.py module.

Sounds good, I'll update the PR soon

@mwtzzz
Copy link
Contributor Author

mwtzzz commented Dec 23, 2025

@Iamrodos I've made the updates. PTAL please.
And yes if you could redo the tests that'd be great, thanks.

@Iamrodos
Copy link
Contributor

Thanks for the updates @mwtzzz! Changes look good.

I've put together a patch to update the tests, plus fix a off-by-one-error (mine) that you can apply:

What's in the patch:

  1. Fixed retry semantics - There was an off-by-one error in the original code (my fault, not yours!). --retries N should mean N retries after the initial attempt, so --retries 5 = 6 total attempts. The loops now use range(max_retries + 1).

  2. Added input validation - Added a non_negative_int argparse type validator that:

    • Rejects negative values (--retries -1 → error)
    • Rejects non-integers (--retries abc → clear error message)
    • Accepts zero (--retries 0 = 1 attempt, no retries)
  3. Updated tests - Added TestRetriesCliArgument class with 7 tests covering:

    • CLI argument parsing
    • Default value
    • Zero is valid
    • Negative rejected
    • Non-integer rejected
    • Transient error succeeds on retry
    • Custom retry count limits attempts

To apply the patch:

 # From your branch
 curl -L <patch-url> | git apply
 # or download and:
 git apply retry_fixes.patch

All 81 tests pass and linting (flake8, black) passes.

Let me know if you have any questions.

retry_fixes.patch

@mwtzzz
Copy link
Contributor Author

mwtzzz commented Dec 23, 2025

Thanks for your assistance @Iamrodos , I have applied your patch and updated the PR.

@Iamrodos
Copy link
Contributor

Nice, all looks great. Helpful enhancement.

@josegonzalez josegonzalez merged commit 2e999d0 into josegonzalez:master Dec 24, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants