update retry logic and logging #474

mwtzzz · 2025-12-22T22:25:34Z

What

configureable retry count
additional logging

Why

pass retry count as a command line arg; default 5
show details when api requests fail

Testing before merge

compiles cleanly

Validation after merge

compile and test

Issue addressed by this PR

https://github.com/stellar/ops/issues/2039

### What 1. configureable retry count 2. additional logging ### Why 1. pass retry count as a command line arg; default 5 2. show details when api requests fail ### Testing before merge compiles cleanly ### Validation after merge compile and test ### Issue addressed by this PR stellar/ops#2039

Iamrodos · 2025-12-22T23:06:03Z

Good to have the number of retries configurable.

I noticed an issue with how the --retries flag is wired up - it looks like the value doesn't actually propagate to the retry logic. Here's a quick test:

  from github_backup.github_backup import MAX_RETRIES as GB_MAX_RETRIES
  from github_backup import max_retries

  print(f'Before: github_backup.MAX_RETRIES = {GB_MAX_RETRIES}')

  # Simulate what cli.py does
  max_retries.MAX_RETRIES = 20

  from github_backup.github_backup import MAX_RETRIES as GB_MAX_RETRIES_AFTER
  print(f'After:  github_backup.MAX_RETRIES = {GB_MAX_RETRIES_AFTER}')

Output:
Before: github_backup.MAX_RETRIES = 5
After: github_backup.MAX_RETRIES = 5

The issue is that MAX_RETRIES = max_retries.MAX_RETRIES in github_backup.py copies the value at import time, so later updates to the module don't affect it.

Adding a new file just to hold a single mutable global is also a bit heavyweight. Since --retries already has default=5 in argparse, you could just use args.max_retries directly? Given retrieve_data already receives args, and make_request_with_retry could be updated to accept it too. Then no new module needed and its more consistent with the existing way options flow.

A couple of other things:

There's a duplicate log line around line 748-749 - the first one can probably be removed since the second includes the URL
The new --retries flag should be documented in README.rst with the other CLI options

Iamrodos · 2025-12-22T23:26:34Z

Curious as to what https://github.com/stellar/ops/issues/2039 details, as it can't be accessed. Could you summarise it?

mwtzzz · 2025-12-22T23:48:52Z

argparse, you could just use args.max_retries directly**? Given retrieve_data already receives args, and make_request_with_retry could be updated to accept it too. Then no new module needed and its more consistent with the existing way options flow.

Possibly, I'll have to try this out.

The new --retries flag should be documented in README.rst with the other CLI options

Ack

mwtzzz · 2025-12-22T23:51:24Z

Curious as to what stellar/ops#2039 details, as it can't be accessed. Could you summarise it?

We ran into an issue where the backup wasn't working but it wasn't clear from the logs what the problem was. We did some quick hacking of the script to see the problem was API calls failing. It'd be nice to have this level of logging by default.

For max_retries, it'd be nice for the operator to be able to set this to whatever value they want.

mwtzzz · 2025-12-23T00:15:54Z

I noticed an issue with how the --retries flag is wired up - it looks like the value doesn't actually propagate to the retry logic.

I'm not seeing what you're seeing. Perhaps you could make a new docker image with my branch and test?

It seems to me the simplest way is to use a file with a mutable global, but if that's not acceptable, I can create a new function in github_backup.py to receive args and update MAX_RETRIES in this file from the argument; -or- I can update make_request_with_retry to get passed the retries argument .... Either of these approaches would also work.

Iamrodos · 2025-12-23T00:55:38Z

I'm not seeing what you're seeing. Perhaps you could make a new docker image with my branch and test?

The issue I saw was fixed in 1f2ec01 - removing MAX_RETRIES = max_retries.MAX_RETRIES addresses the import-time copy issue. I was looking at the original commit.

It seems to me the simplest way is to use a file with a mutable global, but if that's not acceptable, I can create a new function in github_backup.py to receive args and update MAX_RETRIES in this file from the argument; -or- I can update make_request_with_retry to get passed the retries argument .... Either of these approaches would also work.

Passing the retry count as a parameter rather than using a separate module feels a lot cleaner to me. I think @josegonzalez is the only one who gets to pull the 'acceptable' card. :)

Since retrieve_data already has args, it's a small change:

  def make_request_with_retry(request, auth, max_retries=5):
      ...
      for attempt in range(max_retries):

And in retrieve_data:

  http_response = make_request_with_retry(request, auth, args.max_retries)

The other loop in fetch_all can use args.max_retries directly since it's nested inside retrieve_data and has access via closure.

The global MAX_RETRIES could be removed. The existing tests would need updating for the new CLI option too, but that's easy enough. I can redo the tests if you want.

This follows the existing pattern in the codebase and avoids the extra max_retries.py module.

josegonzalez · 2025-12-23T05:31:24Z

Whatever either:

Is simple to read/understand
Follows patterns in the codebase
You want to implement

Is fine with me, as long as it doesn't break existing functionality.

mwtzzz · 2025-12-23T16:34:07Z

Passing the retry count as a parameter rather than using a separate module feels a lot cleaner to me. I think @josegonzalez is the only one who gets to pull the 'acceptable' card. :)

Since retrieve_ can redo the tests if you want.

This follows the existing pattern in the codebase and avoids the extra max_retries.py module.

Sounds good, I'll update the PR soon

mwtzzz · 2025-12-23T16:57:08Z

@Iamrodos I've made the updates. PTAL please.
And yes if you could redo the tests that'd be great, thanks.

Iamrodos · 2025-12-23T21:54:00Z

Thanks for the updates @mwtzzz! Changes look good.

I've put together a patch to update the tests, plus fix a off-by-one-error (mine) that you can apply:

What's in the patch:

Fixed retry semantics - There was an off-by-one error in the original code (my fault, not yours!). --retries N should mean N retries after the initial attempt, so --retries 5 = 6 total attempts. The loops now use range(max_retries + 1).
Added input validation - Added a non_negative_int argparse type validator that:
- Rejects negative values (--retries -1 → error)
- Rejects non-integers (--retries abc → clear error message)
- Accepts zero (--retries 0 = 1 attempt, no retries)
Updated tests - Added TestRetriesCliArgument class with 7 tests covering:
- CLI argument parsing
- Default value
- Zero is valid
- Negative rejected
- Non-integer rejected
- Transient error succeeds on retry
- Custom retry count limits attempts

To apply the patch:

 # From your branch
 curl -L <patch-url> | git apply
 # or download and:
 git apply retry_fixes.patch

All 81 tests pass and linting (flake8, black) passes.

Let me know if you have any questions.

retry_fixes.patch

mwtzzz · 2025-12-23T22:08:51Z

Thanks for your assistance @Iamrodos , I have applied your patch and updated the PR.

Iamrodos · 2025-12-23T23:14:22Z

Nice, all looks great. Helpful enhancement.

mwtzzz added 2 commits December 22, 2025 14:23

max_retries 5

8b1b632

readme, simplify the logic a bit

1f2ec01

mwtzzz added 2 commits December 23, 2025 08:53

don't use a global variable, pass the args instead

f9827da

readme

8b21e25

rm max_retries.py

5ab3852

updates to the tests, and fixes to the retry

44b0003

josegonzalez merged commit 2e999d0 into josegonzalez:master Dec 24, 2025
10 checks passed

update retry logic and logging #474

update retry logic and logging #474

Conversation

mwtzzz commented Dec 22, 2025

What

Why

Testing before merge

Validation after merge

Issue addressed by this PR

Uh oh!

Iamrodos commented Dec 22, 2025

Uh oh!

Iamrodos commented Dec 22, 2025

Uh oh!

mwtzzz commented Dec 22, 2025

Uh oh!

mwtzzz commented Dec 22, 2025

Uh oh!

mwtzzz commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Iamrodos commented Dec 23, 2025

Uh oh!

josegonzalez commented Dec 23, 2025

Uh oh!

mwtzzz commented Dec 23, 2025

Uh oh!

mwtzzz commented Dec 23, 2025

Uh oh!

Iamrodos commented Dec 23, 2025

Uh oh!

mwtzzz commented Dec 23, 2025

Uh oh!

Iamrodos commented Dec 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mwtzzz commented Dec 23, 2025 •

edited

Loading