Move tasks to rate-limited queue near rate-limit threshold #2941

majamassarini · 2026-01-14T14:11:23Z

gemini-code-assist

Code Review

This pull request introduces a mechanism to handle API rate limiting by moving tasks to a rate-limited queue when the remaining API calls are below a certain threshold. This is achieved by adding a check_rate_limit_remaining method to the JobHandler and calling it in various handlers.

My review identifies a critical issue in the implementation where apply_async is used to re-queue tasks. This would lead to task duplication instead of a retry. I've suggested using celery_task.retry() which is the correct approach to stop the current task and re-queue it for later execution.

packit_service/worker/handlers/abstract.py

centosinfra-prod-github-app · 2026-01-14T14:15:39Z

Build succeeded.
https://gateway-cloud-softwarefactory.apps.ocp.cloud.ci.centos.org/zuul/t/packit-service/buildset/b6cd125e145a46139625951e08313256

✔️ pre-commit SUCCESS in 1m 42s
✔️ packit-service-tests SUCCESS in 3m 34s

packit_service/worker/handlers/copr.py

lbarcziova

have you checked if Celery doesn't provide something more native for this kind of use case, of scheduling a task for later, e.g. could this be used, instead of a separate queue? Or could you explain what would be benefit of the separate queue?

majamassarini · 2026-01-15T07:54:50Z

have you checked if Celery doesn't provide something more native for this kind of use case, of scheduling a task for later, e.g. could this be used, instead of a separate queue? Or could you explain what would be benefit of the separate queue?

The idea behind this code is not to delay the execution of a task. I just want to be sure that if any of these tasks are going to block they will not stop the processing of our short-running tasks. I rely on the retry-after handling in ogr to do the waiting (if needed), in this way the tasks will wait for exactly the amount of time they are supposed to, based on the feedback from the service. They will wait in a queue where they can rest for up to an hour (much longer than in our other queues). And they will not stop other tasks from running.

nforro · 2026-01-15T08:09:36Z

have you checked if Celery doesn't provide something more native for this kind of use case, of scheduling a task for later, e.g. could this be used, instead of a separate queue? Or could you explain what would be benefit of the separate queue?

The idea behind this code is not to delay the execution of a task. I just want to be sure that if any of these tasks are going to block they will not stop the processing of our short-running tasks. I rely on the retry-after handling in ogr to do the waiting (if needed), in this way the tasks will wait for exactly the amount of time they are supposed to, based on the feedback from the service. They will wait in a queue where they can rest for up to an hour (much longer than in our other queues). And they will not stop other tasks from running.

Thinking about it now, wouldn't this mess up the ordering? Let's say there is a task that would hit rate limits so it's placed in the queue, and in the meantime there is another task that starts when rate limits are no longer in place - it will be executed sooner than the first task. And this can add up, leaving "unlucky" tasks without being processed for a long time. That could be very confusing for users (but perhaps they are equally confused already by the current state 😅). But without any prioritization there could be a task that would be unlucky enough to be put back in the queue every time, or multiple times in a row, and users may be tempted to try and retrigger it, so we should account for that as well.

majamassarini · 2026-01-16T09:45:58Z

Thinking about it now, wouldn't this mess up the ordering? Let's say there is a task that would hit rate limits so it's placed in the queue, and in the meantime there is another task that starts when rate limits are no longer in place - it will be executed sooner than the first task.

I'm not sure. I expect that once a project is no longer rate-limited, tasks from the rate-limited queue will be executed as soon as possible since they'll be among the oldest tasks waiting. But absolutely the order can somehow change.

However I think it is better reshuffle the order than to let tasks die.

nforro · 2026-01-16T10:23:37Z

I'm not sure. I expect that once a project is no longer rate-limited, tasks from the rate-limited queue will be executed as soon as possible since they'll be among the oldest tasks waiting. But absolutely the order can somehow change.

You are probably right, let's see how it behaves in practice.

However I think it is better reshuffle the order than to let tasks die.

👍

lbarcziova · 2026-01-19T09:37:31Z

@majamassarini ok I see now, thanks for the explanation! Let's see how it performs like this and adjust accordingly.

centosinfra-prod-github-app · 2026-01-20T11:07:55Z

Build succeeded.
https://gateway-cloud-softwarefactory.apps.ocp.cloud.ci.centos.org/zuul/t/packit-service/buildset/2e51a614f31447ccaf72bee37a151d16

✔️ pre-commit SUCCESS in 1m 44s
✔️ packit-service-tests SUCCESS in 3m 27s

centosinfra-prod-github-app · 2026-01-21T09:36:55Z

Build failed.
https://gateway-cloud-softwarefactory.apps.ocp.cloud.ci.centos.org/zuul/t/packit-service/buildset/e7723604c92d4f87b685065570ba325d

✔️ pre-commit SUCCESS in 1m 48s
❌ packit-service-tests FAILURE in 3m 19s

centosinfra-prod-github-app · 2026-01-21T10:18:46Z

Build failed.
https://gateway-cloud-softwarefactory.apps.ocp.cloud.ci.centos.org/zuul/t/packit-service/buildset/6732f66fc96a409089b73be1db5cb9d2

✔️ pre-commit SUCCESS in 1m 43s
❌ packit-service-tests FAILURE in 3m 26s

centosinfra-prod-github-app · 2026-01-21T10:29:03Z

Build succeeded.
https://gateway-cloud-softwarefactory.apps.ocp.cloud.ci.centos.org/zuul/t/packit-service/buildset/43988bd43677410d86090970262859ea

✔️ pre-commit SUCCESS in 1m 50s
✔️ packit-service-tests SUCCESS in 3m 25s

centosinfra-prod-github-app · 2026-01-21T11:05:10Z

Build succeeded.
https://gateway-cloud-softwarefactory.apps.ocp.cloud.ci.centos.org/zuul/t/packit-service/buildset/7d046431604e457198c357c95404c1a1

✔️ pre-commit SUCCESS in 1m 52s
✔️ packit-service-tests SUCCESS in 3m 40s

packit_service/worker/handlers/abstract.py

majamassarini · 2026-01-21T13:13:50Z

/gemini review

centosinfra-prod-github-app · 2026-01-21T13:14:39Z

Build succeeded.
https://gateway-cloud-softwarefactory.apps.ocp.cloud.ci.centos.org/zuul/t/packit-service/buildset/ab9d6c0d20da42cdbb3a29c8ace121f4

✔️ pre-commit SUCCESS in 1m 50s
✔️ packit-service-tests SUCCESS in 3m 38s

gemini-code-assist

Code Review

The pull request introduces a rate-limiting mechanism for tasks by refactoring the run method in JobHandler to _run and adding a check_rate_limit_remaining method. This change correctly moves the core logic to _run and allows for pre-execution checks in run. The new constants for rate-limiting are well-defined. Additionally, several handlers have been updated to reflect this run to _run change. Minor but important type-checking improvements were also made in packit_service/worker/handlers/koji.py for robustness. The test case in tests/unit/test_distgit.py was appropriately updated.

packit_service/worker/handlers/abstract.py

centosinfra-prod-github-app · 2026-01-22T13:57:53Z

Build failed.
https://gateway-cloud-softwarefactory.apps.ocp.cloud.ci.centos.org/zuul/t/packit-service/buildset/b2cce77ad5fb48e78dc0f5455a0e4042

✔️ pre-commit SUCCESS in 1m 46s
❌ packit-service-tests FAILURE in 3m 45s

centosinfra-prod-github-app · 2026-01-23T09:14:39Z

Build succeeded.
https://gateway-cloud-softwarefactory.apps.ocp.cloud.ci.centos.org/zuul/t/packit-service/buildset/bc2e2a39dbf744aca091b3df550db4f1

✔️ pre-commit SUCCESS in 1m 45s
✔️ packit-service-tests SUCCESS in 3m 24s

majamassarini · 2026-01-23T09:27:59Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a robust rate-limiting mechanism by moving tasks to a dedicated queue when API call limits are low, which is a great step towards improving service reliability. The implementation is solid, featuring a new rate-limited Celery queue and clear logic for re-queuing tasks. I particularly appreciate the significant improvement to the Celery auto-retry mechanism, which now only retries on transient errors, making the worker more resilient. The refactoring of run() to _run() across handlers is a clean way to integrate the new rate-limiting check, and the addition of unit tests for this new logic is commendable. I have a couple of minor suggestions to handle edge cases with timestamps.

packit_service/worker/handlers/koji.py

centosinfra-prod-github-app · 2026-01-23T09:41:43Z

Build succeeded.
https://gateway-cloud-softwarefactory.apps.ocp.cloud.ci.centos.org/zuul/t/packit-service/buildset/179d6159ccbd4a228d32a6e648ea4434

✔️ pre-commit SUCCESS in 1m 45s
✔️ packit-service-tests SUCCESS in 3m 19s

packit_service/worker/handlers/abstract.py

packit_service/worker/handlers/koji.py

nforro · 2026-01-26T12:11:15Z

packit_service/worker/tasks.py

+    # Only retry on specific exceptions that are likely to be transient:
+    # - PackitException: Packit-specific errors that might be retryable
+    # - OgrException: OGR library errors (API, network, authentication issues)
+    # - ConnectionError: Network connection problems
+    # - TimeoutError: Timeout issues
+    # - OSError: File system/OS errors that might be transient
+    # Note: RateLimitRequeueException is NOT in this list, so it won't trigger autoretry
+    autoretry_for = (
+        PackitException,
+        OgrException,
+        ConnectionError,
+        TimeoutError,
+        OSError,
+    )


Why not base RateLimitRequeueException on BaseException and keep this as it was? Or are we certain that there can be no other exception that those listed here?

I tried with BaseException but it was making the worker exit. Also I don't think it makes sense to retry for every exception. I hope I didn't miss anything important but we can always add it later.

centosinfra-prod-github-app · 2026-01-26T12:49:52Z

Build succeeded.
https://gateway-cloud-softwarefactory.apps.ocp.cloud.ci.centos.org/zuul/t/packit-service/buildset/ce31d8c7867347a4bcf53fcc98f0ecab

✔️ pre-commit SUCCESS in 1m 48s
✔️ packit-service-tests SUCCESS in 3m 26s

centosinfra-prod-github-app · 2026-01-26T12:55:02Z

Build succeeded.
https://gateway-cloud-softwarefactory.apps.ocp.cloud.ci.centos.org/zuul/t/packit-service/buildset/265705e5ae0241e0b635ae55fcae71b7

✔️ pre-commit SUCCESS in 1m 43s
✔️ packit-service-tests SUCCESS in 3m 28s

centosinfra-prod-github-app · 2026-01-26T13:10:06Z

Build succeeded.
https://gateway-cloud-softwarefactory.apps.ocp.cloud.ci.centos.org/zuul/t/packit-service/buildset/fac2ca6db2264e74ab54f46d99545b48

✔️ pre-commit SUCCESS in 1m 50s
✔️ packit-service-tests SUCCESS in 3m 20s

Also fix mypy errors in koji.py like the following: Argument 1 to "utcfromtimestamp" of "datetime" has incompatible type "Union[int, float, str]"; expected "float" Co-authored-by: Nikola Forró <nforro@redhat.com>

Flower has an endpoint for /api/queues/length but it isn't exsposed as a metric Co-authored-by: Nikola Forró <nforro@redhat.com>

centosinfra-prod-github-app · 2026-01-26T13:17:26Z

Build succeeded.
https://gateway-cloud-softwarefactory.apps.ocp.cloud.ci.centos.org/zuul/t/packit-service/buildset/c6a05d1279f04e1ab0db500f977240bc

✔️ pre-commit SUCCESS in 1m 46s
✔️ packit-service-tests SUCCESS in 3m 19s

nforro

LGTM. To nitpick, the wording around rate limits seems a bit weird to me. For example, I would rather see something like N requests remaining until rate limit, which is below/above the threshold of X instead of Rate limit remaining is low/high.

majamassarini requested review from a team, mfocko and nforro as code owners January 14, 2026 14:11

usercont-release-bot added this to Packit pull requests Jan 14, 2026

github-project-automation bot moved this to New in Packit pull requests Jan 14, 2026

gemini-code-assist bot reviewed Jan 14, 2026

View reviewed changes

packit_service/worker/handlers/abstract.py Outdated Show resolved Hide resolved

majamassarini mentioned this pull request Jan 14, 2026

Add rate-limited queue to the long-running workers packit/deployment#681

Open

nforro reviewed Jan 14, 2026

View reviewed changes

packit_service/worker/handlers/copr.py Outdated Show resolved Hide resolved

lbarcziova reviewed Jan 14, 2026

View reviewed changes

majamassarini force-pushed the check-rate-limit branch from 00456df to 4f11e36 Compare January 20, 2026 10:49

majamassarini force-pushed the check-rate-limit branch from 4f11e36 to e861a79 Compare January 21, 2026 09:32

majamassarini force-pushed the check-rate-limit branch from 82da413 to f86c275 Compare January 21, 2026 10:14

majamassarini force-pushed the check-rate-limit branch from f86c275 to a84ea00 Compare January 21, 2026 10:25

majamassarini force-pushed the check-rate-limit branch from a84ea00 to 7522866 Compare January 21, 2026 11:00

nforro reviewed Jan 21, 2026

View reviewed changes

packit_service/worker/handlers/abstract.py Outdated Show resolved Hide resolved

majamassarini force-pushed the check-rate-limit branch from 7522866 to 95fd5bd Compare January 21, 2026 13:10

gemini-code-assist bot reviewed Jan 21, 2026

View reviewed changes

packit_service/worker/handlers/abstract.py Show resolved Hide resolved

lbarcziova moved this from New to In review in Packit pull requests Jan 21, 2026

majamassarini force-pushed the check-rate-limit branch from 95fd5bd to 0d647f0 Compare January 22, 2026 13:53

majamassarini force-pushed the check-rate-limit branch from 0d647f0 to 1311bd3 Compare January 23, 2026 09:10

gemini-code-assist bot reviewed Jan 23, 2026

View reviewed changes

packit_service/worker/handlers/koji.py Outdated Show resolved Hide resolved

packit_service/worker/handlers/koji.py Outdated Show resolved Hide resolved

majamassarini force-pushed the check-rate-limit branch from 8677cba to 5797d70 Compare January 23, 2026 09:38

nforro reviewed Jan 26, 2026

View reviewed changes

majamassarini force-pushed the check-rate-limit branch from 328409a to 9779aff Compare January 26, 2026 13:06

majamassarini and others added 2 commits January 26, 2026 14:12

Move tasks to rate-limited queue near rate-limit threshold

ede3c02

Also fix mypy errors in koji.py like the following: Argument 1 to "utcfromtimestamp" of "datetime" has incompatible type "Union[int, float, str]"; expected "float" Co-authored-by: Nikola Forró <nforro@redhat.com>

Add new metric for monitoring rate-limited queue

428298f

Flower has an endpoint for /api/queues/length but it isn't exsposed as a metric Co-authored-by: Nikola Forró <nforro@redhat.com>

majamassarini force-pushed the check-rate-limit branch from 69fdfaf to 428298f Compare January 26, 2026 13:13

nforro approved these changes Jan 26, 2026

View reviewed changes

Move tasks to rate-limited queue near rate-limit threshold #2941

Are you sure you want to change the base?

Move tasks to rate-limited queue near rate-limit threshold #2941

Uh oh!

Conversation

majamassarini commented Jan 14, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

centosinfra-prod-github-app bot commented Jan 14, 2026

Uh oh!

Uh oh!

lbarcziova left a comment

Choose a reason for hiding this comment

Uh oh!

majamassarini commented Jan 15, 2026

Uh oh!

nforro commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

majamassarini commented Jan 16, 2026

Uh oh!

nforro commented Jan 16, 2026

Uh oh!

lbarcziova commented Jan 19, 2026

Uh oh!

centosinfra-prod-github-app bot commented Jan 20, 2026

Uh oh!

centosinfra-prod-github-app bot commented Jan 21, 2026

Uh oh!

centosinfra-prod-github-app bot commented Jan 21, 2026

Uh oh!

centosinfra-prod-github-app bot commented Jan 21, 2026

Uh oh!

centosinfra-prod-github-app bot commented Jan 21, 2026

Uh oh!

Uh oh!

majamassarini commented Jan 21, 2026

Uh oh!

centosinfra-prod-github-app bot commented Jan 21, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

centosinfra-prod-github-app bot commented Jan 22, 2026

Uh oh!

centosinfra-prod-github-app bot commented Jan 23, 2026

Uh oh!

majamassarini commented Jan 23, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

centosinfra-prod-github-app bot commented Jan 23, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nforro Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

majamassarini Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

centosinfra-prod-github-app bot commented Jan 26, 2026

Uh oh!

centosinfra-prod-github-app bot commented Jan 26, 2026

Uh oh!

centosinfra-prod-github-app bot commented Jan 26, 2026

nforro commented Jan 15, 2026 •

edited

Loading