Skip to content

Fix rate limiting to evenly space API calls (fixes #28)#29

Open
vipinkataria2209 wants to merge 2 commits intoUSEPA:mainfrom
vipinkataria2209:main
Open

Fix rate limiting to evenly space API calls (fixes #28)#29
vipinkataria2209 wants to merge 2 commits intoUSEPA:mainfrom
vipinkataria2209:main

Conversation

@vipinkataria2209
Copy link

Summary

This PR replaces the blocking rate limiter with a token bucket implementation that evenly spaces API calls over time, significantly improving performance and reliability for multi-year data requests.

Fixes #28

Problem

The previous rate limiting implementation using @limits and @sleep_and_retry decorators from the ratelimit package caused inefficient behavior:

  • Allowed bursts of 10 calls followed by long waits (60+ seconds)
  • For 74 API calls, this resulted in unpredictable timing and potential server overload
  • Performance was significantly worse than RAQSAPI's httr2::req_throttle implementation

Solution

Implemented a TokenBucketRateLimiter class that:

  • Evenly spaces calls: Ensures 6 seconds between call starts (10 calls per 60 seconds)
  • Accounts for API call duration: If an API call takes 10 seconds, the next call doesn't wait extra
  • Thread-safe: Uses locks to ensure safe concurrent access
  • Efficient: Only waits when necessary to maintain rate limits

Key Features

  1. Smart spacing:

    • Fast API call (1s) → waits 5s to reach 6s minimum ✅
    • Slow API call (10s) → no extra wait needed ✅
    • Medium API call (5s) → waits 1s to reach 6s minimum ✅
  2. Predictable timing: Makes completion time easier to estimate

  3. Server protection: Prevents bursts that can overwhelm the API server

  4. Feature parity: Matches RAQSAPI's httr2::req_throttle behavior

Changes

  • ✅ Added TokenBucketRateLimiter class to helperfunctions.py
  • ✅ Removed @sleep_and_retry and @limits decorators from __aqs() method
  • ✅ Integrated rate limiter at the start of each API call
  • ✅ Added unit tests in tests/test_rate_limiter.py
  • ✅ Updated existing tests in tests/test_helperfunctions.py
  • ✅ Cleaned up deprecated rate limiting code and comments

Testing

  • ✅ Created comprehensive unit tests for rate limiter functionality
  • ✅ Verified rate limiter correctly accounts for variable API call durations
  • ✅ Tested thread safety with concurrent calls
  • ✅ Confirmed integration with existing API call flow
  • ✅ All tests pass successfully

Impact

For 74 API calls (74 years of data):

  • Old approach: Unpredictable bursts + long waits (~7+ minutes with inefficiencies)
  • New approach: Steady, predictable spacing (~7.3 minutes, but more reliable)

Key improvements:

  • ✅ Prevents API server overload from bursts
  • ✅ Reduces risk of rate limit errors
  • ✅ More predictable and consistent behavior
  • ✅ Better user experience with easier completion time estimation
  • ✅ Aligns with RAQSAPI behavior for consistency

Fix rate limiting to evenly space API calls (fixes USEPA#28)

Replace blocking rate limiter with token bucket implementation that
evenly spaces API calls over time instead of allowing bursts followed
by long waits. This significantly improves performance for multi-year
data requests by reducing total wait time.

- Implement TokenBucketRateLimiter class to space calls evenly
- Remove @sleep_and_retry and @limits decorators from __aqs method
- Add rate limiter acquire() call at start of each API request
- Maintains 10 calls per minute limit while improving efficiency

This change makes pyaqsapi's rate limiting behavior similar to
RAQSAPI's httr2::req_throttle, addressing the performance gap
reported in issue USEPA#28.
@mccroweyclinton-EPA
Copy link
Collaborator

mccroweyclinton-EPA commented Jan 13, 2026

@vipinkataria2209 thank you for your submission, unfortunately any ratelimiting that is implemented needs to satisfy the rules as stated on the EPA AQS DataMart API's website. I realized this while considering how I should re-implement this ratelimit. Unfortunately I am going to have to change RAQSAPI's rate limiting functionality to match this as well. I am in talks with the DataMart team to see if we can get something faster than what they are requiring. There may be some changes to the API coming in the near future that will allow us to do implement something similar to what you are proposing but for now we have to follow the guidelines as stated on the Datamart API page.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants