Set headers for bulk data to gzip #151

NandaScott · 2026-01-24T19:33:23Z

This PR adds a more complete fix to #145 by setting the Accept-Encoding to gzip (and identity for thoroughness). Also adds tests.

claude · 2026-01-24T19:34:47Z

Code Review

I've reviewed PR #151 and have the following feedback:

✅ Positive Aspects

Good fix for issue Bugfix/download encoding mismatch #145: Setting the Accept-Encoding header explicitly to gzip, identity is the right approach to signal to CDNs/proxies that the client can handle gzip compression.
Consistent User-Agent usage: Correctly reuses ScrythonRequestHandler._user_agent to maintain consistency with the rest of the codebase.
Comprehensive test coverage: The new test test_download_sets_headers properly verifies that both headers are set correctly on the Request object.
No breaking changes: The fix is backwards compatible and doesn't change the public API.

🔍 Code Quality Observations

Header Setting Best Practice

Location: scrython/bulk_data/bulk_data_mixins.py:166-168

The current implementation correctly creates a Request object and sets headers. This is the proper way to add headers when using urllib.request.urlopen().

Potential Improvement: User-Agent Access Pattern

Location: scrython/bulk_data/bulk_data_mixins.py:167

The code accesses ScrythonRequestHandler._user_agent as a class variable. While this works, note that:

If a user calls ScrythonRequestHandler.set_user_agent() to customize the User-Agent, this code will correctly pick up the change since it accesses the class variable.
This is consistent with how scrython/base.py:184 accesses it in the main request handler.

✅ This is correct and consistent with the codebase.

Test Implementation

Location: tests/test_bulk_data.py:316-343

The test properly:

Mocks the urlopen call
Captures and inspects the Request object
Verifies both headers are set correctly

However, there's a minor inconsistency in header name casing:

Line 342: request.get_header("User-agent") (lowercase 'a')
Line 343: request.get_header("Accept-encoding") (lowercase 'e')

This works because Request.get_header() is case-insensitive, but for consistency with the actual header names being set ("User-Agent", "Accept-Encoding"), consider matching the casing:

assert request.get_header("User-Agent") == ScrythonRequestHandler._user_agent
assert request.get_header("Accept-Encoding") == "gzip, identity"

This is a minor style issue and doesn't affect functionality.

🔒 Security Considerations

No security concerns identified. The changes:

Use standard library functions (urllib.request)
Don't introduce user input handling
Don't modify authentication or sensitive data handling

⚡ Performance Considerations

Positive impact: By explicitly requesting gzip encoding, the fix ensures:

CDNs/proxies know the client supports compression
Reduced bandwidth usage when servers send compressed responses
Faster downloads for large bulk data files (100+ MB)

🧪 Test Coverage

Excellent: The PR adds appropriate test coverage for the new functionality. The test verifies the core behavior (headers are set) without being overly implementation-specific.

Suggestion: Consider adding an integration-style test that verifies the download works correctly with a small mock gzip response to ensure end-to-end functionality, though the existing tests may already cover this adequately.

📝 Documentation

The existing docstring in download() method (lines 116-163) already mentions that "Files may be compressed with gzip depending on CDN/proxy configuration" and that "The method automatically detects encoding from HTTP headers." This documentation remains accurate with the changes.

✅ Recommendations

Merge as-is: The PR is solid and fixes the reported issue correctly.
Optional improvement: Consider matching header name casing in the test for consistency (not blocking).
Consider adding: A comment in the code explaining why we set Accept-Encoding explicitly might help future maintainers:

# Explicitly request gzip encoding to signal CDN/proxy that we support compression.
# Without this header, some CDNs may not compress the response even if they could.
request.add_header("Accept-Encoding", "gzip, identity")

Summary

This is a well-implemented fix that addresses issue #145 correctly. The code quality is good, follows the project's conventions, and includes appropriate tests. I recommend approving and merging this PR.

Rating: ✅ Approved with optional suggestions for minor improvements.

Set encoding headers to gzip for bulk_data download

92bd735

NandaScott merged commit 0dc7f7a into develop Jan 24, 2026
8 checks passed

NandaScott deleted the feature/bulk-data-headers branch January 24, 2026 19:39

NandaScott mentioned this pull request Jan 24, 2026

Set proper accept headers for gzip in bulk export #148

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set headers for bulk data to gzip #151

Set headers for bulk data to gzip #151

Uh oh!

NandaScott commented Jan 24, 2026

Uh oh!

claude bot commented Jan 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Set headers for bulk data to gzip #151

Set headers for bulk data to gzip #151

Uh oh!

Conversation

NandaScott commented Jan 24, 2026

Uh oh!

claude bot commented Jan 24, 2026

Code Review

✅ Positive Aspects

🔍 Code Quality Observations

Header Setting Best Practice

Potential Improvement: User-Agent Access Pattern

Test Implementation

🔒 Security Considerations

⚡ Performance Considerations

🧪 Test Coverage

📝 Documentation

✅ Recommendations

Summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant