Skip to content

Conversation

@me21
Copy link
Contributor

@me21 me21 commented Oct 1, 2025

Problem: Download of ZIP file containing 50000 files stalled while zipfile wrote the central directory; Chrome spinner kept spinning and Python hit 100% CPU.
Root cause: bytes are immutable; _buffer += b caused O(n^2) copying during many small writes at close time. No data was yielded until the end.
Changes
Reworked ZipflyStream to be queue-backed with chunked staging:

  • Use a mutable bytearray (_staging) to append writes in-place and cut fixed-size chunks to a bounded queue.Queue (default 64 chunks).
  • Convert chunks to immutable bytes before enqueue for thread-safety and to decouple memory from _staging.
  • Added flush() to push remaining bytes and close() to flush and enqueue a None sentinel.
  • Track total written size without concatenating buffers.

Updated ZipFly.generator() to stream concurrently:

  • Start a background writer thread that builds the ZIP (files + central directory) and writes into ZipflyStream.
  • Drain the queue in the generator and yield chunks until the sentinel is seen; then join() the writer.
  • Set _buffer_size from the stream after completion.

Small import additions: threading, queue.
ZipflyStream.init now takes chunksize (internal use only).

me21 added 2 commits October 1, 2025 14:59
Problem: Download of ZIP file containing 50000 files stalled while zipfile wrote the central directory; Chrome spinner kept spinning and Python hit 100% CPU.
Root cause: bytes are immutable; _buffer += b caused O(n^2) copying during many small writes at close time. No data was yielded until the end.
Changes
Reworked ZipflyStream to be queue-backed with chunked staging:
* Use a mutable bytearray (_staging) to append writes in-place and cut fixed-size chunks to a bounded queue.Queue (default 64 chunks).
* Convert chunks to immutable bytes before enqueue for thread-safety and to decouple memory from _staging.
* Added flush() to push remaining bytes and close() to flush and enqueue a None sentinel.
* Track total written size without concatenating buffers.
Updated ZipFly.generator() to stream concurrently:
* Start a background writer thread that builds the ZIP (files + central directory) and writes into ZipflyStream.
* Drain the queue in the generator and yield chunks until the sentinel is seen; then join() the writer.
* Set _buffer_size from the stream after completion.
Small import additions: threading, queue.
ZipflyStream.__init__ now takes chunksize (internal use only).
Streaming ZIP: avoid hang during central directory write and reduce CPU
@sandes sandes self-requested a review October 1, 2025 12:49
@sandes sandes merged commit c44bfed into sandes:master Oct 1, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants