Other Enhancements: Large File Chunking, Parallel Processing

I started a new thread because this is a sizeable change, enough that I put it in its' own feature branch

**### What and Why**
Most of us have a few cores to spare. FFMPEG is not taking advantage of that. If you have 1 monolith file, it could time out during translation, cannot be restarted at any point and doesn't allow for maximizing use of computer resources.
To that end MonkeyPlug was enhanced with large file handling capabilities through a new AudioChunker class and refactored utilities module. Previously, files larger than 150MB could cause memory exhaustion or API timeouts during transcription. 

The new `--use-chunking` flag enables automatic splitting of large files at natural silence points, processes each chunk independently with transcript caching for fast reruns, then reassembles them while preserving chapter metadata and other tags. This enables monkeyplug to process large audiobooks or podcasts with the added benefit of optional parallel encoding (`--parallel-encoding`) that can significantly reduce processing time on multi-core systems. 

In order to be efficient I wanted to split this out into `audio_chunker.py` in the event that you were not interested in this feature so that I can more easily backport to my fork. In doing this work I noticed some duplication and felt that it made sense to move some common functionality into an `utilities.py`

This feature branch has the work from my previous branch plus all this new stuff so linking a diff doesn't make a lot of sense.
Can find this work here: https://github.com/stratus-ss/monkeyplug/tree/feature/file-chunking

**NOTE**: The tests were done with Ai and human-in-the-loop

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Other Enhancements: Large File Chunking, Parallel Processing #15

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Other Enhancements: Large File Chunking, Parallel Processing #15

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions