Optimize performance: parallel flite calls, pre-computed lookups#10
Merged
Optimize performance: parallel flite calls, pre-computed lookups#10
Conversation
…uted lookups - Parallelize flite subprocess calls using ThreadPoolExecutor (batch size 32, 8 workers) - Cache NLTK POS tag results to avoid re-tokenizing the same sentence - Pre-compute double_word_lookup dict with pre-split keys for O(1) first-word lookup - Pre-compute flite_path at module level instead of per-call - Extract _call_flite helper for reuse in batch and single-call paths ~2.9x speedup on 100-line test file (9.2ms/line -> 3.2ms/line). Output is identical to the original. Co-Authored-By: tom mottes <tom.mottes@gmail.com>
Co-Authored-By: tom mottes <tom.mottes@gmail.com>
Co-Authored-By: tom mottes <tom.mottes@gmail.com>
- Add parallel flite calls to process_html_file via ThreadPoolExecutor batching - Refactor _process_single_paragraph into _prepare_paragraph_texts + _assemble_paragraph - Remove POS tag cache (only helps with duplicate sentences, which are rare in practice) - Restore original is_verb_in_sentence without caching Real-world profiling on twig_full.html (60s): Baseline: 3,530 paragraphs processed Optimized: 18,848 paragraphs processed (5.3x throughput) Output verified identical via diff. Co-Authored-By: tom mottes <tom.mottes@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Optimize performance: parallel flite calls, pre-computed lookups
Summary
Profiling revealed that ~92–96% of processing time is spent spawning one
flitesubprocess per line/paragraph. The remaining hot paths were redundant dict construction + string splitting inadd_double_word_reductions(~4% of runtime, 2.3Mstr.splitcalls on real data).Changes
print_ipaandprocess_html_file— Both text-file and HTML processing paths now batch flite subprocess calls (batch size 32) and run them concurrently viaThreadPoolExecutor(8 workers). Post-processing (reductions, flap t/d) still runs sequentially on the main thread._double_word_lookup— O(1) first-word lookup instead of iterating all dict entries and splitting keys on every call. Reducedadd_double_word_reductionsfrom 2.44s to 0.64s per 60s of real-world processing (handling 5× more calls in the same time)._flite_path— Computed once at module load instead of per-call._process_single_paragraphinto_prepare_paragraph_texts(collects texts needing flite) and_assemble_paragraph(reconstructs HTML with IPA results), enabling cross-paragraph batching of flite calls.Performance (real-world:
twig_full.html, 72K paragraphs, 60s runs)add_double_word_reductionsstr.splitcallsOutput verified identical to baseline via
diff(50-paragraph HTML subset and 100-line synthetic text file). All 74 existing tests pass.What was tried and removed
_pos_tag_cachedict to avoid re-tokenizing sentences inis_verb_in_sentence. Real-world profiling showed this doesn't help because duplicate sentences are rare in actual text. Removed to avoid unnecessary memory usage.Updates since last revision
process_html_file(the main real-world code path) — previously onlyprint_ipawas parallelized.lru_cacheimport and deadresult_idxvariable.Review & Testing Checklist for Human
twig_full.html— Automated comparison used a 50-paragraph subset. Run full file through both master and this branch, diff the outputs.flush_batchordering logic inprint_ipa— The batching introduces complexity withorder_counter,pending_indices, andnewline_positions. Verifyall_outputs.sort()correctly interleaves newlines and IPA output in edge cases (many consecutive newlines,cached_textflush at end of file)._prepare_paragraph_texts+_assemble_paragraphrefactor — Theresult_maplogic for reassembling parts with flite results is new. Check edge cases with mixed HTML tags and text nodes.--resume._process_single_paragraph(used when no output file specified) still callsrun_flitesequentially. Only file output is parallelized.Suggested test plan:
Notes