Optimize performance: parallel flite calls, pre-computed lookups by Tomotz · Pull Request #10 · Tomotz/IPA_transcript

Tomotz · 2026-02-15T05:46:02Z

Optimize performance: parallel flite calls, pre-computed lookups

Summary

Profiling revealed that ~92–96% of processing time is spent spawning one flite subprocess per line/paragraph. The remaining hot paths were redundant dict construction + string splitting in add_double_word_reductions (~4% of runtime, 2.3M str.split calls on real data).

Changes

Parallel flite calls in both print_ipa and process_html_file — Both text-file and HTML processing paths now batch flite subprocess calls (batch size 32) and run them concurrently via ThreadPoolExecutor (8 workers). Post-processing (reductions, flap t/d) still runs sequentially on the main thread.
Pre-computed _double_word_lookup — O(1) first-word lookup instead of iterating all dict entries and splitting keys on every call. Reduced add_double_word_reductions from 2.44s to 0.64s per 60s of real-world processing (handling 5× more calls in the same time).
Pre-computed _flite_path — Computed once at module load instead of per-call.
Refactored HTML paragraph processing — Split _process_single_paragraph into _prepare_paragraph_texts (collects texts needing flite) and _assemble_paragraph (reconstructs HTML with IPA results), enabling cross-paragraph batching of flite calls.

Performance (real-world: `twig_full.html`, 72K paragraphs, 60s runs)

Metric	Baseline (master)	Optimized	Improvement
Paragraphs processed in 60s	3,530	18,848	5.3× throughput
`add_double_word_reductions`	2.44s / 4,507 calls	2.28s / 23,020 calls	5.1× more calls in same time
`str.split` calls	2,302,295	44,907	98% fewer

Output verified identical to baseline via diff (50-paragraph HTML subset and 100-line synthetic text file). All 74 existing tests pass.

What was tried and removed

POS tag caching — Initially added a _pos_tag_cache dict to avoid re-tokenizing sentences in is_verb_in_sentence. Real-world profiling showed this doesn't help because duplicate sentences are rare in actual text. Removed to avoid unnecessary memory usage.

Updates since last revision

Parallelized process_html_file (the main real-world code path) — previously only print_ipa was parallelized.
Removed POS tag cache after real-world profiling showed no benefit.
Removed unused lru_cache import and dead result_idx variable.

Review & Testing Checklist for Human

Verify output correctness on full twig_full.html — Automated comparison used a 50-paragraph subset. Run full file through both master and this branch, diff the outputs.
Review flush_batch ordering logic in print_ipa — The batching introduces complexity with order_counter, pending_indices, and newline_positions. Verify all_outputs.sort() correctly interleaves newlines and IPA output in edge cases (many consecutive newlines, cached_text flush at end of file).
Review HTML _prepare_paragraph_texts + _assemble_paragraph refactor — The result_map logic for reassembling parts with flite results is new. Check edge cases with mixed HTML tags and text nodes.
Checkpoint frequency changed — Text mode: now saves per-batch (32 lines) instead of every 10 lines. HTML mode: now saves per-batch (32 paragraphs) instead of per-paragraph. More data could be lost on crash when using --resume.
stdout path for HTML still sequential — _process_single_paragraph (used when no output file specified) still calls run_flite sequentially. Only file output is parallelized.

Suggested test plan:

git checkout master
python main.py twig_full.html --html -o /tmp/baseline.html
git checkout devin/1771134115-performance-optimizations  
python main.py twig_full.html --html -o /tmp/optimized.html
diff /tmp/baseline.html /tmp/optimized.html

Notes

Link to Devin run: https://app.devin.ai/sessions/8a5097961e824985a991d95998013bc5
Requested by: @Tomotz

…uted lookups - Parallelize flite subprocess calls using ThreadPoolExecutor (batch size 32, 8 workers) - Cache NLTK POS tag results to avoid re-tokenizing the same sentence - Pre-compute double_word_lookup dict with pre-split keys for O(1) first-word lookup - Pre-compute flite_path at module level instead of per-call - Extract _call_flite helper for reuse in batch and single-call paths ~2.9x speedup on 100-line test file (9.2ms/line -> 3.2ms/line). Output is identical to the original. Co-Authored-By: tom mottes <tom.mottes@gmail.com>

Co-Authored-By: tom mottes <tom.mottes@gmail.com>

- Add parallel flite calls to process_html_file via ThreadPoolExecutor batching - Refactor _process_single_paragraph into _prepare_paragraph_texts + _assemble_paragraph - Remove POS tag cache (only helps with duplicate sentences, which are rare in practice) - Restore original is_verb_in_sentence without caching Real-world profiling on twig_full.html (60s): Baseline: 3,530 paragraphs processed Optimized: 18,848 paragraphs processed (5.3x throughput) Output verified identical via diff. Co-Authored-By: tom mottes <tom.mottes@gmail.com>

devin-ai-integration bot and others added 4 commits February 15, 2026 05:44

Remove unused lru_cache import

40708d0

Co-Authored-By: tom mottes <tom.mottes@gmail.com>

Remove unused result_idx variable in flush_batch

6877422

Co-Authored-By: tom mottes <tom.mottes@gmail.com>

Tomotz changed the title ~~Optimize performance: parallel flite calls, cached POS tags, pre-computed lookups~~ Optimize performance: parallel flite calls, pre-computed lookups Feb 15, 2026

Tomotz merged commit 9322ced into master Feb 15, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize performance: parallel flite calls, pre-computed lookups#10

Optimize performance: parallel flite calls, pre-computed lookups#10
Tomotz merged 4 commits intomasterfrom
devin/1771134115-performance-optimizations

Tomotz commented Feb 15, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Tomotz commented Feb 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Optimize performance: parallel flite calls, pre-computed lookups

Summary

Changes

Performance (real-world: twig_full.html, 72K paragraphs, 60s runs)

What was tried and removed

Updates since last revision

Review & Testing Checklist for Human

Notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Tomotz commented Feb 15, 2026 •

edited

Loading

Performance (real-world: `twig_full.html`, 72K paragraphs, 60s runs)