Skip to content

Optimize performance: parallel flite calls, pre-computed lookups#10

Merged
Tomotz merged 4 commits intomasterfrom
devin/1771134115-performance-optimizations
Feb 15, 2026
Merged

Optimize performance: parallel flite calls, pre-computed lookups#10
Tomotz merged 4 commits intomasterfrom
devin/1771134115-performance-optimizations

Conversation

@Tomotz
Copy link
Owner

@Tomotz Tomotz commented Feb 15, 2026

Optimize performance: parallel flite calls, pre-computed lookups

Summary

Profiling revealed that ~92–96% of processing time is spent spawning one flite subprocess per line/paragraph. The remaining hot paths were redundant dict construction + string splitting in add_double_word_reductions (~4% of runtime, 2.3M str.split calls on real data).

Changes

  1. Parallel flite calls in both print_ipa and process_html_file — Both text-file and HTML processing paths now batch flite subprocess calls (batch size 32) and run them concurrently via ThreadPoolExecutor (8 workers). Post-processing (reductions, flap t/d) still runs sequentially on the main thread.
  2. Pre-computed _double_word_lookup — O(1) first-word lookup instead of iterating all dict entries and splitting keys on every call. Reduced add_double_word_reductions from 2.44s to 0.64s per 60s of real-world processing (handling 5× more calls in the same time).
  3. Pre-computed _flite_path — Computed once at module load instead of per-call.
  4. Refactored HTML paragraph processing — Split _process_single_paragraph into _prepare_paragraph_texts (collects texts needing flite) and _assemble_paragraph (reconstructs HTML with IPA results), enabling cross-paragraph batching of flite calls.

Performance (real-world: twig_full.html, 72K paragraphs, 60s runs)

Metric Baseline (master) Optimized Improvement
Paragraphs processed in 60s 3,530 18,848 5.3× throughput
add_double_word_reductions 2.44s / 4,507 calls 2.28s / 23,020 calls 5.1× more calls in same time
str.split calls 2,302,295 44,907 98% fewer

Output verified identical to baseline via diff (50-paragraph HTML subset and 100-line synthetic text file). All 74 existing tests pass.

What was tried and removed

  • POS tag caching — Initially added a _pos_tag_cache dict to avoid re-tokenizing sentences in is_verb_in_sentence. Real-world profiling showed this doesn't help because duplicate sentences are rare in actual text. Removed to avoid unnecessary memory usage.

Updates since last revision

  • Parallelized process_html_file (the main real-world code path) — previously only print_ipa was parallelized.
  • Removed POS tag cache after real-world profiling showed no benefit.
  • Removed unused lru_cache import and dead result_idx variable.

Review & Testing Checklist for Human

  • Verify output correctness on full twig_full.html — Automated comparison used a 50-paragraph subset. Run full file through both master and this branch, diff the outputs.
  • Review flush_batch ordering logic in print_ipa — The batching introduces complexity with order_counter, pending_indices, and newline_positions. Verify all_outputs.sort() correctly interleaves newlines and IPA output in edge cases (many consecutive newlines, cached_text flush at end of file).
  • Review HTML _prepare_paragraph_texts + _assemble_paragraph refactor — The result_map logic for reassembling parts with flite results is new. Check edge cases with mixed HTML tags and text nodes.
  • Checkpoint frequency changed — Text mode: now saves per-batch (32 lines) instead of every 10 lines. HTML mode: now saves per-batch (32 paragraphs) instead of per-paragraph. More data could be lost on crash when using --resume.
  • stdout path for HTML still sequential_process_single_paragraph (used when no output file specified) still calls run_flite sequentially. Only file output is parallelized.

Suggested test plan:

git checkout master
python main.py twig_full.html --html -o /tmp/baseline.html
git checkout devin/1771134115-performance-optimizations  
python main.py twig_full.html --html -o /tmp/optimized.html
diff /tmp/baseline.html /tmp/optimized.html

Notes

devin-ai-integration bot and others added 4 commits February 15, 2026 05:44
…uted lookups

- Parallelize flite subprocess calls using ThreadPoolExecutor (batch size 32, 8 workers)
- Cache NLTK POS tag results to avoid re-tokenizing the same sentence
- Pre-compute double_word_lookup dict with pre-split keys for O(1) first-word lookup
- Pre-compute flite_path at module level instead of per-call
- Extract _call_flite helper for reuse in batch and single-call paths

~2.9x speedup on 100-line test file (9.2ms/line -> 3.2ms/line).
Output is identical to the original.

Co-Authored-By: tom mottes <tom.mottes@gmail.com>
Co-Authored-By: tom mottes <tom.mottes@gmail.com>
Co-Authored-By: tom mottes <tom.mottes@gmail.com>
- Add parallel flite calls to process_html_file via ThreadPoolExecutor batching
- Refactor _process_single_paragraph into _prepare_paragraph_texts + _assemble_paragraph
- Remove POS tag cache (only helps with duplicate sentences, which are rare in practice)
- Restore original is_verb_in_sentence without caching

Real-world profiling on twig_full.html (60s):
  Baseline: 3,530 paragraphs processed
  Optimized: 18,848 paragraphs processed (5.3x throughput)
Output verified identical via diff.

Co-Authored-By: tom mottes <tom.mottes@gmail.com>
@Tomotz Tomotz changed the title Optimize performance: parallel flite calls, cached POS tags, pre-computed lookups Optimize performance: parallel flite calls, pre-computed lookups Feb 15, 2026
@Tomotz Tomotz merged commit 9322ced into master Feb 15, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant