Skip to content

fix BPM, whisper force cpu and librosa upgrade#300

Merged
rakuri255 merged 4 commits intomainfrom
fix-bpm-with-wav-and-whisper-force-cpu-and-upgrade-librosa
Feb 15, 2026
Merged

fix BPM, whisper force cpu and librosa upgrade#300
rakuri255 merged 4 commits intomainfrom
fix-bpm-with-wav-and-whisper-force-cpu-and-upgrade-librosa

Conversation

@rakuri255
Copy link
Owner

@rakuri255 rakuri255 commented Feb 15, 2026

Summary by CodeRabbit

  • New Features

    • Whisper transcription can optionally be forced to CPU for broader compatibility.
  • Chores

    • Relaxed librosa version constraints and removed the numba dependency.
    • Redesigned BPM detection: improved audio handling and tempo extraction; BPM is now optional and may not be auto-populated for some input types.

@coderabbitai
Copy link

coderabbitai bot commented Feb 15, 2026

Warning

Rate limit exceeded

@rakuri255 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 25 minutes and 57 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📝 Walkthrough

Walkthrough

BPM extraction was centralized: removed runtime BPM reads from YouTube and Ultrastar inputs, made MediaInfo.bpm optional, moved BPM extraction to after processing-audio creation in run(), switched BPM calculation to use soundfile.read + librosa.feature.tempo with stereo handling, and allowed forcing Whisper to CPU.

Changes

Cohort / File(s) Summary
Dependency
pyproject.toml
Removed the numba dependency and removed the version bound from librosa (changed "librosa>=0.10.2" to "librosa").
BPM extraction implementation
src/modules/Audio/bpm.py
Replaced prior librosa loading/tempo call with soundfile.read + librosa.feature.tempo; added stereo handling and mono conversion; signatures unchanged.
Input sources (removed BPM at source)
src/modules/Audio/youtube.py, src/modules/UltraSinger.py (Ultrastar path adjustments)
Removed inline BPM extraction from YouTube and removed real_bpm extraction for Ultrastar input; BPM is no longer populated at those input stages.
Processing/control flow & data model
src/UltraSinger.py, src/modules/ProcessData.py
Moved BPM extraction to after creating processing_audio_path inside run() and store into process_data.media_info.bpm; added settings.force_whisper_cpu check so transcribe_with_whisper may be invoked with "cpu"; changed MediaInfo.bpm to Optional[float] = None.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant UltraSinger
    participant BPMModule as bpm.py
    participant ProcessData
    participant Whisper

    User->>UltraSinger: provide input (audio/video/Ultrastar/YouTube)
    UltraSinger->>ProcessData: create ProcessData & MediaInfo (bpm=None)
    UltraSinger->>UltraSinger: generate processing_audio_path
    UltraSinger->>BPMModule: get_bpm_from_file(processing_audio_path)
    BPMModule-->>UltraSinger: bpm value
    UltraSinger->>ProcessData: set media_info.bpm = bpm
    UltraSinger->>Whisper: transcribe_with_whisper(device = "cpu" or configured)
    Whisper-->>UltraSinger: transcription
    UltraSinger-->>User: output results (uses MediaInfo.bpm if present)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I hopped through files to find the beat,
I moved the BPM where programs meet,
Librosa reads, soundfile lends ear,
Whisper may nap on CPU near,
MediaInfo smiles — optional, neat. 🥕

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately identifies the three main changes: BPM extraction improvements, whisper CPU forcing feature, and librosa dependency upgrade.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Merge Conflict Detection ✅ Passed ✅ No merge conflicts detected when merging into main

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix-bpm-with-wav-and-whisper-force-cpu-and-upgrade-librosa

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/UltraSinger.py`:
- Around line 174-176: The BPM extraction currently unconditionally overwrites
process_data.media_info.bpm via get_bpm_from_file; change this so Ultrastar
`.txt` inputs keep their parsed BPM unless missing or explicitly
force-recompute: in UltraSinger.py, before calling get_bpm_from_file, check the
input type/flag that indicates Ultrastar text parsing (e.g., the code path that
sets parsed BPM) and only call get_bpm_from_file when
process_data.media_info.bpm is empty/None or when the input type is not
Ultrastar `.txt` (or when a "force_bpm_extraction" option is true); ensure you
reference and preserve process_data.media_info.bpm and use
get_bpm_from_file(process_data.process_data_paths.processing_audio_path) only
under those conditional circumstances so existing manually-set BPMs from
Ultrastar `.txt` are not overwritten.
🧹 Nitpick comments (2)
pyproject.toml (1)

19-19: Missing explicit soundfile dependency; unpinned librosa is risky.

soundfile is now directly imported in bpm.py (import soundfile as sf), but it's only available as a transitive dependency of librosa. If librosa ever drops it, this breaks. Consider adding soundfile explicitly.

Also, removing all version constraints on librosa means older (pre-0.10) or future breaking versions could be resolved. A lower bound like librosa>=0.10.2 (which was there before) is safer since librosa.feature.tempo was introduced around that version.

Suggested fix
-    "librosa",
+    "librosa>=0.10.2",
+    "soundfile",
src/modules/Audio/bpm.py (1)

18-24: Stereo handling logic is correct but has a minor readability nit.

The two conditionals (len(data.shape) > 1 then data.ndim > 1) check the same dimensionality property. While functionally correct (transpose changes shape but not ndim), collapsing into a single block would be clearer:

Optional simplification
     data, sampling_rate = sf.read(wav_file, dtype='float32')
-    # Transpose if stereo to match librosa's expected format
-    if len(data.shape) > 1:
-        data = data.T
-    # Convert to mono if stereo
-    if data.ndim > 1:
-        data = librosa.to_mono(data)
+    # Convert to mono if multi-channel
+    if data.ndim > 1:
+        data = librosa.to_mono(data.T)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/UltraSinger.py (1)

619-626: ⚠️ Potential issue | 🟡 Minor

Cache key does not reflect the actual device used when force_whisper_cpu is enabled.

Line 619 builds transcription_config using settings.pytorch_device, but Line 626 may override the device to "cpu". This means the cache key won't distinguish between CPU-forced and GPU transcription runs, potentially serving stale/mismatched cached results.

Suggested fix
+        whisper_device = "cpu" if settings.force_whisper_cpu else settings.pytorch_device
         if not settings.whisper_align_model is None: whisper_align_model_string = settings.whisper_align_model.replace("/", "_")
-        transcription_config = f"{settings.transcriber}_{settings.whisper_model.value}_{settings.pytorch_device}_{whisper_align_model_string}_{settings.whisper_batch_size}_{settings.whisper_compute_type}_{settings.language}"
+        transcription_config = f"{settings.transcriber}_{settings.whisper_model.value}_{whisper_device}_{whisper_align_model_string}_{settings.whisper_batch_size}_{settings.whisper_compute_type}_{settings.language}"
         transcription_path = os.path.join(cache_folder_path, f"{transcription_config}.json")
         cached_transcription_available = check_file_exists(transcription_path)
         if settings.skip_cache_transcription or not cached_transcription_available:
             transcription_result = transcribe_with_whisper(
                 processing_audio_path,
                 settings.whisper_model,
-                "cpu" if settings.force_whisper_cpu else settings.pytorch_device,
+                whisper_device,
                 settings.whisper_align_model,
🧹 Nitpick comments (1)
pyproject.toml (1)

19-19: Removing the version floor for librosa may allow incompatible older versions.

The previous >=0.10.2 constraint ensured API compatibility (e.g., librosa.feature.tempo). Dropping it entirely means a resolver could theoretically pick an older version. Consider keeping at least a minimum bound like "librosa>=0.10.2".

Suggested change
-    "librosa",
+    "librosa>=0.10.2",

@rakuri255 rakuri255 merged commit e07cb90 into main Feb 15, 2026
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant