fix BPM, whisper force cpu and librosa upgrade by rakuri255 · Pull Request #300 · rakuri255/UltraSinger

rakuri255 · 2026-02-15T22:55:54Z

Summary by CodeRabbit

New Features
- Whisper transcription can optionally be forced to CPU for broader compatibility.
Chores
- Relaxed librosa version constraints and removed the numba dependency.
- Redesigned BPM detection: improved audio handling and tempo extraction; BPM is now optional and may not be auto-populated for some input types.

coderabbitai · 2026-02-15T22:56:06Z

Warning

Rate limit exceeded

@rakuri255 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 25 minutes and 57 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📝 Walkthrough

Walkthrough

BPM extraction was centralized: removed runtime BPM reads from YouTube and Ultrastar inputs, made MediaInfo.bpm optional, moved BPM extraction to after processing-audio creation in run(), switched BPM calculation to use soundfile.read + librosa.feature.tempo with stereo handling, and allowed forcing Whisper to CPU.

Changes

Cohort / File(s)	Summary
Dependency `pyproject.toml`	Removed the numba dependency and removed the version bound from `librosa` (changed `"librosa>=0.10.2"` to `"librosa"`).
BPM extraction implementation `src/modules/Audio/bpm.py`	Replaced prior librosa loading/tempo call with `soundfile.read` + `librosa.feature.tempo`; added stereo handling and mono conversion; signatures unchanged.
Input sources (removed BPM at source) `src/modules/Audio/youtube.py`, `src/modules/UltraSinger.py` (Ultrastar path adjustments)	Removed inline BPM extraction from YouTube and removed real_bpm extraction for Ultrastar input; BPM is no longer populated at those input stages.
Processing/control flow & data model `src/UltraSinger.py`, `src/modules/ProcessData.py`	Moved BPM extraction to after creating `processing_audio_path` inside `run()` and store into `process_data.media_info.bpm`; added `settings.force_whisper_cpu` check so `transcribe_with_whisper` may be invoked with `"cpu"`; changed `MediaInfo.bpm` to `Optional[float] = None`.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant UltraSinger
    participant BPMModule as bpm.py
    participant ProcessData
    participant Whisper

    User->>UltraSinger: provide input (audio/video/Ultrastar/YouTube)
    UltraSinger->>ProcessData: create ProcessData & MediaInfo (bpm=None)
    UltraSinger->>UltraSinger: generate processing_audio_path
    UltraSinger->>BPMModule: get_bpm_from_file(processing_audio_path)
    BPMModule-->>UltraSinger: bpm value
    UltraSinger->>ProcessData: set media_info.bpm = bpm
    UltraSinger->>Whisper: transcribe_with_whisper(device = "cpu" or configured)
    Whisper-->>UltraSinger: transcription
    UltraSinger-->>User: output results (uses MediaInfo.bpm if present)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I hopped through files to find the beat,
I moved the BPM where programs meet,
Librosa reads, soundfile lends ear,
Whisper may nap on CPU near,
MediaInfo smiles — optional, neat. 🥕

🚥 Pre-merge checks | ✅ 4

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately identifies the three main changes: BPM extraction improvements, whisper CPU forcing feature, and librosa dependency upgrade.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Merge Conflict Detection	✅ Passed	✅ No merge conflicts detected when merging into `main`

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix-bpm-with-wav-and-whisper-force-cpu-and-upgrade-librosa

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@src/UltraSinger.py`:
- Around line 174-176: The BPM extraction currently unconditionally overwrites
process_data.media_info.bpm via get_bpm_from_file; change this so Ultrastar
`.txt` inputs keep their parsed BPM unless missing or explicitly
force-recompute: in UltraSinger.py, before calling get_bpm_from_file, check the
input type/flag that indicates Ultrastar text parsing (e.g., the code path that
sets parsed BPM) and only call get_bpm_from_file when
process_data.media_info.bpm is empty/None or when the input type is not
Ultrastar `.txt` (or when a "force_bpm_extraction" option is true); ensure you
reference and preserve process_data.media_info.bpm and use
get_bpm_from_file(process_data.process_data_paths.processing_audio_path) only
under those conditional circumstances so existing manually-set BPMs from
Ultrastar `.txt` are not overwritten.

🧹 Nitpick comments (2)

pyproject.toml (1)
19-19: Missing explicit soundfile dependency; unpinned librosa is risky.

soundfile is now directly imported in bpm.py (import soundfile as sf), but it's only available as a transitive dependency of librosa. If librosa ever drops it, this breaks. Consider adding soundfile explicitly.

Also, removing all version constraints on librosa means older (pre-0.10) or future breaking versions could be resolved. A lower bound like librosa>=0.10.2 (which was there before) is safer since librosa.feature.tempo was introduced around that version.
Suggested fix
-    "librosa",
+    "librosa>=0.10.2",
+    "soundfile",
src/modules/Audio/bpm.py (1)
18-24: Stereo handling logic is correct but has a minor readability nit.

The two conditionals (len(data.shape) > 1 then data.ndim > 1) check the same dimensionality property. While functionally correct (transpose changes shape but not ndim), collapsing into a single block would be clearer:
Optional simplification
     data, sampling_rate = sf.read(wav_file, dtype='float32')
-    # Transpose if stereo to match librosa's expected format
-    if len(data.shape) > 1:
-        data = data.T
-    # Convert to mono if stereo
-    if data.ndim > 1:
-        data = librosa.to_mono(data)
+    # Convert to mono if multi-channel
+    if data.ndim > 1:
+        data = librosa.to_mono(data.T)

src/UltraSinger.py

…tar files

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/UltraSinger.py (1)

619-626: ⚠️ Potential issue | 🟡 Minor

Cache key does not reflect the actual device used when force_whisper_cpu is enabled.

Line 619 builds transcription_config using settings.pytorch_device, but Line 626 may override the device to "cpu". This means the cache key won't distinguish between CPU-forced and GPU transcription runs, potentially serving stale/mismatched cached results.

Suggested fix

+        whisper_device = "cpu" if settings.force_whisper_cpu else settings.pytorch_device
         if not settings.whisper_align_model is None: whisper_align_model_string = settings.whisper_align_model.replace("/", "_")
-        transcription_config = f"{settings.transcriber}_{settings.whisper_model.value}_{settings.pytorch_device}_{whisper_align_model_string}_{settings.whisper_batch_size}_{settings.whisper_compute_type}_{settings.language}"
+        transcription_config = f"{settings.transcriber}_{settings.whisper_model.value}_{whisper_device}_{whisper_align_model_string}_{settings.whisper_batch_size}_{settings.whisper_compute_type}_{settings.language}"
         transcription_path = os.path.join(cache_folder_path, f"{transcription_config}.json")
         cached_transcription_available = check_file_exists(transcription_path)
         if settings.skip_cache_transcription or not cached_transcription_available:
             transcription_result = transcribe_with_whisper(
                 processing_audio_path,
                 settings.whisper_model,
-                "cpu" if settings.force_whisper_cpu else settings.pytorch_device,
+                whisper_device,
                 settings.whisper_align_model,

🧹 Nitpick comments (1)

pyproject.toml (1)
19-19: Removing the version floor for librosa may allow incompatible older versions.

The previous >=0.10.2 constraint ensured API compatibility (e.g., librosa.feature.tempo). Dropping it entirely means a resolver could theoretically pick an older version. Consider keeping at least a minimum bound like "librosa>=0.10.2".
Suggested change
-    "librosa",
+    "librosa>=0.10.2",

fix BPM, whisper force cpu and librosa upgrade

1a0c84a

coderabbitai bot reviewed Feb 15, 2026

View reviewed changes

src/UltraSinger.py Show resolved Hide resolved

fix: remove numba dependency and adjust BPM extraction for non-ultras…

a5f9500

…tar files

coderabbitai bot reviewed Feb 15, 2026

View reviewed changes

rakuri255 added 2 commits February 16, 2026 00:27

fix: update whisper device handling and transcription config

857728d

fix: bump version to 0.0.13.dev15 in pyproject.toml and Settings.py

1d30dc7

rakuri255 merged commit e07cb90 into main Feb 15, 2026
6 of 7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix BPM, whisper force cpu and librosa upgrade#300

fix BPM, whisper force cpu and librosa upgrade#300
rakuri255 merged 4 commits intomainfrom
fix-bpm-with-wav-and-whisper-force-cpu-and-upgrade-librosa

rakuri255 commented Feb 15, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 15, 2026 •

edited

Loading

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

rakuri255 commented Feb 15, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

rakuri255 commented Feb 15, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 15, 2026 •

edited

Loading