fix BPM, whisper force cpu and librosa upgrade#300
Conversation
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📝 WalkthroughWalkthroughBPM extraction was centralized: removed runtime BPM reads from YouTube and Ultrastar inputs, made MediaInfo.bpm optional, moved BPM extraction to after processing-audio creation in run(), switched BPM calculation to use soundfile.read + librosa.feature.tempo with stereo handling, and allowed forcing Whisper to CPU. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant UltraSinger
participant BPMModule as bpm.py
participant ProcessData
participant Whisper
User->>UltraSinger: provide input (audio/video/Ultrastar/YouTube)
UltraSinger->>ProcessData: create ProcessData & MediaInfo (bpm=None)
UltraSinger->>UltraSinger: generate processing_audio_path
UltraSinger->>BPMModule: get_bpm_from_file(processing_audio_path)
BPMModule-->>UltraSinger: bpm value
UltraSinger->>ProcessData: set media_info.bpm = bpm
UltraSinger->>Whisper: transcribe_with_whisper(device = "cpu" or configured)
Whisper-->>UltraSinger: transcription
UltraSinger-->>User: output results (uses MediaInfo.bpm if present)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@src/UltraSinger.py`:
- Around line 174-176: The BPM extraction currently unconditionally overwrites
process_data.media_info.bpm via get_bpm_from_file; change this so Ultrastar
`.txt` inputs keep their parsed BPM unless missing or explicitly
force-recompute: in UltraSinger.py, before calling get_bpm_from_file, check the
input type/flag that indicates Ultrastar text parsing (e.g., the code path that
sets parsed BPM) and only call get_bpm_from_file when
process_data.media_info.bpm is empty/None or when the input type is not
Ultrastar `.txt` (or when a "force_bpm_extraction" option is true); ensure you
reference and preserve process_data.media_info.bpm and use
get_bpm_from_file(process_data.process_data_paths.processing_audio_path) only
under those conditional circumstances so existing manually-set BPMs from
Ultrastar `.txt` are not overwritten.
🧹 Nitpick comments (2)
pyproject.toml (1)
19-19: Missing explicitsoundfiledependency; unpinnedlibrosais risky.
soundfileis now directly imported inbpm.py(import soundfile as sf), but it's only available as a transitive dependency oflibrosa. If librosa ever drops it, this breaks. Consider addingsoundfileexplicitly.Also, removing all version constraints on
librosameans older (pre-0.10) or future breaking versions could be resolved. A lower bound likelibrosa>=0.10.2(which was there before) is safer sincelibrosa.feature.tempowas introduced around that version.Suggested fix
- "librosa", + "librosa>=0.10.2", + "soundfile",src/modules/Audio/bpm.py (1)
18-24: Stereo handling logic is correct but has a minor readability nit.The two conditionals (
len(data.shape) > 1thendata.ndim > 1) check the same dimensionality property. While functionally correct (transpose changes shape but not ndim), collapsing into a single block would be clearer:Optional simplification
data, sampling_rate = sf.read(wav_file, dtype='float32') - # Transpose if stereo to match librosa's expected format - if len(data.shape) > 1: - data = data.T - # Convert to mono if stereo - if data.ndim > 1: - data = librosa.to_mono(data) + # Convert to mono if multi-channel + if data.ndim > 1: + data = librosa.to_mono(data.T)
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/UltraSinger.py (1)
619-626:⚠️ Potential issue | 🟡 MinorCache key does not reflect the actual device used when
force_whisper_cpuis enabled.Line 619 builds
transcription_configusingsettings.pytorch_device, but Line 626 may override the device to"cpu". This means the cache key won't distinguish between CPU-forced and GPU transcription runs, potentially serving stale/mismatched cached results.Suggested fix
+ whisper_device = "cpu" if settings.force_whisper_cpu else settings.pytorch_device if not settings.whisper_align_model is None: whisper_align_model_string = settings.whisper_align_model.replace("/", "_") - transcription_config = f"{settings.transcriber}_{settings.whisper_model.value}_{settings.pytorch_device}_{whisper_align_model_string}_{settings.whisper_batch_size}_{settings.whisper_compute_type}_{settings.language}" + transcription_config = f"{settings.transcriber}_{settings.whisper_model.value}_{whisper_device}_{whisper_align_model_string}_{settings.whisper_batch_size}_{settings.whisper_compute_type}_{settings.language}" transcription_path = os.path.join(cache_folder_path, f"{transcription_config}.json") cached_transcription_available = check_file_exists(transcription_path) if settings.skip_cache_transcription or not cached_transcription_available: transcription_result = transcribe_with_whisper( processing_audio_path, settings.whisper_model, - "cpu" if settings.force_whisper_cpu else settings.pytorch_device, + whisper_device, settings.whisper_align_model,
🧹 Nitpick comments (1)
pyproject.toml (1)
19-19: Removing the version floor forlibrosamay allow incompatible older versions.The previous
>=0.10.2constraint ensured API compatibility (e.g.,librosa.feature.tempo). Dropping it entirely means a resolver could theoretically pick an older version. Consider keeping at least a minimum bound like"librosa>=0.10.2".Suggested change
- "librosa", + "librosa>=0.10.2",
Summary by CodeRabbit
New Features
Chores