Skip to content

App is stalling after language detection #298

@t-beaver

Description

@t-beaver

Here is my output:

python src/UltraSinger.py -i 'https://www.youtube.com/watch?v=laPhvAZ_2m4' --whisper tiny
2026-02-13 23:25:09.776282: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.

[UltraSinger] *********************************
[UltraSinger] UltraSinger Version: 0.0.13-dev13
[UltraSinger] *********************************
[UltraSinger] Checking GPU support.
[UltraSinger] pytorch - there are no cuda devices available -> Using cpu.
[UltraSinger] tensorflow - there are no cuda devices available -> Using cpu.
[UltraSinger] ----------------------
[UltraSinger] FFmpeg - using /usr/local/bin/ffmpeg
[UltraSinger] FFprobe - using /usr/local/bin/ffprobe
[UltraSinger] ----------------------
[UltraSinger] Option: Notes will be quantized to the detected musical key
[UltraSinger] Full Automatic Mode
[youtube] Extracting URL: https://www.youtube.com/watch?v=laPhvAZ_2m4
[youtube] laPhvAZ_2m4: Downloading webpage
[youtube] laPhvAZ_2m4: Downloading tv client config
[youtube] laPhvAZ_2m4: Downloading player 1798f86c-player_es6_tcc_vflset_en_US_base
[youtube] laPhvAZ_2m4: Downloading tv player API JSON
[youtube] laPhvAZ_2m4: Downloading ios player API JSON
WARNING: [youtube] Falling back to generic n function search
         player = https://www.youtube.com/s/player/1798f86c/player_es6_tcc.vflset/en_US/base.js
WARNING: [youtube] laPhvAZ_2m4: nsig extraction failed: Some formats may be missing
         n = 96UA5fJosvKlMdtqH ; player = https://www.youtube.com/s/player/1798f86c/player_es6_tcc.vflset/en_US/base.js
         Please report this issue on  https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using  yt-dlp -U
WARNING: [youtube] laPhvAZ_2m4: Some web client https formats have been skipped as they are missing a url. YouTube is forcing SABR streaming for this client. See  https://github.com/yt-dlp/yt-dlp/issues/12482  for more details
[youtube] laPhvAZ_2m4: Downloading m3u8 information
[UltraSinger] Found data on Musicbrainz: Artist=Lucky Seven Title=My Wish List
[UltraSinger] Found year: 2019
[UltraSinger] Found cover image
[UltraSinger] Creating output folder. -> /Users/tbeaver/repos/github/rakuri255/UltraSinger/output/Lucky Seven - My Wish List
[UltraSinger] Downloading from YouTube
[UltraSinger] Downloading Video with Audio
[youtube] Extracting URL: https://www.youtube.com/watch?v=laPhvAZ_2m4
[youtube] laPhvAZ_2m4: Downloading webpage
[youtube] laPhvAZ_2m4: Downloading tv client config
[youtube] laPhvAZ_2m4: Downloading tv player API JSON
[youtube] laPhvAZ_2m4: Downloading ios player API JSON
[youtube] laPhvAZ_2m4: Downloading player 1798f86c-player_es6_vflset_en_US_base
WARNING: [youtube] Falling back to generic n function search
         player = https://www.youtube.com/s/player/1798f86c/player_es6.vflset/en_US/base.js
WARNING: [youtube] laPhvAZ_2m4: nsig extraction failed: Some formats may be missing
         n = thsjhMDm7ZtSg1MzK ; player = https://www.youtube.com/s/player/1798f86c/player_es6.vflset/en_US/base.js
         Please report this issue on  https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using  yt-dlp -U
WARNING: [youtube] laPhvAZ_2m4: Some web client https formats have been skipped as they are missing a url. YouTube is forcing SABR streaming for this client. See  https://github.com/yt-dlp/yt-dlp/issues/12482  for more details
[youtube] laPhvAZ_2m4: Downloading m3u8 information
[info] laPhvAZ_2m4: Downloading 1 format(s): 609+234-7
[hlsnative] Downloading m3u8 manifest
[hlsnative] Total fragments: 5
[download] Destination: /Users/tbeaver/repos/github/rakuri255/UltraSinger/output/Lucky Seven - My Wish List/Lucky Seven - My Wish List.f609.mp4
[download] 100% of    2.00MiB in 00:00:02 at 719.22KiB/s
[hlsnative] Downloading m3u8 manifest
[hlsnative] Total fragments: 5
[download] Destination: /Users/tbeaver/repos/github/rakuri255/UltraSinger/output/Lucky Seven - My Wish List/Lucky Seven - My Wish List.f234-7.mp4
[download] 100% of  496.90KiB in 00:00:00 at 522.99KiB/s
[Merger] Merging formats into "/Users/tbeaver/repos/github/rakuri255/UltraSinger/output/Lucky Seven - My Wish List/Lucky Seven - My Wish List.mp4"
Deleting original file /Users/tbeaver/repos/github/rakuri255/UltraSinger/output/Lucky Seven - My Wish List/Lucky Seven - My Wish List.f234-7.mp4 (pass -k to keep)
Deleting original file /Users/tbeaver/repos/github/rakuri255/UltraSinger/output/Lucky Seven - My Wish List/Lucky Seven - My Wish List.f609.mp4 (pass -k to keep)
[UltraSinger] Extracting audio from video
[UltraSinger] Creating video without audio
/Users/tbeaver/repos/github/rakuri255/UltraSinger/.venv/lib/python3.10/site-packages/librosa/util/decorators.py:88: UserWarning: PySoundFile failed. Trying audioread instead.
  return f(*args, **kwargs)
[UltraSinger] BPM is 114.84
[UltraSinger] Creating output folder. -> /Users/tbeaver/repos/github/rakuri255/UltraSinger/output/Lucky Seven - My Wish List/cache
[UltraSinger] Separating vocals from audio with demucs with model htdemucs and cpu as worker.
Selected model is a bag of 1 models. You will see that many progress bars per track.
Separated tracks will be stored in /Users/tbeaver/repos/github/rakuri255/UltraSinger/output/Lucky Seven - My Wish List/cache/separated/htdemucs
Separating track /Users/tbeaver/repos/github/rakuri255/UltraSinger/output/Lucky Seven - My Wish List/Lucky Seven - My Wish List.m4a
100%|██████████████████████████████████████████████| 35.099999999999994/35.099999999999994 [00:24<00:00,  1.43seconds/s]
[UltraSinger] Reduce noise from vocal audio with ffmpeg.
[UltraSinger] Converting audio for AI
[UltraSinger] Mute audio parts with no singing
[UltraSinger] Detecting musical key
[UltraSinger] Detected key: G minor
[UltraSinger] Loading whisper with model tiny and cpu as worker
No language specified, language will be first be detected for each audio file (increases inference time).
Lightning automatically upgraded your loaded checkpoint from v1.5.4 to v2.5.1.post0. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint .venv/lib/python3.10/site-packages/whisperx/assets/pytorch_model.bin`
[UltraSinger] Transcribing /Users/tbeaver/repos/github/rakuri255/UltraSinger/output/Lucky Seven - My Wish List/cache/Lucky Seven - My Wish List_mute.wav
Detected language: en (0.72) in first 30s of audio...
Downloading: "https://download.pytorch.org/torchaudio/models/wav2vec2_fairseq_base_ls960_asr_ls960.pth" to /Users/tbeaver/.cache/torch/hub/checkpoints/wav2vec2_fairseq_base_ls960_asr_ls960.pth
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 360M/360M [03:08<00:00, 2.00MB/s]

After that nothing happens, I cannot press CTRL + C to cancel, or CTRL + D
No network activity for the python process, nor CPU or RAM activity.

I've tried another youtube link, it's the same.
I've waited 30 minutse, no progress in the output.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions