-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Open
Description
When starting the service via /app/vllm_plugin/scripts/start_server.py, the following error occurs during the startup process. Although it is marked as "non-fatal," it indicates a failure in the audio preprocessing warmup.
Error Summary: The system expects a ProcessorMixin for audio processing but instead receives a Qwen2TokenizerFast instance.
Stack Trace:
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177] Audio preprocessing warmup failed (non-fatal): %s. First request may experience higher latency.
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177] Traceback (most recent call last):
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177] File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/speech_to_text.py", line 152, in _warmup_audio_preprocessing
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177] processor = cached_processor_from_config(self.model_config)
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177] File "/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/processor.py", line 251, in cached_processor_from_config
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177] return cached_get_processor_without_dynamic_kwargs(
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177] File "/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/processor.py", line 210, in cached_get_processor_without_dynamic_kwargs
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177] processor = cached_get_processor(
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177] File "/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/processor.py", line 155, in get_processor
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177] raise TypeError(
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177] TypeError: Invalid type of HuggingFace processor. Expected type: <class 'transformers.processing_utils.ProcessorMixin'>, but found type: <class 'transformers.models.qwen2.tokenization_qwen2_fast.Qwen2TokenizerFast'>
May I ask how this problem can be solved?
Thanks.
Workaround & Versioning:
Docker image Working Version: vllm/vllm-openai:v0.14.1
The complete log of starting the service is as follows:
============================================================
VibeVoice vLLM ASR Server - One-Click Deployment
============================================================
============================================================
Updating package list
============================================================
Hit:2 http://archive.ubuntu.com/ubuntu jammy InRelease
Hit:3 http://security.ubuntu.com/ubuntu jammy-security InRelease
Hit:4 http://archive.ubuntu.com/ubuntu jammy-updates InRelease
Hit:5 http://archive.ubuntu.com/ubuntu jammy-backports InRelease
Hit:6 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu jammy InRelease
Hit:1 https://developer.download.nvidia.cn/compute/cuda/repos/ubuntu2204/x86_64 InRelease
Reading package lists... Done
============================================================
Installing FFmpeg and audio libraries
============================================================
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
libsndfile1 is already the newest version (1.0.31-2ubuntu0.2).
ffmpeg is already the newest version (7:4.4.2-0ubuntu0.22.04.1).
0 upgraded, 0 newly installed, 0 to remove and 34 not upgraded.
============================================================
Installing VibeVoice with vLLM support
============================================================
Obtaining file:///app
Installing build dependencies ... done
Checking if build backend supports build_editable ... done
Getting requirements to build editable ... done
Preparing editable metadata (pyproject.toml) ... done
Requirement already satisfied: torch in /usr/local/lib/python3.12/dist-packages (from vibevoice==1.0.0) (2.9.1+cu129)
Requirement already satisfied: transformers<5.0.0,>=4.51.3 in /usr/local/lib/python3.12/dist-packages (from vibevoice==1.0.0) (4.57.6)
Requirement already satisfied: accelerate in /usr/local/lib/python3.12/dist-packages (from vibevoice==1.0.0) (1.12.0)
Requirement already satisfied: llvmlite>=0.40.0 in /usr/local/lib/python3.12/dist-packages (from vibevoice==1.0.0) (0.44.0)
Requirement already satisfied: numba>=0.57.0 in /usr/local/lib/python3.12/dist-packages (from vibevoice==1.0.0) (0.61.2)
Requirement already satisfied: diffusers in /usr/local/lib/python3.12/dist-packages (from vibevoice==1.0.0) (0.36.0)
Requirement already satisfied: tqdm in /usr/local/lib/python3.12/dist-packages (from vibevoice==1.0.0) (4.67.1)
Requirement already satisfied: numpy in /usr/local/lib/python3.12/dist-packages (from vibevoice==1.0.0) (2.2.6)
Requirement already satisfied: scipy in /usr/local/lib/python3.12/dist-packages (from vibevoice==1.0.0) (1.17.0)
Requirement already satisfied: librosa in /usr/local/lib/python3.12/dist-packages (from vibevoice==1.0.0) (0.11.0)
Requirement already satisfied: ml-collections in /usr/local/lib/python3.12/dist-packages (from vibevoice==1.0.0) (1.1.0)
Requirement already satisfied: absl-py in /usr/local/lib/python3.12/dist-packages (from vibevoice==1.0.0) (2.4.0)
Requirement already satisfied: gradio in /usr/local/lib/python3.12/dist-packages (from vibevoice==1.0.0) (6.5.1)
Requirement already satisfied: av in /usr/local/lib/python3.12/dist-packages (from vibevoice==1.0.0) (16.1.0)
Requirement already satisfied: aiortc in /usr/local/lib/python3.12/dist-packages (from vibevoice==1.0.0) (1.14.0)
Requirement already satisfied: uvicorn[standard] in /usr/local/lib/python3.12/dist-packages (from vibevoice==1.0.0) (0.40.0)
Requirement already satisfied: fastapi in /usr/local/lib/python3.12/dist-packages (from vibevoice==1.0.0) (0.128.0)
Requirement already satisfied: pydub in /usr/local/lib/python3.12/dist-packages (from vibevoice==1.0.0) (0.25.1)
Requirement already satisfied: requests in /usr/local/lib/python3.12/dist-packages (from vibevoice==1.0.0) (2.32.5)
WARNING: vibevoice 1.0.0 does not provide the extra 'vllm'
Requirement already satisfied: filelock in /usr/local/lib/python3.12/dist-packages (from transformers<5.0.0,>=4.51.3->vibevoice==1.0.0) (3.20.3)
Requirement already satisfied: huggingface-hub<1.0,>=0.34.0 in /usr/local/lib/python3.12/dist-packages (from transformers<5.0.0,>=4.51.3->vibevoice==1.0.0) (0.36.0)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.12/dist-packages (from transformers<5.0.0,>=4.51.3->vibevoice==1.0.0) (26.0)
Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.12/dist-packages (from transformers<5.0.0,>=4.51.3->vibevoice==1.0.0) (6.0.3)
Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.12/dist-packages (from transformers<5.0.0,>=4.51.3->vibevoice==1.0.0) (2026.1.15)
Requirement already satisfied: tokenizers<=0.23.0,>=0.22.0 in /usr/local/lib/python3.12/dist-packages (from transformers<5.0.0,>=4.51.3->vibevoice==1.0.0) (0.22.2)
Requirement already satisfied: safetensors>=0.4.3 in /usr/local/lib/python3.12/dist-packages (from transformers<5.0.0,>=4.51.3->vibevoice==1.0.0) (0.7.0)
Requirement already satisfied: fsspec>=2023.5.0 in /usr/local/lib/python3.12/dist-packages (from huggingface-hub<1.0,>=0.34.0->transformers<5.0.0,>=4.51.3->vibevoice==1.0.0) (2026.1.0)
Requirement already satisfied: typing-extensions>=3.7.4.3 in /usr/local/lib/python3.12/dist-packages (from huggingface-hub<1.0,>=0.34.0->transformers<5.0.0,>=4.51.3->vibevoice==1.0.0) (4.15.0)
Requirement already satisfied: hf-xet<2.0.0,>=1.1.3 in /usr/local/lib/python3.12/dist-packages (from huggingface-hub<1.0,>=0.34.0->transformers<5.0.0,>=4.51.3->vibevoice==1.0.0) (1.2.0)
Requirement already satisfied: psutil in /usr/local/lib/python3.12/dist-packages (from accelerate->vibevoice==1.0.0) (7.2.1)
Requirement already satisfied: setuptools in /usr/local/lib/python3.12/dist-packages (from torch->vibevoice==1.0.0) (80.10.1)
Requirement already satisfied: sympy>=1.13.3 in /usr/local/lib/python3.12/dist-packages (from torch->vibevoice==1.0.0) (1.14.0)
Requirement already satisfied: networkx>=2.5.1 in /usr/local/lib/python3.12/dist-packages (from torch->vibevoice==1.0.0) (3.6.1)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.12/dist-packages (from torch->vibevoice==1.0.0) (3.1.6)
Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.9.86 in /usr/local/lib/python3.12/dist-packages (from torch->vibevoice==1.0.0) (12.9.86)
Requirement already satisfied: nvidia-cuda-runtime-cu12==12.9.79 in /usr/local/lib/python3.12/dist-packages (from torch->vibevoice==1.0.0) (12.9.79)
Requirement already satisfied: nvidia-cuda-cupti-cu12==12.9.79 in /usr/local/lib/python3.12/dist-packages (from torch->vibevoice==1.0.0) (12.9.79)
Requirement already satisfied: nvidia-cudnn-cu12==9.10.2.21 in /usr/local/lib/python3.12/dist-packages (from torch->vibevoice==1.0.0) (9.10.2.21)
Requirement already satisfied: nvidia-cublas-cu12==12.9.1.4 in /usr/local/lib/python3.12/dist-packages (from torch->vibevoice==1.0.0) (12.9.1.4)
Requirement already satisfied: nvidia-cufft-cu12==11.4.1.4 in /usr/local/lib/python3.12/dist-packages (from torch->vibevoice==1.0.0) (11.4.1.4)
Requirement already satisfied: nvidia-curand-cu12==10.3.10.19 in /usr/local/lib/python3.12/dist-packages (from torch->vibevoice==1.0.0) (10.3.10.19)
Requirement already satisfied: nvidia-cusolver-cu12==11.7.5.82 in /usr/local/lib/python3.12/dist-packages (from torch->vibevoice==1.0.0) (11.7.5.82)
Requirement already satisfied: nvidia-cusparse-cu12==12.5.10.65 in /usr/local/lib/python3.12/dist-packages (from torch->vibevoice==1.0.0) (12.5.10.65)
Requirement already satisfied: nvidia-cusparselt-cu12==0.7.1 in /usr/local/lib/python3.12/dist-packages (from torch->vibevoice==1.0.0) (0.7.1)
Requirement already satisfied: nvidia-nccl-cu12==2.27.5 in /usr/local/lib/python3.12/dist-packages (from torch->vibevoice==1.0.0) (2.27.5)
Requirement already satisfied: nvidia-nvshmem-cu12==3.3.20 in /usr/local/lib/python3.12/dist-packages (from torch->vibevoice==1.0.0) (3.3.20)
Requirement already satisfied: nvidia-nvtx-cu12==12.9.79 in /usr/local/lib/python3.12/dist-packages (from torch->vibevoice==1.0.0) (12.9.79)
Requirement already satisfied: nvidia-nvjitlink-cu12==12.9.86 in /usr/local/lib/python3.12/dist-packages (from torch->vibevoice==1.0.0) (12.9.86)
Requirement already satisfied: nvidia-cufile-cu12==1.14.1.1 in /usr/local/lib/python3.12/dist-packages (from torch->vibevoice==1.0.0) (1.14.1.1)
Requirement already satisfied: triton==3.5.1 in /usr/local/lib/python3.12/dist-packages (from torch->vibevoice==1.0.0) (3.5.1)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/lib/python3.12/dist-packages (from sympy>=1.13.3->torch->vibevoice==1.0.0) (1.3.0)
Requirement already satisfied: aioice<1.0.0,>=0.10.1 in /usr/local/lib/python3.12/dist-packages (from aiortc->vibevoice==1.0.0) (0.10.2)
Requirement already satisfied: cryptography>=44.0.0 in /usr/local/lib/python3.12/dist-packages (from aiortc->vibevoice==1.0.0) (46.0.3)
Requirement already satisfied: google-crc32c>=1.1 in /usr/local/lib/python3.12/dist-packages (from aiortc->vibevoice==1.0.0) (1.8.0)
Requirement already satisfied: pyee>=13.0.0 in /usr/local/lib/python3.12/dist-packages (from aiortc->vibevoice==1.0.0) (13.0.0)
Requirement already satisfied: pylibsrtp>=0.10.0 in /usr/local/lib/python3.12/dist-packages (from aiortc->vibevoice==1.0.0) (1.0.0)
Requirement already satisfied: pyopenssl>=25.0.0 in /usr/local/lib/python3.12/dist-packages (from aiortc->vibevoice==1.0.0) (25.3.0)
Requirement already satisfied: dnspython>=2.0.0 in /usr/local/lib/python3.12/dist-packages (from aioice<1.0.0,>=0.10.1->aiortc->vibevoice==1.0.0) (2.8.0)
Requirement already satisfied: ifaddr>=0.2.0 in /usr/local/lib/python3.12/dist-packages (from aioice<1.0.0,>=0.10.1->aiortc->vibevoice==1.0.0) (0.2.0)
Requirement already satisfied: cffi>=2.0.0 in /usr/local/lib/python3.12/dist-packages (from cryptography>=44.0.0->aiortc->vibevoice==1.0.0) (2.0.0)
Requirement already satisfied: pycparser in /usr/local/lib/python3.12/dist-packages (from cffi>=2.0.0->cryptography>=44.0.0->aiortc->vibevoice==1.0.0) (3.0)
Requirement already satisfied: importlib_metadata in /usr/lib/python3/dist-packages (from diffusers->vibevoice==1.0.0) (4.6.4)
Requirement already satisfied: httpx<1.0.0 in /usr/local/lib/python3.12/dist-packages (from diffusers->vibevoice==1.0.0) (0.28.1)
Requirement already satisfied: Pillow in /usr/local/lib/python3.12/dist-packages (from diffusers->vibevoice==1.0.0) (12.1.0)
Requirement already satisfied: anyio in /usr/local/lib/python3.12/dist-packages (from httpx<1.0.0->diffusers->vibevoice==1.0.0) (4.12.1)
Requirement already satisfied: certifi in /usr/local/lib/python3.12/dist-packages (from httpx<1.0.0->diffusers->vibevoice==1.0.0) (2026.1.4)
Requirement already satisfied: httpcore==1.* in /usr/local/lib/python3.12/dist-packages (from httpx<1.0.0->diffusers->vibevoice==1.0.0) (1.0.9)
Requirement already satisfied: idna in /usr/local/lib/python3.12/dist-packages (from httpx<1.0.0->diffusers->vibevoice==1.0.0) (3.11)
Requirement already satisfied: h11>=0.16 in /usr/local/lib/python3.12/dist-packages (from httpcore==1.*->httpx<1.0.0->diffusers->vibevoice==1.0.0) (0.16.0)
Requirement already satisfied: starlette<0.51.0,>=0.40.0 in /usr/local/lib/python3.12/dist-packages (from fastapi->vibevoice==1.0.0) (0.50.0)
Requirement already satisfied: pydantic>=2.7.0 in /usr/local/lib/python3.12/dist-packages (from fastapi->vibevoice==1.0.0) (2.12.5)
Requirement already satisfied: annotated-doc>=0.0.2 in /usr/local/lib/python3.12/dist-packages (from fastapi->vibevoice==1.0.0) (0.0.4)
Requirement already satisfied: annotated-types>=0.6.0 in /usr/local/lib/python3.12/dist-packages (from pydantic>=2.7.0->fastapi->vibevoice==1.0.0) (0.7.0)
Requirement already satisfied: pydantic-core==2.41.5 in /usr/local/lib/python3.12/dist-packages (from pydantic>=2.7.0->fastapi->vibevoice==1.0.0) (2.41.5)
Requirement already satisfied: typing-inspection>=0.4.2 in /usr/local/lib/python3.12/dist-packages (from pydantic>=2.7.0->fastapi->vibevoice==1.0.0) (0.4.2)
Requirement already satisfied: aiofiles<25.0,>=22.0 in /usr/local/lib/python3.12/dist-packages (from gradio->vibevoice==1.0.0) (24.1.0)
Requirement already satisfied: brotli>=1.1.0 in /usr/local/lib/python3.12/dist-packages (from gradio->vibevoice==1.0.0) (1.2.0)
Requirement already satisfied: ffmpy in /usr/local/lib/python3.12/dist-packages (from gradio->vibevoice==1.0.0) (1.0.0)
Requirement already satisfied: gradio-client==2.0.3 in /usr/local/lib/python3.12/dist-packages (from gradio->vibevoice==1.0.0) (2.0.3)
Requirement already satisfied: groovy~=0.1 in /usr/local/lib/python3.12/dist-packages (from gradio->vibevoice==1.0.0) (0.1.2)
Requirement already satisfied: markupsafe<4.0,>=2.0 in /usr/local/lib/python3.12/dist-packages (from gradio->vibevoice==1.0.0) (3.0.3)
Requirement already satisfied: orjson~=3.0 in /usr/local/lib/python3.12/dist-packages (from gradio->vibevoice==1.0.0) (3.11.7)
Requirement already satisfied: pandas<4.0,>=1.0 in /usr/local/lib/python3.12/dist-packages (from gradio->vibevoice==1.0.0) (3.0.0)
Requirement already satisfied: python-multipart>=0.0.18 in /usr/local/lib/python3.12/dist-packages (from gradio->vibevoice==1.0.0) (0.0.21)
Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.12/dist-packages (from gradio->vibevoice==1.0.0) (2025.2)
Requirement already satisfied: safehttpx<0.2.0,>=0.1.7 in /usr/local/lib/python3.12/dist-packages (from gradio->vibevoice==1.0.0) (0.1.7)
Requirement already satisfied: semantic-version~=2.0 in /usr/local/lib/python3.12/dist-packages (from gradio->vibevoice==1.0.0) (2.10.0)
Requirement already satisfied: tomlkit<0.14.0,>=0.12.0 in /usr/local/lib/python3.12/dist-packages (from gradio->vibevoice==1.0.0) (0.13.3)
Requirement already satisfied: typer<1.0,>=0.12 in /usr/local/lib/python3.12/dist-packages (from gradio->vibevoice==1.0.0) (0.21.1)
Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.12/dist-packages (from pandas<4.0,>=1.0->gradio->vibevoice==1.0.0) (2.9.0.post0)
Requirement already satisfied: click>=8.0.0 in /usr/local/lib/python3.12/dist-packages (from typer<1.0,>=0.12->gradio->vibevoice==1.0.0) (8.3.1)
Requirement already satisfied: shellingham>=1.3.0 in /usr/local/lib/python3.12/dist-packages (from typer<1.0,>=0.12->gradio->vibevoice==1.0.0) (1.5.4)
Requirement already satisfied: rich>=10.11.0 in /usr/local/lib/python3.12/dist-packages (from typer<1.0,>=0.12->gradio->vibevoice==1.0.0) (14.2.0)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.12/dist-packages (from python-dateutil>=2.8.2->pandas<4.0,>=1.0->gradio->vibevoice==1.0.0) (1.17.0)
Requirement already satisfied: markdown-it-py>=2.2.0 in /usr/local/lib/python3.12/dist-packages (from rich>=10.11.0->typer<1.0,>=0.12->gradio->vibevoice==1.0.0) (4.0.0)
Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /usr/local/lib/python3.12/dist-packages (from rich>=10.11.0->typer<1.0,>=0.12->gradio->vibevoice==1.0.0) (2.19.2)
Requirement already satisfied: mdurl~=0.1 in /usr/local/lib/python3.12/dist-packages (from markdown-it-py>=2.2.0->rich>=10.11.0->typer<1.0,>=0.12->gradio->vibevoice==1.0.0) (0.1.2)
Requirement already satisfied: audioread>=2.1.9 in /usr/local/lib/python3.12/dist-packages (from librosa->vibevoice==1.0.0) (3.1.0)
Requirement already satisfied: scikit-learn>=1.1.0 in /usr/local/lib/python3.12/dist-packages (from librosa->vibevoice==1.0.0) (1.8.0)
Requirement already satisfied: joblib>=1.0 in /usr/local/lib/python3.12/dist-packages (from librosa->vibevoice==1.0.0) (1.5.3)
Requirement already satisfied: decorator>=4.3.0 in /usr/local/lib/python3.12/dist-packages (from librosa->vibevoice==1.0.0) (5.2.1)
Requirement already satisfied: soundfile>=0.12.1 in /usr/local/lib/python3.12/dist-packages (from librosa->vibevoice==1.0.0) (0.13.1)
Requirement already satisfied: pooch>=1.1 in /usr/local/lib/python3.12/dist-packages (from librosa->vibevoice==1.0.0) (1.9.0)
Requirement already satisfied: soxr>=0.3.2 in /usr/local/lib/python3.12/dist-packages (from librosa->vibevoice==1.0.0) (1.0.0)
Requirement already satisfied: lazy_loader>=0.1 in /usr/local/lib/python3.12/dist-packages (from librosa->vibevoice==1.0.0) (0.4)
Requirement already satisfied: msgpack>=1.0 in /usr/local/lib/python3.12/dist-packages (from librosa->vibevoice==1.0.0) (1.1.2)
Requirement already satisfied: platformdirs>=2.5.0 in /usr/local/lib/python3.12/dist-packages (from pooch>=1.1->librosa->vibevoice==1.0.0) (4.5.1)
Requirement already satisfied: charset_normalizer<4,>=2 in /usr/local/lib/python3.12/dist-packages (from requests->vibevoice==1.0.0) (3.4.4)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.12/dist-packages (from requests->vibevoice==1.0.0) (2.6.3)
Requirement already satisfied: threadpoolctl>=3.2.0 in /usr/local/lib/python3.12/dist-packages (from scikit-learn>=1.1.0->librosa->vibevoice==1.0.0) (3.6.0)
Requirement already satisfied: httptools>=0.6.3 in /usr/local/lib/python3.12/dist-packages (from uvicorn[standard]->vibevoice==1.0.0) (0.7.1)
Requirement already satisfied: python-dotenv>=0.13 in /usr/local/lib/python3.12/dist-packages (from uvicorn[standard]->vibevoice==1.0.0) (1.2.1)
Requirement already satisfied: uvloop>=0.15.1 in /usr/local/lib/python3.12/dist-packages (from uvicorn[standard]->vibevoice==1.0.0) (0.22.1)
Requirement already satisfied: watchfiles>=0.13 in /usr/local/lib/python3.12/dist-packages (from uvicorn[standard]->vibevoice==1.0.0) (1.1.1)
Requirement already satisfied: websockets>=10.4 in /usr/local/lib/python3.12/dist-packages (from uvicorn[standard]->vibevoice==1.0.0) (16.0)
Building wheels for collected packages: vibevoice
Building editable for vibevoice (pyproject.toml) ... done
Created wheel for vibevoice: filename=vibevoice-1.0.0-0.editable-py3-none-any.whl size=8169 sha256=05af9cd4d66b15176e5d2d4e4358dc446d980866b7134d2224a4c8cee4c0ced3
Stored in directory: /tmp/pip-ephem-wheel-cache-vrew19un/wheels/54/1b/b7/aa63e25c8f14f4f2ae7b04e6097bdecb770e455c5c1ee0a600
Successfully built vibevoice
Installing collected packages: vibevoice
Attempting uninstall: vibevoice
Found existing installation: vibevoice 1.0.0
Uninstalling vibevoice-1.0.0:
Successfully uninstalled vibevoice-1.0.0
Successfully installed vibevoice-1.0.0
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning.
[notice] A new release of pip is available: 25.3 -> 26.0.1
[notice] To update, run: python3 -m pip install --upgrade pip
args.model='microsoft/VibeVoice-ASR'
============================================================
Downloading model: microsoft/VibeVoice-ASR
============================================================
Fetching 17 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████| 17/17 [00:00<00:00, 149796.57it/s]
============================================================
✅ Model downloaded successfully!
📁 Path: /root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944
============================================================
============================================================
Generating tokenizer files
============================================================
=== Generating VibeVoice tokenizer files to /root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944 ===
Downloading vocab.json from Qwen/Qwen2.5-7B...
/usr/local/lib/python3.12/dist-packages/huggingface_hub/file_download.py:979: UserWarning: `local_dir_use_symlinks` parameter is deprecated and will be ignored. The process to download files to a local folder has been updated and do not rely on symlinks anymore. You only need to pass a destination folder as`local_dir`.
For more details, check out https://huggingface.co/docs/huggingface_hub/main/en/guides/download#download-files-to-local-folder.
warnings.warn(
Downloading merges.txt from Qwen/Qwen2.5-7B...
Downloading tokenizer.json from Qwen/Qwen2.5-7B...
Downloading tokenizer_config.json from Qwen/Qwen2.5-7B...
Patched /root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944/tokenizer_config.json
Patched /root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944/tokenizer.json
Generated /root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944/added_tokens.json
Generated /root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944/special_tokens_map.json
✅ All 6 tokenizer files generated in /root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944
============================================================
Starting vLLM server on port 8000
============================================================
(APIServer pid=3197) INFO 02-05 19:01:58 [api_server.py:1272] vLLM API server version 0.14.1
(APIServer pid=3197) INFO 02-05 19:01:58 [utils.py:263] non-default args: {'model_tag': '/root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944', 'chat_template_content_format': 'openai', 'model': '/root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944', 'dtype': 'bfloat16', 'allowed_local_media_path': '/app', 'max_model_len': 65536, 'enforce_eager': True, 'served_model_name': ['vibevoice'], 'tensor_parallel_size': 4, 'gpu_memory_utilization': 0.8, 'enable_prefix_caching': False, 'max_num_batched_tokens': 32768, 'max_num_seqs': 64, 'enable_chunked_prefill': True}
(APIServer pid=3197) INFO 02-05 19:01:58 [model.py:530] Resolved architecture: VibeVoiceForASRTraining
(APIServer pid=3197) INFO 02-05 19:01:58 [model.py:1866] Downcasting torch.float32 to torch.bfloat16.
(APIServer pid=3197) INFO 02-05 19:01:58 [model.py:1545] Using max model len 65536
(APIServer pid=3197) INFO 02-05 19:01:59 [scheduler.py:229] Chunked prefill is enabled with max_num_batched_tokens=32768.
(APIServer pid=3197) INFO 02-05 19:01:59 [vllm.py:630] Asynchronous scheduling is enabled.
(APIServer pid=3197) INFO 02-05 19:01:59 [vllm.py:637] Disabling NCCL for DP synchronization when using async scheduling.
(APIServer pid=3197) WARNING 02-05 19:01:59 [vllm.py:665] Enforce eager set, overriding optimization level to -O0
(APIServer pid=3197) INFO 02-05 19:01:59 [vllm.py:765] Cudagraph is disabled under eager mode
(APIServer pid=3197) The tokenizer you are loading from '/root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e. This will lead to incorrect tokenization. You should set the `fix_mistral_regex=True` flag when loading this tokenizer to fix this issue.
(EngineCore_DP0 pid=3921) INFO 02-05 19:02:09 [core.py:97] Initializing a V1 LLM engine (v0.14.1) with config: model='/root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944', speculative_config=None, tokenizer='/root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=65536, download_dir=None, load_format=auto, tensor_parallel_size=4, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=True, enable_return_routed_experts=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser='', reasoning_parser_plugin='', enable_in_reasoning=False), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, kv_cache_metrics=False, kv_cache_metrics_sample=0.01, cudagraph_metrics=False, enable_layerwise_nvtx_tracing=False, enable_mfu_metrics=False, enable_mm_processor_stats=False, enable_logging_iteration_details=False), seed=0, served_model_name=vibevoice, enable_prefix_caching=False, enable_chunked_prefill=True, pooler_config=None, compilation_config={'level': None, 'mode': <CompilationMode.NONE: 0>, 'debug_dump_path': None, 'cache_dir': '', 'compile_cache_save_format': 'binary', 'backend': 'inductor', 'custom_ops': ['all'], 'splitting_ops': [], 'compile_mm_encoder': False, 'compile_sizes': [], 'compile_ranges_split_points': [32768], 'inductor_compile_config': {'enable_auto_functionalized_v2': False, 'combo_kernels': True, 'benchmark_combo_kernel': True}, 'inductor_passes': {}, 'cudagraph_mode': <CUDAGraphMode.NONE: 0>, 'cudagraph_num_of_warmups': 0, 'cudagraph_capture_sizes': [], 'cudagraph_copy_inputs': False, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': False, 'pass_config': {'fuse_norm_quant': False, 'fuse_act_quant': False, 'fuse_attn_quant': False, 'eliminate_noops': False, 'enable_sp': False, 'fuse_gemm_comms': False, 'fuse_allreduce_rms': False}, 'max_cudagraph_capture_size': 0, 'dynamic_shapes_config': {'type': <DynamicShapesType.BACKED: 'backed'>, 'evaluate_guards': False, 'assume_32_bit_indexing': True}, 'local_cache_dir': None}
(EngineCore_DP0 pid=3921) WARNING 02-05 19:02:09 [multiproc_executor.py:880] Reducing Torch parallelism from 56 threads to 1 to avoid unnecessary CPU contention. Set OMP_NUM_THREADS in the external environment to tune this value as needed.
The tokenizer you are loading from '/root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e. This will lead to incorrect tokenization. You should set the `fix_mistral_regex=True` flag when loading this tokenizer to fix this issue.
INFO 02-05 19:02:19 [parallel_state.py:1214] world_size=4 rank=3 local_rank=3 distributed_init_method=tcp://127.0.0.1:33473 backend=nccl
The tokenizer you are loading from '/root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e. This will lead to incorrect tokenization. You should set the `fix_mistral_regex=True` flag when loading this tokenizer to fix this issue.
The tokenizer you are loading from '/root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e. This will lead to incorrect tokenization. You should set the `fix_mistral_regex=True` flag when loading this tokenizer to fix this issue.
INFO 02-05 19:02:20 [parallel_state.py:1214] world_size=4 rank=1 local_rank=1 distributed_init_method=tcp://127.0.0.1:33473 backend=nccl
INFO 02-05 19:02:20 [parallel_state.py:1214] world_size=4 rank=2 local_rank=2 distributed_init_method=tcp://127.0.0.1:33473 backend=nccl
The tokenizer you are loading from '/root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e. This will lead to incorrect tokenization. You should set the `fix_mistral_regex=True` flag when loading this tokenizer to fix this issue.
INFO 02-05 19:02:21 [parallel_state.py:1214] world_size=4 rank=0 local_rank=0 distributed_init_method=tcp://127.0.0.1:33473 backend=nccl
INFO 02-05 19:02:45 [pynccl.py:111] vLLM is using nccl==2.27.5
WARNING 02-05 19:02:45 [symm_mem.py:67] SymmMemCommunicator: Device capability 8.9 not supported, communicator is not available.
WARNING 02-05 19:02:45 [symm_mem.py:67] SymmMemCommunicator: Device capability 8.9 not supported, communicator is not available.
WARNING 02-05 19:02:45 [symm_mem.py:67] SymmMemCommunicator: Device capability 8.9 not supported, communicator is not available.
WARNING 02-05 19:02:45 [symm_mem.py:67] SymmMemCommunicator: Device capability 8.9 not supported, communicator is not available.
WARNING 02-05 19:02:45 [custom_all_reduce.py:154] Custom allreduce is disabled because it's not supported on more than two PCIe-only GPUs. To silence this warning, specify disable_custom_all_reduce=True explicitly.
WARNING 02-05 19:02:45 [custom_all_reduce.py:154] Custom allreduce is disabled because it's not supported on more than two PCIe-only GPUs. To silence this warning, specify disable_custom_all_reduce=True explicitly.
WARNING 02-05 19:02:45 [custom_all_reduce.py:154] Custom allreduce is disabled because it's not supported on more than two PCIe-only GPUs. To silence this warning, specify disable_custom_all_reduce=True explicitly.
WARNING 02-05 19:02:45 [custom_all_reduce.py:154] Custom allreduce is disabled because it's not supported on more than two PCIe-only GPUs. To silence this warning, specify disable_custom_all_reduce=True explicitly.
INFO 02-05 19:02:45 [parallel_state.py:1425] rank 0 in world size 4 is assigned as DP rank 0, PP rank 0, PCP rank 0, TP rank 0, EP rank N/A
INFO 02-05 19:02:45 [parallel_state.py:1425] rank 3 in world size 4 is assigned as DP rank 0, PP rank 0, PCP rank 0, TP rank 3, EP rank N/A
INFO 02-05 19:02:45 [parallel_state.py:1425] rank 2 in world size 4 is assigned as DP rank 0, PP rank 0, PCP rank 0, TP rank 2, EP rank N/A
INFO 02-05 19:02:45 [parallel_state.py:1425] rank 1 in world size 4 is assigned as DP rank 0, PP rank 0, PCP rank 0, TP rank 1, EP rank N/A
/usr/local/lib/python3.12/dist-packages/transformers/audio_utils.py:525: UserWarning: At least one mel filter has all zero values. The value for `num_mel_filters` (128) may be set too high. Or, the value for `num_frequency_bins` (201) may be set too low.
warnings.warn(
/usr/local/lib/python3.12/dist-packages/transformers/audio_utils.py:525: UserWarning: At least one mel filter has all zero values. The value for `num_mel_filters` (128) may be set too high. Or, the value for `num_frequency_bins` (201) may be set too low.
warnings.warn(
/usr/local/lib/python3.12/dist-packages/transformers/audio_utils.py:525: UserWarning: At least one mel filter has all zero values. The value for `num_mel_filters` (128) may be set too high. Or, the value for `num_frequency_bins` (201) may be set too low.
warnings.warn(
(Worker_TP0 pid=4166) INFO 02-05 19:02:46 [gpu_model_runner.py:3808] Starting to load model /root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944...
/usr/local/lib/python3.12/dist-packages/transformers/audio_utils.py:525: UserWarning: At least one mel filter has all zero values. The value for `num_mel_filters` (128) may be set too high. Or, the value for `num_frequency_bins` (201) may be set too low.
warnings.warn(
(Worker_TP3 pid=4169) `torch_dtype` is deprecated! Use `dtype` instead!
(Worker_TP3 pid=4169) INFO 02-05 19:02:46 [vllm.py:630] Asynchronous scheduling is enabled.
(Worker_TP3 pid=4169) WARNING 02-05 19:02:46 [vllm.py:672] Inductor compilation was disabled by user settings, optimizations settings that are only active during inductor compilation will be ignored.
(Worker_TP3 pid=4169) INFO 02-05 19:02:46 [vllm.py:765] Cudagraph is disabled under eager mode
(Worker_TP1 pid=4167) `torch_dtype` is deprecated! Use `dtype` instead!
(Worker_TP1 pid=4167) INFO 02-05 19:02:46 [vllm.py:630] Asynchronous scheduling is enabled.
(Worker_TP1 pid=4167) WARNING 02-05 19:02:46 [vllm.py:672] Inductor compilation was disabled by user settings, optimizations settings that are only active during inductor compilation will be ignored.
(Worker_TP1 pid=4167) INFO 02-05 19:02:46 [vllm.py:765] Cudagraph is disabled under eager mode
(Worker_TP0 pid=4166) `torch_dtype` is deprecated! Use `dtype` instead!
(Worker_TP0 pid=4166) INFO 02-05 19:02:46 [vllm.py:630] Asynchronous scheduling is enabled.
(Worker_TP0 pid=4166) WARNING 02-05 19:02:46 [vllm.py:672] Inductor compilation was disabled by user settings, optimizations settings that are only active during inductor compilation will be ignored.
(Worker_TP0 pid=4166) INFO 02-05 19:02:46 [vllm.py:765] Cudagraph is disabled under eager mode
(Worker_TP2 pid=4168) `torch_dtype` is deprecated! Use `dtype` instead!
(Worker_TP2 pid=4168) INFO 02-05 19:02:47 [vllm.py:630] Asynchronous scheduling is enabled.
(Worker_TP2 pid=4168) WARNING 02-05 19:02:47 [vllm.py:672] Inductor compilation was disabled by user settings, optimizations settings that are only active during inductor compilation will be ignored.
(Worker_TP2 pid=4168) INFO 02-05 19:02:47 [vllm.py:765] Cudagraph is disabled under eager mode
(Worker_TP0 pid=4166) INFO 02-05 19:02:47 [cuda.py:351] Using FLASH_ATTN attention backend out of potential backends: ('FLASH_ATTN', 'FLASHINFER', 'TRITON_ATTN', 'FLEX_ATTENTION')
(Worker_TP3 pid=4169) [VibeVoice] Converted acoustic_tokenizer to torch.float32 (was torch.bfloat16)
(Worker_TP3 pid=4169) [VibeVoice] Converted semantic_tokenizer to torch.float32 (was torch.bfloat16)
(Worker_TP3 pid=4169) [VibeVoice] Converted acoustic_connector to torch.float32 (was torch.bfloat16)
(Worker_TP3 pid=4169) [VibeVoice] Converted semantic_connector to torch.float32 (was torch.bfloat16)
(Worker_TP0 pid=4166) [VibeVoice] Converted acoustic_tokenizer to torch.float32 (was torch.bfloat16)
(Worker_TP2 pid=4168) [VibeVoice] Converted acoustic_tokenizer to torch.float32 (was torch.bfloat16)
(Worker_TP0 pid=4166) [VibeVoice] Converted semantic_tokenizer to torch.float32 (was torch.bfloat16)
(Worker_TP0 pid=4166) [VibeVoice] Converted acoustic_connector to torch.float32 (was torch.bfloat16)
(Worker_TP0 pid=4166) [VibeVoice] Converted semantic_connector to torch.float32 (was torch.bfloat16)
(Worker_TP1 pid=4167) [VibeVoice] Converted acoustic_tokenizer to torch.float32 (was torch.bfloat16)
(Worker_TP2 pid=4168) [VibeVoice] Converted semantic_tokenizer to torch.float32 (was torch.bfloat16)
Loading safetensors checkpoint shards: 0% Completed | 0/8 [00:00<?, ?it/s]
(Worker_TP2 pid=4168) [VibeVoice] Converted acoustic_connector to torch.float32 (was torch.bfloat16)
(Worker_TP2 pid=4168) [VibeVoice] Converted semantic_connector to torch.float32 (was torch.bfloat16)
(Worker_TP1 pid=4167) [VibeVoice] Converted semantic_tokenizer to torch.float32 (was torch.bfloat16)
(Worker_TP1 pid=4167) [VibeVoice] Converted acoustic_connector to torch.float32 (was torch.bfloat16)
(Worker_TP1 pid=4167) [VibeVoice] Converted semantic_connector to torch.float32 (was torch.bfloat16)
Loading safetensors checkpoint shards: 12% Completed | 1/8 [00:00<00:01, 4.22it/s]
Loading safetensors checkpoint shards: 25% Completed | 2/8 [00:01<00:06, 1.10s/it]
Loading safetensors checkpoint shards: 38% Completed | 3/8 [00:02<00:03, 1.41it/s]
Loading safetensors checkpoint shards: 50% Completed | 4/8 [00:03<00:03, 1.25it/s]
Loading safetensors checkpoint shards: 75% Completed | 6/8 [00:03<00:00, 2.19it/s]
Loading safetensors checkpoint shards: 88% Completed | 7/8 [00:03<00:00, 2.66it/s]
Loading safetensors checkpoint shards: 100% Completed | 8/8 [00:03<00:00, 2.96it/s]
Loading safetensors checkpoint shards: 100% Completed | 8/8 [00:03<00:00, 2.09it/s]
(Worker_TP0 pid=4166)
(Worker_TP0 pid=4166) INFO 02-05 19:02:51 [default_loader.py:291] Loading weights took 3.85 seconds
(Worker_TP0 pid=4166) INFO 02-05 19:02:52 [gpu_model_runner.py:3905] Model loading took 7.53 GiB memory and 4.367733 seconds
(Worker_TP1 pid=4167) INFO 02-05 19:02:52 [gpu_model_runner.py:4715] Encoder cache will be initialized with a budget of 32768 tokens, and profiled with 145 audio items of the maximum feature size.
(Worker_TP0 pid=4166) INFO 02-05 19:02:52 [gpu_model_runner.py:4715] Encoder cache will be initialized with a budget of 32768 tokens, and profiled with 145 audio items of the maximum feature size.
(Worker_TP2 pid=4168) INFO 02-05 19:02:52 [gpu_model_runner.py:4715] Encoder cache will be initialized with a budget of 32768 tokens, and profiled with 145 audio items of the maximum feature size.
(Worker_TP3 pid=4169) INFO 02-05 19:02:52 [gpu_model_runner.py:4715] Encoder cache will be initialized with a budget of 32768 tokens, and profiled with 145 audio items of the maximum feature size.
(Worker_TP0 pid=4166) INFO 02-05 19:03:22 [gpu_worker.py:358] Available KV cache memory: 8.88 GiB
(EngineCore_DP0 pid=3921) INFO 02-05 19:03:23 [kv_cache_utils.py:1305] GPU KV cache size: 665,344 tokens
(EngineCore_DP0 pid=3921) INFO 02-05 19:03:23 [kv_cache_utils.py:1310] Maximum concurrency for 65,536 tokens per request: 10.15x
(EngineCore_DP0 pid=3921) INFO 02-05 19:03:23 [core.py:273] init engine (profile, create kv cache, warmup model) took 30.92 seconds
(EngineCore_DP0 pid=3921) The tokenizer you are loading from '/root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e. This will lead to incorrect tokenization. You should set the `fix_mistral_regex=True` flag when loading this tokenizer to fix this issue.
(EngineCore_DP0 pid=3921) /usr/local/lib/python3.12/dist-packages/transformers/audio_utils.py:525: UserWarning: At least one mel filter has all zero values. The value for `num_mel_filters` (128) may be set too high. Or, the value for `num_frequency_bins` (201) may be set too low.
(EngineCore_DP0 pid=3921) warnings.warn(
(EngineCore_DP0 pid=3921) INFO 02-05 19:03:24 [vllm.py:630] Asynchronous scheduling is enabled.
(EngineCore_DP0 pid=3921) WARNING 02-05 19:03:24 [vllm.py:672] Inductor compilation was disabled by user settings, optimizations settings that are only active during inductor compilation will be ignored.
(EngineCore_DP0 pid=3921) INFO 02-05 19:03:24 [vllm.py:765] Cudagraph is disabled under eager mode
(APIServer pid=3197) INFO 02-05 19:03:24 [api_server.py:1014] Supported tasks: ['generate', 'transcription']
(APIServer pid=3197) INFO 02-05 19:03:24 [serving_chat.py:182] Warming up chat template processing...
(APIServer pid=3197) The tokenizer you are loading from '/root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e. This will lead to incorrect tokenization. You should set the `fix_mistral_regex=True` flag when loading this tokenizer to fix this issue.
(APIServer pid=3197) INFO 02-05 19:03:24 [serving_chat.py:218] Chat template warmup completed in 295.2ms
(APIServer pid=3197) INFO 02-05 19:03:24 [speech_to_text.py:138] Warming up audio preprocessing libraries...
(APIServer pid=3197) The tokenizer you are loading from '/root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e. This will lead to incorrect tokenization. You should set the `fix_mistral_regex=True` flag when loading this tokenizer to fix this issue.
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177] Audio preprocessing warmup failed (non-fatal): %s. First request may experience higher latency.
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177] Traceback (most recent call last):
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177] File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/speech_to_text.py", line 152, in _warmup_audio_preprocessing
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177] processor = cached_processor_from_config(self.model_config)
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177] File "/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/processor.py", line 251, in cached_processor_from_config
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177] return cached_get_processor_without_dynamic_kwargs(
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177] File "/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/processor.py", line 210, in cached_get_processor_without_dynamic_kwargs
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177] processor = cached_get_processor(
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177] ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177] File "/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/processor.py", line 155, in get_processor
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177] raise TypeError(
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177] TypeError: Invalid type of HuggingFace processor. Expected type: <class 'transformers.processing_utils.ProcessorMixin'>, but found type: <class 'transformers.models.qwen2.tokenization_qwen2_fast.Qwen2TokenizerFast'>
(APIServer pid=3197) INFO 02-05 19:03:25 [speech_to_text.py:201] Warming up multimodal input processor...
(APIServer pid=3197) INFO 02-05 19:03:25 [speech_to_text.py:234] Input processor warmup completed in 0.00s
(APIServer pid=3197) INFO 02-05 19:03:25 [speech_to_text.py:138] Warming up audio preprocessing libraries...
(APIServer pid=3197) The tokenizer you are loading from '/root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e. This will lead to incorrect tokenization. You should set the `fix_mistral_regex=True` flag when loading this tokenizer to fix this issue.
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177] Audio preprocessing warmup failed (non-fatal): %s. First request may experience higher latency.
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177] Traceback (most recent call last):
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177] File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/speech_to_text.py", line 152, in _warmup_audio_preprocessing
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177] processor = cached_processor_from_config(self.model_config)
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177] File "/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/processor.py", line 251, in cached_processor_from_config
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177] return cached_get_processor_without_dynamic_kwargs(
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177] File "/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/processor.py", line 210, in cached_get_processor_without_dynamic_kwargs
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177] processor = cached_get_processor(
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177] ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177] File "/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/processor.py", line 155, in get_processor
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177] raise TypeError(
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177] TypeError: Invalid type of HuggingFace processor. Expected type: <class 'transformers.processing_utils.ProcessorMixin'>, but found type: <class 'transformers.models.qwen2.tokenization_qwen2_fast.Qwen2TokenizerFast'>
(APIServer pid=3197) INFO 02-05 19:03:25 [speech_to_text.py:201] Warming up multimodal input processor...
(APIServer pid=3197) INFO 02-05 19:03:25 [speech_to_text.py:234] Input processor warmup completed in 0.00s
(APIServer pid=3197) INFO 02-05 19:03:25 [api_server.py:1346] Starting vLLM API server 0 on http://0.0.0.0:8000
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:38] Available routes are:
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /openapi.json, Methods: HEAD, GET
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /docs, Methods: HEAD, GET
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /docs/oauth2-redirect, Methods: HEAD, GET
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /redoc, Methods: HEAD, GET
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /scale_elastic_ep, Methods: POST
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /is_scaling_elastic_ep, Methods: POST
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /tokenize, Methods: POST
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /detokenize, Methods: POST
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /inference/v1/generate, Methods: POST
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /pause, Methods: POST
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /resume, Methods: POST
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /is_paused, Methods: GET
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /metrics, Methods: GET
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /health, Methods: GET
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /load, Methods: GET
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /v1/models, Methods: GET
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /version, Methods: GET
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /v1/responses, Methods: POST
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /v1/responses/{response_id}, Methods: GET
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /v1/responses/{response_id}/cancel, Methods: POST
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /v1/messages, Methods: POST
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /v1/chat/completions, Methods: POST
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /v1/completions, Methods: POST
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /v1/audio/transcriptions, Methods: POST
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /v1/audio/translations, Methods: POST
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /ping, Methods: GET
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /ping, Methods: POST
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /invocations, Methods: POST
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /classify, Methods: POST
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /v1/embeddings, Methods: POST
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /score, Methods: POST
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /v1/score, Methods: POST
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /rerank, Methods: POST
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /v1/rerank, Methods: POST
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /v2/rerank, Methods: POST
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /pooling, Methods: POST
(APIServer pid=3197) INFO: Started server process [3197]
(APIServer pid=3197) INFO: Waiting for application startup.
(APIServer pid=3197) INFO: Application startup complete.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels