Skip to content

during audio preprocessing warmup: Invalid type of HuggingFace processor #231

@KnightLancelot

Description

@KnightLancelot

When starting the service via /app/vllm_plugin/scripts/start_server.py, the following error occurs during the startup process. Although it is marked as "non-fatal," it indicates a failure in the audio preprocessing warmup.

Error Summary: The system expects a ProcessorMixin for audio processing but instead receives a Qwen2TokenizerFast instance.

Stack Trace:

(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177] Audio preprocessing warmup failed (non-fatal): %s. First request may experience higher latency.
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177] Traceback (most recent call last):
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/speech_to_text.py", line 152, in _warmup_audio_preprocessing
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177]     processor = cached_processor_from_config(self.model_config)
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177]   File "/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/processor.py", line 251, in cached_processor_from_config
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177]     return cached_get_processor_without_dynamic_kwargs(
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177]   File "/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/processor.py", line 210, in cached_get_processor_without_dynamic_kwargs
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177]     processor = cached_get_processor(
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177]   File "/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/processor.py", line 155, in get_processor
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177]     raise TypeError(
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177] TypeError: Invalid type of HuggingFace processor. Expected type: <class 'transformers.processing_utils.ProcessorMixin'>, but found type: <class 'transformers.models.qwen2.tokenization_qwen2_fast.Qwen2TokenizerFast'>

May I ask how this problem can be solved?
Thanks.

Workaround & Versioning:

Docker image Working Version: vllm/vllm-openai:v0.14.1

The complete log of starting the service is as follows:

============================================================
  VibeVoice vLLM ASR Server - One-Click Deployment
============================================================

============================================================
  Updating package list
============================================================

Hit:2 http://archive.ubuntu.com/ubuntu jammy InRelease                                                                                              
Hit:3 http://security.ubuntu.com/ubuntu jammy-security InRelease                                                              
Hit:4 http://archive.ubuntu.com/ubuntu jammy-updates InRelease                                                                
Hit:5 http://archive.ubuntu.com/ubuntu jammy-backports InRelease                           
Hit:6 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu jammy InRelease               
Hit:1 https://developer.download.nvidia.cn/compute/cuda/repos/ubuntu2204/x86_64  InRelease
Reading package lists... Done                  

============================================================
  Installing FFmpeg and audio libraries
============================================================

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
libsndfile1 is already the newest version (1.0.31-2ubuntu0.2).
ffmpeg is already the newest version (7:4.4.2-0ubuntu0.22.04.1).
0 upgraded, 0 newly installed, 0 to remove and 34 not upgraded.

============================================================
  Installing VibeVoice with vLLM support
============================================================

Obtaining file:///app
  Installing build dependencies ... done
  Checking if build backend supports build_editable ... done
  Getting requirements to build editable ... done
  Preparing editable metadata (pyproject.toml) ... done
Requirement already satisfied: torch in /usr/local/lib/python3.12/dist-packages (from vibevoice==1.0.0) (2.9.1+cu129)
Requirement already satisfied: transformers<5.0.0,>=4.51.3 in /usr/local/lib/python3.12/dist-packages (from vibevoice==1.0.0) (4.57.6)
Requirement already satisfied: accelerate in /usr/local/lib/python3.12/dist-packages (from vibevoice==1.0.0) (1.12.0)
Requirement already satisfied: llvmlite>=0.40.0 in /usr/local/lib/python3.12/dist-packages (from vibevoice==1.0.0) (0.44.0)
Requirement already satisfied: numba>=0.57.0 in /usr/local/lib/python3.12/dist-packages (from vibevoice==1.0.0) (0.61.2)
Requirement already satisfied: diffusers in /usr/local/lib/python3.12/dist-packages (from vibevoice==1.0.0) (0.36.0)
Requirement already satisfied: tqdm in /usr/local/lib/python3.12/dist-packages (from vibevoice==1.0.0) (4.67.1)
Requirement already satisfied: numpy in /usr/local/lib/python3.12/dist-packages (from vibevoice==1.0.0) (2.2.6)
Requirement already satisfied: scipy in /usr/local/lib/python3.12/dist-packages (from vibevoice==1.0.0) (1.17.0)
Requirement already satisfied: librosa in /usr/local/lib/python3.12/dist-packages (from vibevoice==1.0.0) (0.11.0)
Requirement already satisfied: ml-collections in /usr/local/lib/python3.12/dist-packages (from vibevoice==1.0.0) (1.1.0)
Requirement already satisfied: absl-py in /usr/local/lib/python3.12/dist-packages (from vibevoice==1.0.0) (2.4.0)
Requirement already satisfied: gradio in /usr/local/lib/python3.12/dist-packages (from vibevoice==1.0.0) (6.5.1)
Requirement already satisfied: av in /usr/local/lib/python3.12/dist-packages (from vibevoice==1.0.0) (16.1.0)
Requirement already satisfied: aiortc in /usr/local/lib/python3.12/dist-packages (from vibevoice==1.0.0) (1.14.0)
Requirement already satisfied: uvicorn[standard] in /usr/local/lib/python3.12/dist-packages (from vibevoice==1.0.0) (0.40.0)
Requirement already satisfied: fastapi in /usr/local/lib/python3.12/dist-packages (from vibevoice==1.0.0) (0.128.0)
Requirement already satisfied: pydub in /usr/local/lib/python3.12/dist-packages (from vibevoice==1.0.0) (0.25.1)
Requirement already satisfied: requests in /usr/local/lib/python3.12/dist-packages (from vibevoice==1.0.0) (2.32.5)
WARNING: vibevoice 1.0.0 does not provide the extra 'vllm'
Requirement already satisfied: filelock in /usr/local/lib/python3.12/dist-packages (from transformers<5.0.0,>=4.51.3->vibevoice==1.0.0) (3.20.3)
Requirement already satisfied: huggingface-hub<1.0,>=0.34.0 in /usr/local/lib/python3.12/dist-packages (from transformers<5.0.0,>=4.51.3->vibevoice==1.0.0) (0.36.0)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.12/dist-packages (from transformers<5.0.0,>=4.51.3->vibevoice==1.0.0) (26.0)
Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.12/dist-packages (from transformers<5.0.0,>=4.51.3->vibevoice==1.0.0) (6.0.3)
Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.12/dist-packages (from transformers<5.0.0,>=4.51.3->vibevoice==1.0.0) (2026.1.15)
Requirement already satisfied: tokenizers<=0.23.0,>=0.22.0 in /usr/local/lib/python3.12/dist-packages (from transformers<5.0.0,>=4.51.3->vibevoice==1.0.0) (0.22.2)
Requirement already satisfied: safetensors>=0.4.3 in /usr/local/lib/python3.12/dist-packages (from transformers<5.0.0,>=4.51.3->vibevoice==1.0.0) (0.7.0)
Requirement already satisfied: fsspec>=2023.5.0 in /usr/local/lib/python3.12/dist-packages (from huggingface-hub<1.0,>=0.34.0->transformers<5.0.0,>=4.51.3->vibevoice==1.0.0) (2026.1.0)
Requirement already satisfied: typing-extensions>=3.7.4.3 in /usr/local/lib/python3.12/dist-packages (from huggingface-hub<1.0,>=0.34.0->transformers<5.0.0,>=4.51.3->vibevoice==1.0.0) (4.15.0)
Requirement already satisfied: hf-xet<2.0.0,>=1.1.3 in /usr/local/lib/python3.12/dist-packages (from huggingface-hub<1.0,>=0.34.0->transformers<5.0.0,>=4.51.3->vibevoice==1.0.0) (1.2.0)
Requirement already satisfied: psutil in /usr/local/lib/python3.12/dist-packages (from accelerate->vibevoice==1.0.0) (7.2.1)
Requirement already satisfied: setuptools in /usr/local/lib/python3.12/dist-packages (from torch->vibevoice==1.0.0) (80.10.1)
Requirement already satisfied: sympy>=1.13.3 in /usr/local/lib/python3.12/dist-packages (from torch->vibevoice==1.0.0) (1.14.0)
Requirement already satisfied: networkx>=2.5.1 in /usr/local/lib/python3.12/dist-packages (from torch->vibevoice==1.0.0) (3.6.1)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.12/dist-packages (from torch->vibevoice==1.0.0) (3.1.6)
Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.9.86 in /usr/local/lib/python3.12/dist-packages (from torch->vibevoice==1.0.0) (12.9.86)
Requirement already satisfied: nvidia-cuda-runtime-cu12==12.9.79 in /usr/local/lib/python3.12/dist-packages (from torch->vibevoice==1.0.0) (12.9.79)
Requirement already satisfied: nvidia-cuda-cupti-cu12==12.9.79 in /usr/local/lib/python3.12/dist-packages (from torch->vibevoice==1.0.0) (12.9.79)
Requirement already satisfied: nvidia-cudnn-cu12==9.10.2.21 in /usr/local/lib/python3.12/dist-packages (from torch->vibevoice==1.0.0) (9.10.2.21)
Requirement already satisfied: nvidia-cublas-cu12==12.9.1.4 in /usr/local/lib/python3.12/dist-packages (from torch->vibevoice==1.0.0) (12.9.1.4)
Requirement already satisfied: nvidia-cufft-cu12==11.4.1.4 in /usr/local/lib/python3.12/dist-packages (from torch->vibevoice==1.0.0) (11.4.1.4)
Requirement already satisfied: nvidia-curand-cu12==10.3.10.19 in /usr/local/lib/python3.12/dist-packages (from torch->vibevoice==1.0.0) (10.3.10.19)
Requirement already satisfied: nvidia-cusolver-cu12==11.7.5.82 in /usr/local/lib/python3.12/dist-packages (from torch->vibevoice==1.0.0) (11.7.5.82)
Requirement already satisfied: nvidia-cusparse-cu12==12.5.10.65 in /usr/local/lib/python3.12/dist-packages (from torch->vibevoice==1.0.0) (12.5.10.65)
Requirement already satisfied: nvidia-cusparselt-cu12==0.7.1 in /usr/local/lib/python3.12/dist-packages (from torch->vibevoice==1.0.0) (0.7.1)
Requirement already satisfied: nvidia-nccl-cu12==2.27.5 in /usr/local/lib/python3.12/dist-packages (from torch->vibevoice==1.0.0) (2.27.5)
Requirement already satisfied: nvidia-nvshmem-cu12==3.3.20 in /usr/local/lib/python3.12/dist-packages (from torch->vibevoice==1.0.0) (3.3.20)
Requirement already satisfied: nvidia-nvtx-cu12==12.9.79 in /usr/local/lib/python3.12/dist-packages (from torch->vibevoice==1.0.0) (12.9.79)
Requirement already satisfied: nvidia-nvjitlink-cu12==12.9.86 in /usr/local/lib/python3.12/dist-packages (from torch->vibevoice==1.0.0) (12.9.86)
Requirement already satisfied: nvidia-cufile-cu12==1.14.1.1 in /usr/local/lib/python3.12/dist-packages (from torch->vibevoice==1.0.0) (1.14.1.1)
Requirement already satisfied: triton==3.5.1 in /usr/local/lib/python3.12/dist-packages (from torch->vibevoice==1.0.0) (3.5.1)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/lib/python3.12/dist-packages (from sympy>=1.13.3->torch->vibevoice==1.0.0) (1.3.0)
Requirement already satisfied: aioice<1.0.0,>=0.10.1 in /usr/local/lib/python3.12/dist-packages (from aiortc->vibevoice==1.0.0) (0.10.2)
Requirement already satisfied: cryptography>=44.0.0 in /usr/local/lib/python3.12/dist-packages (from aiortc->vibevoice==1.0.0) (46.0.3)
Requirement already satisfied: google-crc32c>=1.1 in /usr/local/lib/python3.12/dist-packages (from aiortc->vibevoice==1.0.0) (1.8.0)
Requirement already satisfied: pyee>=13.0.0 in /usr/local/lib/python3.12/dist-packages (from aiortc->vibevoice==1.0.0) (13.0.0)
Requirement already satisfied: pylibsrtp>=0.10.0 in /usr/local/lib/python3.12/dist-packages (from aiortc->vibevoice==1.0.0) (1.0.0)
Requirement already satisfied: pyopenssl>=25.0.0 in /usr/local/lib/python3.12/dist-packages (from aiortc->vibevoice==1.0.0) (25.3.0)
Requirement already satisfied: dnspython>=2.0.0 in /usr/local/lib/python3.12/dist-packages (from aioice<1.0.0,>=0.10.1->aiortc->vibevoice==1.0.0) (2.8.0)
Requirement already satisfied: ifaddr>=0.2.0 in /usr/local/lib/python3.12/dist-packages (from aioice<1.0.0,>=0.10.1->aiortc->vibevoice==1.0.0) (0.2.0)
Requirement already satisfied: cffi>=2.0.0 in /usr/local/lib/python3.12/dist-packages (from cryptography>=44.0.0->aiortc->vibevoice==1.0.0) (2.0.0)
Requirement already satisfied: pycparser in /usr/local/lib/python3.12/dist-packages (from cffi>=2.0.0->cryptography>=44.0.0->aiortc->vibevoice==1.0.0) (3.0)
Requirement already satisfied: importlib_metadata in /usr/lib/python3/dist-packages (from diffusers->vibevoice==1.0.0) (4.6.4)
Requirement already satisfied: httpx<1.0.0 in /usr/local/lib/python3.12/dist-packages (from diffusers->vibevoice==1.0.0) (0.28.1)
Requirement already satisfied: Pillow in /usr/local/lib/python3.12/dist-packages (from diffusers->vibevoice==1.0.0) (12.1.0)
Requirement already satisfied: anyio in /usr/local/lib/python3.12/dist-packages (from httpx<1.0.0->diffusers->vibevoice==1.0.0) (4.12.1)
Requirement already satisfied: certifi in /usr/local/lib/python3.12/dist-packages (from httpx<1.0.0->diffusers->vibevoice==1.0.0) (2026.1.4)
Requirement already satisfied: httpcore==1.* in /usr/local/lib/python3.12/dist-packages (from httpx<1.0.0->diffusers->vibevoice==1.0.0) (1.0.9)
Requirement already satisfied: idna in /usr/local/lib/python3.12/dist-packages (from httpx<1.0.0->diffusers->vibevoice==1.0.0) (3.11)
Requirement already satisfied: h11>=0.16 in /usr/local/lib/python3.12/dist-packages (from httpcore==1.*->httpx<1.0.0->diffusers->vibevoice==1.0.0) (0.16.0)
Requirement already satisfied: starlette<0.51.0,>=0.40.0 in /usr/local/lib/python3.12/dist-packages (from fastapi->vibevoice==1.0.0) (0.50.0)
Requirement already satisfied: pydantic>=2.7.0 in /usr/local/lib/python3.12/dist-packages (from fastapi->vibevoice==1.0.0) (2.12.5)
Requirement already satisfied: annotated-doc>=0.0.2 in /usr/local/lib/python3.12/dist-packages (from fastapi->vibevoice==1.0.0) (0.0.4)
Requirement already satisfied: annotated-types>=0.6.0 in /usr/local/lib/python3.12/dist-packages (from pydantic>=2.7.0->fastapi->vibevoice==1.0.0) (0.7.0)
Requirement already satisfied: pydantic-core==2.41.5 in /usr/local/lib/python3.12/dist-packages (from pydantic>=2.7.0->fastapi->vibevoice==1.0.0) (2.41.5)
Requirement already satisfied: typing-inspection>=0.4.2 in /usr/local/lib/python3.12/dist-packages (from pydantic>=2.7.0->fastapi->vibevoice==1.0.0) (0.4.2)
Requirement already satisfied: aiofiles<25.0,>=22.0 in /usr/local/lib/python3.12/dist-packages (from gradio->vibevoice==1.0.0) (24.1.0)
Requirement already satisfied: brotli>=1.1.0 in /usr/local/lib/python3.12/dist-packages (from gradio->vibevoice==1.0.0) (1.2.0)
Requirement already satisfied: ffmpy in /usr/local/lib/python3.12/dist-packages (from gradio->vibevoice==1.0.0) (1.0.0)
Requirement already satisfied: gradio-client==2.0.3 in /usr/local/lib/python3.12/dist-packages (from gradio->vibevoice==1.0.0) (2.0.3)
Requirement already satisfied: groovy~=0.1 in /usr/local/lib/python3.12/dist-packages (from gradio->vibevoice==1.0.0) (0.1.2)
Requirement already satisfied: markupsafe<4.0,>=2.0 in /usr/local/lib/python3.12/dist-packages (from gradio->vibevoice==1.0.0) (3.0.3)
Requirement already satisfied: orjson~=3.0 in /usr/local/lib/python3.12/dist-packages (from gradio->vibevoice==1.0.0) (3.11.7)
Requirement already satisfied: pandas<4.0,>=1.0 in /usr/local/lib/python3.12/dist-packages (from gradio->vibevoice==1.0.0) (3.0.0)
Requirement already satisfied: python-multipart>=0.0.18 in /usr/local/lib/python3.12/dist-packages (from gradio->vibevoice==1.0.0) (0.0.21)
Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.12/dist-packages (from gradio->vibevoice==1.0.0) (2025.2)
Requirement already satisfied: safehttpx<0.2.0,>=0.1.7 in /usr/local/lib/python3.12/dist-packages (from gradio->vibevoice==1.0.0) (0.1.7)
Requirement already satisfied: semantic-version~=2.0 in /usr/local/lib/python3.12/dist-packages (from gradio->vibevoice==1.0.0) (2.10.0)
Requirement already satisfied: tomlkit<0.14.0,>=0.12.0 in /usr/local/lib/python3.12/dist-packages (from gradio->vibevoice==1.0.0) (0.13.3)
Requirement already satisfied: typer<1.0,>=0.12 in /usr/local/lib/python3.12/dist-packages (from gradio->vibevoice==1.0.0) (0.21.1)
Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.12/dist-packages (from pandas<4.0,>=1.0->gradio->vibevoice==1.0.0) (2.9.0.post0)
Requirement already satisfied: click>=8.0.0 in /usr/local/lib/python3.12/dist-packages (from typer<1.0,>=0.12->gradio->vibevoice==1.0.0) (8.3.1)
Requirement already satisfied: shellingham>=1.3.0 in /usr/local/lib/python3.12/dist-packages (from typer<1.0,>=0.12->gradio->vibevoice==1.0.0) (1.5.4)
Requirement already satisfied: rich>=10.11.0 in /usr/local/lib/python3.12/dist-packages (from typer<1.0,>=0.12->gradio->vibevoice==1.0.0) (14.2.0)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.12/dist-packages (from python-dateutil>=2.8.2->pandas<4.0,>=1.0->gradio->vibevoice==1.0.0) (1.17.0)
Requirement already satisfied: markdown-it-py>=2.2.0 in /usr/local/lib/python3.12/dist-packages (from rich>=10.11.0->typer<1.0,>=0.12->gradio->vibevoice==1.0.0) (4.0.0)
Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /usr/local/lib/python3.12/dist-packages (from rich>=10.11.0->typer<1.0,>=0.12->gradio->vibevoice==1.0.0) (2.19.2)
Requirement already satisfied: mdurl~=0.1 in /usr/local/lib/python3.12/dist-packages (from markdown-it-py>=2.2.0->rich>=10.11.0->typer<1.0,>=0.12->gradio->vibevoice==1.0.0) (0.1.2)
Requirement already satisfied: audioread>=2.1.9 in /usr/local/lib/python3.12/dist-packages (from librosa->vibevoice==1.0.0) (3.1.0)
Requirement already satisfied: scikit-learn>=1.1.0 in /usr/local/lib/python3.12/dist-packages (from librosa->vibevoice==1.0.0) (1.8.0)
Requirement already satisfied: joblib>=1.0 in /usr/local/lib/python3.12/dist-packages (from librosa->vibevoice==1.0.0) (1.5.3)
Requirement already satisfied: decorator>=4.3.0 in /usr/local/lib/python3.12/dist-packages (from librosa->vibevoice==1.0.0) (5.2.1)
Requirement already satisfied: soundfile>=0.12.1 in /usr/local/lib/python3.12/dist-packages (from librosa->vibevoice==1.0.0) (0.13.1)
Requirement already satisfied: pooch>=1.1 in /usr/local/lib/python3.12/dist-packages (from librosa->vibevoice==1.0.0) (1.9.0)
Requirement already satisfied: soxr>=0.3.2 in /usr/local/lib/python3.12/dist-packages (from librosa->vibevoice==1.0.0) (1.0.0)
Requirement already satisfied: lazy_loader>=0.1 in /usr/local/lib/python3.12/dist-packages (from librosa->vibevoice==1.0.0) (0.4)
Requirement already satisfied: msgpack>=1.0 in /usr/local/lib/python3.12/dist-packages (from librosa->vibevoice==1.0.0) (1.1.2)
Requirement already satisfied: platformdirs>=2.5.0 in /usr/local/lib/python3.12/dist-packages (from pooch>=1.1->librosa->vibevoice==1.0.0) (4.5.1)
Requirement already satisfied: charset_normalizer<4,>=2 in /usr/local/lib/python3.12/dist-packages (from requests->vibevoice==1.0.0) (3.4.4)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.12/dist-packages (from requests->vibevoice==1.0.0) (2.6.3)
Requirement already satisfied: threadpoolctl>=3.2.0 in /usr/local/lib/python3.12/dist-packages (from scikit-learn>=1.1.0->librosa->vibevoice==1.0.0) (3.6.0)
Requirement already satisfied: httptools>=0.6.3 in /usr/local/lib/python3.12/dist-packages (from uvicorn[standard]->vibevoice==1.0.0) (0.7.1)
Requirement already satisfied: python-dotenv>=0.13 in /usr/local/lib/python3.12/dist-packages (from uvicorn[standard]->vibevoice==1.0.0) (1.2.1)
Requirement already satisfied: uvloop>=0.15.1 in /usr/local/lib/python3.12/dist-packages (from uvicorn[standard]->vibevoice==1.0.0) (0.22.1)
Requirement already satisfied: watchfiles>=0.13 in /usr/local/lib/python3.12/dist-packages (from uvicorn[standard]->vibevoice==1.0.0) (1.1.1)
Requirement already satisfied: websockets>=10.4 in /usr/local/lib/python3.12/dist-packages (from uvicorn[standard]->vibevoice==1.0.0) (16.0)
Building wheels for collected packages: vibevoice
  Building editable for vibevoice (pyproject.toml) ... done
  Created wheel for vibevoice: filename=vibevoice-1.0.0-0.editable-py3-none-any.whl size=8169 sha256=05af9cd4d66b15176e5d2d4e4358dc446d980866b7134d2224a4c8cee4c0ced3
  Stored in directory: /tmp/pip-ephem-wheel-cache-vrew19un/wheels/54/1b/b7/aa63e25c8f14f4f2ae7b04e6097bdecb770e455c5c1ee0a600
Successfully built vibevoice
Installing collected packages: vibevoice
  Attempting uninstall: vibevoice
    Found existing installation: vibevoice 1.0.0
    Uninstalling vibevoice-1.0.0:
      Successfully uninstalled vibevoice-1.0.0
Successfully installed vibevoice-1.0.0
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning.

[notice] A new release of pip is available: 25.3 -> 26.0.1
[notice] To update, run: python3 -m pip install --upgrade pip
args.model='microsoft/VibeVoice-ASR'

============================================================
  Downloading model: microsoft/VibeVoice-ASR
============================================================

Fetching 17 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████| 17/17 [00:00<00:00, 149796.57it/s]

============================================================
  ✅ Model downloaded successfully!
  📁 Path: /root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944
============================================================


============================================================
  Generating tokenizer files
============================================================

=== Generating VibeVoice tokenizer files to /root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944 ===

Downloading vocab.json from Qwen/Qwen2.5-7B...
/usr/local/lib/python3.12/dist-packages/huggingface_hub/file_download.py:979: UserWarning: `local_dir_use_symlinks` parameter is deprecated and will be ignored. The process to download files to a local folder has been updated and do not rely on symlinks anymore. You only need to pass a destination folder as`local_dir`.
For more details, check out https://huggingface.co/docs/huggingface_hub/main/en/guides/download#download-files-to-local-folder.
  warnings.warn(
Downloading merges.txt from Qwen/Qwen2.5-7B...
Downloading tokenizer.json from Qwen/Qwen2.5-7B...
Downloading tokenizer_config.json from Qwen/Qwen2.5-7B...
Patched /root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944/tokenizer_config.json
Patched /root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944/tokenizer.json
Generated /root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944/added_tokens.json
Generated /root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944/special_tokens_map.json

✅ All 6 tokenizer files generated in /root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944

============================================================
  Starting vLLM server on port 8000
============================================================

(APIServer pid=3197) INFO 02-05 19:01:58 [api_server.py:1272] vLLM API server version 0.14.1
(APIServer pid=3197) INFO 02-05 19:01:58 [utils.py:263] non-default args: {'model_tag': '/root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944', 'chat_template_content_format': 'openai', 'model': '/root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944', 'dtype': 'bfloat16', 'allowed_local_media_path': '/app', 'max_model_len': 65536, 'enforce_eager': True, 'served_model_name': ['vibevoice'], 'tensor_parallel_size': 4, 'gpu_memory_utilization': 0.8, 'enable_prefix_caching': False, 'max_num_batched_tokens': 32768, 'max_num_seqs': 64, 'enable_chunked_prefill': True}
(APIServer pid=3197) INFO 02-05 19:01:58 [model.py:530] Resolved architecture: VibeVoiceForASRTraining
(APIServer pid=3197) INFO 02-05 19:01:58 [model.py:1866] Downcasting torch.float32 to torch.bfloat16.
(APIServer pid=3197) INFO 02-05 19:01:58 [model.py:1545] Using max model len 65536
(APIServer pid=3197) INFO 02-05 19:01:59 [scheduler.py:229] Chunked prefill is enabled with max_num_batched_tokens=32768.
(APIServer pid=3197) INFO 02-05 19:01:59 [vllm.py:630] Asynchronous scheduling is enabled.
(APIServer pid=3197) INFO 02-05 19:01:59 [vllm.py:637] Disabling NCCL for DP synchronization when using async scheduling.
(APIServer pid=3197) WARNING 02-05 19:01:59 [vllm.py:665] Enforce eager set, overriding optimization level to -O0
(APIServer pid=3197) INFO 02-05 19:01:59 [vllm.py:765] Cudagraph is disabled under eager mode
(APIServer pid=3197) The tokenizer you are loading from '/root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e. This will lead to incorrect tokenization. You should set the `fix_mistral_regex=True` flag when loading this tokenizer to fix this issue.
(EngineCore_DP0 pid=3921) INFO 02-05 19:02:09 [core.py:97] Initializing a V1 LLM engine (v0.14.1) with config: model='/root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944', speculative_config=None, tokenizer='/root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=65536, download_dir=None, load_format=auto, tensor_parallel_size=4, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=True, enable_return_routed_experts=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser='', reasoning_parser_plugin='', enable_in_reasoning=False), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, kv_cache_metrics=False, kv_cache_metrics_sample=0.01, cudagraph_metrics=False, enable_layerwise_nvtx_tracing=False, enable_mfu_metrics=False, enable_mm_processor_stats=False, enable_logging_iteration_details=False), seed=0, served_model_name=vibevoice, enable_prefix_caching=False, enable_chunked_prefill=True, pooler_config=None, compilation_config={'level': None, 'mode': <CompilationMode.NONE: 0>, 'debug_dump_path': None, 'cache_dir': '', 'compile_cache_save_format': 'binary', 'backend': 'inductor', 'custom_ops': ['all'], 'splitting_ops': [], 'compile_mm_encoder': False, 'compile_sizes': [], 'compile_ranges_split_points': [32768], 'inductor_compile_config': {'enable_auto_functionalized_v2': False, 'combo_kernels': True, 'benchmark_combo_kernel': True}, 'inductor_passes': {}, 'cudagraph_mode': <CUDAGraphMode.NONE: 0>, 'cudagraph_num_of_warmups': 0, 'cudagraph_capture_sizes': [], 'cudagraph_copy_inputs': False, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': False, 'pass_config': {'fuse_norm_quant': False, 'fuse_act_quant': False, 'fuse_attn_quant': False, 'eliminate_noops': False, 'enable_sp': False, 'fuse_gemm_comms': False, 'fuse_allreduce_rms': False}, 'max_cudagraph_capture_size': 0, 'dynamic_shapes_config': {'type': <DynamicShapesType.BACKED: 'backed'>, 'evaluate_guards': False, 'assume_32_bit_indexing': True}, 'local_cache_dir': None}
(EngineCore_DP0 pid=3921) WARNING 02-05 19:02:09 [multiproc_executor.py:880] Reducing Torch parallelism from 56 threads to 1 to avoid unnecessary CPU contention. Set OMP_NUM_THREADS in the external environment to tune this value as needed.
The tokenizer you are loading from '/root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e. This will lead to incorrect tokenization. You should set the `fix_mistral_regex=True` flag when loading this tokenizer to fix this issue.
INFO 02-05 19:02:19 [parallel_state.py:1214] world_size=4 rank=3 local_rank=3 distributed_init_method=tcp://127.0.0.1:33473 backend=nccl
The tokenizer you are loading from '/root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e. This will lead to incorrect tokenization. You should set the `fix_mistral_regex=True` flag when loading this tokenizer to fix this issue.
The tokenizer you are loading from '/root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e. This will lead to incorrect tokenization. You should set the `fix_mistral_regex=True` flag when loading this tokenizer to fix this issue.
INFO 02-05 19:02:20 [parallel_state.py:1214] world_size=4 rank=1 local_rank=1 distributed_init_method=tcp://127.0.0.1:33473 backend=nccl
INFO 02-05 19:02:20 [parallel_state.py:1214] world_size=4 rank=2 local_rank=2 distributed_init_method=tcp://127.0.0.1:33473 backend=nccl
The tokenizer you are loading from '/root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e. This will lead to incorrect tokenization. You should set the `fix_mistral_regex=True` flag when loading this tokenizer to fix this issue.
INFO 02-05 19:02:21 [parallel_state.py:1214] world_size=4 rank=0 local_rank=0 distributed_init_method=tcp://127.0.0.1:33473 backend=nccl
INFO 02-05 19:02:45 [pynccl.py:111] vLLM is using nccl==2.27.5
WARNING 02-05 19:02:45 [symm_mem.py:67] SymmMemCommunicator: Device capability 8.9 not supported, communicator is not available.
WARNING 02-05 19:02:45 [symm_mem.py:67] SymmMemCommunicator: Device capability 8.9 not supported, communicator is not available.
WARNING 02-05 19:02:45 [symm_mem.py:67] SymmMemCommunicator: Device capability 8.9 not supported, communicator is not available.
WARNING 02-05 19:02:45 [symm_mem.py:67] SymmMemCommunicator: Device capability 8.9 not supported, communicator is not available.
WARNING 02-05 19:02:45 [custom_all_reduce.py:154] Custom allreduce is disabled because it's not supported on more than two PCIe-only GPUs. To silence this warning, specify disable_custom_all_reduce=True explicitly.
WARNING 02-05 19:02:45 [custom_all_reduce.py:154] Custom allreduce is disabled because it's not supported on more than two PCIe-only GPUs. To silence this warning, specify disable_custom_all_reduce=True explicitly.
WARNING 02-05 19:02:45 [custom_all_reduce.py:154] Custom allreduce is disabled because it's not supported on more than two PCIe-only GPUs. To silence this warning, specify disable_custom_all_reduce=True explicitly.
WARNING 02-05 19:02:45 [custom_all_reduce.py:154] Custom allreduce is disabled because it's not supported on more than two PCIe-only GPUs. To silence this warning, specify disable_custom_all_reduce=True explicitly.
INFO 02-05 19:02:45 [parallel_state.py:1425] rank 0 in world size 4 is assigned as DP rank 0, PP rank 0, PCP rank 0, TP rank 0, EP rank N/A
INFO 02-05 19:02:45 [parallel_state.py:1425] rank 3 in world size 4 is assigned as DP rank 0, PP rank 0, PCP rank 0, TP rank 3, EP rank N/A
INFO 02-05 19:02:45 [parallel_state.py:1425] rank 2 in world size 4 is assigned as DP rank 0, PP rank 0, PCP rank 0, TP rank 2, EP rank N/A
INFO 02-05 19:02:45 [parallel_state.py:1425] rank 1 in world size 4 is assigned as DP rank 0, PP rank 0, PCP rank 0, TP rank 1, EP rank N/A
/usr/local/lib/python3.12/dist-packages/transformers/audio_utils.py:525: UserWarning: At least one mel filter has all zero values. The value for `num_mel_filters` (128) may be set too high. Or, the value for `num_frequency_bins` (201) may be set too low.
  warnings.warn(
/usr/local/lib/python3.12/dist-packages/transformers/audio_utils.py:525: UserWarning: At least one mel filter has all zero values. The value for `num_mel_filters` (128) may be set too high. Or, the value for `num_frequency_bins` (201) may be set too low.
  warnings.warn(
/usr/local/lib/python3.12/dist-packages/transformers/audio_utils.py:525: UserWarning: At least one mel filter has all zero values. The value for `num_mel_filters` (128) may be set too high. Or, the value for `num_frequency_bins` (201) may be set too low.
  warnings.warn(
(Worker_TP0 pid=4166) INFO 02-05 19:02:46 [gpu_model_runner.py:3808] Starting to load model /root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944...
/usr/local/lib/python3.12/dist-packages/transformers/audio_utils.py:525: UserWarning: At least one mel filter has all zero values. The value for `num_mel_filters` (128) may be set too high. Or, the value for `num_frequency_bins` (201) may be set too low.
  warnings.warn(
(Worker_TP3 pid=4169) `torch_dtype` is deprecated! Use `dtype` instead!
(Worker_TP3 pid=4169) INFO 02-05 19:02:46 [vllm.py:630] Asynchronous scheduling is enabled.
(Worker_TP3 pid=4169) WARNING 02-05 19:02:46 [vllm.py:672] Inductor compilation was disabled by user settings, optimizations settings that are only active during inductor compilation will be ignored.
(Worker_TP3 pid=4169) INFO 02-05 19:02:46 [vllm.py:765] Cudagraph is disabled under eager mode
(Worker_TP1 pid=4167) `torch_dtype` is deprecated! Use `dtype` instead!
(Worker_TP1 pid=4167) INFO 02-05 19:02:46 [vllm.py:630] Asynchronous scheduling is enabled.
(Worker_TP1 pid=4167) WARNING 02-05 19:02:46 [vllm.py:672] Inductor compilation was disabled by user settings, optimizations settings that are only active during inductor compilation will be ignored.
(Worker_TP1 pid=4167) INFO 02-05 19:02:46 [vllm.py:765] Cudagraph is disabled under eager mode
(Worker_TP0 pid=4166) `torch_dtype` is deprecated! Use `dtype` instead!
(Worker_TP0 pid=4166) INFO 02-05 19:02:46 [vllm.py:630] Asynchronous scheduling is enabled.
(Worker_TP0 pid=4166) WARNING 02-05 19:02:46 [vllm.py:672] Inductor compilation was disabled by user settings, optimizations settings that are only active during inductor compilation will be ignored.
(Worker_TP0 pid=4166) INFO 02-05 19:02:46 [vllm.py:765] Cudagraph is disabled under eager mode
(Worker_TP2 pid=4168) `torch_dtype` is deprecated! Use `dtype` instead!
(Worker_TP2 pid=4168) INFO 02-05 19:02:47 [vllm.py:630] Asynchronous scheduling is enabled.
(Worker_TP2 pid=4168) WARNING 02-05 19:02:47 [vllm.py:672] Inductor compilation was disabled by user settings, optimizations settings that are only active during inductor compilation will be ignored.
(Worker_TP2 pid=4168) INFO 02-05 19:02:47 [vllm.py:765] Cudagraph is disabled under eager mode
(Worker_TP0 pid=4166) INFO 02-05 19:02:47 [cuda.py:351] Using FLASH_ATTN attention backend out of potential backends: ('FLASH_ATTN', 'FLASHINFER', 'TRITON_ATTN', 'FLEX_ATTENTION')
(Worker_TP3 pid=4169) [VibeVoice] Converted acoustic_tokenizer to torch.float32 (was torch.bfloat16)
(Worker_TP3 pid=4169) [VibeVoice] Converted semantic_tokenizer to torch.float32 (was torch.bfloat16)
(Worker_TP3 pid=4169) [VibeVoice] Converted acoustic_connector to torch.float32 (was torch.bfloat16)
(Worker_TP3 pid=4169) [VibeVoice] Converted semantic_connector to torch.float32 (was torch.bfloat16)
(Worker_TP0 pid=4166) [VibeVoice] Converted acoustic_tokenizer to torch.float32 (was torch.bfloat16)
(Worker_TP2 pid=4168) [VibeVoice] Converted acoustic_tokenizer to torch.float32 (was torch.bfloat16)
(Worker_TP0 pid=4166) [VibeVoice] Converted semantic_tokenizer to torch.float32 (was torch.bfloat16)
(Worker_TP0 pid=4166) [VibeVoice] Converted acoustic_connector to torch.float32 (was torch.bfloat16)
(Worker_TP0 pid=4166) [VibeVoice] Converted semantic_connector to torch.float32 (was torch.bfloat16)
(Worker_TP1 pid=4167) [VibeVoice] Converted acoustic_tokenizer to torch.float32 (was torch.bfloat16)
(Worker_TP2 pid=4168) [VibeVoice] Converted semantic_tokenizer to torch.float32 (was torch.bfloat16)
Loading safetensors checkpoint shards:   0% Completed | 0/8 [00:00<?, ?it/s]
(Worker_TP2 pid=4168) [VibeVoice] Converted acoustic_connector to torch.float32 (was torch.bfloat16)
(Worker_TP2 pid=4168) [VibeVoice] Converted semantic_connector to torch.float32 (was torch.bfloat16)
(Worker_TP1 pid=4167) [VibeVoice] Converted semantic_tokenizer to torch.float32 (was torch.bfloat16)
(Worker_TP1 pid=4167) [VibeVoice] Converted acoustic_connector to torch.float32 (was torch.bfloat16)
(Worker_TP1 pid=4167) [VibeVoice] Converted semantic_connector to torch.float32 (was torch.bfloat16)
Loading safetensors checkpoint shards:  12% Completed | 1/8 [00:00<00:01,  4.22it/s]
Loading safetensors checkpoint shards:  25% Completed | 2/8 [00:01<00:06,  1.10s/it]
Loading safetensors checkpoint shards:  38% Completed | 3/8 [00:02<00:03,  1.41it/s]
Loading safetensors checkpoint shards:  50% Completed | 4/8 [00:03<00:03,  1.25it/s]
Loading safetensors checkpoint shards:  75% Completed | 6/8 [00:03<00:00,  2.19it/s]
Loading safetensors checkpoint shards:  88% Completed | 7/8 [00:03<00:00,  2.66it/s]
Loading safetensors checkpoint shards: 100% Completed | 8/8 [00:03<00:00,  2.96it/s]
Loading safetensors checkpoint shards: 100% Completed | 8/8 [00:03<00:00,  2.09it/s]
(Worker_TP0 pid=4166) 
(Worker_TP0 pid=4166) INFO 02-05 19:02:51 [default_loader.py:291] Loading weights took 3.85 seconds
(Worker_TP0 pid=4166) INFO 02-05 19:02:52 [gpu_model_runner.py:3905] Model loading took 7.53 GiB memory and 4.367733 seconds
(Worker_TP1 pid=4167) INFO 02-05 19:02:52 [gpu_model_runner.py:4715] Encoder cache will be initialized with a budget of 32768 tokens, and profiled with 145 audio items of the maximum feature size.
(Worker_TP0 pid=4166) INFO 02-05 19:02:52 [gpu_model_runner.py:4715] Encoder cache will be initialized with a budget of 32768 tokens, and profiled with 145 audio items of the maximum feature size.
(Worker_TP2 pid=4168) INFO 02-05 19:02:52 [gpu_model_runner.py:4715] Encoder cache will be initialized with a budget of 32768 tokens, and profiled with 145 audio items of the maximum feature size.
(Worker_TP3 pid=4169) INFO 02-05 19:02:52 [gpu_model_runner.py:4715] Encoder cache will be initialized with a budget of 32768 tokens, and profiled with 145 audio items of the maximum feature size.
(Worker_TP0 pid=4166) INFO 02-05 19:03:22 [gpu_worker.py:358] Available KV cache memory: 8.88 GiB
(EngineCore_DP0 pid=3921) INFO 02-05 19:03:23 [kv_cache_utils.py:1305] GPU KV cache size: 665,344 tokens
(EngineCore_DP0 pid=3921) INFO 02-05 19:03:23 [kv_cache_utils.py:1310] Maximum concurrency for 65,536 tokens per request: 10.15x
(EngineCore_DP0 pid=3921) INFO 02-05 19:03:23 [core.py:273] init engine (profile, create kv cache, warmup model) took 30.92 seconds
(EngineCore_DP0 pid=3921) The tokenizer you are loading from '/root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e. This will lead to incorrect tokenization. You should set the `fix_mistral_regex=True` flag when loading this tokenizer to fix this issue.
(EngineCore_DP0 pid=3921) /usr/local/lib/python3.12/dist-packages/transformers/audio_utils.py:525: UserWarning: At least one mel filter has all zero values. The value for `num_mel_filters` (128) may be set too high. Or, the value for `num_frequency_bins` (201) may be set too low.
(EngineCore_DP0 pid=3921)   warnings.warn(
(EngineCore_DP0 pid=3921) INFO 02-05 19:03:24 [vllm.py:630] Asynchronous scheduling is enabled.
(EngineCore_DP0 pid=3921) WARNING 02-05 19:03:24 [vllm.py:672] Inductor compilation was disabled by user settings, optimizations settings that are only active during inductor compilation will be ignored.
(EngineCore_DP0 pid=3921) INFO 02-05 19:03:24 [vllm.py:765] Cudagraph is disabled under eager mode
(APIServer pid=3197) INFO 02-05 19:03:24 [api_server.py:1014] Supported tasks: ['generate', 'transcription']
(APIServer pid=3197) INFO 02-05 19:03:24 [serving_chat.py:182] Warming up chat template processing...
(APIServer pid=3197) The tokenizer you are loading from '/root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e. This will lead to incorrect tokenization. You should set the `fix_mistral_regex=True` flag when loading this tokenizer to fix this issue.
(APIServer pid=3197) INFO 02-05 19:03:24 [serving_chat.py:218] Chat template warmup completed in 295.2ms
(APIServer pid=3197) INFO 02-05 19:03:24 [speech_to_text.py:138] Warming up audio preprocessing libraries...
(APIServer pid=3197) The tokenizer you are loading from '/root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e. This will lead to incorrect tokenization. You should set the `fix_mistral_regex=True` flag when loading this tokenizer to fix this issue.
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177] Audio preprocessing warmup failed (non-fatal): %s. First request may experience higher latency.
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177] Traceback (most recent call last):
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/speech_to_text.py", line 152, in _warmup_audio_preprocessing
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177]     processor = cached_processor_from_config(self.model_config)
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177]                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177]   File "/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/processor.py", line 251, in cached_processor_from_config
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177]     return cached_get_processor_without_dynamic_kwargs(
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177]   File "/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/processor.py", line 210, in cached_get_processor_without_dynamic_kwargs
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177]     processor = cached_get_processor(
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177]                 ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177]   File "/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/processor.py", line 155, in get_processor
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177]     raise TypeError(
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177] TypeError: Invalid type of HuggingFace processor. Expected type: <class 'transformers.processing_utils.ProcessorMixin'>, but found type: <class 'transformers.models.qwen2.tokenization_qwen2_fast.Qwen2TokenizerFast'>
(APIServer pid=3197) INFO 02-05 19:03:25 [speech_to_text.py:201] Warming up multimodal input processor...
(APIServer pid=3197) INFO 02-05 19:03:25 [speech_to_text.py:234] Input processor warmup completed in 0.00s
(APIServer pid=3197) INFO 02-05 19:03:25 [speech_to_text.py:138] Warming up audio preprocessing libraries...
(APIServer pid=3197) The tokenizer you are loading from '/root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e. This will lead to incorrect tokenization. You should set the `fix_mistral_regex=True` flag when loading this tokenizer to fix this issue.
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177] Audio preprocessing warmup failed (non-fatal): %s. First request may experience higher latency.
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177] Traceback (most recent call last):
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/speech_to_text.py", line 152, in _warmup_audio_preprocessing
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177]     processor = cached_processor_from_config(self.model_config)
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177]                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177]   File "/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/processor.py", line 251, in cached_processor_from_config
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177]     return cached_get_processor_without_dynamic_kwargs(
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177]   File "/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/processor.py", line 210, in cached_get_processor_without_dynamic_kwargs
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177]     processor = cached_get_processor(
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177]                 ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177]   File "/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/processor.py", line 155, in get_processor
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177]     raise TypeError(
(APIServer pid=3197) ERROR 02-05 19:03:25 [speech_to_text.py:177] TypeError: Invalid type of HuggingFace processor. Expected type: <class 'transformers.processing_utils.ProcessorMixin'>, but found type: <class 'transformers.models.qwen2.tokenization_qwen2_fast.Qwen2TokenizerFast'>
(APIServer pid=3197) INFO 02-05 19:03:25 [speech_to_text.py:201] Warming up multimodal input processor...
(APIServer pid=3197) INFO 02-05 19:03:25 [speech_to_text.py:234] Input processor warmup completed in 0.00s
(APIServer pid=3197) INFO 02-05 19:03:25 [api_server.py:1346] Starting vLLM API server 0 on http://0.0.0.0:8000
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:38] Available routes are:
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /openapi.json, Methods: HEAD, GET
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /docs, Methods: HEAD, GET
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /docs/oauth2-redirect, Methods: HEAD, GET
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /redoc, Methods: HEAD, GET
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /scale_elastic_ep, Methods: POST
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /is_scaling_elastic_ep, Methods: POST
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /tokenize, Methods: POST
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /detokenize, Methods: POST
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /inference/v1/generate, Methods: POST
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /pause, Methods: POST
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /resume, Methods: POST
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /is_paused, Methods: GET
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /metrics, Methods: GET
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /health, Methods: GET
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /load, Methods: GET
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /v1/models, Methods: GET
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /version, Methods: GET
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /v1/responses, Methods: POST
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /v1/responses/{response_id}, Methods: GET
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /v1/responses/{response_id}/cancel, Methods: POST
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /v1/messages, Methods: POST
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /v1/chat/completions, Methods: POST
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /v1/completions, Methods: POST
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /v1/audio/transcriptions, Methods: POST
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /v1/audio/translations, Methods: POST
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /ping, Methods: GET
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /ping, Methods: POST
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /invocations, Methods: POST
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /classify, Methods: POST
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /v1/embeddings, Methods: POST
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /score, Methods: POST
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /v1/score, Methods: POST
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /rerank, Methods: POST
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /v1/rerank, Methods: POST
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /v2/rerank, Methods: POST
(APIServer pid=3197) INFO 02-05 19:03:25 [launcher.py:46] Route: /pooling, Methods: POST
(APIServer pid=3197) INFO:     Started server process [3197]
(APIServer pid=3197) INFO:     Waiting for application startup.
(APIServer pid=3197) INFO:     Application startup complete.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions