Skip to content

Fix for long text inputs, send text as payload, not url parameter.#149

Open
bioluminesceme wants to merge 1 commit intomicrosoft:mainfrom
bioluminesceme:fix/websocket-url-length-limit
Open

Fix for long text inputs, send text as payload, not url parameter.#149
bioluminesceme wants to merge 1 commit intomicrosoft:mainfrom
bioluminesceme:fix/websocket-url-length-limit

Conversation

@bioluminesceme
Copy link

@bioluminesceme bioluminesceme commented Dec 8, 2025

If you paste in too much text, the webui of VibeVoice-Realtime TTS errors with a 400 Bad Request.
Fix: send all the text as a websocket payload, not as a URL parameter. Then streaming the text works great again.

I used Claude Code for part of this.

@bioluminesceme
Copy link
Author

@microsoft-github-policy-service agree

@YaoyaoChang
Copy link
Collaborator

Error on my side.

APEX FusedRMSNorm not available, using native implementation
INFO:     Started server process [2850394]
INFO:     Waiting for application startup.
[startup] Loading processor from microsoft/VibeVoice-Realtime-0.5B
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'Qwen2Tokenizer'. 
The class this function is called from is 'VibeVoiceTextTokenizerFast'.
Using device: cuda, torch_dtype: torch.bfloat16, attn_implementation: flash_attention_2
[startup] Found 25 voice presets
[startup] Using fallback voice preset: de-Spk0_man
[startup] Loading voice preset de-Spk0_man from /data/yaoyaochang/code/speech/microsoft/microsoft/VibeVoice/demo/voices/streaming_model/de-Spk0_man.pt
[startup] Loading prefilled prompt from /data/yaoyaochang/code/speech/microsoft/microsoft/VibeVoice/demo/voices/streaming_model/de-Spk0_man.pt
[startup] Model ready.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:3000 (Press CTRL+C to quit)
INFO:     127.0.0.1:57894 - "WebSocket /stream?text=Enter+your+text+here+and+click+%22Start%22+to+instantly+hear+the+VibeVoice-Realtime+TTS+output+audio.&cfg=1.500&steps=5&voice=de-Spk0_man" [accepted]
INFO:     connection open
Error receiving config: (<CloseCode.NO_STATUS_RCVD: 1005>, '')
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/data/yaoyaochang/code/speech/microsoft/microsoft/VibeVoice/demo/web/app.py", line 369, in websocket_stream
    config_message = await ws.receive_text()
                     ^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/yaoyaochang/miniconda3/envs/VibeVoiceStreaming/lib/python3.11/site-packages/starlette/websockets.py", line 120, in receive_text
    self._raise_on_disconnect(message)
  File "/data/yaoyaochang/miniconda3/envs/VibeVoiceStreaming/lib/python3.11/site-packages/starlette/websockets.py", line 114, in _raise_on_disconnect
    raise WebSocketDisconnect(message["code"], message.get("reason"))
starlette.websockets.WebSocketDisconnect: (<CloseCode.NO_STATUS_RCVD: 1005>, '')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/data/yaoyaochang/miniconda3/envs/VibeVoiceStreaming/lib/python3.11/site-packages/uvicorn/protocols/websockets/websockets_impl.py", line 244, in run_asgi
    result = await self.app(self.scope, self.asgi_receive, self.asgi_send)  # type: ignore[func-returns-value]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/yaoyaochang/miniconda3/envs/VibeVoiceStreaming/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in __call__
    return await self.app(scope, receive, send)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/yaoyaochang/miniconda3/envs/VibeVoiceStreaming/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/data/yaoyaochang/miniconda3/envs/VibeVoiceStreaming/lib/python3.11/site-packages/starlette/applications.py", line 113, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/data/yaoyaochang/miniconda3/envs/VibeVoiceStreaming/lib/python3.11/site-packages/starlette/middleware/errors.py", line 151, in __call__
    await self.app(scope, receive, send)
  File "/data/yaoyaochang/miniconda3/envs/VibeVoiceStreaming/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 63, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/data/yaoyaochang/miniconda3/envs/VibeVoiceStreaming/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    raise exc
  File "/data/yaoyaochang/miniconda3/envs/VibeVoiceStreaming/lib/python3.11/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
    await app(scope, receive, sender)
  File "/data/yaoyaochang/miniconda3/envs/VibeVoiceStreaming/lib/python3.11/site-packages/starlette/routing.py", line 716, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/data/yaoyaochang/miniconda3/envs/VibeVoiceStreaming/lib/python3.11/site-packages/starlette/routing.py", line 736, in app
    await route.handle(scope, receive, send)
  File "/data/yaoyaochang/miniconda3/envs/VibeVoiceStreaming/lib/python3.11/site-packages/starlette/routing.py", line 364, in handle
    await self.app(scope, receive, send)
  File "/data/yaoyaochang/miniconda3/envs/VibeVoiceStreaming/lib/python3.11/site-packages/starlette/routing.py", line 97, in app
    await wrap_app_handling_exceptions(app, session)(scope, receive, send)
  File "/data/yaoyaochang/miniconda3/envs/VibeVoiceStreaming/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    raise exc
  File "/data/yaoyaochang/miniconda3/envs/VibeVoiceStreaming/lib/python3.11/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
    await app(scope, receive, sender)
  File "/data/yaoyaochang/miniconda3/envs/VibeVoiceStreaming/lib/python3.11/site-packages/starlette/routing.py", line 95, in app
    await func(session)
  File "/data/yaoyaochang/miniconda3/envs/VibeVoiceStreaming/lib/python3.11/site-packages/fastapi/routing.py", line 384, in app
    await dependant.call(**solved_result.values)
  File "/data/yaoyaochang/code/speech/microsoft/microsoft/VibeVoice/demo/web/app.py", line 373, in websocket_stream
    await ws.close(code=1003, reason="Invalid configuration")
  File "/data/yaoyaochang/miniconda3/envs/VibeVoiceStreaming/lib/python3.11/site-packages/starlette/websockets.py", line 181, in close
    await self.send({"type": "websocket.close", "code": code, "reason": reason or ""})
  File "/data/yaoyaochang/miniconda3/envs/VibeVoiceStreaming/lib/python3.11/site-packages/starlette/websockets.py", line 86, in send
    await self._send(message)
  File "/data/yaoyaochang/miniconda3/envs/VibeVoiceStreaming/lib/python3.11/site-packages/starlette/_exception_handler.py", line 39, in sender
    await send(message)
  File "/data/yaoyaochang/miniconda3/envs/VibeVoiceStreaming/lib/python3.11/site-packages/uvicorn/protocols/websockets/websockets_impl.py", line 360, in asgi_send
    raise RuntimeError(msg % message_type)
RuntimeError: Unexpected ASGI message 'websocket.close', after sending 'websocket.close' or response already completed.
INFO:     connection closed

@bioluminesceme
Copy link
Author

bioluminesceme commented Dec 9, 2025

That looks like you are still using the old code that sends the text as a URL parameter?
With the same voice, I get this:

INFO:     Started server process [2784]
INFO:     Waiting for application startup.
[startup] Loading processor from models/VibeVoice-Realtime-0.5B
The tokenizer class you load from this checkpoint is not the same type as the class this function is called    
 from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'Qwen2Tokenizer'. 
The class this function is called from is 'VibeVoiceTextTokenizerFast'.
Using device: cuda, torch_dtype: torch.bfloat16, attn_implementation: flash_attention_2
[startup] Found 25 voice presets
[startup] Using fallback voice preset: de-Spk0_man
[startup] Loading voice preset de-Spk0_man from 
F:\VoiceCloning\VibeVoiceNEW\demo\voices\streaming_model\de-Spk0_man.pt
[startup] Loading prefilled prompt from 
F:\VoiceCloning\VibeVoiceNEW\demo\voices\streaming_model\de-Spk0_man.pt
[startup] Model ready.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:4243 (Press CTRL+C to quit)
INFO:     127.0.0.1:57159 - "GET / HTTP/1.1" 200 OK
INFO:     127.0.0.1:57159 - "GET /config HTTP/1.1" 200 OK
INFO:     127.0.0.1:57159 - "GET /favicon.ico HTTP/1.1" 404 Not Found
INFO:     127.0.0.1:57163 - "WebSocket /stream" [accepted]
INFO:     connection open
Client connected, text length=97, cfg=1.500, steps=5, voice=de-Spk0_man
WS handler exit

My URL is "WebSocket /stream without the long ?text= thing after it (which is what is causing the problems when the text it so long that that string goes over the character limit for a URL I think?).

@bioluminesceme
Copy link
Author

If you have both the changed demo/web/app.py and demo/web/index.html that url should not have the parameters in it anymore

in the index.html this

``` const wsUrl = ${location.origin.replace(/^http/, 'ws')}/stream?${params.toString()};````

was changed to

const wsUrl = `${location.origin.replace(/^http/, 'ws')}/stream`;

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes 400 Bad Request errors when streaming long TTS text in the demo Web UI by moving the input text/config from WebSocket URL query parameters into an initial WebSocket message payload.

Changes:

  • Frontend: stop appending text/cfg/steps/voice as URL query params; send them as the first WebSocket JSON message after connect.
  • Backend: stop reading ws.query_params; instead receive_text() + json.loads() to obtain initial stream configuration.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
demo/web/index.html Sends TTS request configuration as a JSON WebSocket message instead of URL query params.
demo/web/app.py Reads the initial JSON configuration message from the WebSocket rather than query params.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +940 to +942
socket.onopen = () => {
// Send configuration as first message after connection opens
socket.send(JSON.stringify(config));
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

socket.onopen closes over the mutable global socket variable, but closeSocket() sets socket = null even when the WebSocket is still CONNECTING. If the user stops quickly (or other code calls closeSocket) around the time the connection opens, this handler can run with socket === null and throw when calling socket.send(...). Capture the WebSocket instance in a local const (or use the event.target) inside the handler to avoid races with socket being reassigned/nullified.

Suggested change
socket.onopen = () => {
// Send configuration as first message after connection opens
socket.send(JSON.stringify(config));
socket.onopen = event => {
// Send configuration as first message after connection opens
const ws = event.target;
ws.send(JSON.stringify(config));

Copilot uses AI. Check for mistakes.
Comment on lines +368 to +381
try:
config_message = await ws.receive_text()
config = json.loads(config_message)
except Exception as e:
print(f"Error receiving config: {e}")
await ws.close(code=1003, reason="Invalid configuration")
return

text = config.get("text", "")
cfg_param = config.get("cfg")
steps_param = config.get("steps")
voice_param = config.get("voice")

print(f"Client connected, text length={len(text)}, cfg={cfg_param}, steps={steps_param}, voice={voice_param}")
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After json.loads, config can be any JSON type (string/list/null/etc). The subsequent config.get(...) calls and len(text) will raise (AttributeError/TypeError) for valid-but-unexpected payloads like "foo" or { "text": null }, resulting in a 500 and leaving the socket in a bad state. Validate config is a dict and that text is a string (or coerce/reject) before accessing .get/len, and close the websocket with an appropriate code/reason on invalid types.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants