Connection error to API at http://127.0.0.1:1234/v1/completions

```
INFO:     Started server process [970]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
 INFO:     100.64.0.25:40024 - "GET / HTTP/1.1" 200 OK
INFO:     100.64.0.25:40024 - "GET /get_config HTTP/1.1" 200 OK
INFO:     100.64.0.27:47244 - "GET /favicon.ico HTTP/1.1" 200 OK
Starting speech generation for 'alright'
Using voice: tara, GPU acceleration: Yes (High-end)
Generating speech for: <|audio|>tara: alright<|eot_id|>
Using optimized parameters for high-end GPU
Connection error to API at http://127.0.0.1:1234/v1/completions
Retrying in 2 seconds... (attempt 2/3)
Connection error to API at http://127.0.0.1:1234/v1/completions
Retrying in 4 seconds... (attempt 3/3)
Connection error to API at http://127.0.0.1:1234/v1/completions
Max retries reached. Token generation failed.
Producer completed - setting done event
Received end-of-stream marker
Waiting for token processor thread to complete...
Audio saved to outputs/tara_20250824_204655.wav
Total speech generation completed in 6.21 seconds
INFO:     100.64.0.29:51204 - "POST /speak HTTP/1.1" 200 OK
INFO:     100.64.0.27:52766 - "GET /tara_20250824_204655.wav HTTP/1.1" 200 OK
INFO:     100.64.0.27:52770 - "GET /docs HTTP/1.1" 200 OK
INFO:     100.64.0.27:52770 - "GET /openapi.json HTTP/1.1" 200 OK
INFO:     100.64.0.35:34118 - "GET / HTTP/1.1" 200 OK
```


What am I doing wrong? I'm trying to launch this on runpod, on my local machine it worked when I was using LM studio.

Also it works when I use this command in terminal

```
apt-get update
apt-get install -y python3-pip ffmpeg
python3 -m pip install --upgrade pip
python3 -m pip install orpheus-cpp scipy numpy
python3 - << 'PY'
from orpheus_cpp import OrpheusCpp
from scipy.io.wavfile import write
import numpy as np
text = "This is a Runpod-hosted Orpheus TTS test sentence."
orpheus = OrpheusCpp(verbose=False, lang="en")
chunks = []
sr = 24000
for i, (sr, chunk) in enumerate(orpheus.stream_tts_sync(text, options={"voice_id": "tara"})):
    chunks.append(chunk)
audio = np.concatenate(chunks, axis=1).ravel()
write("/workspace/runpod_orpheus_test.wav", sr, audio)
print("/workspace/runpod_orpheus_test.wav")
PY
```


but it's using my CPU not GPU, do I need to launch orpheus on some vllm or something of that sort? I'm not very technical when it comes to stuff like this so I apologize if this a dumb issue

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Connection error to API at http://127.0.0.1:1234/v1/completions #81

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Connection error to API at http://127.0.0.1:1234/v1/completions #81

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions