Standalone LLM inference service с Ollama и LiteLLM Proxy.
┌─────────────────────────────────────────────┐
│ LLM Service │
├─────────────────────────────────────────────┤
│ litellm (port 4000) │
│ └── OpenAI-совместимый API │
│ └── API key авторизация │
│ │ │
│ ▼ │
│ ollama (internal) │
│ └── qwen2.5:7b (или другие модели) │
└─────────────────────────────────────────────┘
- Endpoint:
http://localhost:4000/v1/chat/completions - Авторизация:
Authorization: Bearer <api_key>
curl http://localhost:4000/v1/chat/completions \
-H "Authorization: Bearer sk-your-api-key" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen2.5-7b",
"messages": [{"role": "user", "content": "Hello!"}],
"temperature": 0.7
}'# С master key
curl -X POST "http://localhost:4000/key/generate" \
-H "Authorization: Bearer sk-master-key" \
-H "Content-Type: application/json" \
-d '{"key_alias": "my-client"}'cp .env.example .env
# Отредактировать .env
docker compose up -dcd ansible
ansible-vault edit inventory/group_vars/vault.yml # Задать vault_litellm_master_key
ansible-playbook playbooks/deploy.yml -i inventory/hosts --ask-vault-passcurl http://localhost:4000/health