Skip to content

Conversation

Copy link

Copilot AI commented Dec 27, 2025

Enables running quantcoder-cli with local Ollama models instead of requiring OpenAI API access. Default backend is now Ollama.

Architecture

Backend Adapter Pattern

  • OllamaAdapter: HTTP client for Ollama API at /api/generate
  • Translates OpenAI message format to Ollama prompt format
  • Handles multiple response schemas (response, text, output, choices[])
  • Configurable via OLLAMA_BASE_URL (default: http://localhost:11434) and OLLAMA_MODEL (default: llama2)

Factory Selection

  • make_backend(): Returns adapter based on BACKEND env var (default: ollama)
  • Case-insensitive, extensible for future backends

Processor Integration

  • OpenAIHandler now accepts backend parameter instead of calling OpenAI SDK directly
  • ArticleProcessor uses factory to instantiate backend
  • All LLM calls (generate_summary, generate_qc_code, refine_code) route through backend.chat_complete()

Usage

# Environment configuration
export BACKEND=ollama
export OLLAMA_MODEL=mistral
export OLLAMA_BASE_URL=http://localhost:11434

# Backend adapter usage
from quantcli.backend_factory import make_backend
backend = make_backend()
response = backend.chat_complete([
    {"role": "system", "content": "You are a trading expert"},
    {"role": "user", "content": "Explain momentum strategy"}
], max_tokens=500, temperature=0.3)

Error Handling

Connection failures provide clear diagnostics:

Failed to connect to Ollama at http://localhost:11434/api/generate. 
Is Ollama running? Error: Connection refused

Unexpected response formats list expected fields:

Unexpected response format from Ollama. 
Expected fields: 'response', 'text', 'output', or 'choices'. 
Got: ['model', 'created_at', 'done']

Testing

21 tests with mocked HTTP calls cover:

  • Adapter initialization and configuration
  • Multiple Ollama response formats
  • Network error scenarios (timeout, connection refused, HTTP errors)
  • Factory backend selection
  • Integration with ArticleProcessor

Files

New

  • quantcli/backend.py (147 lines): OllamaAdapter implementation
  • quantcli/backend_factory.py (42 lines): Backend factory
  • tests/test_backend.py (198 lines): Adapter unit tests
  • tests/test_backend_factory.py (63 lines): Factory tests
  • tests/test_integration.py (151 lines): Integration tests

Modified

  • quantcli/processor.py: Refactored OpenAIHandler to use adapters (~50 lines changed)
  • README.md: Added Ollama setup guide and environment variable documentation

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • esm.ubuntu.com
    • Triggering command: /usr/lib/apt/methods/https /usr/lib/apt/methods/https (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Original prompt

Goal: Add an Ollama-backed pluggable LLM backend to quantcoder-cli and make the CLI use it by default so users running Ollama locally can run the tool without the OpenAI SDK.

Scope (high-level):

  • Add a lightweight backend adapter for Ollama that exposes chat_complete(messages, max_tokens, temperature) -> str.
  • Add a backend factory to select backend via env var BACKEND (default: ollama).
  • Refactor the existing OpenAIHandler usage in quantcli/processor.py to use the backend adapter instead of direct openai.ChatCompletion.create calls.
  • Add tests for the adapter (mocking HTTP calls) and a minimal integration test that verifies ArticleProcessor can call the backend adapter.
  • Update README and requirements/setup.py to document the new env vars and add requests dependency.

Required changes (detailed):

  1. New file: quantcli/backend.py
  • Implements OllamaAdapter class.
  • The adapter should read OLLAMA_BASE_URL (default http://localhost:11434) and OLLAMA_MODEL from env vars.
  • Provide chat_complete(messages: List[Dict[str,str]], max_tokens:int=1500, temperature:float=0.0) -> str.
  • Use requests.post to POST to Ollama generate endpoint (try /api/generate). Parse common response shapes (text, output, choices[0].message.content, etc.) and return the model text. Raise descriptive error on network failure.
  1. New file: quantcli/backend_factory.py
  • Provide make_backend() function that reads BACKEND env var (default 'ollama'). If 'ollama', return OllamaAdapter(). If unsupported, raise error.
  1. Modify quantcli/processor.py
  • Replace the current OpenAIHandler usage such that:
    • Create a backend via make_backend() during ArticleProcessor.init and pass it to OpenAIHandler (or rename OpenAIHandler to LLMHandler if preferred).
    • All calls that previously used openai.ChatCompletion.create(...) should instead call backend.chat_complete(messages, max_tokens, temperature) and parse the returned text.
  • Keep existing prompts and message construction, but ensure the code no longer imports or requires the OpenAI SDK for the Ollama path. Preserve existing behavior for non-ollama backends if BACKEND changes in future.
  • Ensure error handling: if backend.chat_complete raises, ArticleProcessor logs and returns gracefully (GUI shows error).
  1. Update setup.py / requirements
  • Add 'requests' to install_requires in setup.py and/or requirements file so the adapter works after pip install.
  1. Update README.md
  • Add a short section explaining how to configure Ollama for quantcoder-cli:
    • BACKEND=ollama
    • OLLAMA_BASE_URL=http://localhost:11434
    • OLLAMA_MODEL=
    • Example quick test snippet to exercise OllamaAdapter from Python REPL.
  • Note fallback behavior: the repo will default to Ollama; OpenAI is not required for Ollama mode.
  1. Tests (new): tests/test_ollama_adapter.py
  • Use pytest and requests-mock or responses to mock the Ollama HTTP response.
  • Assert that OllamaAdapter.chat_complete returns expected string when server returns common shapes and raises on 500.
  1. Small integration test: tests/test_integration_processor.py (optional minimal)
  • Create a stub/mock backend that returns a canned response, inject it into ArticleProcessor (or into the handler) and assert that generate_summary / generate_qc_code receive/return strings; this ensures the refactor didn't break the pipeline.

Implementation details and justifications:

  • Use environment-based wiring so no large code changes are needed. Default BACKEND should be 'ollama' to match your local setup.
  • Use the approach Katja uses as a compatibility reference: its backend/core/llm_router.LocalLLM.generate uses Ollama and accepts messages; adapt the payload similarly (include model and messages). Katja examples for base URL and env vars: OLLAMA_BASE_URL and OLLAMA_MODEL.
  • Keep message format as a list of dicts with {role, content}; Ollama modern versions accept this format. The adapter should also gracefully handle servers that return non-chat shapes.

Testing & QA:

  • Run pytest on the new tests.
  • Manually test against a local Ollama instance (instructions in README update).

Deliverable (what PR will include):

  • New branch with commits implementing the files above.
  • One pull request titled: "Integrate Ollama local LLM backend and pluggable adapter" with a clear description of the changes and usage instructions.

If any response shapes or endpoints differ on your machine, the adapter parsing will be adjusted in follow-up commits. Please confirm you want me to open the PR in SL-Mar/quantcoder-cli with these changes now.

This pull request was created from Copilot chat.


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI self-assigned this Dec 27, 2025
Copilot AI and others added 5 commits December 27, 2025 11:45
…ckends

Co-authored-by: SL-Mar <126812704+SL-Mar@users.noreply.github.com>
Co-authored-by: SL-Mar <126812704+SL-Mar@users.noreply.github.com>
Co-authored-by: SL-Mar <126812704+SL-Mar@users.noreply.github.com>
Co-authored-by: SL-Mar <126812704+SL-Mar@users.noreply.github.com>
Copilot AI changed the title [WIP] Add Ollama-backed pluggable LLM backend to quantcoder-cli Add Ollama backend adapter for local LLM execution Dec 27, 2025
Copilot AI requested a review from SL-Mar December 27, 2025 11:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants