Add Ollama backend adapter for local LLM execution #13

Copilot · 2025-12-27T11:41:15Z

Enables running quantcoder-cli with local Ollama models instead of requiring OpenAI API access. Default backend is now Ollama.

Architecture

Backend Adapter Pattern

OllamaAdapter: HTTP client for Ollama API at /api/generate
Translates OpenAI message format to Ollama prompt format
Handles multiple response schemas (response, text, output, choices[])
Configurable via OLLAMA_BASE_URL (default: http://localhost:11434) and OLLAMA_MODEL (default: llama2)

Factory Selection

make_backend(): Returns adapter based on BACKEND env var (default: ollama)
Case-insensitive, extensible for future backends

Processor Integration

OpenAIHandler now accepts backend parameter instead of calling OpenAI SDK directly
ArticleProcessor uses factory to instantiate backend
All LLM calls (generate_summary, generate_qc_code, refine_code) route through backend.chat_complete()

Usage

# Environment configuration
export BACKEND=ollama
export OLLAMA_MODEL=mistral
export OLLAMA_BASE_URL=http://localhost:11434

# Backend adapter usage
from quantcli.backend_factory import make_backend
backend = make_backend()
response = backend.chat_complete([
    {"role": "system", "content": "You are a trading expert"},
    {"role": "user", "content": "Explain momentum strategy"}
], max_tokens=500, temperature=0.3)

Error Handling

Connection failures provide clear diagnostics:

Failed to connect to Ollama at http://localhost:11434/api/generate. 
Is Ollama running? Error: Connection refused

Unexpected response formats list expected fields:

Unexpected response format from Ollama. 
Expected fields: 'response', 'text', 'output', or 'choices'. 
Got: ['model', 'created_at', 'done']

Testing

21 tests with mocked HTTP calls cover:

Adapter initialization and configuration
Multiple Ollama response formats
Network error scenarios (timeout, connection refused, HTTP errors)
Factory backend selection
Integration with ArticleProcessor

Files

New

quantcli/backend.py (147 lines): OllamaAdapter implementation
quantcli/backend_factory.py (42 lines): Backend factory
tests/test_backend.py (198 lines): Adapter unit tests
tests/test_backend_factory.py (63 lines): Factory tests
tests/test_integration.py (151 lines): Integration tests

Modified

quantcli/processor.py: Refactored OpenAIHandler to use adapters (~50 lines changed)
README.md: Added Ollama setup guide and environment variable documentation

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

esm.ubuntu.com
- Triggering command: /usr/lib/apt/methods/https /usr/lib/apt/methods/https (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Configure Actions setup steps to set up my environment, which run before the firewall is enabled
Add the appropriate URLs or hosts to the custom allowlist in this repository's Copilot coding agent settings (admins only)

Original prompt

Goal: Add an Ollama-backed pluggable LLM backend to quantcoder-cli and make the CLI use it by default so users running Ollama locally can run the tool without the OpenAI SDK.

Scope (high-level):

Add a lightweight backend adapter for Ollama that exposes chat_complete(messages, max_tokens, temperature) -> str.

Add a backend factory to select backend via env var BACKEND (default: ollama).

Refactor the existing OpenAIHandler usage in quantcli/processor.py to use the backend adapter instead of direct openai.ChatCompletion.create calls.

Add tests for the adapter (mocking HTTP calls) and a minimal integration test that verifies ArticleProcessor can call the backend adapter.

Update README and requirements/setup.py to document the new env vars and add requests dependency.

Required changes (detailed):

New file: quantcli/backend.py

Implements OllamaAdapter class.

The adapter should read OLLAMA_BASE_URL (default http://localhost:11434) and OLLAMA_MODEL from env vars.

Provide chat_complete(messages: List[Dict[str,str]], max_tokens:int=1500, temperature:float=0.0) -> str.

Use requests.post to POST to Ollama generate endpoint (try /api/generate). Parse common response shapes (text, output, choices[0].message.content, etc.) and return the model text. Raise descriptive error on network failure.

New file: quantcli/backend_factory.py

Provide make_backend() function that reads BACKEND env var (default 'ollama'). If 'ollama', return OllamaAdapter(). If unsupported, raise error.

Modify quantcli/processor.py

Replace the current OpenAIHandler usage such that:

Create a backend via make_backend() during ArticleProcessor.init and pass it to OpenAIHandler (or rename OpenAIHandler to LLMHandler if preferred).

All calls that previously used openai.ChatCompletion.create(...) should instead call backend.chat_complete(messages, max_tokens, temperature) and parse the returned text.

Keep existing prompts and message construction, but ensure the code no longer imports or requires the OpenAI SDK for the Ollama path. Preserve existing behavior for non-ollama backends if BACKEND changes in future.

Ensure error handling: if backend.chat_complete raises, ArticleProcessor logs and returns gracefully (GUI shows error).

Update setup.py / requirements

Add 'requests' to install_requires in setup.py and/or requirements file so the adapter works after pip install.

Update README.md

Add a short section explaining how to configure Ollama for quantcoder-cli:

BACKEND=ollama

OLLAMA_BASE_URL=http://localhost:11434

OLLAMA_MODEL=

Example quick test snippet to exercise OllamaAdapter from Python REPL.

Note fallback behavior: the repo will default to Ollama; OpenAI is not required for Ollama mode.

Tests (new): tests/test_ollama_adapter.py

Use pytest and requests-mock or responses to mock the Ollama HTTP response.

Assert that OllamaAdapter.chat_complete returns expected string when server returns common shapes and raises on 500.

Small integration test: tests/test_integration_processor.py (optional minimal)

Create a stub/mock backend that returns a canned response, inject it into ArticleProcessor (or into the handler) and assert that generate_summary / generate_qc_code receive/return strings; this ensures the refactor didn't break the pipeline.

Implementation details and justifications:

Use environment-based wiring so no large code changes are needed. Default BACKEND should be 'ollama' to match your local setup.

Use the approach Katja uses as a compatibility reference: its backend/core/llm_router.LocalLLM.generate uses Ollama and accepts messages; adapt the payload similarly (include model and messages). Katja examples for base URL and env vars: OLLAMA_BASE_URL and OLLAMA_MODEL.

Keep message format as a list of dicts with {role, content}; Ollama modern versions accept this format. The adapter should also gracefully handle servers that return non-chat shapes.

Testing & QA:

Run pytest on the new tests.

Manually test against a local Ollama instance (instructions in README update).

Deliverable (what PR will include):

New branch with commits implementing the files above.

One pull request titled: "Integrate Ollama local LLM backend and pluggable adapter" with a clear description of the changes and usage instructions.

If any response shapes or endpoints differ on your machine, the adapter parsing will be adjusted in follow-up commits. Please confirm you want me to open the PR in SL-Mar/quantcoder-cli with these changes now.

This pull request was created from Copilot chat.

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

…ckends Co-authored-by: SL-Mar <126812704+SL-Mar@users.noreply.github.com>

Co-authored-by: SL-Mar <126812704+SL-Mar@users.noreply.github.com>

Copilot AI self-assigned this Dec 27, 2025

Initial plan

f22c6c4

Copilot AI assigned SL-Mar Dec 27, 2025

Copilot started work on behalf of SL-Mar December 27, 2025 11:41 View session

Copilot AI and others added 5 commits December 27, 2025 11:45

Add Ollama backend adapter and refactor processor to use pluggable ba…

28362d0

…ckends Co-authored-by: SL-Mar <126812704+SL-Mar@users.noreply.github.com>

Add comprehensive tests for backend adapter and integration

45dd4f3

Co-authored-by: SL-Mar <126812704+SL-Mar@users.noreply.github.com>

Remove __pycache__ files and update .gitignore

8c81885

Address code review feedback: improve error messages and test cleanup

2cf21d6

Co-authored-by: SL-Mar <126812704+SL-Mar@users.noreply.github.com>

Add implementation summary documentation

075e48d

Co-authored-by: SL-Mar <126812704+SL-Mar@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Add Ollama-backed pluggable LLM backend to quantcoder-cli~~ Add Ollama backend adapter for local LLM execution Dec 27, 2025

Copilot AI requested a review from SL-Mar December 27, 2025 11:54

Copilot finished work on behalf of SL-Mar December 27, 2025 11:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Ollama backend adapter for local LLM execution #13

Add Ollama backend adapter for local LLM execution #13

Copilot AI commented Dec 27, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add Ollama backend adapter for local LLM execution #13

Are you sure you want to change the base?

Add Ollama backend adapter for local LLM execution #13

Conversation

Copilot AI commented Dec 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Architecture

Usage

Error Handling

Testing

Files

I tried to connect to the following addresses, but was blocked by firewall rules:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Dec 27, 2025 •

edited

Loading