PolyLLM is a Python package that provides a unified interface for interacting with multiple Large Language Models (LLMs) through a single, consistent API. It simplifies the process of working with different LLM providers by abstracting away their specific implementation details.
Links:
- Unified interface for multiple LLM providers:
- Local LLMs (llama.cpp, llama-cpp-python)
- Ollama
- OpenAI (GPT models)
- Google (Gemini models)
- Anthropic (Claude models)
- Support for different interaction modes:
- Standard chat completion
- Multimodal through image input
- Function calling / tools
- JSON output
- Structured output (using Pydantic models)
- Streaming real-time responses
| Provider | Standard Chat | Image Input | JSON | Structured Output | Tool Usage |
|---|---|---|---|---|---|
| llama.cpp | ✅ | 🔶 | ✅ | ✅ | ✅ |
| MLX | ✅ | 🟫 | 🚧 | 🚧 | 🚧 |
| Ollama | ✅ | ✅ | ✅ | ✅ | ✅ |
| Openai | ✅ | ✅ | ✅ | ✅ | ✅ |
| ✅ | ✅ | ✅ | ✅ | ✅ | |
| Anthropic | ✅ | ✅ | ✅ | ✅ | ✅ |
| Provider | Plain Text | JSON | Structured Output | Tool Usage |
|---|---|---|---|---|
| llama.cpp | ✅ | ✅ | ✅ | 🟫 |
| MLX | ✅ | 🚧 | 🚧 | 🟫 |
| Ollama | ✅ | ✅ | ✅ | 🟫 |
| Openai | ✅ | ✅ | ❌ | 🟫 |
| ✅ | ✅ | ✅ | 🟫 | |
| Anthropic | ✅ | 🟫 | ❌ | 🟫 |
✅: Supported
🔶: Support planned
❌: Not yet supported by the LLM provider
🟫: Support not planned
Warning
🚧: MLX support for structuring techniques is not part of the official mlx_lm module.
A modified version of this GBNF package is included here to support some interim capabilities.
These features are experimental and will be buggy and slow!
pip install polyllmpip install polyllm[all] # Gets all optional provider dependencies- Python 3.9+
backoffpydantic- Optional dependencies for advanced image input:
numpyopencv-pythonpillow
- Optional dependencies based on which LLM providers you want to use:
llama-cpp-pythonmlx-lmollamaopenaigoogle-generativeaianthropiclitellm
Set your API keys as environment variables:
export OPENAI_API_KEY="your-key-here"
export GOOGLE_API_KEY="your-key-here"
export ANTHROPIC_API_KEY="your-key-here"python -m polyllm.demo \
--image-path /path/to/image.jpg \
--llama-python-model /path/to/model.gguf \
--llama-python-server-port 8000 \
--ollama-model llama3.2-vision \
--openai-model gpt-4o \
--google-model gemini-1.5-flash-latest \
--anthropic-model claude-3-5-sonnet-latestThe model argument may be provided as one of the following:
- An instance of
llama_cpp.Llama- Helper function
model = polyllm.load_helpers.load_llama("path/to/model.gguf")
- Helper function
- A 2-tuple containing instances of
mlx.nn.ModuleandTokenizerWrapper- Helper function
model = polyllm.load_helpers.load_mlx("mlx-community/model-name-here", auto_download=True) - Also accepts a path to a local directory for self-managed downloads
- Helper function
'llamacpp/MODEL', whereMODELis either the port or ip:port of a running llama-cpp-python server (python -m llama_cpp.server --n_gpu_layers -1 --model path/to/model.gguf)- Treated as
f'http://localhost:{MODEL}/v1'ifMODELDOES NOT contain a:. - Treated as
f'http://{MODEL}/v1'ifMODELDOES contain a:.
- Treated as
'ollama/MODEL_NAME', whereMODEL_NAMEmatches theollama run MODEL_NAMEcommand'openai/MODEL_NAME''google/MODEL_NAME''anthropic/MODEL_NAME''litellm/PROVIDER/MODEL_NAME'- LiteLLM will replace the OpenAI, Google, and Anthropic backends in a future update.
- At that point, you will no longer need to use
'litellm'at the start of the string.
def generate(
model: str|Llama,
messages: list,
temperature: float = 0.0,
json_output: bool = False,
structured_output_model: BaseModel|None = None,
stream: bool = False,
) -> str | Generator[str, None, None]:Generate a chat message response as either a string or generator of strings depending on the stream argument.
def generate_tools(
model: str|Llama,
messages: list,
temperature: float = 0.0,
tools: list[Callable] = None,
) -> tuple[str, str, dict]:Ask the model to try to use one of the provided tools.
Responds with:
- Text reponse
- Tool name (Use
get_tool_functo get the tool object) - Tool arguments dictionary
def get_tool_func(
tools: list[Callable],
tool: str,
) -> Callable:Returns the tool corresponding to the name. Intended for use with the output of generate_tools.
def structured_output_model_to_schema(
structured_output_model: BaseModel,
indent: int|str|None = None,
) -> str:Creates a JSON schema string from a Pydantic model. Include the string in one of the messages in a generate(..., structured_output_model) call to help guide the model on how to respond.
def structured_output_to_object(
structured_output: str,
structured_output_model: type[BaseModel],
) -> BaseModel:Parse the output of a generate(..., structured_output_model) call into an instance of the Pydantic BaseModel.
import polyllmRun python -m polyllm to see the full list of detected Ollama, OpenAI, Google, and Anthropic models.
response = polyllm.generate(
model="openai/gpt-3.5-turbo",
messages=[{"role": "user", "content": "Hello, how are you?"}],
temperature=0.2,
)
print(response)
# Prints:
# Hello! I'm just a computer program, so I don't have feelings, but I'm here to help you. How can I assist you today?for chunk in polyllm.generate(
model="openai/gpt-4",
messages=[{"role": "user", "content": "Tell me a story"}],
temperature=0.7,
stream=True,
):
print(chunk, end='', flush=True)
print()
# Prints (a word or so at a time):
# Once upon a time, ...messages = [
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{"type": "image", "image": "/path/to/image"},
# These also work if you have the image as
# an np.array / PIL Image instead of on disk:
# {"type": "image", "image": cv2.imread("/path/to/image")},
# {"type": "image", "image": Image.open("/path/to/image")},
],
},
]
response = polyllm.generate(
model="ollama/llama3.2-vision",
messages=messages,
)
print(response)
# Prints:
# This image depicts ...def multiply_large_numbers(x: int, y: int) -> int:
"""Multiplies two large numbers."""
return x * y
tools = [multiply_large_numbers]
response, tool, args = polyllm.generate_tools(
model="google/gemini-1.5-pro-latest",
messages=[{"role": "user", "content": "What is 123456 multiplied by 654321?"}],
tools=tools,
)
tool_func = polyllm.get_tool_func(tools, tool)
if tool_func:
# print('response:', response) # Some models (Anthropic) may return both their tool call AND a text response
tool_result = tool_func(**args)
print(tool_result) # 123456 * 654321 = 80779853376
else:
print(response)
# Prints:
# 80779853376.0response = polyllm.generate(
model="anthropic/claude-3-5-sonnet-latest",
messages=[{"role": "user", "content": "List three colors in JSON"}],
json_output=True,
)
print(response)
# Prints:
# {
# "colors": [
# "red",
# "blue",
# "green"
# ]
# }
import json
print(json.loads(response))
# Prints:
# {'colors': ['red', 'blue', 'green']}from pydantic import BaseModel, Field
class Flight(BaseModel):
departure_time: str = Field(description="The time the flight departs")
destination: str = Field(description="The destination of the flight")
class FlightList(BaseModel):
flights: list[Flight] = Field(description="A list of known flight details")
flight_list_schema = polyllm.structured_output_model_to_schema(FlightList, indent=2)
response = polyllm.generate(
model="google/gemini-1.5-pro-latest",
messages=[
{
"role": "user",
"content": f"Write a list of 2 to 5 random flight details.\nProduce the result in JSON that matches this schema:\n{flight_list_schema}",
},
],
structured_output_model=FlightList,
)
print(response)
# Prints:
# {"flights": [{"departure_time": "2024-07-20T08:30", "destination": "JFK"}, {"departure_time": "2024-07-21T14:00", "destination": "LAX"}, {"departure_time": "2024-07-22T16:45", "destination": "ORD"}, {"departure_time": "2024-07-23T09:15", "destination": "SFO"}]}
response_object = polyllm.structured_output_to_object(response, FlightList)
print(response_object.flights[0].destination)
# Prints:
# JFKfrom polyllm.langchain import LCPolyLLM
llm = LCPolyLLM(model="openai/gpt-4")
response = llm.invoke("What is your name?")
print(response)
# Prints:
# As an artificial intelligence, I don't have a personal name. You can simply refer to me as OpenAI.
from langchain_core.prompts import ChatPromptTemplate
prompt = ChatPromptTemplate.from_messages(
[("system", "you are a bot"), ("human", "{input}")]
)
chain = prompt | llm
response = chain.invoke(input="What are you?")
print(response)
# Prints:
# Bot: I am an artificial intelligence assistant designed to help answer questions and provide information.