This Discord bot transforms Discord into a collaborative LLM frontend, based on the excellent llmcord project. It works with practically any LLM, remote or locally hosted, with a modular Python codebase and extensive customization options.
Just @ the bot or include the phrase at ai (case-insensitive) anywhere in your message to start a conversation. Reply to the bot's response to continue the chat. The bot mention (@BotName or at ai) is automatically removed before the query is sent to the LLM (e.g., hello at ai becomes hello). Build conversations with reply chains!
You can:
- Branch conversations endlessly
- Continue other people's conversations
- @ the bot or include
at aiwhile replying to ANY message to include it in the conversation
Additionally:
- When DMing the bot, conversations continue automatically (no reply required). To start a fresh conversation, just @ the bot or include
at ai. You can still reply to continue from anywhere. - You can branch conversations into threads. Just create a thread from any message and @ the bot or include
at aiinside to continue. - Back-to-back messages from the same user are automatically chained together. Just reply to the latest one and the bot will see all of them.
llmcord supports remote models from:
- OpenAI API
- Google Gemini API (via
google-genai) - xAI API
- Mistral API
- Groq API
- OpenRouter API
Or run a local model with:
- Ollama
- LM Studio
- vLLM
- Jan
- Oobabooga's WebUI (via its OpenAI compatible API)
...Or use any other OpenAI compatible API server.
- Multiple API Key Support: Configure multiple API keys for each LLM provider (see
config.yaml). - Automatic Key Rotation & Retry: The bot randomly rotates between available keys for each request. If an error occurs (including during response streaming), it automatically retries with another key until the request succeeds.
- User-Specific Preferences: Users can set their preferred model using the
/modelcommand (see "Slash Commands" below).
llmcord offers slash commands for user-specific customizations:
/model <provider/model_name>: Set your preferred LLM provider and model (e.g.,openai/gpt-4.1,google/gemini-2.5-flash). This preference is saved and used for your future messages./systemprompt <prompt_text | "reset">: Set a custom system prompt for your interactions with the bot. Useresetto revert to the default system prompt defined inconfig.yaml./setgeminithinking <True|False>: Enable or disable thethinkingBudgetparameter for Gemini models for your interactions. This can potentially improve response quality for complex queries but may increase latency. The actual budget value is set globally inconfig.yaml./help: Displays a help message listing all available commands and their usage. User preferences for these commands are saved locally in JSON files (e.g.,user_model_prefs.json).
- Include YouTube URLs directly in your query.
- The bot automatically extracts the video's title, description, channel name, transcript (using
youtube-transcript-api), and up to 10 top comments (using YouTube Data API v3). - This extracted information is appended to your original query before being sent to the LLM, providing rich context for summarization, analysis, or question-answering based on the video content.
- Handles multiple YouTube URLs within the same message concurrently.
- Requires a YouTube Data API v3 key configured in
config.yaml. - Optional Proxy Support: Configure proxies in
config.yamlto help avoid YouTube IP blocks that can cause transcript fetching errors (likeParseError). Seeconfig-example.yaml.
- Include Reddit submission URLs directly in your query (e.g.,
https://www.reddit.com/r/.../comments/.../). - The bot automatically extracts the submission's title, self-text (if any), and up to 10 top-level comments using
asyncpraw. - This extracted information is appended to your original query before being sent to the LLM, providing context for summarization, analysis, or question-answering based on the Reddit post.
- Handles multiple Reddit URLs within the same message concurrently.
- Requires Reddit API credentials (client ID, client secret, user agent) configured in
config.yaml. You can obtain these by creating a 'script' app on Reddit's app preferences page.
- Include most HTTP/HTTPS URLs directly in your query (excluding YouTube and Reddit, which have specialized handling).
- The bot automatically fetches the content of the URL using
httpxand parses the main textual content usingBeautifulSoup4. - Extracted text (limited length) is appended to your original query before being sent to the LLM, providing context for summarization, analysis, or question-answering based on the web page content.
- Handles multiple URLs within the same message concurrently.
- Basic error handling included (timeouts, non-HTML content, fetch errors).
- Trigger image analysis by starting your query with
googlelens(case-insensitive) after mentioning the bot or usingat ai(e.g.,@BotName googlelens what is this?orat ai googlelens identify this object). - Requires image(s) to be attached to the message.
- The bot uses SerpAPI's Google Lens API to fetch visual matches and related content for each image concurrently.
- Requires SerpAPI API keys configured in
config.yaml. - Multiple API Key Support: Configure multiple SerpAPI keys in
config.yaml. - Automatic Key Rotation & Retry: The bot randomly rotates between available keys. If a request fails (e.g., rate limit), it automatically retries with another key.
- Appends the formatted results to your original query before sending it to the LLM, providing visual context for analysis or identification.
- Handles multiple image attachments in a single query.
llmcord employs several strategies to enrich the context provided to the LLM:
- Deep Search Keyword: Including
deepsearchordeepersearch(case-insensitive) in your query automatically attempts to switch to a pre-configured powerful model (default:x-ai/grok-3) for that specific request, if the model and its API keys are configured. - Vision Model Fallback: If your message includes images but your currently selected model doesn't support vision, the bot will automatically switch to a pre-configured fallback vision model (default:
google/gemini-2.5-flash) for that request, if available. A warning will inform you of the switch. - SearXNG Integration for Grounding: For non-Gemini/non-Grok models, if your query doesn't contain URLs, the bot can perform a pre-step:
- A Gemini model (
gemini-2.5-flashby default) analyzes your query and conversation history to generate relevant web search queries. - These queries are run against your configured SearXNG instance.
- Content from the top SearXNG result URLs (YouTube, Reddit, general web pages) is fetched, processed (with a configurable length limit for general web content via
searxng_url_content_max_length), and formatted. - This fetched web context is then provided to your chosen LLM along with your original query.
This feature requires
searxng_base_urland API keys for the grounding Gemini model to be configured.
- A Gemini model (
- PDF and DOCX Attachment Processing:
- When using Gemini models (with vision/file capabilities enabled), PDF attachments are sent directly to the model.
- For other models, text is extracted from PDF attachments using
PyMuPDFand appended to your query. - DOCX attachments are processed for all models (including Gemini): text is extracted using
python-docxand appended to your query.
- Image URL Processing: If you include direct URLs to images (e.g.,
https://example.com/image.png) in your message text, the bot will attempt to download these images and treat them as if they were attached directly, making them available to vision models or Google Lens.
- Persistent Cooldown: Rate-limited API keys (LLM providers & SerpAPI) are automatically detected and put on a 24-hour cooldown to prevent reuse.
- SQLite Database: Cooldown information is stored in separate SQLite databases per service (in the
ratelimit_dbs/folder), ensuring the cooldown persists even if the bot restarts. - Automatic Reset: Databases are automatically reset every 24 hours (tracked via
last_reset_timestamp.txt). Additionally, if all keys for a specific service become rate-limited, that service's database is reset immediately to allow retries. - Silent Retries: Error messages are only sent to Discord if all configured keys for a service have been tried and failed for a single request.
- Gemini Grounding: When using a compatible Gemini model (e.g.,
gemini-2.5-flash), the bot can automatically use Google Search to ground its responses with up-to-date information. - Persistent Action Buttons: If a response was enhanced by grounding or if there's a response text, action buttons will appear below the message. These buttons are persistent and do not time out.
- "Show Sources" Button: If grounding was used, this button reveals the search queries the model used and the web sources it consulted. The sources are displayed in embeds, split into multiple messages if necessary.
- "Get response as text file" Button: Allows you to download the bot's full response as a
.txtfile. - "View output properly (especially tables)" Button: If enabled in
config.yaml, this button shares the LLM's full Markdown response via a temporary public ngrok URL. The content is rendered as an HTML page using a local Grip server, making complex Markdown (like tables) easier to read.
- Slash Commands:
/model,/systemprompt,/setgeminithinking,/helpfor user-specific preferences and help (see "Slash Commands" section). - Supports image attachments when using a vision model (like
gpt-4.1,claude-3,gemini-2.5-flash, etc.). - Supports text file attachments (.txt, .py, .c, etc.), PDF attachments, and DOCX attachments (see "Advanced Query Handling").
- Customizable personality (default system prompt in
config.yaml, user-overridable via/systemprompt). - User identity aware (OpenAI API and xAI API only, sends user's Discord ID as
name). - Streamed responses (turns green when complete, automatically splits into separate messages when too long).
- Imgur URL Resending: If the LLM includes Imgur URLs under a specific header in its response, the bot automatically resends these URLs as separate messages to ensure they embed properly.
- Configurable OpenAI Vision: Vision capabilities for OpenAI models can be disabled via
disable_vision: truein the provider config. - Configurable Gemini Thinking Budget: The
thinkingBudgetfor Gemini models can be enabled and its value set globally inconfig.yaml, and users can toggle its use for their sessions via/setgeminithinking. - Hot reloading config (you can change settings without restarting the bot).
- Displays helpful warnings when appropriate (e.g., message limits, content fetching issues, model fallbacks).
- Caches message data in a size-managed (no memory leaks) and mutex-protected (no race conditions) global dictionary to maximize efficiency and minimize Discord API calls.
- Fully asynchronous.
- Modular Python codebase.
This project is organized into several key directories:
src/- Main source code organized by functionality:bot/- Discord bot implementation and command handlerscore/- Core utilities, configuration, and rate limitingllm/- LLM provider integrations and response handlingcontent/- Content extraction from URLs, files, and mediamessaging/- Message processing and conversation managementservices/- Additional services like grounding and prompt utilitiesui/- Output server for sharing rendered content
config/- Configuration files and user preferencesdocker/- Docker configuration fileslogs/- Application logsdata/- Data storage and cachingscripts/- Utility scriptstests/- Test files
-
Clone this repo or download the source code
-
Create a copy of "config/config-example.yaml" named "config/config.yaml" and set it up:
| Setting | Description |
|---|---|
| bot_token | Create a new Discord bot at discord.com/developers/applications and generate a token under the "Bot" tab. Also enable "MESSAGE CONTENT INTENT". |
| client_id | Found under the "OAuth2" tab of the Discord bot you just made. |
| status_message | Set a custom message that displays on the bot's Discord profile. Max 128 characters. |
| max_text | The maximum total number of tokens (counted using tiktoken) for the entire chat history sent to the LLM. This includes text from all messages in the context, attached text files, and extracted text from PDFs (for non-Gemini models). If this limit is exceeded, the oldest messages (preferring user/model pairs) are removed from the history until the total token count is within the limit. If removing all prior history is still not enough, the latest user query itself will be truncated.(Default: 2000) |
| max_images | The maximum number of image attachments allowed in a single message. Only applicable when using a vision model or Google Lens. (Default: 5) |
| max_messages | The maximum number of messages allowed in a reply chain. This acts as an initial cap before token-based truncation with max_text occurs.(Default: 25) |
| use_plain_responses | When set to true the bot will use plaintext responses instead of embeds. Plaintext responses have a shorter character limit so the bot's messages may split more often.Also disables streamed responses, warning messages, and the 'Show Sources' button. (Default: false) |
| allow_dms | Set to false to disable direct message access.(Default: true) |
| permissions | Configure permissions for users, roles and channels, each with a list of allowed_ids and blocked_ids.Leave allowed_ids empty to allow ALL.Role and channel permissions do not affect DMs. You can use category IDs to control channel permissions in groups (including for threads within those categories/channels). |
| Setting | Description |
|---|---|
| youtube_api_key | Required for YouTube URL processing. Get an API key from the Google Cloud Console. Ensure the YouTube Data API v3 is enabled for your project. This key is used to fetch video details (title, description, channel) and comments. Transcripts are fetched using youtube-transcript-api (which has its own proxy settings) and do not require this key. (Note: Currently only supports a single key). |
| Setting | Description |
|---|---|
| reddit_client_id | Required for Reddit URL processing. Your Reddit script app's client ID. |
| reddit_client_secret | Required for Reddit URL processing. Your Reddit script app's client secret. |
| reddit_user_agent | Required for Reddit URL processing. A unique user agent string (e.g., discord:my-llm-bot:v1.0 (by u/your_reddit_username)). |
| Setting | Description |
|---|---|
| proxy_config | Optional: Configure a proxy for youtube-transcript-api to potentially bypass YouTube IP blocks that cause transcript fetching errors (like ParseError: no element found). See config-example.yaml for setup details (Webshare recommended). |
| Setting | Description |
|---|---|
| serpapi_api_keys | Required for Google Lens. A list of API keys from SerpApi. The bot will rotate through these keys and retry if one fails (e.g., due to rate limits). |
| Setting | Description |
|---|---|
| providers | Add the LLM providers you want to use. For OpenAI compatible APIs, provide a base_url and a list of api_keys. For Google Gemini, just provide a list of api_keys. For keyless providers (like Ollama, Jan), provide an empty list [] for api_keys.Gemini uses the google-genai library, others use OpenAI compatible APIs.The bot rotates through keys and retries on failure. For OpenAI, you can add disable_vision: true to prevent image processing even if the model supports it. |
| model | Set the default model to <provider name>/<model name>, e.g:- openai/gpt-4.1- google/gemini-2.5-flash- ollama/llama3- openrouter/anthropic/claude-3.5-sonnetUsers can override this with the /model command. |
| searxng_base_url | Optional, for SearXNG grounding. The base URL of your SearXNG instance (e.g., http://localhost:18088). Required if you want non-Gemini/non-Grok models to have web search capabilities. |
| searxng_url_content_max_length | Optional, for SearXNG grounding. Maximum character length for text extracted from each URL fetched via SearXNG results. (Default: 20000) |
| grounding_system_prompt | Optional, for SearXNG grounding. The system prompt used for the Gemini model that generates search queries for SearXNG. |
| jina_engine_mode | Optional, for Jina Reader URL extraction. Sets the engine mode for Jina Reader. Supported: "direct" (fast, no JS, sends X-Engine: direct), "browser" (slower, handles JS, sends X-Engine: browser), or "default" (does not send X-Engine header, allowing Jina to use its current default). (Default: "default") |
| jina_wait_for_selector | Optional, for Jina Reader URL extraction. A CSS selector (e.g., "#content", ".main-article"). Jina Reader will wait for an element matching this selector to appear on the page before returning content. Useful for SPAs. (Default: null - disabled) |
| jina_timeout | Optional, for Jina Reader URL extraction. An integer in seconds (e.g., 30). Jina Reader will wait for this duration for dynamic content to load or network activity to become idle. (Default: null - Jina's internal default timeout) |
| gemini_use_thinking_budget | Optional, for Gemini models. Set to true to enable the thinkingBudget parameter by default for Gemini models. Users can override with /setgeminithinking. (Default: false) |
| gemini_thinking_budget_value | Optional, for Gemini models. The actual budget value (0-24576) to use if gemini_use_thinking_budget is enabled. (Default: 0) |
| extra_api_parameters | Extra API parameters for the selected LLM's provider. Add more entries as needed. Refer to your provider's documentation for supported API parameters. (Default: max_tokens=4096, temperature=1.0 for OpenAI compatible)(Gemini uses parameters like max_output_tokens, temperature, top_p, top_k) |
| system_prompt | The default system prompt to customize the bot's behavior. Users can set their own with /systemprompt.Leave blank for no default system prompt. |
| Setting | Description |
|---|---|
| output_sharing.ngrok_enabled | Set to true to enable the "View output properly" button, which uses ngrok to share rendered Markdown. (Default: false) |
| output_sharing.ngrok_authtoken | Your ngrok authtoken. Optional but recommended for stable ngrok usage. Find it on your ngrok dashboard. |
| output_sharing.grip_port | The local port number for the Python HTTP server (formerly Grip server) that renders the Markdown to HTML. (Default: 6419) |
| output_sharing.ngrok_static_domain | Optional: If you have a reserved static domain with ngrok (requires a paid plan), you can specify it here (e.g., your-cool-domain.ngrok-free.app). Otherwise, a random ngrok subdomain will be used. |
| output_sharing.cleanup_on_shutdown | Set to false to keep the generated HTML files in the shared_html/ directory across bot restarts. If true, the directory is cleaned up when the bot shuts down. (Default: true) |
| output_sharing.url_shortener_enabled | Set to true to shorten the ngrok URL using a service like TinyURL. (Default: false) |
| output_sharing.url_shortener_service | The URL shortener service to use. Currently supported: "tinyurl". (Default: "tinyurl") |
- User-specific settings for model, system prompt, and Gemini thinking budget are stored in JSON files in the
config/directory (e.g.,config/user_model_prefs.json,config/user_system_prompts.json,config/user_gemini_thinking_budget_prefs.json). Consider adding these to your.gitignoreif you manage your deployment with git.
-
Install requirements:
# Ensure you are in the project root directory (the one containing requirements.txt) python -m pip install -U -r requirements.txt(Note: This includes
discord.py,httpx,google-genai,openai,youtube-transcript-api,google-api-python-client,asyncpraw,beautifulsoup4,google-search-resultsfor SerpAPI,PyMuPDFfor PDF text extraction,tiktokenfor token counting,crawl4aifor advanced web scraping,gripandpyngrokfor output sharing, and other dependencies.) -
Run the bot:
No Docker:
# Ensure you are in the project root directory python main.pyWith Docker:
cd docker docker compose up
-
If you're having issues, try my suggestions here
-
Only models from OpenAI API and xAI API are "user identity aware" because only they support the "name" parameter in the message object. Hopefully more providers support this in the future.
-
Gemini safety settings are currently hardcoded to
BLOCK_NONEfor all categories. -
YouTube Data API has usage quotas. Fetching details and comments for many videos may consume your quota quickly. Check the Google Cloud Console for details.
-
YouTube transcript fetching (via
youtube-transcript-api) can be blocked by YouTube, especially when running from cloud IPs. This often results in aParseError: no element founderror. Configure the optionalproxy_configinconfig.yaml(using Webshare residential proxies is recommended) to mitigate this. -
Reddit API also has rate limits. Processing many Reddit URLs quickly might lead to temporary throttling.
-
General URL fetching uses
httpxandBeautifulSoup4. It might fail on complex JavaScript-heavy sites or sites with strong anti-scraping measures. Content extraction focuses on main text areas and might miss some information or include unwanted elements. Content from general URLs fetched via SearXNG grounding has a configurable length limit (searxng_url_content_max_length). -
Google Lens processing uses SerpAPI. Ensure you have valid SerpAPI keys configured.
-
PDF text extraction for non-Gemini models is done using
PyMuPDF. -
Rate Limit Handling & User Preferences: The bot uses SQLite databases (in the
ratelimit_dbs/folder) and JSON files (e.g.,config/user_model_prefs.json) in the config directory. Ensure these (andlast_reset_timestamp.txt) are added to your.gitignoreif you manage your deployment with git.
This project is based on the excellent llmcord by jakobdylanc. The original project provides the foundation for this Discord LLM bot implementation.
Contributions are welcome! Please feel free to submit issues or pull requests to improve the bot.