-
Notifications
You must be signed in to change notification settings - Fork 2
Model Routing
How the bot selects the right LLM model for each request through tier-based routing.
See also: Configuration for runtime config fields, Quick Start for setup, Deployment for production configuration.
The bot uses a 4-tier model selection strategy that picks the most appropriate model based on task complexity. The tier is determined from multiple sources with clear priority:
-
User preference -- set via
/tiercommand orset_tiertool -
Skill override --
model_tierfield in skill YAML frontmatter -
Dynamic upgrade --
DynamicTierSystempromotes tocodingwhen code activity is detected mid-conversation -
Fallback --
"balanced"when no tier is explicitly set -
Per-user model override -- set via
/modelcommand
User Message
|
v
[ContextBuildingSystem] --- Resolves tier from user prefs / active skill
| Priority: force+user > skill > user pref > balanced
v
[DynamicTierSystem] --- May upgrade to "coding" if code activity detected
| (only on iteration > 0, never downgrades)
v
[ModelSelectionService] --- Resolves actual model for the tier
| (user override > router config fallback)
v
[ToolLoopExecutionSystem] --- Selects model + reasoning level based on modelTier
| (via DefaultToolLoopSystem internal loop)
v
LLM API Call
Four tiers map task complexity to model capabilities:
| Tier | Reasoning | Typical Use Cases | Default Model |
|---|---|---|---|
| balanced | medium |
Greetings, general questions, summarization (default/fallback) | openai/gpt-5.1 |
| smart | high |
Complex analysis, architecture decisions, multi-step planning | openai/gpt-5.1 |
| coding | medium |
Code generation, debugging, refactoring, code review | openai/gpt-5.2 |
| deep | xhigh |
PhD-level reasoning: proofs, scientific analysis, deep calculations | openai/gpt-5.2 |
Each tier is independently configurable -- you can assign any model from any supported provider to any tier. See Multi-Provider Setup below.
Configure tier models in preferences/runtime-config.json under modelRouter:
{
"modelRouter": {
"balancedModel": "openai/gpt-5.1",
"balancedModelReasoning": "medium",
"smartModel": "openai/gpt-5.1",
"smartModelReasoning": "high",
"codingModel": "openai/gpt-5.2",
"codingModelReasoning": "medium",
"deepModel": "openai/gpt-5.2",
"deepModelReasoning": "xhigh",
"dynamicTierEnabled": true,
"temperature": 0.7
}
}Note: Reasoning models may ignore the temperature parameter. The presence of a
reasoningobject and thesupportsTemperatureflag inmodels/models.json(workspace) control this behavior. See models.json Reference.
The tier is resolved in ContextBuildingSystem (order=20) on iteration 0 with the following priority:
| Priority | Source | Condition |
|---|---|---|
| 1 (highest) | User preference + force |
tierForce=true and modelTier set in user preferences |
| 2 | Skill model_tier |
Active skill has model_tier in YAML frontmatter |
| 3 | User preference |
modelTier set in user preferences (without force) |
| 4 (lowest) | Fallback |
"balanced" -- when no tier is explicitly set |
Force mode locks the tier, preventing both skill overrides and DynamicTierSystem upgrades. This is useful when you want a specific model regardless of context.
/tier # Show current tier and force status
/tier coding # Set tier to coding, clears force
/tier smart force # Lock tier to smart (ignores skill overrides + dynamic upgrades)
Key behavior:
-
/tier <tier>always clears the force flag (even if it was previously on) -
/tier <tier> forcesets both the tier and locks it - The setting persists across conversations (stored in user preferences)
The LLM can switch tiers mid-conversation using the set_tier tool:
{
"tier": "coding"
}- If the user has
tierForce=true, the tool returns an error: "Tier is locked by user" - Otherwise, the tier is applied immediately for the current conversation
- The tool does NOT persist the change to user preferences (session-only)
Users can override the default model for any tier using the /model command:
/model # Show current model+reasoning for all 4 tiers
/model list # List available models (filtered by allowed providers)
/model <tier> <provider/model> # Set model override for a tier
/model <tier> reasoning <level> # Set reasoning level for the current model on a tier
/model <tier> reset # Remove override, revert to application.properties default
Examples:
/model coding openai/gpt-5.2 # Set coding tier to GPT-5.2
/model coding reasoning high # Set reasoning level to high
/model smart anthropic/claude-sonnet-4-20250514 # Use Claude for smart tier
/model coding reset # Revert coding tier to default
Key behavior:
- Overrides are stored in
UserPreferences.tierOverridesand persist across conversations - When a model is set, the default reasoning level from
models.jsonis auto-applied - The model's provider must be in the allowed providers list (configured via
BOT_MODEL_SELECTION_ALLOWED_PROVIDERS) - Tier selection (
/tier) and model selection (/model) work independently --/tierselects which tier is active,/modelcustomizes what model each tier uses -
ModelSelectionServicecentralizes resolution: user override > router config fallback
Allowed providers:
BOT_MODEL_SELECTION_ALLOWED_PROVIDERS=openai,anthropic # defaultOnly models from these providers will appear in /model list and be accepted in /model <tier> <model>.
Skills can declare a preferred model tier in their YAML frontmatter:
---
name: code-review
description: Review code changes
model_tier: coding
---When a skill with model_tier is active:
- If the user has
tierForce=true, the skill's tier is ignored - Otherwise, the skill's tier takes precedence over the user's default preference
- A system prompt instruction informs the LLM about the tier switch
See Skills for more details on skill configuration.
DynamicTierSystem (order=25) runs on subsequent iterations of the agent loop (after tool calls). It scans only messages from the current loop run (after the last user message) to detect coding activity.
Key constraint: It never scans old conversation history, only the current run's assistant messages and tool results. This prevents false positives from past coding discussions.
Upgrade signals:
| Signal Type | Detection Logic |
|---|---|
| File operations on code files |
filesystem / file_system tool calls with write_file or read_file on files ending in .py, .js, .ts, .java, .go, .rs, .rb, .sh, .c, .cpp, .cs, .kt, .scala, .swift, .lua, .r, .pl, .php, .sql, .yaml, .yml, .toml, .gradle, .cmake, .makefile, plus Makefile and Dockerfile
|
| Code-related shell commands |
shell tool calls starting with python, node, npm, npx, pip, mvn, gradle, gcc, g++, cargo, go, rustc, pytest, make, cmake, javac, dotnet, ruby, tsc, webpack, esbuild, jest, mocha, yarn
|
| Stack traces in tool results | Tool result messages containing Traceback, SyntaxError, TypeError, NullPointerException, at com., at org., panic:, error[E, etc. |
Rules:
- Only upgrades to
coding-- never downgrades (prevents oscillation) - Skips if current tier is already
codingordeep - Skips if user has
tierForce=true - Only runs when
bot.router.dynamic-tier-enabled=true(default)
Source:
DynamicTierSystem.java
ToolLoopExecutionSystem (order=30) delegates to DefaultToolLoopSystem, which uses ModelSelectionService to translate the modelTier string into an actual model name and reasoning effort:
ModelSelectionService.resolveForTier(tier)
1. Check UserPreferences.tierOverrides for the tier
2. If override exists -> use override model + reasoning
3. Otherwise -> use runtime config defaults (modelRouter.*Model)
4. For reasoning models -> auto-fill default reasoning from models/models.json
The selected model and reasoning effort are passed to the LLM adapter via LlmRequest.
You can mix different LLM providers across tiers for cost optimization or capability access:
Configure provider API keys in preferences/runtime-config.json:
{
"llm": {
"providers": {
"openai": { "apiKey": "sk-proj-..." },
"anthropic": { "apiKey": "sk-ant-..." }
}
},
"modelRouter": {
"balancedModel": "openai/gpt-5.1",
"balancedModelReasoning": "medium",
"smartModel": "anthropic/claude-opus-4-6",
"smartModelReasoning": "high",
"codingModel": "openai/gpt-5.2",
"codingModelReasoning": "medium"
}
}The Langchain4jAdapter creates per-request model instances when the requested model differs from the default. Provider detection is based on the model name prefix (e.g., anthropic/claude-opus-4-6 routes to the Anthropic adapter).
See: Configuration for runtime config details.
Model capabilities are defined in the workspace at models/models.json.
On first run, the bot copies a bundled models.json into the workspace so edits can persist (dashboard: Models).
Each entry specifies:
| Field | Type | Description |
|---|---|---|
provider |
string | Provider name: openai, anthropic, zhipu, qwen, cerebras, deepinfra
|
displayName |
string | Human-readable name for UI display (e.g., in /model list) |
supportsTemperature |
boolean | Whether to send the temperature parameter (reasoning models typically don't support it) |
maxInputTokens |
integer | Maximum input tokens for non-reasoning models (used for truncation) |
reasoning |
object | Reasoning configuration (null/absent for non-reasoning models) |
reasoning.default |
string | Default reasoning level (e.g., "medium") |
reasoning.levels |
object | Map of level name to { "maxInputTokens": N }
|
Example entry:
{
"models": {
"gpt-5.1": {
"provider": "openai",
"displayName": "GPT-5.1",
"supportsTemperature": false,
"reasoning": {
"default": "medium",
"levels": {
"low": { "maxInputTokens": 1000000 },
"medium": { "maxInputTokens": 1000000 },
"high": { "maxInputTokens": 500000 },
"xhigh": { "maxInputTokens": 250000 }
}
}
},
"gpt-4o": {
"provider": "openai",
"displayName": "GPT-4o",
"supportsTemperature": true,
"maxInputTokens": 128000
},
"claude-sonnet-4-20250514": {
"provider": "anthropic",
"displayName": "Claude Sonnet 4",
"supportsTemperature": true,
"maxInputTokens": 200000
}
},
"defaults": {
"supportsTemperature": true,
"maxInputTokens": 128000
}
}Note: The
reasoningRequiredfield has been replaced by the presence of areasoningobject. Models with reasoning have per-level context limits insidereasoning.levels. Models without reasoning use the flatmaxInputTokensfield.
Model name resolution in ModelConfigService:
- Exact match (e.g.,
gpt-5.1) - Strip provider prefix (e.g.,
openai/gpt-5.1becomesgpt-5.1) - Prefix match (e.g.,
gpt-5.1-previewmatchesgpt-5.1) - Fall back to
defaultssection
The maxInputTokens value is used by:
-
AutoCompactionSystem-- triggers compaction at 80% of context window (uses per-level limit when available) -
ToolLoopExecutionSystem-- emergency truncation limits each message to 25% of context window (minimum 10K characters) -
ModelSelectionService-- resolves max tokens per tier for compaction threshold
See: Configuration for workspace paths and model config notes.
Model routing is primarily configured by choosing models for each tier and enabling optional dynamic upgrades.
Edit preferences/runtime-config.json:
{
"modelRouter": {
"balancedModel": "openai/gpt-5.1",
"smartModel": "openai/gpt-5.1",
"codingModel": "openai/gpt-5.2",
"deepModel": "openai/gpt-5.2",
"dynamicTierEnabled": true
}
}LLM providers have strict requirements for tool call identifiers and function names. When models switch mid-conversation (e.g., from a non-OpenAI provider to OpenAI due to tier change), stored tool call IDs and names from the previous provider may be incompatible with the new one. Langchain4jAdapter handles this transparently.
Source:
Langchain4jAdapter.java
OpenAI requires function names to match ^[a-zA-Z0-9_-]+$. Non-OpenAI providers (e.g., DeepInfra) may return names containing dots or other characters that get stored in conversation history.
Original name: "com.example.search.tool"
Sanitized name: "com_example_search_tool"
sanitizeFunctionName() replaces any character outside [a-zA-Z0-9_-] with _. This is applied to both:
- Assistant messages: tool call names in
toolExecutionRequests - Tool result messages: the
toolNamefield
If the name is null, it defaults to "unknown".
Provider-generated tool call IDs can violate two constraints:
-
Length: IDs exceeding 40 characters (the
MAX_TOOL_CALL_ID_LENGTHconstant) -
Characters: IDs containing characters outside
[a-zA-Z0-9_-](e.g., dots from non-OpenAI providers)
When either condition is detected, the adapter builds a consistent ID remap table before converting any messages:
Original ID: "chatcmpl-abc123.tool.call.very-long-identifier-from-provider"
Remapped ID: "call_a1b2c3d4e5f6a1b2c3d4e5f6" (UUID-based, 29 chars)
The remap is computed in a first pass over all messages, then applied consistently to both:
- Assistant messages:
toolCalls[].idfield - Tool result messages:
toolCallIdfield
This ensures the assistant's tool call IDs always match the corresponding tool result IDs, even after remapping.
Why this matters: Without remapping, switching from a model that generated long/non-standard IDs to OpenAI would cause 400 Bad Request errors because the tool result IDs wouldn't match what OpenAI expects.
LlmRequest.messages
|
v
[Pass 1: Build ID remap table]
| Scan all messages for tool calls with:
| - ID length > 40 chars, or
| - ID contains chars outside [a-zA-Z0-9_-]
| Generate: originalId -> "call_" + UUID(24 chars)
v
[Pass 2: Convert messages]
| For each message:
| - assistant + toolCalls: remap IDs, sanitize names
| - tool results: remap toolCallId, sanitize toolName
| - user/system: pass through
v
List<ChatMessage> (langchain4j format)
The bot employs a 3-layer defense to prevent context window overflow. Each layer operates at a different stage of the pipeline and catches progressively more severe cases.
AutoCompactionSystem (order=18) runs before the LLM call to proactively shrink the conversation history.
Source:
AutoCompactionSystem.java
Token estimation:
estimatedTokens = sum(message.content.length) / charsPerToken + systemPromptOverheadTokens
Where charsPerToken defaults to 3.5 and systemPromptOverheadTokens defaults to 8000.
Threshold resolution:
- Look up the current model's
maxInputTokensfrommodels/models.json(viaModelConfigService) - Apply 80% safety margin:
modelMax * 0.8 - Cap by runtime config:
min(modelThreshold, compaction.maxContextTokens) - If model lookup fails, fall back to
compaction.maxContextTokens
Compaction strategy:
- Summarize old messages via LLM (balanced model, low reasoning) using
CompactionService - Replace old messages with a
[Conversation summary]system message + last N messages (default N=10) - If LLM unavailable, fall back to simple truncation (drop oldest, keep last N)
Configuration:
Edit preferences/runtime-config.json:
{
"compaction": {
"enabled": true,
"maxContextTokens": 50000,
"keepLastMessages": 20
}
}See: Configuration for the full compaction reference.
DefaultToolLoopSystem truncates individual tool results that exceed maxToolResultChars before they are added to conversation history.
Source:
DefaultToolLoopSystem.java
When a tool result's content exceeds the limit (default 100,000 characters):
[first maxChars characters of content...]
[OUTPUT TRUNCATED: 500000 chars total, showing first 100000 chars.
The full result is too large for the context window.
Try a more specific query, use filtering/pagination,
or process the data in smaller chunks.]
The suffix length is subtracted from the cut point so the final output stays within the limit. This hint enables the LLM to self-correct by retrying with a more specific query.
Configuration:
Tool result truncation is controlled by the Spring property bot.auto-compact.max-tool-result-chars.
Default: 100000.
ToolLoopExecutionSystem (order=30) catches context overflow errors from the LLM provider and applies emergency per-message truncation as a last resort.
Source:
DefaultToolLoopSystem.java
Error detection -- matches these patterns in the exception message chain:
exceeds maximum input lengthcontext_length_exceededmaximum context lengthtoo many tokensrequest too large
Per-message limit calculation:
maxInputTokens = ModelConfigService.getMaxInputTokens(selectedModel)
maxMessageChars = maxInputTokens * 3.5 * 0.25 // 25% of context window per message
maxMessageChars = max(maxMessageChars, 10000) // floor: 10K chars
For example, with gpt-5.1 (1,000,000 input tokens):
maxMessageChars = 1,000,000 * 3.5 * 0.25 = 875,000 chars per message
With gpt-4o (128,000 input tokens):
maxMessageChars = 128,000 * 3.5 * 0.25 = 112,000 chars per message
Truncation format:
[first cutPoint characters of message...]
[EMERGENCY TRUNCATED: 500000 chars total. Try a more specific query to get smaller results.]
Recovery flow:
- Catch
context_length_exceededfrom LLM provider - Scan all messages, truncate any exceeding per-message limit
- Also truncate in session history (so truncation persists across requests)
- Rebuild LlmRequest and retry once
- If retry also fails, set
llm.errorforResponseRoutingSystem
Conversation History
|
v
Layer 1: AutoCompactionSystem (order=18)
Preventive. Estimates tokens, compacts if > 80% of model max.
Strategy: LLM summary + keep last N messages.
|
v
ToolLoopExecutionSystem (order=30)
+-----------------------------------+
| LLM call -> tool execution loop |
| |
| Layer 2: Tool Result Truncation |
| Per-result. Truncates > 100K. |
| Hint: "try a more specific |
| query" |
| |
| Layer 3: Emergency Truncation |
| On context_length_exceeded: |
| per-message limit = 25% of |
| context window. Truncates + |
| retries once. Persists in |
| session history. |
+-----------------------------------+
| Class | Package | Order | Purpose |
|---|---|---|---|
ContextBuildingSystem |
domain.system |
20 | Resolves tier from user prefs / skill, builds system prompt |
DynamicTierSystem |
domain.system |
25 | Mid-conversation upgrade to coding tier |
ToolLoopExecutionSystem |
domain.system |
30 | LLM calls, tool execution loop, plan intercept, model selection, emergency truncation |
AutoCompactionSystem |
domain.system |
18 | Preventive context compaction before LLM call |
TierTool |
tools |
-- | LLM tool for switching tier mid-conversation |
CommandRouter |
adapter.inbound.command |
-- |
/tier command handler |
LlmAdapterFactory |
adapter.outbound.llm |
-- | Provider adapter selection |
Langchain4jAdapter |
adapter.outbound.llm |
-- | OpenAI/Anthropic integration, ID remapping, name sanitization |
ModelConfigService |
infrastructure.config |
-- | Model capability lookups from models.json |
ModelSelectionService |
domain.service |
-- | Per-user model override resolution, provider filtering |
AgentContext |
domain.model |
-- | Runtime state: holds modelTier, activeSkill
|
The routing system produces detailed logs at INFO level:
[ContextBuilding] Resolved tier: coding (source: skill 'code-review')
[LLM] Model tier: coding, selected model: openai/gpt-5.2, reasoning: medium
On subsequent iterations with dynamic upgrade:
[DynamicTier] Detected coding activity, upgrading tier: balanced -> coding
[LLM] Model tier: coding, selected model: openai/gpt-5.2, reasoning: medium
User-initiated tier changes:
[TierTool] Tier changed to: smart
[LLM] Model tier: smart, selected model: openai/gpt-5.1, reasoning: high
Use /status in Telegram to check active configuration, including current model tier. For server-side flags and health, see Configuration.
Use /tier to check or change the current tier:
/tier # Show current: "Tier: balanced, Force: off"
/tier coding # Switch to coding tier
/tier smart force # Lock to smart tier (ignores skill overrides + dynamic upgrades)
User: "Hi, how are you?"
ContextBuildingSystem:
No user tier preference, no active skill
Tier: null -> balanced (fallback)
ToolLoopExecutionSystem:
Tier: balanced -> openai/gpt-5.1 (reasoning: medium)
Skill "code-review" has model_tier: coding
User: "Review this PR"
ContextBuildingSystem:
Active skill: code-review (model_tier: coding)
No user force -> use skill tier
Tier: coding
ToolLoopExecutionSystem:
Tier: coding -> openai/gpt-5.2 (reasoning: medium)
User ran: /tier smart force
Skill "code-review" has model_tier: coding
User: "Review this PR"
ContextBuildingSystem:
User preference: smart (force=true)
-> Skill's coding tier is ignored
Tier: smart
ToolLoopExecutionSystem:
Tier: smart -> openai/gpt-5.1 (reasoning: high)
User: "Help me with this project"
ContextBuildingSystem:
No user tier, no skill tier -> balanced
ToolLoopExecutionSystem (iteration 0):
Tier: balanced -> openai/gpt-5.1 (reasoning: medium)
--- LLM calls filesystem.write_file("app.py", ...) ---
DynamicTierSystem (iteration 1):
Detected: filesystem write on .py file -> upgrade to "coding"
ToolLoopExecutionSystem (iteration 1):
Tier: coding -> openai/gpt-5.2 (reasoning: medium)
User: "This needs deep analysis"
LLM calls set_tier({"tier": "deep"})
-> context.modelTier = "deep"
ToolLoopExecutionSystem (next iteration):
Tier: deep -> openai/gpt-5.2 (reasoning: xhigh)
See also: Configuration, Quick Start, Deployment, Skills
GolemCore Bot -- Apache License 2.0 | GitHub | Issues | Discussions
Getting Started
Core Concepts
Features
Reference
Development