TIMRUN (TIM Runtime) is a high-performance inference engine that orchestrates the TIM (Thread Inference Model) for unprecedented long-horizon reasoning capabilities. TIMRUN manages the entire inference pipeline, using TIM to predict next tokens while performing intelligent structure checks to extract tool calls and identify prunable subtasks. This enables efficient end-to-end multi-hop tool use and makes complex problem-solving tasks more scalable.
- π Multi-hop Reasoning: Chain complex reasoning steps across extended contexts
- π οΈ End-to-End Tool Integration: Seamlessly incorporate external tools and APIs
- π― Long-horizon Planning: Handle tasks requiring extended planning and execution
- π§ Generative Orchestration: Intelligent context engineering learned by the TIM model and handled by TIMRUN with efficient KV cache pruning
βββββββββββββββββββ βββββββββββββββββββββββββββββββββββββββββββ
β Input Query βββββΆβ TIMRUN Engine β
β β β β
βββββββββββββββββββ β βββββββββββββββββββ β
β β Structure Check β β
β β β β
β β β’ Tool Calls β β
β β β’ Prunable β β
β β Subtasks β β
β βββββββββββββββββββ β
β β β
β βΌ β
β βββββββββββββββββββ β
β β TIM Model β β
β β β β
β β β’ Sparse Attn β βββββββββββββ β
β β β’ Multi-hop β β β
β β β’ Token Pred β β β
β βββββββββββββββββββ β β
β β β β
β βΌ βΌ β
βββββββββββββββββββ β βββββββββββββββββββ βββββββββββββββ β
β Tool Usage ββββββΌβββ Tool Execution β β KV Cache β β
β β β β β β Pruning β β
β β’ External APIsβ β β β’ Call Tools β β β β
β β’ Tool Calls β β β β’ Encode β β β’ Memory β β
β β’ Data Sources β β β Response β β Mgmt β β
βββββββββββββββββββ β βββββββββββββββββββ βββββββββββββββ β
β β β β
β βΌ βΌ β
β ββββββββββββββββββββββββββββββββββββββββ
β β Continue Decoding ββ
β β (with updated context) ββ
β ββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββ
β Final Result β
β β
βββββββββββββββββββ
Install the package using pip:
pip install subconscious-pythonNote: The package name is
subconscious-pythonbut you import it assubconscious:import subconscious # Import name remains clean and simple
Run your first agent:
from subconscious import Client
# Initialize the client
client = Client(
base_url="https://api.subconscious.dev/v1", # can be omitted
api_key="your-api-key" # get it from https://subconscious.dev
)
# Define tools
tools = [
{
"type": "function",
"name": "calculator",
"url": "https://URL_TO_CALCULATOR_TOOL/ENDPOINT", # the server url of your own tool
"method": "POST",
"timeout": 5, # seconds
"parameters": {
"type": "object",
"properties": {
"operation": {"type": "string"},
"a": {"type": "number"},
"b": {"type": "number"}
},
"required": ["operation", "a", "b"]
}
}
]
# Build toolkit
client.build_toolkit(tools, agent_name="math_agent")
# Run agent
messages = [{"role": "user", "content": "What is 2 + 3?"}]
response = client.agent.run(messages, agent_name="math_agent")
print(response)The TIM language model will call the calculator tool as many times as necessary, handle excepts, compute the answer, and return the result. The agent is completed with one language model API call!
We also provide fine-grained control over the reasoning structure, tool use, and memory management. Check out the deep research agent example for more advanced usage.
Note: The OpenAI compatible API does not support fine-grained reasoning structure control. For advanced performance tuning, please use the Subconscious Python SDK.
client = OpenAI(
base_url = "https://api.subconscious.dev/v1",
api_key = # get API KEY from https://subconscious.dev
)resp = client.chat.completions.create(
model = "tim-large",
messages = [
{
'role': 'user',
'content': 'find 10 most influencial research papers in dog walking.'
}
],
top_p = 0.95,
max_completion_tokens = 10000,
temperature = 0.6,
tools = [
{
"type": "function",
"name": "SearchTool",
"description": "a general search engine returns title, url, and desciription of 10 webpages",
"url": URL_TO_TOOL, # the server url of your own tool
"method": "POST",
"timeout": 10,
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "A natural language query for the search engine."
}
},
"required": [
"query"
],
"additionalProperties": False
}
}
]
stream = False # if true, same as OpenAI's streaming
)
print(json.loads(resp.choices[0].message.content)['answer'])- AP CS test assistant
- [Arxiv podcast writer - coming soon]
- [Legal research agent - coming soon]
coming soon
- Selective Working Memory: 50% reduction in memory usage for long sequences
- Tool Caching: 30% faster repeated tool calls
- Batched Processing: Multi-threaded tool execution when possible
- Memory Management: Efficient handling of large reasoning chains
If you use found our work helpful in your research, please cite:
@article{tim-timrun,
title={Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning},
author={Hongyin Luo, Nathaniel Morgan, Tina Li, Derek Zhao, Ai Vy Ngo, Philip Schroeder, Lijie Yang, Assaf Ben-Kish, Jack O'Brien, James Glass},
journal={arXiv preprint arXiv:2507.16784},
year={2024}
}This TIM-8b-preview model is licensed under the MIT License.
- π§ Email: hongyin OR jack AT subconscious DOT dev
- π Issues: GitHub Issues
- π Documentation: docs.subconscious.dev/