-
-
Notifications
You must be signed in to change notification settings - Fork 50
Open
Labels
aibountycore-teamenhancementNew feature or requestNew feature or requestpriority: highImportant for MVP completionImportant for MVP completion
Description
Problem
Current architecture uses Ollama (Go-based, 500MB+) as a separate installed application. This makes Cortex feel like a bolted-on tool rather than native OS integration.
Solution (from Ed's feedback)
Integrate llama.cpp directly as a C++ library:
- Native binary integration (no separate process)
- Links directly into system components
- Smaller footprint, faster startup
- Feels like part of the OS, not an app
- Build custom binaries for natural language processing
Why llama.cpp over Ollama
| Ollama | llama.cpp |
|---|---|
| Go binary, 500MB+ | C++ library, <50MB |
| Separate process | Links into daemon |
| Feels like installed app | Feels like OS component |
| Network API overhead | Direct function calls |
Technical Notes
- Reference: https://github.com/ggerganov/llama.cpp
- Consider static linking for single binary distribution
- Must support ARM64 and x86_64
- GGUF model format
Acceptance Criteria
- llama.cpp integrated as shared library or static link
- Cortex daemon can call inference without spawning external process
- Works offline with no cloud dependency
- Startup time < 100ms for inference readiness
- Memory footprint < 100MB with model loaded
- Documentation for building from source
Related Issues
- [FEATURE] AI Shell with lightweight apt-trained model for natural language command generation #424 (tiny model) - Will use this runtime
- [FEATURE] [CRITICAL] Autonomous Security Vulnerability Management & Patching #422 (security patching) - Will use local inference
- [BUG] Remove hardcoded llama3.2 model, allow configurable Ollama model selection #383 (Ollama model selection) - Will be superseded
Bounty: $150 (+ $150 bonus after funding)
Paid on merge to main.
Metadata
Metadata
Assignees
Labels
aibountycore-teamenhancementNew feature or requestNew feature or requestpriority: highImportant for MVP completionImportant for MVP completion