Skip to content

[ARCHITECTURE] Replace Ollama with llama.cpp native C++ integration #425

@mikejmorgan-ai

Description

@mikejmorgan-ai

Problem

Current architecture uses Ollama (Go-based, 500MB+) as a separate installed application. This makes Cortex feel like a bolted-on tool rather than native OS integration.

Solution (from Ed's feedback)

Integrate llama.cpp directly as a C++ library:

  • Native binary integration (no separate process)
  • Links directly into system components
  • Smaller footprint, faster startup
  • Feels like part of the OS, not an app
  • Build custom binaries for natural language processing

Why llama.cpp over Ollama

Ollama llama.cpp
Go binary, 500MB+ C++ library, <50MB
Separate process Links into daemon
Feels like installed app Feels like OS component
Network API overhead Direct function calls

Technical Notes

Acceptance Criteria

  • llama.cpp integrated as shared library or static link
  • Cortex daemon can call inference without spawning external process
  • Works offline with no cloud dependency
  • Startup time < 100ms for inference readiness
  • Memory footprint < 100MB with model loaded
  • Documentation for building from source

Related Issues

Bounty: $150 (+ $150 bonus after funding)

Paid on merge to main.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions