[ARCHITECTURE] Replace Ollama with llama.cpp native C++ integration

## Problem
Current architecture uses Ollama (Go-based, 500MB+) as a separate installed application. This makes Cortex feel like a bolted-on tool rather than native OS integration.

## Solution (from Ed's feedback)
Integrate llama.cpp directly as a C++ library:
- Native binary integration (no separate process)
- Links directly into system components
- Smaller footprint, faster startup
- Feels like part of the OS, not an app
- Build custom binaries for natural language processing

## Why llama.cpp over Ollama
| Ollama | llama.cpp |
|--------|-----------|
| Go binary, 500MB+ | C++ library, <50MB |
| Separate process | Links into daemon |
| Feels like installed app | Feels like OS component |
| Network API overhead | Direct function calls |

## Technical Notes
- Reference: https://github.com/ggerganov/llama.cpp
- Consider static linking for single binary distribution
- Must support ARM64 and x86_64
- GGUF model format

## Acceptance Criteria
- [ ] llama.cpp integrated as shared library or static link
- [ ] Cortex daemon can call inference without spawning external process
- [ ] Works offline with no cloud dependency
- [ ] Startup time < 100ms for inference readiness
- [ ] Memory footprint < 100MB with model loaded
- [ ] Documentation for building from source

## Related Issues
- #424 (tiny model) - Will use this runtime
- #422 (security patching) - Will use local inference
- #383 (Ollama model selection) - Will be superseded

## Bounty: $150 (+ $150 bonus after funding)
Paid on merge to main.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[ARCHITECTURE] Replace Ollama with llama.cpp native C++ integration #425

Problem

Solution (from Ed's feedback)

Why llama.cpp over Ollama

Technical Notes

Acceptance Criteria

Related Issues

Bounty: $150 (+ $150 bonus after funding)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Ollama	llama.cpp
Go binary, 500MB+	C++ library, <50MB
Separate process	Links into daemon
Feels like installed app	Feels like OS component
Network API overhead	Direct function calls

Uh oh!

[ARCHITECTURE] Replace Ollama with llama.cpp native C++ integration #425

Description

Problem

Solution (from Ed's feedback)

Why llama.cpp over Ollama

Technical Notes

Acceptance Criteria

Related Issues

Bounty: $150 (+ $150 bonus after funding)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions