Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
339 changes: 339 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,339 @@
# Loki Development Guide

This file provides detailed architectural guidance and development context for AI agents working with the Loki codebase.

## High-Level Architecture

Loki is a horizontally-scalable, multi-tenant log aggregation system inspired by Prometheus. It stores compressed, unstructured logs and only indexes metadata using labels, making it cost-effective and operationally simple.

### Core Components & Data Flow

- **Distributor** (`pkg/distributor/`): Entry point for log streams
- Receives logs via HTTP/gRPC from agents (Alloy, Promtail)
- Validates and authenticates incoming streams
- Uses consistent hashing to distribute logs across ingesters
- Handles rate limiting and tenant isolation

- **Ingester** (`pkg/ingester/`): Log storage and buffering
- Receives log streams from distributors
- Buffers logs in memory and compresses them into chunks
- Writes chunks to long-term storage (S3, GCS, etc.)
- Maintains write-ahead log (WAL) for durability
- Manages lifecycle with lifecycler for ring membership

- **Querier** (`pkg/querier/`): Query processing engine
- Handles LogQL queries from Grafana or logcli
- Queries both ingesters (recent data) and long-term storage
- Merges results from multiple sources
- Implements query parallelization and optimization

- **Ruler** (`pkg/ruler/`): Alerting and recording rules
- Evaluates LogQL expressions periodically
- Generates alerts based on log patterns
- Creates recording rules for pre-computed metrics
- Integrates with Prometheus Alertmanager

- **Query Frontend** (`pkg/frontend/`): Query coordination (optional)
- Provides query queuing and parallelization
- Implements result caching
- Splits large queries across time ranges

- **Compactor** (`pkg/compactor/`): Background optimization
- Compacts small chunks into larger ones
- Applies retention policies
- Builds and maintains block indexes

### Deployment Modes

**Microservices Mode**: Each component runs as separate service
- Horizontal scaling per component
- Independent failure domains
- More operational complexity
- Suitable for large-scale deployments

**Monolithic Mode**: Single binary with all components
- Simpler deployment and operations
- Vertical scaling only
- Suitable for smaller deployments
- Default configuration in `cmd/loki/loki-local-config.yaml`

### LogQL Query Language Architecture

Located in `pkg/logql/`, the query language implementation includes:

- **Parser** (`pkg/logql/syntax/`): Converts LogQL strings to AST
- **Planner** (`pkg/logql/`): Optimizes queries and creates execution plans
- **Engine** (`pkg/logql/`): Executes queries against storage
- **Functions**: Built-in functions for log processing and metrics

**LogQL Capabilities**:
- Label-based stream selection: `{app="nginx", env="production"}`
- Log filtering: `|= "error"`, `|~ "regex pattern"`
- Parser extraction: `| json | logfmt | regex`
- Metric queries: `rate()`, `count_over_time()`, `quantile_over_time()`
- Aggregations: `sum by (label)`, `topk()`, `bottomk()`

### Storage Architecture

**Chunk Storage Model**:
- Logs are compressed into chunks (default 256KB-1MB)
- Chunks contain logs from single stream over time period
- Chunks are immutable once written to storage

**Index Structure**:
- Period-based index tables (daily/weekly)
- Stores mappings of label combinations to chunk references
- Supports Cassandra, BigTable, BoltDB, or in-memory indexes

**Storage Backends** (`pkg/storage/chunk/client/`):
- **Object Storage**: S3, GCS, Azure Blob Storage, Swift
- **Database**: Cassandra, BigTable
- **Local**: Filesystem, BoltDB (for testing/small deployments)

**Bloom Filters**:
- Reduce storage reads during queries
- Built per chunk to indicate presence of terms
- Configurable false positive rate

## Build & Development Commands

### Core Build Commands
```bash
make all # build all binaries
make loki # build loki only
make logcli # build logcli only
make promtail # build promtail only
make loki-canary # build monitoring tool
make chunks-inspect # tool for examining chunk files
make migrate # database migration tool
make lokitool # operational utilities
make query-tee # query result comparison tool
```

### Testing Commands
```bash
make test # run all unit tests
make test-integration # run integration tests
go test ./... # run all tests with Go directly
go test -v ./pkg/logql/... # run tests in specific package (e.g., LogQL)
go test -run TestName ./pkg/path # run a specific test
make test-fuzz # run fuzz tests
```

### Development Tools
```bash
make lint # run all linters (use in CI-like environment)
make format # format code (gofmt and goimports)
make check-generated-files # verify generated files are up-to-date
make validate-example-configs # validate example configurations
make clean-protos # clean generated protobuf files
make protos # generate protobuf files
make yacc # generate parser from yacc grammar
make ragel # generate lexer from ragel grammar
```

### Cross-Platform & Debug Builds
```bash
make loki GOOS=linux GOARCH=amd64 # Linux AMD64
make loki GOOS=darwin GOARCH=arm64 # macOS Apple Silicon
make BUILD_IN_CONTAINER=false DEBUG=1 loki # build with debug symbols
make BUILD_IN_CONTAINER=true loki # build inside container
make loki-image # build Docker image
```

### Frontend Development

The Loki UI (different from query-frontend) is in `pkg/ui/frontend/` and built with Vite:

```bash
cd pkg/ui/frontend
make build # build the frontend
make dev # start development mode with hot reload
make test # run frontend tests
make lint # lint frontend code
make check-deps # check for vulnerabilities
make clean # clean build artifacts
```

## Code Organization

### Key Directories

- `cmd/`: Executable entry points (`loki`, `logcli`, `promtail`, `loki-canary`)
- `pkg/`: Core implementation packages
- `distributor/`, `ingester/`, `querier/`, `ruler/`: Main components
- `logql/`: Query language implementation with parser in `syntax/`
- `storage/`: Storage layer with pluggable backends
- `util/`: Shared utilities and build info
- `clients/`: Client libraries and plugins (Fluent-bit, Fluentd, etc.)
- `production/`: Deployment configurations (Docker Compose, Helm, Terraform)
- `docs/`: Documentation sources
- `integration/`: Integration test suites
- `operator/`: Kubernetes Operator implementation, see [loki operator deep dive](#loki-operator-deep-dive)

### Core Package Deep Dive

**`pkg/distributor/`**: Log ingestion entry point
- `distributor.go`: Main distributor implementation
- `http.go`: HTTP handler for log ingestion
- `validator.go`: Stream validation logic
- `rate_store.go`: Rate limiting implementation

**`pkg/ingester/`**: Log storage and WAL management
- `ingester.go`: Core ingester logic
- `instance.go`: Per-tenant log stream management
- `wal.go`: Write-ahead log implementation
- `checkpoint.go`: WAL checkpointing
- `flush.go`: Chunk flushing to storage

**`pkg/querier/`**: Query execution engine
- `querier.go`: Main query interface
- `queryrange/`: Query range optimization
- `series.go`: Time series querying
- `tail.go`: Live log tailing implementation

**`pkg/logql/`**: Query language implementation
- `syntax/parser.go`: LogQL grammar and parsing
- `ast.go`: Abstract syntax tree definitions
- `engine.go`: Query execution engine
- `functions.go`: Built-in function implementations
- `metrics.go`: Metric extraction logic

### Loki Operator (`operator/`)

The Loki Operator is a Kubernetes controller that manages Loki deployments using Custom Resource Definitions (CRDs). It provides a declarative approach to deploying and managing Loki in Kubernetes and OpenShift environments with support for multi-tenancy, flexible storage backends, and tight integration with OpenShift Logging.

**For detailed operator development guidance, see [`operator/AGENTS.md`](operator/AGENTS.md)**

## Development Guidelines

### Code Style
- Follow standard Go formatting (gofmt/goimports)
- Import order: standard library, external packages, then Loki packages
- Use structured logging with go-kit/log
- Document all exported functions, types, and variables
- Use table-driven tests when appropriate

### Error Handling Patterns
```go
// Always wrap errors with context
if err := someOperation(); err != nil {
return fmt.Errorf("failed to perform operation: %w", err)
}

// Use level.Error for structured logging
level.Error(logger).Log("msg", "operation failed", "err", err)
```

### Commit Format
Follow Conventional Commits: `<type>: <description>`
- `feat`: New features
- `fix`: Bug fixes
- `chore`: Maintenance tasks
- `docs`: Documentation changes

### Frontend Guidelines (from .cursor/rules/frontend.mdc)
- Use TypeScript with functional components
- Prefer interfaces over types, avoid enums
- Use lowercase with dashes for directories (`components/auth-wizard`)
- Colocate components close to where they're used
- Use Shadcn UI, Radix, and Tailwind for styling
- Avoid modifying Shadcn components directly in `src/components/ui/*`

## Configuration Architecture

Loki uses YAML configuration files with hierarchical structure:

**Global Settings**:
- Server config (HTTP/gRPC ports, timeouts)
- Authentication and authorization
- Common storage configuration

**Component-Specific**:
- Distributor: rate limiting, validation
- Ingester: WAL, chunk settings, lifecycler
- Querier: query limits, parallelization
- Ruler: rule evaluation, alerting

**Key Configuration Files**:
- `cmd/loki/loki-local-config.yaml`: Single-node development
- `production/helm/loki/values.yaml`: Kubernetes defaults
- `production/docker-compose/`: Docker Compose examples

## Testing Strategy

### Unit Tests
- Package-level testing with table-driven tests
- Mock external dependencies (storage, ring)
- Focus on business logic validation

### Integration Tests (`integration/`)
- `client/`: Test log ingestion clients
- `query/`: End-to-end query testing
- `cluster/`: Multi-component testing
- Use Docker Compose for test environments

### Frontend Testing
- Jest for unit tests
- React Testing Library for component tests
- Cypress for end-to-end testing

## Debugging & Troubleshooting

### Query Performance Analysis
```bash
# Use logcli with statistics
logcli query '{app="nginx"}' --stats

# Check slow query logs
kubectl logs -f loki-querier | grep "slow query"
```

### Storage Investigation
```bash
# Inspect chunk contents
./chunks-inspect --path=/path/to/chunk

# Check storage connectivity
./lokitool storage-client test-connection
```

### Component Health Monitoring
```bash
# Check ring status (microservices mode)
curl http://loki-distributor:3100/ring

# View ingester status
curl http://loki-ingester:3100/ready
```

## Common Development Tasks

### Adding a New LogQL Function
1. Extend the parser grammar in `pkg/logql/syntax/`
2. Add AST node types in `pkg/logql/ast.go`
3. Implement function logic in `pkg/logql/functions.go`
4. Add comprehensive tests for parsing and execution
5. Update documentation and examples

### Adding a New Storage Backend
1. Implement storage interfaces in `pkg/storage/chunk/client/`
2. Register backend in storage configuration
3. Add configuration validation and defaults
4. Include integration tests with real backend
5. Update deployment documentation

### Debugging Query Performance Issues
1. Use `logcli` with `--stats` flag to see query metrics
2. Check ingester and querier logs for bottlenecks
3. Analyze storage backend performance metrics
4. Consider bloom filter effectiveness
5. Review LogQL query patterns for optimization

## Documentation Standards
- Follow the Grafana [Writers' Toolkit](https://grafana.com/docs/writers-toolkit/) Style Guide
- Use CommonMark flavor of markdown for documentation
- Create LIDs (Loki Improvement Documents) for large functionality changes
- Document upgrading steps in `docs/sources/setup/upgrade/_index.md`
- Preview docs locally with `make docs` from the `/docs` directory
- Include examples and clear descriptions for public APIs
50 changes: 0 additions & 50 deletions CLAUDE.md

This file was deleted.

1 change: 1 addition & 0 deletions CLAUDE.md
Loading