diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000000000..41b5ed00424b6 --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,339 @@ +# Loki Development Guide + +This file provides detailed architectural guidance and development context for AI agents working with the Loki codebase. + +## High-Level Architecture + +Loki is a horizontally-scalable, multi-tenant log aggregation system inspired by Prometheus. It stores compressed, unstructured logs and only indexes metadata using labels, making it cost-effective and operationally simple. + +### Core Components & Data Flow + +- **Distributor** (`pkg/distributor/`): Entry point for log streams + - Receives logs via HTTP/gRPC from agents (Alloy, Promtail) + - Validates and authenticates incoming streams + - Uses consistent hashing to distribute logs across ingesters + - Handles rate limiting and tenant isolation + +- **Ingester** (`pkg/ingester/`): Log storage and buffering + - Receives log streams from distributors + - Buffers logs in memory and compresses them into chunks + - Writes chunks to long-term storage (S3, GCS, etc.) + - Maintains write-ahead log (WAL) for durability + - Manages lifecycle with lifecycler for ring membership + +- **Querier** (`pkg/querier/`): Query processing engine + - Handles LogQL queries from Grafana or logcli + - Queries both ingesters (recent data) and long-term storage + - Merges results from multiple sources + - Implements query parallelization and optimization + +- **Ruler** (`pkg/ruler/`): Alerting and recording rules + - Evaluates LogQL expressions periodically + - Generates alerts based on log patterns + - Creates recording rules for pre-computed metrics + - Integrates with Prometheus Alertmanager + +- **Query Frontend** (`pkg/frontend/`): Query coordination (optional) + - Provides query queuing and parallelization + - Implements result caching + - Splits large queries across time ranges + +- **Compactor** (`pkg/compactor/`): Background optimization + - Compacts small chunks into larger ones + - Applies retention policies + - Builds and maintains block indexes + +### Deployment Modes + +**Microservices Mode**: Each component runs as separate service +- Horizontal scaling per component +- Independent failure domains +- More operational complexity +- Suitable for large-scale deployments + +**Monolithic Mode**: Single binary with all components +- Simpler deployment and operations +- Vertical scaling only +- Suitable for smaller deployments +- Default configuration in `cmd/loki/loki-local-config.yaml` + +### LogQL Query Language Architecture + +Located in `pkg/logql/`, the query language implementation includes: + +- **Parser** (`pkg/logql/syntax/`): Converts LogQL strings to AST +- **Planner** (`pkg/logql/`): Optimizes queries and creates execution plans +- **Engine** (`pkg/logql/`): Executes queries against storage +- **Functions**: Built-in functions for log processing and metrics + +**LogQL Capabilities**: +- Label-based stream selection: `{app="nginx", env="production"}` +- Log filtering: `|= "error"`, `|~ "regex pattern"` +- Parser extraction: `| json | logfmt | regex` +- Metric queries: `rate()`, `count_over_time()`, `quantile_over_time()` +- Aggregations: `sum by (label)`, `topk()`, `bottomk()` + +### Storage Architecture + +**Chunk Storage Model**: +- Logs are compressed into chunks (default 256KB-1MB) +- Chunks contain logs from single stream over time period +- Chunks are immutable once written to storage + +**Index Structure**: +- Period-based index tables (daily/weekly) +- Stores mappings of label combinations to chunk references +- Supports Cassandra, BigTable, BoltDB, or in-memory indexes + +**Storage Backends** (`pkg/storage/chunk/client/`): +- **Object Storage**: S3, GCS, Azure Blob Storage, Swift +- **Database**: Cassandra, BigTable +- **Local**: Filesystem, BoltDB (for testing/small deployments) + +**Bloom Filters**: +- Reduce storage reads during queries +- Built per chunk to indicate presence of terms +- Configurable false positive rate + +## Build & Development Commands + +### Core Build Commands +```bash +make all # build all binaries +make loki # build loki only +make logcli # build logcli only +make promtail # build promtail only +make loki-canary # build monitoring tool +make chunks-inspect # tool for examining chunk files +make migrate # database migration tool +make lokitool # operational utilities +make query-tee # query result comparison tool +``` + +### Testing Commands +```bash +make test # run all unit tests +make test-integration # run integration tests +go test ./... # run all tests with Go directly +go test -v ./pkg/logql/... # run tests in specific package (e.g., LogQL) +go test -run TestName ./pkg/path # run a specific test +make test-fuzz # run fuzz tests +``` + +### Development Tools +```bash +make lint # run all linters (use in CI-like environment) +make format # format code (gofmt and goimports) +make check-generated-files # verify generated files are up-to-date +make validate-example-configs # validate example configurations +make clean-protos # clean generated protobuf files +make protos # generate protobuf files +make yacc # generate parser from yacc grammar +make ragel # generate lexer from ragel grammar +``` + +### Cross-Platform & Debug Builds +```bash +make loki GOOS=linux GOARCH=amd64 # Linux AMD64 +make loki GOOS=darwin GOARCH=arm64 # macOS Apple Silicon +make BUILD_IN_CONTAINER=false DEBUG=1 loki # build with debug symbols +make BUILD_IN_CONTAINER=true loki # build inside container +make loki-image # build Docker image +``` + +### Frontend Development + +The Loki UI (different from query-frontend) is in `pkg/ui/frontend/` and built with Vite: + +```bash +cd pkg/ui/frontend +make build # build the frontend +make dev # start development mode with hot reload +make test # run frontend tests +make lint # lint frontend code +make check-deps # check for vulnerabilities +make clean # clean build artifacts +``` + +## Code Organization + +### Key Directories + +- `cmd/`: Executable entry points (`loki`, `logcli`, `promtail`, `loki-canary`) +- `pkg/`: Core implementation packages + - `distributor/`, `ingester/`, `querier/`, `ruler/`: Main components + - `logql/`: Query language implementation with parser in `syntax/` + - `storage/`: Storage layer with pluggable backends + - `util/`: Shared utilities and build info +- `clients/`: Client libraries and plugins (Fluent-bit, Fluentd, etc.) +- `production/`: Deployment configurations (Docker Compose, Helm, Terraform) +- `docs/`: Documentation sources +- `integration/`: Integration test suites +- `operator/`: Kubernetes Operator implementation, see [loki operator deep dive](#loki-operator-deep-dive) + +### Core Package Deep Dive + +**`pkg/distributor/`**: Log ingestion entry point +- `distributor.go`: Main distributor implementation +- `http.go`: HTTP handler for log ingestion +- `validator.go`: Stream validation logic +- `rate_store.go`: Rate limiting implementation + +**`pkg/ingester/`**: Log storage and WAL management +- `ingester.go`: Core ingester logic +- `instance.go`: Per-tenant log stream management +- `wal.go`: Write-ahead log implementation +- `checkpoint.go`: WAL checkpointing +- `flush.go`: Chunk flushing to storage + +**`pkg/querier/`**: Query execution engine +- `querier.go`: Main query interface +- `queryrange/`: Query range optimization +- `series.go`: Time series querying +- `tail.go`: Live log tailing implementation + +**`pkg/logql/`**: Query language implementation +- `syntax/parser.go`: LogQL grammar and parsing +- `ast.go`: Abstract syntax tree definitions +- `engine.go`: Query execution engine +- `functions.go`: Built-in function implementations +- `metrics.go`: Metric extraction logic + +### Loki Operator (`operator/`) + +The Loki Operator is a Kubernetes controller that manages Loki deployments using Custom Resource Definitions (CRDs). It provides a declarative approach to deploying and managing Loki in Kubernetes and OpenShift environments with support for multi-tenancy, flexible storage backends, and tight integration with OpenShift Logging. + +**For detailed operator development guidance, see [`operator/AGENTS.md`](operator/AGENTS.md)** + +## Development Guidelines + +### Code Style +- Follow standard Go formatting (gofmt/goimports) +- Import order: standard library, external packages, then Loki packages +- Use structured logging with go-kit/log +- Document all exported functions, types, and variables +- Use table-driven tests when appropriate + +### Error Handling Patterns +```go +// Always wrap errors with context +if err := someOperation(); err != nil { + return fmt.Errorf("failed to perform operation: %w", err) +} + +// Use level.Error for structured logging +level.Error(logger).Log("msg", "operation failed", "err", err) +``` + +### Commit Format +Follow Conventional Commits: `: ` +- `feat`: New features +- `fix`: Bug fixes +- `chore`: Maintenance tasks +- `docs`: Documentation changes + +### Frontend Guidelines (from .cursor/rules/frontend.mdc) +- Use TypeScript with functional components +- Prefer interfaces over types, avoid enums +- Use lowercase with dashes for directories (`components/auth-wizard`) +- Colocate components close to where they're used +- Use Shadcn UI, Radix, and Tailwind for styling +- Avoid modifying Shadcn components directly in `src/components/ui/*` + +## Configuration Architecture + +Loki uses YAML configuration files with hierarchical structure: + +**Global Settings**: +- Server config (HTTP/gRPC ports, timeouts) +- Authentication and authorization +- Common storage configuration + +**Component-Specific**: +- Distributor: rate limiting, validation +- Ingester: WAL, chunk settings, lifecycler +- Querier: query limits, parallelization +- Ruler: rule evaluation, alerting + +**Key Configuration Files**: +- `cmd/loki/loki-local-config.yaml`: Single-node development +- `production/helm/loki/values.yaml`: Kubernetes defaults +- `production/docker-compose/`: Docker Compose examples + +## Testing Strategy + +### Unit Tests +- Package-level testing with table-driven tests +- Mock external dependencies (storage, ring) +- Focus on business logic validation + +### Integration Tests (`integration/`) +- `client/`: Test log ingestion clients +- `query/`: End-to-end query testing +- `cluster/`: Multi-component testing +- Use Docker Compose for test environments + +### Frontend Testing +- Jest for unit tests +- React Testing Library for component tests +- Cypress for end-to-end testing + +## Debugging & Troubleshooting + +### Query Performance Analysis +```bash +# Use logcli with statistics +logcli query '{app="nginx"}' --stats + +# Check slow query logs +kubectl logs -f loki-querier | grep "slow query" +``` + +### Storage Investigation +```bash +# Inspect chunk contents +./chunks-inspect --path=/path/to/chunk + +# Check storage connectivity +./lokitool storage-client test-connection +``` + +### Component Health Monitoring +```bash +# Check ring status (microservices mode) +curl http://loki-distributor:3100/ring + +# View ingester status +curl http://loki-ingester:3100/ready +``` + +## Common Development Tasks + +### Adding a New LogQL Function +1. Extend the parser grammar in `pkg/logql/syntax/` +2. Add AST node types in `pkg/logql/ast.go` +3. Implement function logic in `pkg/logql/functions.go` +4. Add comprehensive tests for parsing and execution +5. Update documentation and examples + +### Adding a New Storage Backend +1. Implement storage interfaces in `pkg/storage/chunk/client/` +2. Register backend in storage configuration +3. Add configuration validation and defaults +4. Include integration tests with real backend +5. Update deployment documentation + +### Debugging Query Performance Issues +1. Use `logcli` with `--stats` flag to see query metrics +2. Check ingester and querier logs for bottlenecks +3. Analyze storage backend performance metrics +4. Consider bloom filter effectiveness +5. Review LogQL query patterns for optimization + +## Documentation Standards +- Follow the Grafana [Writers' Toolkit](https://grafana.com/docs/writers-toolkit/) Style Guide +- Use CommonMark flavor of markdown for documentation +- Create LIDs (Loki Improvement Documents) for large functionality changes +- Document upgrading steps in `docs/sources/setup/upgrade/_index.md` +- Preview docs locally with `make docs` from the `/docs` directory +- Include examples and clear descriptions for public APIs diff --git a/CLAUDE.md b/CLAUDE.md deleted file mode 100644 index 98f5ee10292ae..0000000000000 --- a/CLAUDE.md +++ /dev/null @@ -1,50 +0,0 @@ -# Loki Development Guide - -## Build & Test Commands - -```bash -make all # build all binaries -make loki # build loki only -make logcli # build logcli only -make test # run all unit tests -make test-integration # run integration tests -go test ./... # run all tests with Go directly -go test -v ./pkg/logql/... # run tests in specific package -go test -run TestName ./pkg/path # run a specific test -make lint # run all linters (use in CI-like environment) -make format # format code (gofmt and goimports) -``` - -### Building the Frontend - -The Loki UI/frontend (different from the query-frontend) is located in pkg/ui/frontend and is built with [Vite](https://vitejs.dev/). -From pkg/ui/frontend, you can use the following commands. - -```bash -make build # build the frontend -make check-deps # check for vulnerabilities in the frontend dependencies -make clean # clean the frontend -make dev # start the frontend in development mode -make lint # lint the frontend code -make test # run the frontend tests -``` - -## Code Style Guidelines -- Follow standard Go formatting (gofmt/goimports) -- Import order: standard lib, external packages, then Loki packages -- Error handling: Always check errors with `if err != nil { return ... }` -- Use structured logging with leveled logging (go-kit/log) -- Use CamelCase for exported identifiers, camelCase for non-exported -- Document all exported functions, types, and variables -- Use table-driven tests when appropriate -- Follow Conventional Commits format: `: Your change` -- For frontend: use TypeScript, functional components, component composition -- Frontend naming: lowercase with dashes for directories (components/auth-wizard) - -## Documentation Standards -- Follow the Grafana [Writers' Toolkit](https://grafana.com/docs/writers-toolkit/) Style Guide -- Use CommonMark flavor of markdown for documentation -- Create LIDs (Loki Improvement Documents) for large functionality changes -- Document upgrading steps in `docs/sources/setup/upgrade/_index.md` -- Preview docs locally with `make docs` from the `/docs` directory -- Include examples and clear descriptions for public APIs diff --git a/CLAUDE.md b/CLAUDE.md new file mode 120000 index 0000000000000..0c91e4d08d0c8 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1 @@ +/home/aconway/src/openshift/loki/AGENTS.md \ No newline at end of file diff --git a/operator/AGENTS.md b/operator/AGENTS.md new file mode 100644 index 0000000000000..807a163f94b6f --- /dev/null +++ b/operator/AGENTS.md @@ -0,0 +1,200 @@ +# Loki Operator Development Guide + +This file provides detailed architectural guidance and development context for AI agents working with the Loki Operator codebase. + +## Overview + +The Loki Operator is a Kubernetes controller that manages Loki deployments using Custom Resource Definitions (CRDs). It provides a declarative approach to deploying and managing Loki in Kubernetes and OpenShift environments. + +## Directory Structure + +**`operator/`**: the operator sub-project + +- **`api/`**: Custom Resource Definitions and API types + - `loki/v1/`: Stable API version with core resource types + - `lokistack_types.go`: Defines LokiStack custom resource for complete Loki deployments + - `alertingrule_types.go`: Manages LogQL-based alerting rules + - `recordingrule_types.go`: Defines recording rules for pre-computed metrics + - `rulerconfig_types.go`: Configures the ruler component behavior + - `loki/v1beta1/`: Beta API version for experimental features + +- **`cmd/`**: Operator executables and utilities + - `loki-operator/`: Main operator controller binary + - `loki-broker/`: CLI tool for operator management and debugging + - `size-calculator/`: Storage size calculation utility for capacity planning + +- **`internal/`**: Core operator implementation (not part of public API) + - `controller/`: Kubernetes controller reconciliation logic + - `config/`: Configuration management and validation + - `manifests/`: Kubernetes manifest generation and templating + - `operator/`: Core operator business logic and resource management + - `validation/`: Resource validation and admission control + - `sizes/`: Storage sizing algorithms and calculations + +- **`config/`**: Kubernetes deployment configurations + - `crd/`: Custom Resource Definition bases + - `rbac/`: Role-Based Access Control configurations + - `manager/`: Operator deployment manifests + - `samples/`: Example Custom Resource configurations + - `kustomize/`: Kustomize overlays for different environments + +- **`bundle/`**: Operator Lifecycle Manager (OLM) packaging + - Supports multiple deployment variants: + - `community`: Standard Grafana distribution + - `community-openshift`: OpenShift-compatible community version + - `openshift`: Red Hat certified OpenShift distribution + - Contains ClusterServiceVersion, package manifests, and bundle metadata + +## Key Features + +- **Multi-tenant Support**: Isolates log streams by tenant with configurable authentication +- **Flexible Storage**: Supports object storage (S3, GCS, Azure), local storage, and hybrid configurations +- **Auto-scaling**: Horizontal Pod Autoscaler integration for dynamic scaling +- **Security**: Integration with OpenShift authentication, RBAC, and network policies +- **Monitoring**: Built-in Prometheus metrics and Grafana dashboard integration +- **Gateway Component**: Optional log routing and tenant isolation layer + +## Deployment Variants + +1. **Community** (`VARIANT=community`): + - Registry: `docker.io/grafana` + - Standard Kubernetes deployment + - Flexible configuration options + - Community support channels + +2. **Community-OpenShift** (`VARIANT=community-openshift`): + - Optimized for OpenShift but community-supported + - Enhanced security contexts + - OpenShift-specific networking configurations + +3. **OpenShift** (`VARIANT=openshift`): + - Registry: `quay.io/openshift-logging` + - Namespace: `openshift-operators-redhat` + - Full Red Hat support and certification + - Tight integration with OpenShift Logging stack + +## Build and Development Commands + +```bash +# Core Development Workflow +make generate # Generate controller and CRD code +make manifests # Generate CRDs, RBAC, and deployment manifests +make fmt # Format Go source code +make vet # Run Go vet static analysis +make test # Execute unit tests +make lint # Run comprehensive linting + +# Local Development and Testing +make run # Run operator locally against configured cluster +make deploy # Deploy operator to cluster via kubectl +make undeploy # Remove operator from cluster +make install # Install CRDs to cluster +make uninstall # Remove CRDs from cluster + +# Container and Bundle Management +make docker-build # Build operator container image +make docker-push # Push operator image to registry +make bundle # Generate OLM bundle for specified variant +make bundle-push # Push bundle to registry +make catalog-build # Build catalog image for OLM +make catalog-push # Push catalog image + +# Advanced Deployment Options +make olm-deploy # Deploy via Operator Lifecycle Manager +make scorecard # Run OLM scorecard tests +make quickstart # Set up local development environment +``` + +## Testing Strategy + +```bash +# Unit Testing +make test # Run all unit tests +go test ./internal/... # Test internal packages +go test ./api/... # Test API types and validation + +# Integration Testing +make e2e # Run end-to-end tests (requires cluster) +make test-storage # Test storage backend integrations +make test-openshift # OpenShift-specific integration tests + +# Quality Assurance +make lint-rules # Validate Prometheus alerting rules +make bundle-validate # Validate OLM bundle structure +make scorecard # Run OLM certification tests +``` + +## Contributing to Operator Code + +1. **Development Environment Setup**: + ```bash + # Prerequisites: Go 1.21+, Docker/Podman, kind or OpenShift cluster + git clone https://github.com/grafana/loki.git + cd loki/operator + make quickstart # Sets up local environment + ``` + +2. **Development Workflow**: + ```bash + # Make changes to API types, controllers, or manifests + make generate manifests # Regenerate code and manifests + make fmt vet lint # Ensure code quality + make test # Run unit tests + make run # Test locally against cluster + ``` + +3. **Adding New API Fields**: + - Modify types in `api/loki/v1/*.go` + - Run `make generate manifests` to update generated code + - Add validation logic in `internal/validation/` + - Update controller reconciliation in `internal/controller/` + - Write comprehensive unit tests + +4. **Adding New Features**: + - Extend controller logic in `internal/controller/lokistack/` + - Add manifest generation in `internal/manifests/` + - Update configuration handling in `internal/config/` + - Add feature flags if needed + - Document in operator documentation + +5. **Testing Your Changes**: + ```bash + # Local testing workflow + make install # Install CRDs + make run # Run operator locally + # Apply sample CustomResources in another terminal + kubectl apply -f config/samples/ + ``` + +6. **Bundle and Release Process**: + ```bash + # Test bundle generation for all variants + make bundle VARIANT=community + make bundle VARIANT=community-openshift + make bundle VARIANT=openshift + make bundle-validate # Validate all bundles + ``` + +## Common Development Tasks + +- **Adding New CRD Field**: Modify `*_types.go`, run `make generate manifests` +- **Updating Controller Logic**: Edit `internal/controller/`, ensure proper reconciliation +- **Adding Storage Backend**: Extend `internal/manifests/storage.go` +- **Enhancing Validation**: Update `internal/validation/` with new rules +- **Supporting New Loki Version**: Update manifests and test compatibility + +## Troubleshooting Development Issues + +```bash +# Debug operator logs +kubectl logs -f deployment/loki-operator-controller-manager -n loki-operator-system + +# Check CRD status +kubectl describe lokistack my-stack -n my-namespace + +# Validate generated manifests +kubectl apply --dry-run=client -f config/samples/ + +# Test bundle locally +operator-sdk run bundle-upgrade docker.io/grafana/loki-operator-bundle:latest +``` \ No newline at end of file