-
Notifications
You must be signed in to change notification settings - Fork 4
Open
Description
Currently, OncaLLM appears to store analysis reports in a non-persistent manner (e.g. in-memory, files). For production adoption, it would be highly beneficial to support a persistent datastore that enables historical analysis, filtering, scalability, and reliability—especially when handling high volumes of Kubernetes alerts.
🛠 Proposed Enhancements
1. PostgreSQL as primary persistent storage
- Store alert metadata, report content, recommendations, timestamps, severity, Kubernetes context (namespaces, pod info).
- Support efficient querying for filtering by time range, alert type, cluster namespace, analysis outcome, or severity.
- Provide transactional durability and support concurrent access.
2. Redis (optional) as a caching or indexing layer
- Cache recent or frequently accessed analysis reports to reduce load on the primary database.
- (Optional) Use Redis as a pub/sub mechanism for real-time notifications or background task queueing.
3. Adapt code structure
- Introduce data models via an ORM or query library (e.g. SQLAlchemy).
- Abstract report storage behind a ReportStore interface with pluggable backends (in-memory, PostgreSQL).
- Add configuration via .env to switch storage types (e.g. STORE_BACKEND=postgresql or inmemory).
4. API improvements
- Extend existing endpoints (/reports, /reports/{id}) to support filtering and pagination.
- Allow users to query reports by alert label, time window, namespace, or severity.
5. Migration path
- Allow seamless migration from current in-memory or file-based storage to PostgreSQL without data loss (e.g. batch import on startup if switching).
6. Documentation and tests
- Update README and .env.example to include database configuration options.
- Add integration tests (via pytest) to validate PostgreSQL persistence and Redis caching behavior as build targets.
🧠 Why PostgreSQL + Redis?
PostgreSQL — Reliable, Proven Backend
- Sentry’s engineering team explicitly confirms that PostgreSQL is its core relational datastore—used extensively for write-heavy workloads and production deployments with complex scaling and durability requirements:
- "At Sentry, our relational datastore is Postgres. We’re a write heavy application and we’ve had to scale Postgres…" link
Redis — Caching & Real-Time Performance
- In Grafana’s alerting/IRM setup, Redis is explicitly incorporated for high availability (HA) and fast real-time coordination via pub/sub—serving as a critical cache and messaging layer when Memberlist isn't suitable:
- "Enable alerting high availability using Redis… use Redis standalone, Redis Cluster and Redis Sentinel modes…" link
🧩 How I Can Help
Happy to contribute:
- Database schema models (Report, Alert, Recommendation)
- Service layer abstraction and configuration loader
- Migration logic and upgrade path
- Updates to API endpoints and query parameter support
- Tests (unit and integration) and CI updates
- README, configuration, and deployment documentation
Metadata
Metadata
Assignees
Labels
No labels