Skip to content

Add persistent datastore support (PostgreSQL + Redis cache) for storing analysis reports #1

@sadra-allahyari

Description

@sadra-allahyari

Currently, OncaLLM appears to store analysis reports in a non-persistent manner (e.g. in-memory, files). For production adoption, it would be highly beneficial to support a persistent datastore that enables historical analysis, filtering, scalability, and reliability—especially when handling high volumes of Kubernetes alerts.

🛠 Proposed Enhancements

1. PostgreSQL as primary persistent storage

  • Store alert metadata, report content, recommendations, timestamps, severity, Kubernetes context (namespaces, pod info).
  • Support efficient querying for filtering by time range, alert type, cluster namespace, analysis outcome, or severity.
  • Provide transactional durability and support concurrent access.

2. Redis (optional) as a caching or indexing layer

  • Cache recent or frequently accessed analysis reports to reduce load on the primary database.
  • (Optional) Use Redis as a pub/sub mechanism for real-time notifications or background task queueing.

3. Adapt code structure

  • Introduce data models via an ORM or query library (e.g. SQLAlchemy).
  • Abstract report storage behind a ReportStore interface with pluggable backends (in-memory, PostgreSQL).
  • Add configuration via .env to switch storage types (e.g. STORE_BACKEND=postgresql or inmemory).

4. API improvements

  • Extend existing endpoints (/reports, /reports/{id}) to support filtering and pagination.
  • Allow users to query reports by alert label, time window, namespace, or severity.

5. Migration path

  • Allow seamless migration from current in-memory or file-based storage to PostgreSQL without data loss (e.g. batch import on startup if switching).

6. Documentation and tests

  • Update README and .env.example to include database configuration options.
  • Add integration tests (via pytest) to validate PostgreSQL persistence and Redis caching behavior as build targets.

🧠 Why PostgreSQL + Redis?

PostgreSQL — Reliable, Proven Backend

  • Sentry’s engineering team explicitly confirms that PostgreSQL is its core relational datastore—used extensively for write-heavy workloads and production deployments with complex scaling and durability requirements:
    • "At Sentry, our relational datastore is Postgres. We’re a write heavy application and we’ve had to scale Postgres…" link

Redis — Caching & Real-Time Performance

  • In Grafana’s alerting/IRM setup, Redis is explicitly incorporated for high availability (HA) and fast real-time coordination via pub/sub—serving as a critical cache and messaging layer when Memberlist isn't suitable:
    • "Enable alerting high availability using Redis… use Redis standalone, Redis Cluster and Redis Sentinel modes…" link

🧩 How I Can Help

Happy to contribute:

  • Database schema models (Report, Alert, Recommendation)
  • Service layer abstraction and configuration loader
  • Migration logic and upgrade path
  • Updates to API endpoints and query parameter support
  • Tests (unit and integration) and CI updates
  • README, configuration, and deployment documentation

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions