Proposal: Retrieval-Conditioned Confidence Metric (RCCS) for RAG evaluation

## Motivation
Current evaluation metrics in the ecosystem emphasize answer-level quality (accuracy, F1, BLEU, ROUGE) but lack a lightweight, model-agnostic diagnostic for confidence calibration in Retrieval-Augmented Generation (RAG) pipelines. In practice, answer reliability depends on both retrieval relevance and model confidence; therefore a metric that evaluates their joint alignment is valuable.

## Proposed metric: Retrieval-Conditioned Confidence Score (RCCS)
RCCS measures alignment between:
- Retrieval relevance score `R` (per example)
- Model confidence `C` (per example; calibrated probability recommended)
- Ground-truth correctness `A` (0/1)

Two suggested outputs:
- **rccs_correlation**: Pearson correlation between `(R * C)` and `A`.
- **confidence_calibration_error**: mean absolute error `mean(|R*C - A|)`.

Rationale:
- High `R` with high `C` should correlate with `A=1` (correct).
- High `C` but low `R` indicates hallucination risk.
- The metric is model-agnostic, simple to compute, and useful for benchmarking and diagnostics.

## Proposed scope for initial PR
- Add directory `metrics/rccs/` implementing the metric as `evaluate.Metric`.
- Minimal API: accept columns `retrieval_score`, `confidence_score`, `correctness`.
- Return `rccs_correlation`, `confidence_calibration_error`, `mean_rc`, `n`.
- Include minimal unit tests and a metric card README with an example notebook.
- Keep implementation lightweight; follow repository conventions and tests.

## Example formulas
- `rccs_correlation = PearsonCorr(R * C, A)`
- `confidence_calibration_error = mean(|R * C - A|)`

## Notes
- I will provide example Colab / notebook showing RCCS with a simple RAG pipeline.
- I can open an initial PR implementing a minimal version if maintainers agree with scope and naming.

If this direction makes sense, I’m happy to implement whichever option aligns best with the project’s design goals and contribution guidelines. Please let me know the preferred option.

Thanks for your time, and happy to iterate on the approach.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Retrieval-Conditioned Confidence Metric (RCCS) for RAG evaluation #736

Motivation

Proposed metric: Retrieval-Conditioned Confidence Score (RCCS)

Proposed scope for initial PR

Example formulas

Notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Proposal: Retrieval-Conditioned Confidence Metric (RCCS) for RAG evaluation #736

Description

Motivation

Proposed metric: Retrieval-Conditioned Confidence Score (RCCS)

Proposed scope for initial PR

Example formulas

Notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions