[Feature Request] Propensity-scored Metrics


**Is your feature request related to a problem? Please describe.**
In some extremely challenging scenarios for information retrieval, items relevant to a query follow a long tail distribution. Thus, there are a few extremely frequent relevant items (head items) and many extremely rare relevant items (tail items). Since high values can be achieved for common-used metrics such as precision@k or nDCG@k just by considering the head items, it is necessary for metrics also to consider the reward of the retrieved item (defined as the inverse of propensity). For these scenarios, it is also recommended to measure the propensity-scored counterparts of the precision (psprecision@k) and nDCG (psnDCG@k).

**Describe the solution you'd like**
Presently, ranx offers an efficient method for assessing and contrasting ranking effectiveness, fusion algorithms, and normalization strategies. Consequently, incorporating the aforementioned features with Propensity-scored metrics would make it even more complete.


# Test cases
**NOTE**: These test cases are available on [Google Colab](https://colab.research.google.com/drive/19pYUYu_jdGH_SkdPNEWqar5J8Yjszhap?usp=sharing) and all the data can be downloaded from [Propensity-scored Metrics-Files.zip](https://github.com/user-attachments/files/17352862/Propensity-scored.Metrics-20241013T030412Z-001.zip) and from [GDrive](https://drive.google.com/drive/folders/1sW3dnUpFZesYRMzq6zc7c_MKzeQl-tuy?usp=sharing).

Below are some test cases based on the predicted ranking (pred) and the relevance map (true). I employ the pyxclib Python library to measure the propensity-scored metrics using the subsequent approach:

1. Then Propensity-scored precision and ndcg;
2. A validation of pyxclib using ranx concerning the starndard precision@k and ndcg@k.

## The Propensity-scored precision and nDCG
It requires the prediction scores, true scores, and the (inverse) propensity:
```python
pred = load_sparse_matrix(name="pred")
true = load_sparse_matrix(name="true")
inv_propesities = load_sparse_matrix(name="inv_propesities")
```
Then, the measured metrics (psprecision and psnDCG) assume:
```python
get_pyxclib_propensity_scored_metrics(pred, true, inv_propesities, thresholds=[1,5,10])
# {
#     'psprecision@1': 0.316,
#     'psprecision@5': 0.53,
#     'psprecision@10': 0.715,
#     'psnDCG@1': 0.216,
#     'psnDCG@5': 0.19,
#     'psnDCG@10': 0.188
#  }

```
## The ideal Propensity-scored precision and nDCG
The maximum value of the propensity-scored metrics occurs when the predicted ranking places all relevant items ahead of non-relevant items and follows the propensity order (that is, rarer items ahead of less rare items).

```python
ideal_ranking = load_sparse_matrix(name="ideal_ranking")
get_pyxclib_propensity_scored_metrics(ideal_ranking, ideal_ranking, inv_propesities, thresholds=[1,5,10])
{
#     'psprecision@1': 1.0,
#     'psprecision@5': 1.0,
#     'psprecision@10': 1.0,
#     'psnDCG@1': 1.0,
#     'psnDCG@5': 1.0,
#     'psnDCG@10': 1.0
# }
```
## A validation of pyxclib using ranx concerning the standard precision@k and ndcg@k.
To validate the results, I performed a comparison of the usual precision and nDCG metrics using pycxlib and ranx.
```python

get_pyxclib_standard_metrics(pred, true, thresholds=[1,5,10])
# {
#     'precision@1': 0.579,
#     'precision@5': 0.448,
#     'precision@10': 0.361,
#     'nDCG@1': 0.579,
#     'nDCG@5': 0.478,
#     'nDCG@10': 0.412
#  }


get_ranx_standard_metrics(pred, true, thresholds=[1,5,10])
# {
#     'precision@1': 0.579,
#     'precision@5': 0.448,
#     'precision@10': 0.361,
#     'nDCG@1': 0.579,
#     'nDCG@5': 0.478,
#     'nDCG@10': 0.412
#  }
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature Request] Propensity-scored Metrics #59

Test cases

The Propensity-scored precision and nDCG

The ideal Propensity-scored precision and nDCG

A validation of pyxclib using ranx concerning the standard precision@k and ndcg@k.

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Feature Request] Propensity-scored Metrics #59

Description

Test cases

The Propensity-scored precision and nDCG

The ideal Propensity-scored precision and nDCG

A validation of pyxclib using ranx concerning the standard precision@k and ndcg@k.

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions