-
Notifications
You must be signed in to change notification settings - Fork 183
Add Precision-Recall curve in probscores (PR_curve) #531
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…better for imbalanced distribution
dnerini
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @DANIELV95 , thanks for this useful addition (PR curves are often more informative than ROC curves for rare-event verification). The structure (init/accum/compute) matches the ROC approach, which is good for API consistency.
This said, there are a few details worth correcting before merge:
- Missing visualization: I suggest that this PR be complemented with the relative plotting function
plot_PRin https://github.com/pySTEPS/pysteps/blob/master/pysteps/verification/plots.py (similar toplot_ROC). - Missing converage: we need basic units test along side the new method, see examples in https://github.com/pySTEPS/pysteps/blob/master/pysteps/tests/test_verification_probscores.py
| PR["corr_neg"][i] += np.sum(np.logical_and(~forecast_yes, ~obs_yes)) | ||
|
|
||
|
|
||
| def PR_curve_compute(PR, X_o, X_min, compute_area=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR_curve_computedoes not use X_o or X_min anywhere, maybe a copy/paste error? please remove them
| false_alarms = PR["false_alarms"][i] | ||
|
|
||
| recall = hits / (hits + misses) if (hits + misses) > 0 else 0.0 | ||
| precision = hits / (hits + false_alarms) if (hits + false_alarms) > 0 else 1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the convention using if (hits + false_alarms) > 0 else 1.0 should be documented in the docstrings as it can lead to some surprising results when thresholds yield zero predicted positives
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you comment on how this is handled in scikit-learn for example?
| X_o = X_o[mask] | ||
| for i, p in enumerate(PR["prob_thrs"]): | ||
| forecast_yes = P_f >= p | ||
| obs_yes = X_o >= PR["X_min"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this line computing obs_yes is constant and can be moved outside of the loop
Added Precision-Recall curve in probscores. Similar to ROC curve but better metric for imbalanced distribution.