-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Labels
bugSomething isn't workingSomething isn't working
Description
- correct counting of FN and FP
- BUG: the tracing files (...(FN|FP)_Analysis.txt) do not compare by publicationYear (in ValidationTests::writeFNandFPresults). ASySD_SRSR_Human has 8 cases of reported FN where publication years are > 1 year apart
- the FP analysis files contain pairs with "ARE NOT DUPLICATES", or is this a duplicate label transfer (?)
- the FN analysis files contain pairs with "ARE DUPLICATES". In In BIG_SET records 30783 - 7821 "ARE DUPLICATES" but their publication years are 2004 and 2008
- BUG: The FN analysis files contains a different number of paragraphs than the FN columns in the performance table. For ASySD_SRSR_Human
- 51 paragraphs (8 of which are "ARE DUPLICATES" because the publication years are more than 1 year apart)
- 53 FN in performance table
The analysis files use lowerID - higherID order. If 2 records (higherIDs) are FNs for the same record (lowerID), only the first couple is reported. For the records "25459 - 25452" and "29789 - 25452" in SRS_Human_to_validate, only the last one (25452 - 29789) is shown in SRSR_Human.txt_FN_Analysis.txt. The same for 26855 - 26854 and 26856 - 26854
duplicate label transfer: There might be a problem with the dedupId when it is copied NOT from the pivot record to the current record, but from the current record to the pivot record (see DeduplicationService::compareSet)
Reason
In FP and FN the couple where id == dedupId is counted as FN or FP, but it should be skipped IF the only other couple was an error.
E.g. for FN:
| id | dedupId | Counted as FN | Remark |
|---|---|---|---|
| 304 | 304 | Yes | WRONG: Should not be counted as FN |
| 31863 | 304 | Yes | GOOD: Should be counted as FN |
| id | dedupId | Counted as FN | Remark |
|---|---|---|---|
| 296 | 20257 | Yes | GOOD: Should be counted as FN |
| 20257 | 20257 | No | GOOD: Should not be counted as FN |
| 33023 | 20257 | No | GOOD: Should not be counted as FN |
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working