Skip to content

[BUG] Counts of (at least) FP and FN #52

@globbestael

Description

@globbestael
  • correct counting of FN and FP
  • BUG: the tracing files (...(FN|FP)_Analysis.txt) do not compare by publicationYear (in ValidationTests::writeFNandFPresults). ASySD_SRSR_Human has 8 cases of reported FN where publication years are > 1 year apart
  • the FP analysis files contain pairs with "ARE NOT DUPLICATES", or is this a duplicate label transfer (?)
  • the FN analysis files contain pairs with "ARE DUPLICATES". In In BIG_SET records 30783 - 7821 "ARE DUPLICATES" but their publication years are 2004 and 2008
  • BUG: The FN analysis files contains a different number of paragraphs than the FN columns in the performance table. For ASySD_SRSR_Human
    • 51 paragraphs (8 of which are "ARE DUPLICATES" because the publication years are more than 1 year apart)
    • 53 FN in performance table
      The analysis files use lowerID - higherID order. If 2 records (higherIDs) are FNs for the same record (lowerID), only the first couple is reported. For the records "25459 - 25452" and "29789 - 25452" in SRS_Human_to_validate, only the last one (25452 - 29789) is shown in SRSR_Human.txt_FN_Analysis.txt. The same for 26855 - 26854 and 26856 - 26854

duplicate label transfer: There might be a problem with the dedupId when it is copied NOT from the pivot record to the current record, but from the current record to the pivot record (see DeduplicationService::compareSet)

Reason

In FP and FN the couple where id == dedupId is counted as FN or FP, but it should be skipped IF the only other couple was an error.

E.g. for FN:

id dedupId Counted as FN Remark
304 304 Yes WRONG: Should not be counted as FN
31863 304 Yes GOOD: Should be counted as FN
id dedupId Counted as FN Remark
296 20257 Yes GOOD: Should be counted as FN
20257 20257 No GOOD: Should not be counted as FN
33023 20257 No GOOD: Should not be counted as FN

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions