Skip to content

Recall score for the kept tokens in SARI #6

@ekQ

Description

@ekQ

Thank you for making this code available! I was trying to understand how the different components of the SARI score are computed, and I wonder if I've misunderstood something or if there's an inconsistency between the code and the paper. Consider the following example.

Input: "a b"
Output: "b"
Ref-1: "a b"
Ref-2: "a"

Now if I manually compute the recall of kept tokens using Eq. 5 from the paper, I get

    r_{keep}(1) = [min(0, 1) + min(1, 1/2)] / [1 + 1/2] = 1/3,

where the first terms of the numerator and denominator correspond to "a" and the second terms to "b". However, the GitHub implementation gives me

    r_{keep}(1) = 1/2.

The reason is that in the code the terms of the numerator are divided individually by the corresponding denominator terms on line 58, instead of dividing the sum of the numerator terms by the sum of the denominator terms as done in Eq. 5 in the paper.

Replacing line 58 by:

    keeptmpscore2 += keepgramcountergood_rep[keepgram]

and line 65 by:

    keepscore_recall = keeptmpscore2 / sum(keepgramcounterall_rep.values())

seems to fix this and yield p_{keep}(1) = 1/3 as I would expect it to yield.

Have I missed something? Thanks in advance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions