Recall score for the kept tokens in SARI

Thank you for making this code available! I was trying to understand how the different components of the SARI score are computed, and I wonder if I've misunderstood something or if there's an inconsistency between the code and the paper. Consider the following example.

Input: "a b"
Output: "b"
Ref-1: "a b"
Ref-2: "a"

Now if I manually compute the recall of kept tokens using Eq. 5 from the paper, I get
```
    r_{keep}(1) = [min(0, 1) + min(1, 1/2)] / [1 + 1/2] = 1/3,
```
where the first terms of the numerator and denominator correspond to "a" and the second terms to "b". However, the GitHub implementation gives me
```
    r_{keep}(1) = 1/2.
```

The reason is that in the code the terms of the numerator are divided individually by the corresponding denominator terms on line 58, instead of dividing the sum of the numerator terms by the sum of the denominator terms as done in Eq. 5 in the paper.

Replacing line 58 by:
```
    keeptmpscore2 += keepgramcountergood_rep[keepgram]
```
and line 65 by:
```
    keepscore_recall = keeptmpscore2 / sum(keepgramcounterall_rep.values())
```
seems to fix this and yield p_{keep}(1) = 1/3 as I would expect it to yield.

Have I missed something? Thanks in advance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recall score for the kept tokens in SARI #6

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Recall score for the kept tokens in SARI #6

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions