Skip to content

Bug in merge function #14

@feldroop

Description

@feldroop

I think in the function hll::HyperLogLog::merge it should not be the bitwise or between the two registers:

if (M_[r] < other.M_[r]) {
    M_[r] |= other.M_[r];
}

Instead it should be a simple assignment:

if (M_[r] < other.M_[r]) {
    M_[r] = other.M_[r];
}

In my application I use the HyperLogLog sketches mainly to estimate merged unions. I noticed that sometimes the estimate is too high and the relative error skyrockets. I found that when I exclusively use hll::HyperLogLog::add the estimate is good.

When I changed the above snippet, the value from the merging routine was exactly the same as the one from just adding. With my understanding of the HyperLogLog algorithm I also think this is theoretically the correct thing to do.

I don't know about the HyperLogLogHIP, because I have not yet read the article about that estimator. But it also doesn't give good estimates when I do a lot of merging.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions