-
Notifications
You must be signed in to change notification settings - Fork 16
Description
I think in the function hll::HyperLogLog::merge it should not be the bitwise or between the two registers:
if (M_[r] < other.M_[r]) {
M_[r] |= other.M_[r];
}Instead it should be a simple assignment:
if (M_[r] < other.M_[r]) {
M_[r] = other.M_[r];
}In my application I use the HyperLogLog sketches mainly to estimate merged unions. I noticed that sometimes the estimate is too high and the relative error skyrockets. I found that when I exclusively use hll::HyperLogLog::add the estimate is good.
When I changed the above snippet, the value from the merging routine was exactly the same as the one from just adding. With my understanding of the HyperLogLog algorithm I also think this is theoretically the correct thing to do.
I don't know about the HyperLogLogHIP, because I have not yet read the article about that estimator. But it also doesn't give good estimates when I do a lot of merging.