Skip to content

Why is that to get relevant results, the number of elements per bucket should be at least 100? #15

@snie2012

Description

@snie2012

In the comments of the example LSHMinHash code, it says that 'to get relevant results, the number of elements per bucket should be at least 100'. Why?

I tried to specify a number of buckets where the average number of elements per buckets is lower than 100, it turned out that many buckets were empty. Does this have to do with the hashing function that calculates the bucket for each band of the signature? Or is it because that a large portion of signatures after banding are more likely to be identical so they are hashed to the same buckets?

Thanks in advance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions