Skip to content

Question about the implementation of VBase's iterative search in HNSW #22

@cyccbxhl

Description

@cyccbxhl

I'm studying the implementation of VBase on HNSW and I have some questions regarding the iterative search mechanism.
In the original searchBaseLayerST function, which is responsible for searching the top-K vectors from the L0 layer, a max-heap (top_candidates) is used to maintain the best candidates(its size is ef). This approach naturally results in the first vector dequeued from candidate_set not necessarily being part of the final top-K results.

Image

In the current implementation of searchBaseLayerSTIterative, it seems that top_candidates has been removed entirely. Instead, the first vector dequeued from candidate_set is passed upward directly, and appears to be handled in hnsw_gettuple. From what I can see (ignoring some magic numbers I'm not fully clear on), in the branch with ORDER BY, it seems like the very first vector dequeued from candidate_set in searchBaseLayerSTIterative is returned directly. Doesn't this risk returning an incorrect or suboptimal result?

Image Image

Also, according to the HNSW paper, there is a mechanism involving a queue of recently visited vectors, where the median distance is used as one of the termination conditions during traversal(so and filter condition). I couldn't find this logic implemented in the code.

Thanks in advance for your clarification!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions