Skip to content
This repository was archived by the owner on Mar 21, 2025. It is now read-only.
This repository was archived by the owner on Mar 21, 2025. It is now read-only.

Need to exempt certain file types from being treated as binary due to trigram count #138

@robinp-tw

Description

@robinp-tw

Files larger than approx the max_trigram_count (default 20k) are treated as binary, thus excluded from content-based indexing.

If you search for lang:binary, there's chance you'll run into some large source codes or other content you still would wish to index.

The logic lives in https://github.com/google/zoekt/blob/master/indexbuilder.go#L298 . Could maybe plumb some options that excludes files of certain pattern from the "too long, probably binary" treatment (raising the trigram limit is not really as option, there's always just one more file that is above any limit and would have been nice to index).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions