Skip to content

[Query] Can we get score in a case insensitive manner? #36

@sourabhXIII

Description

@sourabhXIII

First of all, thanks for the awesome library! 💯
I am couple of doubts. It would be great help if you please answer those.

  1. How is the tokenization done? Based on white space as far as I have browsed through the code. Is there a way to direct the scorer to split camel case tokens? For example: the string MyDocuments will be tokenized to ["My", " Documents"]

  2. I do not see any param to direct the scorer to score in a case-insensitive manner. Is it not possible or I am missing something?
    Below are the scores for couple of pair of strings, (mysmilarstring, MyawfullySimilarStirng) and (mysmilarstring, myawfullysimilarstirng). Scores are different for the pairs where as they are different only by casing of letters.

    -------------------------------FuzzySharp-------------------------------------------------
    mysmilarstring ||MyawfullySimilarStirng || Ratio = 56
    mysmilarstring ||MyawfullySimilarStirng || PartialRatio = 71
    mysmilarstring ||MyawfullySimilarStirng || TokenSortRatio = 56
    mysmilarstring ||MyawfullySimilarStirng || PartialTokenSortRatio = 71
    mysmilarstring ||MyawfullySimilarStirng || TokenSetRatio = 56
    mysmilarstring ||MyawfullySimilarStirng || PartialTokenSetRatio = 71
    mysmilarstring ||MyawfullySimilarStirng || TokenInitialismRatio = 0
    mysmilarstring ||MyawfullySimilarStirng || PartialTokenInitialismRatio = 0
    mysmilarstring ||MyawfullySimilarStirng || WeightedRatio = 64

    -------------------------------FuzzySharp-------------------------------------------------
    mysmilarstring ||myawfullysimilarstirng || Ratio = 72
    mysmilarstring ||myawfullysimilarstirng || PartialRatio = 86
    mysmilarstring ||myawfullysimilarstirng || TokenSortRatio = 72
    mysmilarstring ||myawfullysimilarstirng || PartialTokenSortRatio = 86
    mysmilarstring ||myawfullysimilarstirng || TokenSetRatio = 72
    mysmilarstring ||myawfullysimilarstirng || PartialTokenSetRatio = 86
    mysmilarstring ||myawfullysimilarstirng || TokenInitialismRatio = 0
    mysmilarstring ||myawfullysimilarstirng || PartialTokenInitialismRatio = 0
    mysmilarstring ||myawfullysimilarstirng || WeightedRatio = 77

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions