-
Notifications
You must be signed in to change notification settings - Fork 88
Open
Description
First of all, thanks for the awesome library! 💯
I am couple of doubts. It would be great help if you please answer those.
-
How is the tokenization done? Based on white space as far as I have browsed through the code. Is there a way to direct the scorer to split camel case tokens? For example: the string
MyDocumentswill be tokenized to["My", " Documents"] -
I do not see any param to direct the scorer to score in a case-insensitive manner. Is it not possible or I am missing something?
Below are the scores for couple of pair of strings,(mysmilarstring, MyawfullySimilarStirng)and(mysmilarstring, myawfullysimilarstirng). Scores are different for the pairs where as they are different only by casing of letters.-------------------------------FuzzySharp-------------------------------------------------
mysmilarstring ||MyawfullySimilarStirng || Ratio = 56
mysmilarstring ||MyawfullySimilarStirng || PartialRatio = 71
mysmilarstring ||MyawfullySimilarStirng || TokenSortRatio = 56
mysmilarstring ||MyawfullySimilarStirng || PartialTokenSortRatio = 71
mysmilarstring ||MyawfullySimilarStirng || TokenSetRatio = 56
mysmilarstring ||MyawfullySimilarStirng || PartialTokenSetRatio = 71
mysmilarstring ||MyawfullySimilarStirng || TokenInitialismRatio = 0
mysmilarstring ||MyawfullySimilarStirng || PartialTokenInitialismRatio = 0
mysmilarstring ||MyawfullySimilarStirng || WeightedRatio = 64-------------------------------FuzzySharp-------------------------------------------------
mysmilarstring ||myawfullysimilarstirng || Ratio = 72
mysmilarstring ||myawfullysimilarstirng || PartialRatio = 86
mysmilarstring ||myawfullysimilarstirng || TokenSortRatio = 72
mysmilarstring ||myawfullysimilarstirng || PartialTokenSortRatio = 86
mysmilarstring ||myawfullysimilarstirng || TokenSetRatio = 72
mysmilarstring ||myawfullysimilarstirng || PartialTokenSetRatio = 86
mysmilarstring ||myawfullysimilarstirng || TokenInitialismRatio = 0
mysmilarstring ||myawfullysimilarstirng || PartialTokenInitialismRatio = 0
mysmilarstring ||myawfullysimilarstirng || WeightedRatio = 77
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels