-
Notifications
You must be signed in to change notification settings - Fork 1
Home
Bhaskar Gautam edited this page Mar 29, 2017
·
12 revisions


String Similarity Formula for two same type attributes (attrib1, attrib2)
Similarity_Score = {140 - [Length.attrib1 - Length(attrib1 INTERSECTION attrib2)]} / 140
Levenstein Distance Formula for two same type attributes (attrib1, attrib2)
Similarity_Score = {Length.Levenstein Distance (attrib1, attrib2)} / Length.(attrib1 + attrib2)
Longest Common Substring Formula for two same type attributes (attrib1, attrib2)
Similarity_Score = {Length.LCS (attrib1, attrib2)} / Length.(attrib1 + attrib2)
Shannon Divergence Formula for two same type attributes (attrib1, attrib2)
Given a vector of frequencies (counts), fi the Shannon diversity index is computed as
H = { n * log(n) − ∑ fi * log(fi) } / n
with k and n denoting the number of groups and the total count, respectively. If fi = 0, then the fi log(fi) term is set to 0.
The maximum value of the index is LOG(k). This value occurs when each group has the same frequency (i.e., maximum evenness ).