Skip to content
Bhaskar Gautam edited this page Mar 29, 2017 · 12 revisions

This is a wiki for Course Project in CS702

Evaluation Result


String Similarity Formula for two same type attributes (attrib1, attrib2)

            Similarity_Score = {140 - [Length.attrib1 - Length(attrib1 INTERSECTION attrib2)]} / 140

Levenstein Distance Formula for two same type attributes (attrib1, attrib2)

            Similarity_Score = {Length.Levenstein Distance (attrib1, attrib2)} / Length.(attrib1 + attrib2)

Longest Common Substring Formula for two same type attributes (attrib1, attrib2)

            Similarity_Score = {Length.LCS (attrib1, attrib2)} / Length.(attrib1 + attrib2)

Shannon Divergence Formula for two same type attributes (attrib1, attrib2)

Given a vector of frequencies (counts), fi the Shannon diversity index is computed as

            H = { n * log(n) − ∑ fi * log(fi) } / n
with k and n denoting the number of groups and the total count, respectively. If fi = 0, then the fi log(fi) term is set to 0.

The maximum value of the index is LOG(k). This value occurs when each group has the same frequency (i.e., maximum evenness ).

Clone this wiki locally