I have tried to tokenize (using other NLP tools) the Chinese articles and pass it into FingerPrint->hash function.
I got two fingers:
0110010101101011111111100100110101011110000011101001000000000000
0111011101101011111100100100110111011110000001101011000000000000
index = 0.21626516682989
I don't understand why two fingers are similar but the index is so low. Two articles are nearly the same.