Chinese entity HiExpan issues

Hi jiaming,
     Thanks for your idea and codes. When I run those codes in Chinese corpus, I found some issues:


-  First, Dependent syntax and part of speech seem to be unnecessary in corpus processing.

- Second, getCombinedWeightByFeatureMap function use too much time when the featuresOfSeed size is large(the skip gram patterns  size reaches hundreds of thousands of levels). So I only retained the 600 features with the highest score and standard length in "eidSkipgram2TFIDFStrength.txt" for each entity. This method reduced the run time from 30 hours to 30 minutes, but there is the possibility of reducing the accuracy of the calculation of the combinedWeight score. 

- Third, type feature is useless for Chinese, I have to use the LDA model's score instead. Now I am evaluating the effectiveness of this method.

   

-  At last, I didn't find the code for the Taxonomy Global Optimization section. Where can I find it?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Chinese entity HiExpan issues #7

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Chinese entity HiExpan issues #7

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions