-
Notifications
You must be signed in to change notification settings - Fork 336
Description
Thank you so much for posting such detailed tutorial !
I am trying to use this to cluster news content.
I have 275449 news contents that I need to cluster. The structure of my data is pretty similar to yours. I have news content Id and description (I don't have a ranking concept that you have in your data).
I followed all the steps as per your guide but when I tried to print the top n words nearest to the cluster centroid, it gave me a weird output. It printed the same combination of words in a specific format, with special characters etc.
In fact, I tried running this by creating very small test dataset, with just 10 records, but ended up with the same output.
Cluster 0 words: b'good', b'weather', b'game',
Cluster 0 ContentID: 1, 6,
Cluster 1 words: b'weather', b'good', b'game',
Cluster 1 ContentID: 3, 5, 8, 10,
Cluster 2 words: b'game', b'weather', b'good',
Cluster 2 ContentID: 2, 7,
Cluster 3 words: b'weather', b'good', b'game',
Cluster 3 ContentID: 4, 9,
Could you please help me to fix this.
Appreciate your help on this !