Skip to content

Unable to get the top n words nearest to the cluster centroid. #14

@MaheshwaranK

Description

@MaheshwaranK

Thank you so much for posting such detailed tutorial !

I am trying to use this to cluster news content.
I have 275449 news contents that I need to cluster. The structure of my data is pretty similar to yours. I have news content Id and description (I don't have a ranking concept that you have in your data).

I followed all the steps as per your guide but when I tried to print the top n words nearest to the cluster centroid, it gave me a weird output. It printed the same combination of words in a specific format, with special characters etc.

In fact, I tried running this by creating very small test dataset, with just 10 records, but ended up with the same output.

Cluster 0 words: b'good', b'weather', b'game',

Cluster 0 ContentID: 1, 6,

Cluster 1 words: b'weather', b'good', b'game',

Cluster 1 ContentID: 3, 5, 8, 10,

Cluster 2 words: b'game', b'weather', b'good',

Cluster 2 ContentID: 2, 7,

Cluster 3 words: b'weather', b'good', b'game',

Cluster 3 ContentID: 4, 9,

Could you please help me to fix this.

Appreciate your help on this !

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions