Skip to content

(NMF) calculating coherence in NMF generates different outputs each time when method is called #1

@Sannidhi-17

Description

@Sannidhi-17

I am trying to calculate the coherence value on each topic but each time when I run my code it generates different values.

It will be a great help if anyone can answer this.

Thank you in advance

def build_w2c(self, raw_documents):
    docgen = TokenGenerator(raw_documents, self.stop_words)
    new_list = []
    for each in docgen.documents:
        new_list.append(each.split(" "))
    # print(new_list)
    # Build the word2vec model
    self.w2v_model = gensim.models.Word2Vec(size=500, min_count=0.0005, sg=1)
    self.w2v_model.build_vocab(sentences=new_list)
    return self.w2v_model

def get_descriptor(self, all_terms, H, topic_index, top):
    # reverse sort the values to sort the indices
    top_indices = np.argsort(H[topic_index, :])[::-1]
    # now get the terms corresponding to the top-ranked indices
    top_terms = []
    for term_index in top_indices[0:top]:
        top_terms.append(all_terms[term_index])
    return top_terms
def get_coherence(self, k, terms, H):
    k_values = []
    term_rankings = []
    coherences = []
    dict = {}
    for topic_index in range(1, k):
        print(topic_index)
        descriptor = self.get_descriptor(terms , H, topic_index, 10)
        term_rankings.append(descriptor)
    # Now calculate the coherence based on our Word2vec model
    #coherence = self.calculate_coherence(term_rankings)
        coherences.append(self.calculate_coherence(term_rankings))
        print("K=%02d: Coherence=%.4f" % (topic_index, coherences[-1]))
        k_values.append(topic_index)
        dict[topic_index] = coherences[-1]
    max_key = max(dict, key=dict.get)
    return k_values, coherences, max_key

def calculate_coherence(self, term_rankings):
    overall_coherence = 0.0
    for topic_index in range(len(term_rankings)):
        # check each pair of terms
        pair_scores = []
        for pair in combinations(term_rankings[topic_index], 2):
            pair_scores.append(self.w2v_model.similarity(pair[0], pair[1]))
        # get the mean for all pairs in this topic
        topic_score = sum(pair_scores) / len(pair_scores)
        overall_coherence += topic_score
    # get the mean score across all topics
    return overall_coherence / len(term_rankings)

here is my code that I have used in my project.

Output Required:
Each time when I run my code coherence should be the same.

if you can help me to resolve this approach it would be a great help
thank you so much.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions