Nepali_Extractive_Summarizer

This project is a Streamlit application that leverages a pre-trained DistilBERT model and K-Means clustering to extract concise summaries from text data in various formats, including PDF files, plain text, and images. The application seamlessly integrates cutting-edge natural language processing (NLP) techniques and computer vision technologies to provide an intuitive and user-friendly experience.

At the core of the application lies the DistilBERT model, a powerful language model pre-trained on a vast corpus of Nepali text data. This model is used to generate contextualized word embeddings, capturing the semantic and syntactic relationships within the input text. These embeddings serve as the foundation for the subsequent summarization process.

To extract the most salient and representative sentences from the input text, the application employs K-Means clustering, a widely-used unsupervised machine learning algorithm. The sentence embeddings generated by the DistilBERT model are clustered using K-Means, with the algorithm automatically identifying the optimal number of clusters. From each cluster, the sentence closest to the centroid is selected, effectively constructing a concise summary that captures the essence of the original text.

The application's versatility lies in its ability to handle text data from diverse sources. Users can upload PDF files, which are processed using the PyMuPDF library to extract the text content. For image files, the application leverages the EasyOCR library, a state-of-the-art optical character recognition (OCR) engine, to extract textual information from the images. Additionally, users can directly input plain text for summarization.

The Streamlit framework provides a user-friendly interface, allowing users to seamlessly interact with the application and view the summarized text alongside the original input. This project showcases the powerful combination of pre-trained language models, unsupervised learning techniques, and modern web technologies, enabling efficient and accurate text summarization for a wide range of applications.

Run Streamlit app

To run the app simply paste the following code in your terminal.

streamlit run .\Summarizer_App.py

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
Summarizer_App.py		Summarizer_App.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Nepali_Extractive_Summarizer

Run Streamlit app

About

Uh oh!

Releases

Packages

Languages

Nawap1/Nepali_Extractive_Summarizer

Folders and files

Latest commit

History

Repository files navigation

Nepali_Extractive_Summarizer

Run Streamlit app

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages