Resume Categorization

This project focuses on automating the categorization of resumes using Natural Language Processing (NLP) and machine learning techniques. The goal is to analyze resume data and classify resumes into their respective job categories efficiently.

Project Structure

resume_categorization_notebook.ipynb: A Jupyter Notebook that contains the implementation for data preprocessing, feature extraction, model training, and evaluation.
script.py: A Python script to automate the process of categorizing resumes based on the trained model and vectorizer.
test_data/: Directory containing the test resumes in PDF format.
requirements.txt: List of all Python packages required for this project.

Description

This project employs various machine learning models and a deep learning approach to identify job categories from resumes. Using datasets, resumes are processed and classified using models such as:

Random Forest Classifier
Logistic Regression
K-Nearest Neighbor
Support Vector Machine (SVM)
Deep Learning: Artificial Neural Networks (ANN) and Long Short-Term Memory (LSTM)

The model training is performed on preprocessed data, and the results are used to create a functional script that categorizes resumes automatically.

Requirements

Ensure the following are installed before proceeding:

Python 3.10.12 or higher
Jupyter Notebook
Necessary Python libraries (specified in requirements.txt)

Installation

Clone the repository to your local machine:

git clone https://github.com/raselmeya94/Resume_Categorization.git

Navigate to the project directory:
```
cd Resume_Categorization  
```
Install the required dependencies:
```
pip install -r requirements.txt  
```

Workflow

Data Preprocessing:
- Split the dataset into resume_data (training set) and resume_test_data (testing set).
- Clean the text data by removing unnecessary symbols, spaces, and irrelevant content.
Feature Extraction:
- Use NLP techniques such as TF-IDF vectorization to extract meaningful features from the resumes.
Model Training:
- Train multiple models to identify the best-performing one for classifying resumes into categories.
- Save the trained classifier (best_clf.pkl) and the vectorizer (tfidf.pkl) as pickle files.
Automated Categorization:
- Use script.py to load the test data (PDF resumes in test_data/), vectorize the content, and predict categories.
- Organize resumes into corresponding folders based on their predicted category.

Running the Project

Train the models and generate pickle files:
Open the Jupyter Notebook:
```
jupyter notebook resume_categorization_notebook.ipynb  
```
Follow the instructions in the notebook to train the models and generate the necessary best_clf.pkl and tfidf.pkl files.
Place the test resumes in the test_data/ folder.
Run the categorization script:
```
python script.py  
```
Provide the path to the test_data/ folder when prompted.

Example Outputs

Trained Model: best_clf.pkl
Vectorizer: tfidf.pkl

Categorized resumes in the following folder structure:

categorized_resumes/  
  ├── ENGINEERING  
  ├── FINANCE  
  ├── HEALTHCARE  
  ├── TEACHER  
  ├── ...

Contributing

We welcome contributions to enhance the functionality of this project. To contribute:

Fork the repository.
Create a feature branch.
Submit a pull request describing your changes.

License

This project is licensed under the MIT License. See the LICENSE file for more information.

For more details and hands-on usage, refer to the main notebook.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Documentation.docx		Documentation.docx
README.md		README.md
best_clf.pkl		best_clf.pkl
categorize_resume.csv		categorize_resume.csv
resume_categorization_notebook.ipynb		resume_categorization_notebook.ipynb
script.py		script.py
tfidf.pkl		tfidf.pkl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Resume Categorization

Project Structure

Description

Requirements

Installation

Workflow

Running the Project

Example Outputs

Contributing

License

About

Uh oh!

Releases

Packages

Languages

raselmeya94/Resume_Categorization

Folders and files

Latest commit

History

Repository files navigation

Resume Categorization

Project Structure

Description

Requirements

Installation

Workflow

Running the Project

Example Outputs

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages