🔍 Insurance Fraud Detection

This project aims to detect fraudulent insurance claims using machine learning techniques. With a dataset of real-world insurance claims, the model identifies patterns and anomalies that help in flagging potentially fraudulent activities.

📌 Problem Statement

Insurance fraud leads to significant financial losses each year. Identifying fraud early can save costs and ensure better claim verification processes. This project builds a model to distinguish between genuine and fraudulent claims using classification algorithms.

📊 Dataset

The dataset used contains various features related to insurance claims including:

Demographics (e.g., Age, Gender)
Claim history (e.g., Number of past claims)
Policy details (e.g., Policy type, Deductible)
Incident details (e.g., Accident Area, Fault)
Target: FraudFound (1: Fraudulent, 0: Not Fraudulent)

🛠️ Technologies Used

Python
Pandas, NumPy – Data manipulation
Seaborn, Matplotlib – Data visualization
LabelEncoder – Categorical encoding
XGBoost – Feature importance and modeling
SMOTE – Class imbalance handling
Scikit-learn – Model training and evaluation

🧠 Workflow

Data Cleaning: Removed irrelevant columns and handled missing values.
Exploratory Data Analysis (EDA): Visualized fraud trends across various categories.
Encoding: Converted categorical variables using Label Encoding.
Feature Selection: Used XGBoost to identify top features contributing to fraud detection.
SMOTE: Applied Synthetic Minority Oversampling Technique to balance classes.
Model Training: Trained an XGBoost Classifier on the balanced dataset.
Evaluation: Compared model performance before and after SMOTE using accuracy and class distribution.

📈 Results

Successfully balanced the dataset using SMOTE.
Top contributing features included: Fault, PolicyType, AddressChange-Claim, AccidentArea, etc.
XGBoost model provided interpretability and effective feature ranking.

✅ Conclusion

This project shows how machine learning can be leveraged to flag potentially fraudulent claims and assist insurance companies in early detection, reducing financial losses.

📂 Folder Structure

insurance-fraud-detection/ │ ├── insurance_fraud_detection.ipynb # Jupyter Notebook with full project ├── carclaims.csv # Dataset (if shared) ├── README.md # Project documentation

🙌 Acknowledgments

Dataset: Kaggle
Libraries: Scikit-learn, XGBoost, imbalanced-learn

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.devcontainer		.devcontainer
Insurance-Claim-Prediction-and-Risk-Analysis.pptx		Insurance-Claim-Prediction-and-Risk-Analysis.pptx
README.md		README.md
carclaims.csv		carclaims.csv
final_model.joblib		final_model.joblib
fraud.py		fraud.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🔍 Insurance Fraud Detection

📌 Problem Statement

📊 Dataset

🛠️ Technologies Used

🧠 Workflow

📈 Results

✅ Conclusion

📂 Folder Structure

🙌 Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Harsh071202/Insurance_Fraud_Detection

Folders and files

Latest commit

History

Repository files navigation

🔍 Insurance Fraud Detection

📌 Problem Statement

📊 Dataset

🛠️ Technologies Used

🧠 Workflow

📈 Results

✅ Conclusion

📂 Folder Structure

🙌 Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages