Skip to content

Predictive modeling system for identifying high-risk financial transactions using time-based validation.

Notifications You must be signed in to change notification settings

Aks18had/Predictive-Modeling-High-Risk-Transactions

Repository files navigation

Python ML

Predictive Modeling for High-Risk Financial Transactions

Overview

This project develops a predictive modeling system to identify high-risk financial transactions within a large-scale dataset containing over 6 million records.

The solution includes data preprocessing, feature engineering, exploratory analysis, model development, and performance evaluation using industry-standard metrics.


Dataset

  • 6,362,620 transaction records
  • 10 primary variables
  • Highly imbalanced classification problem

Workflow

1. Data Preparation

  • Stratified sampling (5%) for efficient computation
  • Memory optimization
  • Missing value handling
  • Removal of ID-based leakage variables

2. Feature Engineering

  • Balance difference variables
  • Log transformation of transaction amount
  • Ratio-based behavioral indicators
  • Merchant & transaction type flags

3. Exploratory Data Analysis

  • Risk distribution visualization
  • Transaction type vulnerability analysis
  • Temporal transaction pattern analysis

4. Modeling

  • Random Forest (150 estimators)
  • Balanced class weights for imbalanced data
  • Unified preprocessing + modeling pipeline
  • Time-based 80:20 split (train/validation)

Model Performance

Metric Score
Precision 1.00
Recall 0.64
ROC AUC 0.95
PR AUC 0.85

Key Risk Indicators

  • Balance inconsistencies (orig_balance_diff)
  • Transaction amount
  • Certain transaction types (TRANSFER, PAYMENT)
  • Account balance shifts

Business Insights

  • Large balance mismatches combined with high transfer amounts significantly increase transaction risk.
  • Precision is prioritized to reduce unnecessary investigation costs.
  • Recall can be tuned further depending on business risk tolerance.

Tech Stack

  • Python
  • Pandas, NumPy
  • Scikit-learn
  • Matplotlib, Seaborn
  • Joblib

Author

Akshad Goyanka

About

Predictive modeling system for identifying high-risk financial transactions using time-based validation.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published