Skip to content

ysimokat/Bank-Fraud-Detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🛡️ Credit Card Fraud Detection System

Python License Framework

A comprehensive machine learning system for detecting credit card fraud using state-of-the-art techniques. Achieves 91%+ F1-score with real-time processing capabilities.

🌟 Key Features

  • 🧠 15+ ML/DL Models: From traditional ML to Graph Neural Networks
  • 📊 91%+ F1-Score: Industry-leading performance on imbalanced data
  • ⚡ <50ms Latency: Real-time fraud detection
  • 🔄 Active Learning: Continuous improvement with human feedback
  • 📈 Interactive Dashboard: Professional Streamlit interface
  • 🚀 Production Ready: REST API with monitoring and A/B testing
  • 🖥️ GPU Optimized: Automatic GPU detection (CUDA, ROCm, MPS)
  • 💡 Explainable AI: SHAP values and feature importance

📊 Performance Overview

Model Type F1-Score ROC-AUC Latency Training Time
Ensemble (All) 0.91 0.97 45ms 45-60 min
XGBoost 0.86 0.95 15ms 3-5 min
Deep Learning 0.87 0.96 30ms 10-15 min
Graph Neural Network 0.87 0.95 50ms 15-20 min
Random Forest 0.85 0.94 20ms 2-3 min

🏗️ System Architecture

graph TD
    A[Transaction Data] --> B[Feature Engineering]
    B --> C[Data Preprocessing]
    C --> D{Model Pipeline}
    
    D --> E[Traditional ML]
    D --> F[Deep Learning]
    D --> G[Graph Networks]
    D --> H[Anomaly Detection]
    
    E --> E1[Random Forest]
    E --> E2[XGBoost/LightGBM]
    E --> E3[Logistic Regression]
    
    F --> F1[Neural Networks]
    F --> F2[Autoencoders]
    F --> F3[Focal Loss Models]
    
    G --> G1[Graph Attention]
    G --> G2[Heterogeneous GNN]
    
    H --> H1[Isolation Forest]
    H --> H2[One-Class SVM]
    
    E1 --> I[Ensemble]
    E2 --> I
    F1 --> I
    G1 --> I
    H1 --> I
    
    I --> J[Model Calibration]
    J --> K[API Service]
    K --> L[Dashboard]
    K --> M[Real-time Scoring]
    
    N[Active Learning] --> D
    O[Human Feedback] --> N
Loading

🚀 Quick Start

Prerequisites

  1. Python 3.8+
  2. 8GB+ RAM (16GB recommended)
  3. Dataset: Download from Kaggle
    • File: creditcard.csv
    • Place in project root directory

Installation

# Clone repository
git clone https://github.com/ysimokat/Bank-Fraud-Detection.git
cd Bank-Fraud-Detection

# Create virtual environment
python -m venv venv

# Activate environment
# Windows:
venv\Scripts\activate
# Linux/Mac:
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

Running the System

Option 1: Interactive Menu (Recommended)

# Windows
run_windows.bat

# Linux/Mac
./RUN_QUICK_START.sh

Option 2: Command Line

Quick Test (10 minutes)

python integrated_fraud_pipeline_simple.py --quick

Full Pipeline (30-45 minutes)

python integrated_fraud_pipeline.py

Advanced Pipeline (60+ minutes)

python advanced_integrated_pipeline.py

View Dashboard

python professional_fraud_dashboard.py
# Open http://localhost:8501

📚 Project Structure

Bank_Fraud_Detection/
│
├── 🎯 Main Pipelines
│   ├── integrated_fraud_pipeline.py         # All basic + enhanced models
│   ├── integrated_fraud_pipeline_simple.py  # Simplified with error handling
│   ├── advanced_integrated_pipeline.py      # Includes streaming & active learning
│   └── professional_fraud_dashboard.py      # Interactive dashboard
│
├── 🧩 Model Components
│   ├── fraud_detection_models.py           # Basic ML models
│   ├── enhanced_fraud_models.py            # XGBoost, LightGBM, CatBoost
│   ├── enhanced_deep_learning.py           # Neural networks with Focal Loss
│   ├── graph_neural_network.py             # Graph neural networks
│   └── heterogeneous_gnn.py                # Advanced heterogeneous GNN
│
├── 🔧 Advanced Systems
│   ├── online_streaming_system.py          # Real-time processing
│   ├── hybrid_ensemble_system.py           # Meta-learning ensemble
│   ├── enhanced_active_learning.py         # Human-in-the-loop learning
│   └── advanced_model_calibration.py       # Probability calibration
│
├── 📱 Deployment & Utils
│   ├── enhanced_fraud_api.py               # FastAPI REST service
│   ├── gpu_config.py                       # GPU detection & optimization
│   ├── data_preprocessing.py               # Feature engineering
│   └── data_exploration.py                 # EDA utilities
│
├── 📓 Learning Resources
│   ├── tutorials/                          # 10 Jupyter notebooks
│   ├── HOW_TO_RUN.md                      # Detailed running guide
│   ├── STUDY_GUIDE.md                      # Learning curriculum
│   ├── LOCAL_LEARNING_GUIDE.md             # Local development guide
│   └── ADVANCED_SYSTEMS_GUIDE.md           # Advanced features guide
│
└── 📊 Outputs
    ├── fraud_models.joblib                 # Trained models
    ├── model_results.joblib                # Performance metrics
    └── model_comparison.png                # Visual comparisons

🎓 Learning Path

Week 1: Fundamentals

# 1. Explore data
cd tutorials && jupyter notebook
# Open data_exploration.ipynb

# 2. Run basic models
python integrated_fraud_pipeline_simple.py --quick

# 3. View results
python professional_fraud_dashboard.py

Week 2: Advanced Models

# 1. Deep learning models
python enhanced_deep_learning.py

# 2. Full pipeline
python integrated_fraud_pipeline.py

# 3. Study ensemble methods
# Open tutorials/hybrid_ensemble_system.ipynb

Week 3: Production Skills

# 1. API deployment
python enhanced_fraud_api.py

# 2. Real-time streaming
python online_streaming_system.py

# 3. Active learning
python enhanced_active_learning.py

💡 Key Features Explained

1. Multiple Model Types

Category Models Use Case
Traditional ML Random Forest, Logistic Regression, SVM Baseline, interpretable
Boosting XGBoost, LightGBM, CatBoost High performance
Deep Learning Neural Networks, Autoencoders Complex patterns
Graph Networks GAT, Heterogeneous GNN Relationship analysis
Anomaly Detection Isolation Forest, One-Class SVM Unsupervised fraud detection

2. Advanced Techniques

  • Imbalanced Learning: SMOTE, Focal Loss, Class weights
  • Ensemble Methods: Voting, Stacking, Meta-learning
  • Online Learning: Streaming updates, Drift detection
  • Active Learning: Uncertainty sampling, Query by committee

3. Production Features

# Real-time API
POST /api/v1/predict
{
    "features": [...],
    "amount": 123.45,
    "merchant_id": "M123"
}

# Batch processing
POST /api/v1/predict/batch

# Model monitoring
GET /api/v1/metrics

# A/B testing
GET /api/v1/models/compare

4. Business Impact Analysis

The dashboard includes:

  • ROI Calculator: Estimate fraud prevention savings
  • Cost-Benefit Analysis: FP vs FN trade-offs
  • Alert Prioritization: Risk-based scoring
  • Performance Monitoring: Real-time metrics

🖥️ GPU Support

The system automatically detects and optimizes for available GPUs:

# Test GPU configuration
python gpu_config.py

# Output example:
# ✅ CUDA GPU detected: NVIDIA GeForce RTX 3080
#    Number of GPUs: 1
#    Memory: {'GPU_0': {'total_gb': 10.0}}

Supported platforms:

  • NVIDIA GPUs: CUDA 11.0+
  • AMD GPUs: ROCm 4.0+
  • Apple Silicon: MPS (M1/M2)

📊 Model Interpretability

SHAP Analysis

# Feature importance visualization
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_test)
shap.summary_plot(shap_values, X_test)

Business Rules

  • Transaction velocity checks
  • Amount anomaly detection
  • Merchant risk scoring
  • Time-based patterns

🔄 Training Pipeline

graph LR
    A[Raw Data] --> B[Feature Engineering]
    B --> C[Train/Test Split]
    C --> D[Model Training]
    D --> E[Hyperparameter Tuning]
    E --> F[Cross Validation]
    F --> G[Model Selection]
    G --> H[Ensemble Creation]
    H --> I[Calibration]
    I --> J[Final Model]
    J --> K[Save & Deploy]
Loading

📈 Extending the System

Adding New Models

# 1. Create model class
class MyCustomModel:
    def fit(self, X, y):
        # Implementation
    
    def predict(self, X):
        # Implementation

# 2. Add to pipeline
pipeline.add_model('custom', MyCustomModel())

# 3. Train and evaluate
pipeline.train_all_models()

Custom Features

# Add in data_preprocessing.py
def create_custom_features(df):
    df['hour_sin'] = np.sin(2 * np.pi * df['Hour'] / 24)
    df['amount_log'] = np.log1p(df['Amount'])
    return df

🐛 Troubleshooting

Issue Solution
Out of Memory Use --quick mode or reduce batch size
Import Error Use integrated_fraud_pipeline_simple.py
GPU Not Detected Check CUDA/driver installation
Slow Training Enable GPU or use fewer models

📊 Dataset Information

Credit Card Fraud Detection Dataset

  • 284,807 transactions (2 days)
  • 492 frauds (0.172%)
  • 30 features (V1-V28 + Time + Amount)
  • Features V1-V28 are PCA transformed
  • No missing values

🏆 Competition Results

If participating in Kaggle competition:

  1. Use advanced_integrated_pipeline.py for best results
  2. Tune hyperparameters in enhanced_fraud_models.py
  3. Create custom features based on EDA
  4. Use ensemble of top 5 models

📝 Citation

@software{fraud_detection_system,
  title = {Credit Card Fraud Detection System},
  author = {Yanhong Simokat},
  year = {2024},
  url = {https://github.com/ysimokat/Bank-Fraud-Detection}
}

🤝 Contributing

Contributions welcome! Please:

  1. Fork the repository
  2. Create feature branch (git checkout -b feature/amazing)
  3. Commit changes (git commit -m 'Add amazing feature')
  4. Push branch (git push origin feature/amazing)
  5. Open Pull Request

📄 License

This project is licensed under the MIT License - see LICENSE file.

📧 Contact

Yanhong Simokat

🙏 Acknowledgments

  • Dataset: Machine Learning Group - ULB
  • Inspired by recent advances in fraud detection research
  • Built with PyTorch, Scikit-learn, XGBoost, and Streamlit
  • Thanks to the open-source community

Made with ❤️ for the ML community | ⭐ Star this repo | 🍴 Fork

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published