🛡️ Credit Card Fraud Detection System

A comprehensive machine learning system for detecting credit card fraud using state-of-the-art techniques. Achieves 91%+ F1-score with real-time processing capabilities.

🌟 Key Features

🧠 15+ ML/DL Models: From traditional ML to Graph Neural Networks
📊 91%+ F1-Score: Industry-leading performance on imbalanced data
⚡ <50ms Latency: Real-time fraud detection
🔄 Active Learning: Continuous improvement with human feedback
📈 Interactive Dashboard: Professional Streamlit interface
🚀 Production Ready: REST API with monitoring and A/B testing
🖥️ GPU Optimized: Automatic GPU detection (CUDA, ROCm, MPS)
💡 Explainable AI: SHAP values and feature importance

📊 Performance Overview

Model Type	F1-Score	ROC-AUC	Latency	Training Time
Ensemble (All)	0.91	0.97	45ms	45-60 min
XGBoost	0.86	0.95	15ms	3-5 min
Deep Learning	0.87	0.96	30ms	10-15 min
Graph Neural Network	0.87	0.95	50ms	15-20 min
Random Forest	0.85	0.94	20ms	2-3 min

🏗️ System Architecture

graph TD
    A[Transaction Data] --> B[Feature Engineering]
    B --> C[Data Preprocessing]
    C --> D{Model Pipeline}
    
    D --> E[Traditional ML]
    D --> F[Deep Learning]
    D --> G[Graph Networks]
    D --> H[Anomaly Detection]
    
    E --> E1[Random Forest]
    E --> E2[XGBoost/LightGBM]
    E --> E3[Logistic Regression]
    
    F --> F1[Neural Networks]
    F --> F2[Autoencoders]
    F --> F3[Focal Loss Models]
    
    G --> G1[Graph Attention]
    G --> G2[Heterogeneous GNN]
    
    H --> H1[Isolation Forest]
    H --> H2[One-Class SVM]
    
    E1 --> I[Ensemble]
    E2 --> I
    F1 --> I
    G1 --> I
    H1 --> I
    
    I --> J[Model Calibration]
    J --> K[API Service]
    K --> L[Dashboard]
    K --> M[Real-time Scoring]
    
    N[Active Learning] --> D
    O[Human Feedback] --> N

🚀 Quick Start

Prerequisites

Python 3.8+
8GB+ RAM (16GB recommended)
Dataset: Download from Kaggle
- File: creditcard.csv
- Place in project root directory

Installation

# Clone repository
git clone https://github.com/ysimokat/Bank-Fraud-Detection.git
cd Bank-Fraud-Detection

# Create virtual environment
python -m venv venv

# Activate environment
# Windows:
venv\Scripts\activate
# Linux/Mac:
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

Running the System

Option 1: Interactive Menu (Recommended)

# Windows
run_windows.bat

# Linux/Mac
./RUN_QUICK_START.sh

Option 2: Command Line

Quick Test (10 minutes)

python integrated_fraud_pipeline_simple.py --quick

Full Pipeline (30-45 minutes)

python integrated_fraud_pipeline.py

Advanced Pipeline (60+ minutes)

python advanced_integrated_pipeline.py

View Dashboard

python professional_fraud_dashboard.py
# Open http://localhost:8501

📚 Project Structure

Bank_Fraud_Detection/
│
├── 🎯 Main Pipelines
│   ├── integrated_fraud_pipeline.py         # All basic + enhanced models
│   ├── integrated_fraud_pipeline_simple.py  # Simplified with error handling
│   ├── advanced_integrated_pipeline.py      # Includes streaming & active learning
│   └── professional_fraud_dashboard.py      # Interactive dashboard
│
├── 🧩 Model Components
│   ├── fraud_detection_models.py           # Basic ML models
│   ├── enhanced_fraud_models.py            # XGBoost, LightGBM, CatBoost
│   ├── enhanced_deep_learning.py           # Neural networks with Focal Loss
│   ├── graph_neural_network.py             # Graph neural networks
│   └── heterogeneous_gnn.py                # Advanced heterogeneous GNN
│
├── 🔧 Advanced Systems
│   ├── online_streaming_system.py          # Real-time processing
│   ├── hybrid_ensemble_system.py           # Meta-learning ensemble
│   ├── enhanced_active_learning.py         # Human-in-the-loop learning
│   └── advanced_model_calibration.py       # Probability calibration
│
├── 📱 Deployment & Utils
│   ├── enhanced_fraud_api.py               # FastAPI REST service
│   ├── gpu_config.py                       # GPU detection & optimization
│   ├── data_preprocessing.py               # Feature engineering
│   └── data_exploration.py                 # EDA utilities
│
├── 📓 Learning Resources
│   ├── tutorials/                          # 10 Jupyter notebooks
│   ├── HOW_TO_RUN.md                      # Detailed running guide
│   ├── STUDY_GUIDE.md                      # Learning curriculum
│   ├── LOCAL_LEARNING_GUIDE.md             # Local development guide
│   └── ADVANCED_SYSTEMS_GUIDE.md           # Advanced features guide
│
└── 📊 Outputs
    ├── fraud_models.joblib                 # Trained models
    ├── model_results.joblib                # Performance metrics
    └── model_comparison.png                # Visual comparisons

🎓 Learning Path

Week 1: Fundamentals

# 1. Explore data
cd tutorials && jupyter notebook
# Open data_exploration.ipynb

# 2. Run basic models
python integrated_fraud_pipeline_simple.py --quick

# 3. View results
python professional_fraud_dashboard.py

Week 2: Advanced Models

# 1. Deep learning models
python enhanced_deep_learning.py

# 2. Full pipeline
python integrated_fraud_pipeline.py

# 3. Study ensemble methods
# Open tutorials/hybrid_ensemble_system.ipynb

Week 3: Production Skills

# 1. API deployment
python enhanced_fraud_api.py

# 2. Real-time streaming
python online_streaming_system.py

# 3. Active learning
python enhanced_active_learning.py

💡 Key Features Explained

1. Multiple Model Types

Category	Models	Use Case
Traditional ML	Random Forest, Logistic Regression, SVM	Baseline, interpretable
Boosting	XGBoost, LightGBM, CatBoost	High performance
Deep Learning	Neural Networks, Autoencoders	Complex patterns
Graph Networks	GAT, Heterogeneous GNN	Relationship analysis
Anomaly Detection	Isolation Forest, One-Class SVM	Unsupervised fraud detection

2. Advanced Techniques

Imbalanced Learning: SMOTE, Focal Loss, Class weights
Ensemble Methods: Voting, Stacking, Meta-learning
Online Learning: Streaming updates, Drift detection
Active Learning: Uncertainty sampling, Query by committee

3. Production Features

# Real-time API
POST /api/v1/predict
{
    "features": [...],
    "amount": 123.45,
    "merchant_id": "M123"
}

# Batch processing
POST /api/v1/predict/batch

# Model monitoring
GET /api/v1/metrics

# A/B testing
GET /api/v1/models/compare

4. Business Impact Analysis

The dashboard includes:

ROI Calculator: Estimate fraud prevention savings
Cost-Benefit Analysis: FP vs FN trade-offs
Alert Prioritization: Risk-based scoring
Performance Monitoring: Real-time metrics

🖥️ GPU Support

The system automatically detects and optimizes for available GPUs:

# Test GPU configuration
python gpu_config.py

# Output example:
# ✅ CUDA GPU detected: NVIDIA GeForce RTX 3080
#    Number of GPUs: 1
#    Memory: {'GPU_0': {'total_gb': 10.0}}

Supported platforms:

NVIDIA GPUs: CUDA 11.0+
AMD GPUs: ROCm 4.0+
Apple Silicon: MPS (M1/M2)

📊 Model Interpretability

SHAP Analysis

# Feature importance visualization
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_test)
shap.summary_plot(shap_values, X_test)

Business Rules

Transaction velocity checks
Amount anomaly detection
Merchant risk scoring
Time-based patterns

🔄 Training Pipeline

graph LR
    A[Raw Data] --> B[Feature Engineering]
    B --> C[Train/Test Split]
    C --> D[Model Training]
    D --> E[Hyperparameter Tuning]
    E --> F[Cross Validation]
    F --> G[Model Selection]
    G --> H[Ensemble Creation]
    H --> I[Calibration]
    I --> J[Final Model]
    J --> K[Save & Deploy]

📈 Extending the System

Adding New Models

# 1. Create model class
class MyCustomModel:
    def fit(self, X, y):
        # Implementation
    
    def predict(self, X):
        # Implementation

# 2. Add to pipeline
pipeline.add_model('custom', MyCustomModel())

# 3. Train and evaluate
pipeline.train_all_models()

Custom Features

# Add in data_preprocessing.py
def create_custom_features(df):
    df['hour_sin'] = np.sin(2 * np.pi * df['Hour'] / 24)
    df['amount_log'] = np.log1p(df['Amount'])
    return df

🐛 Troubleshooting

Issue	Solution
Out of Memory	Use `--quick` mode or reduce batch size
Import Error	Use `integrated_fraud_pipeline_simple.py`
GPU Not Detected	Check CUDA/driver installation
Slow Training	Enable GPU or use fewer models

📊 Dataset Information

Credit Card Fraud Detection Dataset

284,807 transactions (2 days)
492 frauds (0.172%)
30 features (V1-V28 + Time + Amount)
Features V1-V28 are PCA transformed
No missing values

🏆 Competition Results

If participating in Kaggle competition:

Use advanced_integrated_pipeline.py for best results
Tune hyperparameters in enhanced_fraud_models.py
Create custom features based on EDA
Use ensemble of top 5 models

📝 Citation

@software{fraud_detection_system,
  title = {Credit Card Fraud Detection System},
  author = {Yanhong Simokat},
  year = {2024},
  url = {https://github.com/ysimokat/Bank-Fraud-Detection}
}

🤝 Contributing

Contributions welcome! Please:

Fork the repository
Create feature branch (git checkout -b feature/amazing)
Commit changes (git commit -m 'Add amazing feature')
Push branch (git push origin feature/amazing)
Open Pull Request

📄 License

This project is licensed under the MIT License - see LICENSE file.

📧 Contact

Yanhong Simokat

Email: [email protected]
GitHub: @ysimokat
LinkedIn: Connect

🙏 Acknowledgments

Dataset: Machine Learning Group - ULB
Inspired by recent advances in fraud detection research
Built with PyTorch, Scikit-learn, XGBoost, and Streamlit
Thanks to the open-source community

Made with ❤️ for the ML community | ⭐ Star this repo | 🍴 Fork

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
tutorials		tutorials
ADVANCED_SYSTEMS_GUIDE.md		ADVANCED_SYSTEMS_GUIDE.md
HOW_TO_RUN.md		HOW_TO_RUN.md
LICENSE		LICENSE
LOCAL_LEARNING_GUIDE.md		LOCAL_LEARNING_GUIDE.md
PIPELINE_OVERVIEW.md		PIPELINE_OVERVIEW.md
README.md		README.md
RUN_QUICK_START.sh		RUN_QUICK_START.sh
STUDY_GUIDE.md		STUDY_GUIDE.md
active_learning_system.py		active_learning_system.py
advanced_deep_learning.py		advanced_deep_learning.py
advanced_integrated_pipeline.py		advanced_integrated_pipeline.py
advanced_model_calibration.py		advanced_model_calibration.py
advanced_models.py		advanced_models.py
check_requirements.py		check_requirements.py
comprehensive_fraud_dashboard.py		comprehensive_fraud_dashboard.py
dash_dashboard.py		dash_dashboard.py
data_exploration.png		data_exploration.png
data_exploration.py		data_exploration.py
data_preprocessing.py		data_preprocessing.py
demo_all_phases.py		demo_all_phases.py
demo_script.py		demo_script.py
detailed_analysis.png		detailed_analysis.png
enhanced_active_learning.py		enhanced_active_learning.py
enhanced_deep_learning.py		enhanced_deep_learning.py
enhanced_fraud_api.py		enhanced_fraud_api.py
enhanced_fraud_models.py		enhanced_fraud_models.py
enhanced_streamlit_dashboard.py		enhanced_streamlit_dashboard.py
feature_analysis.png		feature_analysis.png
fraud_detection_api.py		fraud_detection_api.py
fraud_detection_models.py		fraud_detection_models.py
gpu_config.py		gpu_config.py
graph_neural_network.py		graph_neural_network.py
heterogeneous_gnn.py		heterogeneous_gnn.py
hybrid_ensemble_system.py		hybrid_ensemble_system.py
install_all.py		install_all.py
integrated_fraud_pipeline.py		integrated_fraud_pipeline.py
integrated_fraud_pipeline_simple.py		integrated_fraud_pipeline_simple.py
model_comparison.png		model_comparison.png
online_streaming_system.py		online_streaming_system.py
professional_fraud_dashboard.py		professional_fraud_dashboard.py
requirements.txt		requirements.txt
run_windows.bat		run_windows.bat
simplified_advanced_models.py		simplified_advanced_models.py
streamlit_dashboard.py		streamlit_dashboard.py
test_all_phases.py		test_all_phases.py
test_xgb_fix.py		test_xgb_fix.py

License

ysimokat/Bank-Fraud-Detection

Folders and files

Latest commit

History

Repository files navigation