Skip to content

aMAAmina/PhiUSIILPhishingURL_Viz

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

6 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Capstone Project: Phishing URL Analysis

This Jupyter Notebook is part of my capstone project for Codecademy: Visualize Data with Python. Being passionate about cybersecurity, I chose to work with a phishing URL dataset to explore patterns and insights.


Dataset Source

This project uses the PhiUSIIL Phishing URL Dataset from the UCI Machine Learning Repository.

It was selected for its comprehensive collection of legitimate and phishing URLs, recent (Donated on 3/3/2024), suitable for visualization and feature analysis.

Table of Contents

  1. Install Packages โ€“ Set up the environment and required libraries.
  2. Import & Clean Dataset โ€“ Load the data and handle missing/malformed values.
  3. Brainstorming and Goal Definition โ€“ Define the questions to answer and hypotheses.
  4. Memory Optimization โ€“ Optimize data types for efficiency.
  5. Answer Questions through Visualizations โ€“ Explore patterns using boxplots, bar plots, and scatter plots.
  6. Conclusion โ€“ Summarize findings and insights.


Visualizations

Charts generated in the notebook:

  • URL Length Distribution by Legitimacy: boxplot
  • Top 10 TLDs Most Associated with Phishing URLs: barplot
  • Domain Length by Domain Type and URL Legitimacy: boxplot2
  • URL Length vs Character Continuation Rate by URL Legitimacy: scatterplot

How to Run

  1. Clone the repository:
git clone https://github.com/aMAAmina/PhiUSIILPhishingURL_Viz.git

About

Dataset of phishing URL visualization with Python

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published