This Jupyter Notebook is part of my capstone project for Codecademy: Visualize Data with Python. Being passionate about cybersecurity, I chose to work with a phishing URL dataset to explore patterns and insights.
This project uses the PhiUSIIL Phishing URL Dataset from the UCI Machine Learning Repository.
It was selected for its comprehensive collection of legitimate and phishing URLs, recent (Donated on 3/3/2024), suitable for visualization and feature analysis.
- Install Packages โ Set up the environment and required libraries.
- Import & Clean Dataset โ Load the data and handle missing/malformed values.
- Brainstorming and Goal Definition โ Define the questions to answer and hypotheses.
- Memory Optimization โ Optimize data types for efficiency.
- Answer Questions through Visualizations โ Explore patterns using boxplots, bar plots, and scatter plots.
- Conclusion โ Summarize findings and insights.
The story: Spot the Phish ๐
Charts generated in the notebook:
- URL Length Distribution by Legitimacy:

- Top 10 TLDs Most Associated with Phishing URLs:

- Domain Length by Domain Type and URL Legitimacy:

- URL Length vs Character Continuation Rate by URL Legitimacy:

- Clone the repository:
git clone https://github.com/aMAAmina/PhiUSIILPhishingURL_Viz.git