Welcome to my collection of Python projects, showcasing my skills in data analysis, web scraping, visualizations, and machine learning. These projects, developed for school, work, or personal exploration, cover a range of topics, from retail transaction insights to NBA game outcome predictions.
Each project involves hands-on experience with Python libraries like Pandas, Matplotlib, Seaborn, and Scikit-Learn, demonstrating my proficiency in handling diverse datasets, extracting meaningful insights, and applying machine learning techniques when appropriate. Explore how I leverage predictive modeling, conduct exploratory data analysis, and integrate data from various sources to derive valuable conclusions.
Feel free to navigate through the projects listed below, each providing a unique perspective on data-driven decision-making.
- Patterns of Victory: Data-Driven Insights into Competitive Play
- Retail Transaction Insights
- NBA Game Outcome Prediction
- Student Performance Data Visualization
- Top Movies Data Scraper and OMDb API Integration
- CUNY Athlete Heights Data Scraper and Analysis
- Valorant Player Stats Data Scraper and Analysis
- U.S. City Weather Data Analysis with Openweathermap API
- File:
CS_GO_Competitive_Matchmaking_Data_Visualization_Analysis.ipynb - Objective:
- Analyze high-level CS:GO ESEA competitive matches to determine which factors most influence winning rounds, including map, round timing, player positions, weapon usage, team economy, and bomb status.
- Key Points:
- Conducted large-scale data cleaning and preparation on over 2 million events, including feature selection, coordinate transformations, and alignment with radar map visuals.
- Performed exploratory data analysis (EDA) and created interactive visualizations (radar heatmaps, bar charts, temporal plots) to uncover map-specific trends and strategic patterns.
- Developed predictive machine learning models (Logistic Regression and Random Forest) to estimate round winners and analyze feature importance, demonstrating advanced modeling and feature engineering capabilities.
- Technical Details:
- Tools: Python, Pandas, NumPy, Matplotlib, Seaborn, Scikit-Learn, Excel
- Showcased end-to-end workflow: data collection, cleaning, EDA, visualization, predictive modeling, and reporting insights.
- Dataset: CS:GO Competitive Matchmaking Data (ESEA) – Kaggle
- File:
Retail_Transaction_Insights.ipynb - Objectives:
- Derive actionable insights from retail transactions through advanced analytics for enhanced data-driven decision-making.
- Key Points:
- Applied machine learning techniques, including K-Means clustering, to analyze retail transactions and uncover distinct customer segments.
- Utilized market basket analysis techniques (Apriori algorithm and association rule mining) to reveal intricate purchasing patterns.
- Conducted thorough exploratory data analysis (EDA) using Pandas, leveraging data visualizations with Matplotlib and Seaborn to reveal patterns and trends, enhancing data-driven insights.
- Technical Details:
- Successfully managed and accessed Kaggle API credentials for seamless dataset download.
- Implemented effective data cleaning strategies, addressing missing values and ensuring proper data type conversions.
- Dataset: https://www.kaggle.com/datasets/prasad22/retail-transactions-dataset
- File:
NBA_Outcome_Predictor.ipynb - Objective:
- Develop a predictive system for NBA game outcomes using machine learning and advanced data analysis techniques.
- Key Points:
- Employed machine learning techniques in Scikit-Learn for accurate NBA game outcome predictions, showcasing predictive modeling proficiency.
- Utilized advanced web scraping with BeautifulSoup to systematically collect comprehensive NBA game and player performance data, demonstrating skills in data extraction and automation.
- Applied a data-driven approach using Pandas for feature engineering, integrating raw NBA data like team statistics, player attributes, and historical performance indicators to enhance predictive models.
- Technical Details:
- Effectively extracting unstructured data embedded in HTML comments which was essential for capturing dynamically loaded data via JavaScript, showcasing adaptability in handling complex web structures.
- Implemented time series analysis, including the integration of rolling averages, to capture temporal patterns in NBA game data.
- Fine-tuned and optimized predictive models for accurate outcomes across various NBA seasons.
- File:
Student_Performance.ipynb - Objective:
- Perform comprehensive data analysis and visualization on student performance data, identifying correlations and dependencies to enable data-informed decisions for educational institutions.
- Key Points:
- Conducted in-depth analysis of a meticulously curated dataset, identifying correlations and dependencies, enabling data-informed decisions for educational institutions.
- Leveraged Python libraries (Pandas, Matplotlib, Seaborn) to create compelling data visualizations that uncover patterns in student performance, showcasing data visualization capabilities.
- Utilized a comprehensive dataset sourced from Kaggle, ensuring the relevance and accuracy of insights into student performance, highlighting data source management skills.
- Dataset: https://www.kaggle.com/datasets/spscientist/students-performance-in-exams
- File:
Top_Movies.ipynb - Objectives:
- Build a movie data analysis system through multiple sources, enabling insightful film analysis and appreciation through data-informed decision-making.
- Key Points:
- Employed advanced web scraping techniques to meticulously extract top-rated movie data from IMDb, showcasing data collection skills.
- Seamlessly integrated scraped IMDb data with the OMDb API, resulting in a cohesive and well-structured dataframe, emphasizing data integration skills.
- Demonstrated proficiency in Python to derive meaningful insights, compute descriptive statistics, and offer comprehensive analysis for data-informed decisions in film analysis and appreciation, highlighting data analysis abilities.
- File:
CUNY_Athlete_Heights.ipynb - Objectives:
- Providing sports analytics by gathering and organizing height data from CUNY's volleyball and swimming team rosters.
- Key Points:
- Applied advanced web scraping techniques to extract precise height measurements of male and female athletes from CUNY's volleyball and swimming team rosters, showcasing data collection skills.
- Organized data into structured and comprehensive dataframes, enabling easy analysis and interpretation, and highlighting data preprocessing skills.
- Demonstrated analytical prowess by computing average heights and identifying the top five tallest and shortest players in each gender category, emphasizing data analysis abilities.
- File:
Valorant_Player_Stats.ipynb - Objectives:
- Extract/analyze diverse player data from top-performing Valorant players to provide valuable insights into player performance for strategic decision-making.
- Key Points:
- Utilized advanced Python web scraping techniques to extract diverse player data from top-performing Valorant players, showcasing data collection skills.
- Structured the extracted player data into comprehensive Pandas dataframes, facilitating in-depth analysis and exploration.
- Presented valuable measures such as average performance, highlighted top-performing players, and identified statistical outliers, providing insights into Valorant player performance for strategic decision-making, emphasizing data analysis capabilities.
- File:
Openweathermap_API.ipynb - Objectives:
- Analyze U.S. city weather data using the Openweathermap API.
- Key Points:
- Leveraged the Openweathermap API to obtain detailed weather information for the top 11 U.S. cities by population, demonstrating expertise in API integration and data retrieval.
- Efficiently parsed and processed the JSON data provided by the API, showcasing skills in working with JSON and transforming unstructured data into a structured format.
- Transformed the JSON data into a well-organized Pandas dataframe, highlighting proficiency in data preprocessing and cleaning, making it ready for analysis, and further emphasizing data management and documentation capabilities.