This project aims to predict the number of sunspots using historical data (1818–2019). Several machine learning models were implemented, including Linear Regression, Ridge Regression, Decision Tree, and Random Forest, to achieve the best prediction accuracy. The Random Forest model provided the best performance with a Mean Absolute Error (MAE) of approximately 1.6.
- Data Preprocessing:
- Cleaned and transformed the dataset by handling missing values and dropping irrelevant columns.
- Scaled features using
StandardScalerfor better model performance.
- Exploratory Data Analysis (EDA):
- Analyzed correlations between features and the target variable.
- Visualized sunspot trends using
MatplotlibandSeaborn.
- Model Training:
- Implemented Linear Regression, Ridge Regression, Decision Tree, and Random Forest models.
- Tested polynomial features but found them ineffective.
- Hyperparameter Tuning:
- Used
GridSearchCVto optimize hyperparameters for Ridge and Random Forest models.
- Used
- Evaluation:
- Achieved the best MAE of ~1.6 with the Random Forest model.
- Validated the model on a test set for final performance metrics.
- Programming Language: Python
- Libraries:
- Data Processing:
Pandas,NumPy - Visualization:
Matplotlib,Seaborn - Machine Learning:
Scikit-learn
- Data Processing: