In this project, students will apply the knowledge and methods they learned in the Introduction to Machine Learning course to compete in a Kaggle competition using the AutoGluon library.
Students will create a Kaggle account if they do not already have one, download the Bike Sharing Demand dataset, and train a model using AutoGluon. They will then submit their initial results for a ranking.
After they complete the first workflow, they will iterate on the process by trying to improve their score. This will be accomplished by adding more features to the dataset and tuning some of the hyperparameters available with AutoGluon.
Finally they will submit all their work and write a report detailing which methods provided the best score improvement and why. A template of the report can be found here.
To meet specifications, the project will require at least these files:
- Jupyter notebook with code run to completion
- HTML export of the jupyter notebbook
- Markdown or PDF file of the report
Images or additional files needed to make your notebook or report complete can be also added.
Python 3.7
MXNet 1.8
Pandas >= 1.2.4
AutoGluon 0.2.0
For this project, it is highly recommended to use Sagemaker Studio from the course provided AWS workspace. This will simplify much of the installation needed to get started.
For local development, you will need to setup a jupyter lab instance.
- Follow the jupyter install link for best practices to install and start a jupyter lab instance.
- If you have a python virtual environment already installed you can just
pipinstall it.
pip install jupyterlab
- There are also docker containers containing jupyter lab from Jupyter Docker Stacks.