Open In Colab

4. Unsupervised Learning for Clustering/Dimensionality Reduction and Environmental Complexity#

For this week’s lab, the learning objectives are:

  1. Exploring dimensionality reduction and its use in conjunction with other machine learning algorithms

  2. Exploring the use of unsupervised clustering algorithms, their advantages, and their limitations

  3. Applying clustering algorithms to identify dynamical regimes in oceanic data

Today’s tutorial:

  1. Adapts Géron et al.’s Jupyter notebook exercises for chapters 8 and 9 (License) of his book “Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition”,

  2. Adapts two articles on oceanic circulation from Sonnewald, Wunsch, & Heimbach and Sonnewald & Lguensat, and Python scripts from Maike Sonnewald.

If you are struggling with some of the exercises, do not hesitate to:

  • Use a direct Internet search, or stackoverflow

  • Ask your neighbor(s), the teacher, or the TA for help

  • Debug your program, e.g. by following this tutorial

  • Use assertions, e.g. by following this tutorial

You’re making a splash!! 💦 😃 💦

If you’re done early, consider:

  • Giving feedback on how to improve this notebook (typos, hints, exercises that may be improved/removed/added, etc.) by messaging the teacher and TA(s) on Moodle

  • Working on your final project for this course.

Final Project The final project’s goal is to answer a well-defined scientific question by applying one of the ML algorithms introduced in class on an environmental dataset of your choice (e.g., related to your Masters thesis or your PhD research).

  • Now that you found a large environmental dataset linked to a scientific question you are passionate about, which machine learning algorithm can you use to address it? Is it a classification, a regression, or a data exploration project?

  • How could you format the dataset to facilitate its manipulation in Python?

  • If you’re still hunting for a dataset of interest, consider browsing the list of benchmark datasets maintained by Pangeo and Kaggle!