8. Explainable Artifical Intelligence (XAI):#

In this chapter, the learning objectives are:

  1. Understand the importance of XAI

  2. Distinguish local explanations methods from global ones

  3. Distinguish model-agnostic explanation methods from model-specific ones

  4. Know how to implement permutation feature importance and partial depedence plots

  5. Have a basic understanding on explanation methods for neural networks

The Exercises will help you to learn:

  1. How to generate PDPs and calculate permutation feature importance for ML models trained on tabular data.

  2. Have a basic understanding on a [SHAP], a popular package containing many useful ML explanation tools.

  3. Understand how the values shown in SHAP figures are linked to PDPs.

  4. Learn how to use the beautiful figures available in [SHAP].

  5. Apply [SHAP] explainers on an image dataset.

This week’s exercises:

  1. This exercise adapts the tutorial codes in the scikit-learn library and SHAP library, and apply these methods to some datasets we have already seen in previous exercises.

If you are struggling with some of the exercises, do not hesitate to:

  • Use a direct Internet search, or stackoverflow

  • Ask your neighbor(s), the teacher, or the TA for help

  • Debug your program, e.g. by following this tutorial

  • Use assertions, e.g. by following this tutorial

Way to go with the flow 😎🌧🌧🏄🌧🌧

If you’re done early, consider:

  • Giving feedback on how to improve this notebook (typos, hints, exercises that may be improved/removed/added, etc.) by messaging the teacher and TA(s) on Moodle

  • Working on your final project for this course.

Final Project The final project’s goal is to answer a well-defined scientific question by applying one of the ML algorithms introduced in class on an environmental dataset of your choice (e.g., related to your Masters thesis or your PhD research).

  • Now that you found a large environmental dataset linked to a scientific question you are passionate about, which machine learning algorithm can you use to address it? Is it a classification, a regression, or a data exploration project?

  • How could you format the dataset to facilitate its manipulation in Python?

  • If you’re still hunting for a dataset of interest, consider browsing the list of benchmark datasets maintained by Pangeo and Kaggle.