Open In Colab

3.5. Exercise 4: Mapping Wildfire Susceptibility in the Liguria Region with Simple Machine Learning Classifiers#


Credits

This online tutorial would not be possible without invaluable contributions from Andrea Trucchia (reduced data, methods), Giorgio Meschi (code, methods), and Marj Tonini (presentation, methods). The methodology builds upon the following article:

Trucchia, A.; Meschi, G.; Fiorucci, P.; Gollini, A.; Negro, D., Defining Wildfire Susceptibility Maps in Italy for Understanding Seasonal Wildfire Regimes at the National Level, Fire, (2022)

which generalizes the study below from the Liguria region (our case study) to all of Italy:

Tonini, Marj, et al. “A machine learning-based approach for wildfire susceptibility mapping. The case study of the Liguria region in Italy.” Geosciences 10.3 (2020): 105.


In week 3’s final notebook, we will train classifiers on real wildfire data to map the fire risk in different regions of Italy. To keep the data size manageable, we will focus on the coastal Liguria region that experiences a lot of wildfires, especially during the winter.

3.5.1. Machine Learning for Environmental Risk Analysis#

For environmental sciences practioners, one of the scenarios where machine learning can be particularly useful is risk analysis. Environmental risk analysis involves predicting where potential hazards may occur; it also involves understanding why some regions are more vulnerable to hazards than others. Machine learning models can be useful for these tasks because they can analyze large datasets containing different environmental predictors (e.g., weather conditions, soil conditions etc.) in an effective manner. By learning the hidden links between predictors and hazard risks, we may also gain new insights on what predictors or patterns are useful for creating early hazard warning systems.

In this exercise, we ask you to use the machine learning classifiers we learned in this chapter to recreate wildfire susceptibility maps for the Liguria region of Italy. The basic idea is to use ML classifiers to analyze a dataset with observations in weather conditions, vegetation cover, and topography information. The goal will be to understand how different factors enhance or reduce the probability of firefire in Liguria, which can help authorities and decision-makers to apply resources to critical areas for hazard prevention.

6409e1d442e86ebe5987bd9902a95f44.jpg

Caption: A wildfire in Italy. Can we predict which locations are most susceptible to wildfires using simple classifiers? 🔥

Source: ANSA

Let’s start by downloading and loading the datasets into memory using the pooch, pickle5, and GeoPandas libraries:

# Install geopandas and pickle5
%pip install geopandas
%pip install pickle5
Requirement already satisfied: geopandas in c:\users\tbeucler\.conda\envs\jb\lib\site-packages (0.14.0)
Requirement already satisfied: fiona>=1.8.21 in c:\users\tbeucler\.conda\envs\jb\lib\site-packages (from geopandas) (1.9.4.post1)
Requirement already satisfied: packaging in c:\users\tbeucler\.conda\envs\jb\lib\site-packages (from geopandas) (23.1)
Requirement already satisfied: pandas>=1.4.0 in c:\users\tbeucler\.conda\envs\jb\lib\site-packages (from geopandas) (2.0.3)
Requirement already satisfied: pyproj>=3.3.0 in c:\users\tbeucler\.conda\envs\jb\lib\site-packages (from geopandas) (3.6.0)
Requirement already satisfied: shapely>=1.8.0 in c:\users\tbeucler\.conda\envs\jb\lib\site-packages (from geopandas) (2.0.2)
Requirement already satisfied: attrs>=19.2.0 in c:\users\tbeucler\.conda\envs\jb\lib\site-packages (from fiona>=1.8.21->geopandas) (21.4.0)
Requirement already satisfied: certifi in c:\users\tbeucler\.conda\envs\jb\lib\site-packages (from fiona>=1.8.21->geopandas) (2023.5.7)
Requirement already satisfied: click~=8.0 in c:\users\tbeucler\.conda\envs\jb\lib\site-packages (from fiona>=1.8.21->geopandas) (8.1.4)
Requirement already satisfied: click-plugins>=1.0 in c:\users\tbeucler\.conda\envs\jb\lib\site-packages (from fiona>=1.8.21->geopandas) (1.1.1)
Requirement already satisfied: cligj>=0.5 in c:\users\tbeucler\.conda\envs\jb\lib\site-packages (from fiona>=1.8.21->geopandas) (0.7.2)
Requirement already satisfied: six in c:\users\tbeucler\.conda\envs\jb\lib\site-packages (from fiona>=1.8.21->geopandas) (1.16.0)
Requirement already satisfied: importlib-metadata in c:\users\tbeucler\.conda\envs\jb\lib\site-packages (from fiona>=1.8.21->geopandas) (6.8.0)
Requirement already satisfied: python-dateutil>=2.8.2 in c:\users\tbeucler\.conda\envs\jb\lib\site-packages (from pandas>=1.4.0->geopandas) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in c:\users\tbeucler\.conda\envs\jb\lib\site-packages (from pandas>=1.4.0->geopandas) (2023.3)
Requirement already satisfied: tzdata>=2022.1 in c:\users\tbeucler\.conda\envs\jb\lib\site-packages (from pandas>=1.4.0->geopandas) (2023.3)
Requirement already satisfied: numpy>=1.20.3 in c:\users\tbeucler\.conda\envs\jb\lib\site-packages (from pandas>=1.4.0->geopandas) (1.25.1)
Requirement already satisfied: colorama in c:\users\tbeucler\.conda\envs\jb\lib\site-packages (from click~=8.0->fiona>=1.8.21->geopandas) (0.4.6)
Requirement already satisfied: zipp>=0.5 in c:\users\tbeucler\.conda\envs\jb\lib\site-packages (from importlib-metadata->fiona>=1.8.21->geopandas) (3.16.0)
Note: you may need to restart the kernel to use updated packages.
Collecting pickle5
  Using cached pickle5-0.0.11.tar.gz (132 kB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Building wheels for collected packages: pickle5
  Building wheel for pickle5 (setup.py): started
  Building wheel for pickle5 (setup.py): finished with status 'error'
  Running setup.py clean for pickle5
Failed to build pickle5
Note: you may need to restart the kernel to use updated packages.
  error: subprocess-exited-with-error
  
  python setup.py bdist_wheel did not run successfully.
  exit code: 1
  
  [17 lines of output]
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build\lib.win-amd64-cpython-39
  creating build\lib.win-amd64-cpython-39\pickle5
  copying pickle5\pickle.py -> build\lib.win-amd64-cpython-39\pickle5
  copying pickle5\pickletools.py -> build\lib.win-amd64-cpython-39\pickle5
  copying pickle5\__init__.py -> build\lib.win-amd64-cpython-39\pickle5
  creating build\lib.win-amd64-cpython-39\pickle5\test
  copying pickle5\test\pickletester.py -> build\lib.win-amd64-cpython-39\pickle5\test
  copying pickle5\test\test_pickle.py -> build\lib.win-amd64-cpython-39\pickle5\test
  copying pickle5\test\test_picklebuffer.py -> build\lib.win-amd64-cpython-39\pickle5\test
  copying pickle5\test\__init__.py -> build\lib.win-amd64-cpython-39\pickle5\test
  running build_ext
  building 'pickle5._pickle' extension
  error: Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools": https://visualstudio.microsoft.com/visual-cpp-build-tools/
  [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for pickle5
ERROR: Could not build wheels for pickle5, which is required to install pyproject.toml-based projects
import geopandas as gpd
import numpy as np
import pickle5 as pickle
import pooch
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[2], line 3
      1 import geopandas as gpd
      2 import numpy as np
----> 3 import pickle5 as pickle
      4 import pooch

ModuleNotFoundError: No module named 'pickle5'
# Function to load the data
def load_data(path):

    # Load the content of the pickle file (using pickle5 for Google Colab)
    with open(path, "rb") as fh:
        points_df = pickle.load(fh)

    # Convert it to a Geopandas `GeoDataFrame` for spatial analysis
    points_df = gpd.GeoDataFrame(points_df,
                                 geometry=gpd.points_from_xy(np.float64(points_df.x),
                                                             np.float64(points_df.y)))
    return points_df
# Path to the data in UNIL OneDrive
variables_path = pooch.retrieve('https://unils-my.sharepoint.com/:u:/g/personal/tom_beucler_unil_ch/EU4FQkuYknFDiDfd7droyAcBP0qFOR5-c-_Oq74gjhTGwQ?download=1',
                            known_hash='e8ebc70f972b5af4ef3d6110dcd61ce01ce5a830dcdb7d2c9e737aeab781606c')
wildfires_path = pooch.retrieve('https://unils-my.sharepoint.com/:u:/g/personal/tom_beucler_unil_ch/EcjqeERsnIRHjhcx1ZFVNggBS7nPUkW530XRrpVUB-qnOw?download=1',
                             known_hash='361f067aafbac8add8f8a9a5c630df3c962cd37a2f125f420e7b9330fd0a1a4c')
Downloading data from 'https://unils-my.sharepoint.com/:u:/g/personal/tom_beucler_unil_ch/EU4FQkuYknFDiDfd7droyAcBP0qFOR5-c-_Oq74gjhTGwQ?download=1' to file '/root/.cache/pooch/07ef52a70acfc4230ffc4d0ca1624c7a-EU4FQkuYknFDiDfd7droyAcBP0qFOR5-c-_Oq74gjhTGwQ'.
Downloading data from 'https://unils-my.sharepoint.com/:u:/g/personal/tom_beucler_unil_ch/EcjqeERsnIRHjhcx1ZFVNggBS7nPUkW530XRrpVUB-qnOw?download=1' to file '/root/.cache/pooch/083c82ff55a8a8a6c98e76bf347a96ef-EcjqeERsnIRHjhcx1ZFVNggBS7nPUkW530XRrpVUB-qnOw'.
# Load the data and convert it to a GeoPandas `GeoDataFrame`
# This can take a minute
variables = load_data(variables_path)
wildfires = load_data(wildfires_path)

3.5.1.1. Part I: Pre-Processing the Dataset for Classification#

Q1) After analyzing the topography and land cover data provided in variables, create your input dataset inputs from variables to predict the occurence of wildfires (wildfires). Keep at least one categorical variable (veg, bioclim, or phytoclim).

Hint 1: Refer to the documentation at this link to know what the different keys of variables refer to.

Hint 2: You may refer to Table 1 of Tonini et al., copied below, to choose your input variables, although we recommend starting with less inputs at first to build a simpler model and avoid overfitting.

Marj_Table1.PNG

Here are some pandas commands you could use to explore your data. .head() .columns() .describe()

# Explore the `variables` dataset
#########################################################################################################
# 1. What kind of data is provided in the "variables" panda DataFrame?
#########################################################################################################
# Can you print the variable names?
print(variables._______)
# Can you print a table with all the statistics of each variable in the dataframe?
print(variables._________)

There are 25 columns (variables) in the DataFrame. It is probably best to start simple and just use a few variables to make the wildfire prediction.

Can we use a very simple model to predict wildfires ❓

A simple model might contain dem, slope, veg and bioclim. We can use it as a baseline to evaluate model performance when you use other combinations to train the model.

However, we cannot tell you which combination would perform the best as we have not done an extensive search while preparing the notebook.

# Here you will filter the DataFrame so that only the variables you want are in the input
############################################################################################################
# 1. Drop down the names of variables you want here so that pandas can do the filtering for you.
############################################################################################################
vars_tokeep = [___,____,____,__name__]
############################################################################################################
# 2. Create a new dataframe 'inputs' with just the variables you want
############################################################################################################
inputs = variables[_______] # Filter 'variables' dataframe
inputs.head # print the first few columns to make sure everything works fine
<bound method NDFrame.head of              dem      slope veg bioclim
point_index                            
0            563  20.843185  34      15
1            527  23.599121  34      15
2            525  26.699856  34      15
3            519  24.512413  34      15
4            532  20.421495  34      15
...          ...        ...  ..     ...
519336       125  29.669502  37       0
519386       139  23.541714  32       0
519388       113  16.290375  32       0
519422        95  23.240271  37       0
519423        67  15.023369  32       0

[528669 rows x 4 columns]>

Q2) To avoid making inaccurate assumptions about which types of vegetation and non-flammable area are most similar, convert your categorical inputs into one-hot vectors.

Hint 1: You may use the fit_transform method of scikit-learn’s OneHotEncoder class to convert categorical inputs into one-hot vectors.

Hint 2: Don’t forget to remove the categorical variables from your input dataset, e.g. using drop if you are still using a GeoDataFrame, or del/pop if you are working with a Python dictionary.

Hint 3: There are numerous ways to change the categorical data into one-hot vectors. It is quite easy to do in pandas, but scikit-learn also provides some transformers that could be useful, including .ColumnTransformer() and Pipeline().

In the guided reading, you have seen how these functions are used. Try experiment with them and see if you prefer using these scikit transformers or pandas.

############################################################################################################
# 1. Import OneHotEncoder for 'veg' and 'bioclim' data, and pandas
############################################################################################################
from sklearn._____________ import OneHotEncoder
import pandas as pd
############################################################################################################
# 2. Initiate OneHotEncoder object, set the 'sparse' option to False
############################################################################################################
enc = OneHotEncoder(____=____)
############################################################################################################
# 3. Convert categorical variables into one-hot vectors
############################################################################################################
# Convert categorical variables with 'enc'
veg_onehot = enc.__________(variables[[______]])
bioclim_onehot = enc.__________(variables[[______]])
# Convert 'veg_onehot' and 'bioclim_onehot' to pandas DataFrames
veg_transform = _____________(________) # veg
bio_transform = _____________(________) # bioclim

############################################################################################################
# 4. Print the shape of 'veg_transform'
############################################################################################################
print(_______________)
# Don't change these!
veg_transform= veg_transform.add_prefix('v_')
bio_transform= bio_transform.add_prefix('b_')
Input (528669, 7)
############################################################################################################
# 5. Convert 'inputs' into Panda DataFrame
############################################################################################################
inputs_gdf = ___________(_____)
############################################################################################################
# 6. Add 'veg_transform' and 'bio_transform' into the inputs DataFrame
############################################################################################################
inputs_gdf = inputs_gdf.___(___________)
inputs_gdf = inputs_gdf.___(____________)
############################################################################################################
# 7. Use .drop() to delete 'veg' and 'bioclim' from DataFrame
############################################################################################################
inputs_gdf.drop(_____=[____,_____])

Now that we built our inputs dataset, we are ready to build our outputs dataset!

Q3) Using the point_index column of wildfires and variables, create your outputs dataset, containing 1 when there was a wildfire and 0 otherwise.

Hint: Check that inputs and outputs have the same number of cases by looking at their .shape[0] attribute.

#####################################################################################################################
# 1. Use numpy to initialize an outputs array with the same shape as the 'point_index' column in the variables table
#####################################################################################################################
outputs = np.zeros_like(_______[________])
#####################################################################################################################
# 2. Pull out the indices of wildfire locations from the pandas DataFrame 'wildfires'
#####################################################################################################################
wildfiresindex = ________.______
#####################################################################################################################
# 3. Fill the zero array with ones if a fire broke out at a specific location index.
# Here you can use a for loop for this task
#####################################################################################################################
for ___ in ____________:
  outputs[___] = 1
#####################################################################################################################
# 4. Check that `inputs` and `outputs` have the same shape[0]
#####################################################################################################################

Q4) Separate your inputs and outputs datasets into a training and a test set. Keep at least 20% of the dataset for testing.

Hint 1: You may use scikit-learn’s train_test_split function.

Hint 2: If you are considering optimizing the hyperparameters of your classifier, form a validation dataset as well.

Hint 3: We recommend performing the split on the indices so that it is easier to track what points are in which dataset after splitting. You will have an easier time when plotting the susceptibility map.

#####################################################################################################################
# 1. Import train_test_split()
#####################################################################################################################
from sklearn._______________ import train_test_split
#####################################################################################################################
# 2. Create a list with all indices in the input [0,1,2,...]
#####################################################################################################################
all_indices = list(range(len(inputs_gdf)))

#####################################################################################################################
# 3. Convert the inputs_gdf dataframe to numpy array
#####################################################################################################################
inputs_gdf = inputs_gdf.to_numpy()

#####################################################################################################################
# 4. Apply train_test_split on 'all_indices', and store training indices in 'indx_train', test indices in 'indx_test'
#####################################################################################################################
____,______ = _________________(____________,test_size=______,random_state=42)

#####################################################################################################################
# 5. Create training and test sets with indx_train and indx_test
#####################################################################################################################
____,____,___,___ = ____________[___________],_______[________],______[________],_______[__________]
# Check the shape of your training(/validation)/test sets
# and make sure you kept at least 20% of your dataset for testing
(396501, 16) (132168, 16)

Congratulations, you have created a viable wildfire dataset to train a machine learning classifier! 😃 Now let’s get started 🔥

3.5.1.2. Part II: Training and Benchmarking the Machine Learning Classifiers#

Q5) Now comes the machine learning fun! 🤖 Train multiple classifiers on your newly-formed training set, and make sure that at least one has the predict_proba method once trained.

Hint: You may train a RandomForestClassifier or an ExtraTreesClassifier, but we encourage you to be creative and include additional classifiers you find promising! 💻

#####################################################################################################################
# 1. Import RandomForestClassifier
#####################################################################################################################
from sklearn.__________ import RandomForestClassifier
#####################################################################################################################
# 2. Convert X_train and X_test into pandas DataFrame
#####################################################################################################################
X_train = ____________(_______)
X_test = _____________(_______)
X_train
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 545 15.253697 34 8 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0
1 213 19.419361 32 8 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0
2 484 22.660355 37 8 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0
3 376 17.878534 32 8 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0
4 1241 8.114972 32 26 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
396496 695 21.491549 34 15 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0
396497 254 14.585821 32 8 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0
396498 237 8.356695 32 8 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0
396499 917 14.110499 33 15 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
396500 556 24.595417 32 8 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0

396501 rows × 16 columns

#####################################################################################################################
# 3. Initiate RF classifier and fit it on training set
#####################################################################################################################
rfc = _____________
rfc.___(_____,_____)
RandomForestClassifier()
# (Optional) Conduct a hyperparameter search on the validation set
# e.g., using scikit-learn's RandomizedSearchCV

Q6) Compare the performance and confusion matrices of your classifiers on the test set. Which classifier performs best in your case?

Hint 1: You may use the accuracy_score to quantify your classifier’s performance, but don’t forget there are many other performance metrics to benchmark binary classifiers.

Hint 2: You can directly calculate the confusion matrix using scikit-learn’s confusion_matrix function.

#####################################################################################################################
# 1. Import accuracy_score, confusion_matrix, and ConfusionMatrixDisplay
#####################################################################################################################
from sklearn._____ import ___________
from sklearn.____ import _____________, ___________________
#####################################################################################################################
# 2. Use rfc to make prediction on test set
#####################################################################################################################
rfc_preds = rfc.________(_____)
#####################################################################################################################
# 3. Use accuracy_score to infer trained classifier performance
#####################################################################################################################
rfc_acc = ____________(_____, ________)
#####################################################################################################################
# 4. Plot the confusion matrix
#####################################################################################################################
cm = confusion_matrix(______, _______, normalize=____) # Get a confusion matrix 'cm', use 'pred' for the normalize option
disp = _________________(_____________ = cm) # Use ConfusionMatrixDisplay to visualize 'cm'
disp.plot()

For comparison, below is the confusion matrix obtained by the paper’s authors:

download (2).png

3.5.1.3. Part III: Making the Susceptibility Map#

Q7) Using all the classifiers you trained that have a predict_proba method, predict the probability of a wildfire over the entire dataset.

Hint: predict_proba will give you the probability of both the presence and absence of a wildfire, so you will have to select the right probability.

#####################################################################################################################
# 1. Predict the probability of a wildfire or not on the *entire* dataset
#####################################################################################################################
predprob_rf = rfc.________(_______) # Use .predict_proba in the 'rfc' classifier to process the entire dataset
#####################################################################################################################
# 2. Extract the probability of a wildfire happening over the entire period of the dataset for each instance
#####################################################################################################################
rf_testprob_fire = _______[____] # Hint: predprob_rf shape = [:,2], select the second column

Q8) Make the susceptibility map 🔥

Hint 1: The x and y coordinates for the map can be extracted from the variables dataset.

Hint 2: You can simply scatter x versus y, and color the dots according to their probabilities (c=probability of a wildfire) to get the susceptibility map.

# Scatter ['x'] vs ['y'] columns in the entire dataframe and color the dots with the predicted probability 'rf_testprob_fire'
# to make the susceptibility map

import matplotlib.pyplot as plt
fig,ax = plt._______(____,figsize=(6+5,4+2))
cmploy = ax.scatter(_________,_____________,s=5,c=____________,cmap='viridis')
plt.colorbar(cmploy,ax=___)
ax.set_title('Wildfire Susceptibility Map (RF)')
ax.set_ylabel('Y',size=13)
ax.set_xlabel('X',size=13)
ax.tick_params(axis='both', which='major', labelsize=11)
plt.show()
../_images/ddbd2e5f6dd2c16083efc96619fb414bc2994536d9a96d6ea524b78b42d895dd.png

You should get a susceptibility map that looks like the one below. Does your susceptibility map depend on the classifier & the inputs you chose? Which map would you trust most?

download (3).png

It seems like our model was too simple to generate a useful map 😞 . Your TA actually experimented training a RandomForest model with 10 variables and got a 91% accuracy!

So you should definitely try combinations of different variables to get a map that is better than what you just got.

3.5.2. Bonus Exercise 4: Exploring the Susceptibility Map’s Sensitivity to Seasonality and Input Selection#

josh-hild-N3e9vYJGZ1w-unsplash (1).jpg

Caption: The Liguria region (Cinque Terre), after you save it from raging wildfires using machine learning ✌

3.5.2.1. Part I: Seasonality#

Q1) Using the season column of wildfires, separate your data into two seasonal datasets (1=Winter, 2=Summer).

Hint: When splitting your inputs into two seasonal datasets, keep in mind that temp_1 and prec_1 are the climatological mean temperatures and precipitation during winter, while temp_2 and prec_2 are the climatological mean temperature and precipitation during summer.

# Identify indices for which the wildfires occured during winter/summer
# Use these indices to split your `inputs` and `outputs` datasets
# into two seasonal datasets
# Verify that for each season, the shape[0] of your
# `inputs` and `outputs` sets are the same

Q2) Use these two seasonal datasets to make the Liguria winter and summer susceptibility maps using your best classifier(s). What do you notice?

Hint: Feel free to recycle as much code as you can from the previous exercise. For instance, you may build a library of functions that directly train the classifier(s) and output susceptibility maps!

# So
# Much
# Recycling
# Compare the winter and summer susceptibility maps

3.5.2.2. Part II: Input Selection#

The details of the susceptibility map may strongly depend on the inputs you chose from the variables dataset. Here, we explore two different ways of selecting inputs to make our susceptibility maps as robust as possible.

Q3) Using your best classifier, identify the inputs contributing the most to your model’s performance using permutation feature importance.

Hint: You may use scikit-learn’s permutation_importance function using your best classifier as your estimator.

# Import the necessary functions and classes
# Calculate the permutation importance of each of your model's inputs
# Display the result and identify the most important inputs

Q4) Retrain the same type of classifier only using the inputs you identified as most important, and display the new susceptibility map.

Hint: Feel free to recycle as much code as you can from the previous exercise. For instance, you may build a library of functions that directly train the classifier(s) and output susceptibility maps!

# Lots
# of
# recycling
# Make the new susceptibility map

Can you explain the differences in susceptibility maps based on the inputs’ spatial distribution?

If the susceptibility map changed a lot, our best classifier may initially have learned spurious correlations. This would have affected our permutation feature importance analysis, and motivates re-selecting our inputs from scratch! 🔨

Q5) Use the SequentialFeatureSelector to select the most important inputs. Select as few as possible!

Hint: Track how the score improves as you add more and more inputs via n_features_to_select, and stop when it’s “good enough”.

# Import the SequentialFeatureSelector
# Add more and more inputs
# How many inputs do you need to get a "good enough" score?

Which inputs have you identified as the most important? Are they the same as the ones you selected using permutation feature importance?

Q6) Retrain the same type of classifier using as little inputs as possible, and display the new susceptibility map.

Hint: Feel free to recycle as much code as you can from the previous exercise. For instance, you may build a library of functions that directly train the classifier(s) and output susceptibility maps!

# Recycle your previous code here
# and here
# And remake the final susceptibility map

How does it compare to the authors’ susceptibility map below?

download (3).png