Open In Colab

One of the best ways to improve skills of machine learning models is to combine multiple machine learning models. Since each model in the ensemble will make slightly different predictions, the models could be more robust and generalizable to unseen data. We can also characterize the uncertainty of ML predictions with an ensemble approach.

Here we will create multiple individual classifiers on the MNIST data, a dataset with hand-written digit images, and perform skill evaluation. The skills of individual models will be compared with skills of ensemble models.

3.4. Exercise 3: Comparing (Ensemble of) Classifiers on MNIST Data#

MNIST_Examples.png

The goal is to train and compare individual classifiers on MNIST data, before combining them into an ensemble model. Will the power of teamwork shine through? 🔢

Let’s start by loading the MNIST database!

from sklearn.datasets import fetch_openml
import numpy as np
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[1], line 1
----> 1 from sklearn.datasets import fetch_openml
      2 import numpy as np

ModuleNotFoundError: No module named 'sklearn'
# Setting `as_frame` to False to avoid loading mnist as Pandas dataframe
mnist = fetch_openml('mnist_784', version=1, as_frame=False)
# Making sure we are working with 8-bit unsigned integers as targets
mnist.target = mnist.target.astype(np.uint8)
# Here we are creating our dataset to train our ML model
# X contains the digit pictures, and y contains the label corresponding to each picture
X = mnist['data'] # Read digit pictures
y = mnist['target'].astype(np.uint8) # Read labels
X.shape
(70000, 784)

Q1) Split the MNIST dataset into a training, a validation, and a test sets

Hint 1: The documentation for scikit-learn’s train_test_split function is at this link.

Hint 2: You may use 50k instances for training, 10k instances for validation, and 10k instances for testing.

# Import the necessary functions and utilities
# You will need train_test_split() that we used in previous notebooks
#from sklearn._____________ import ____________________
from sklearn.model_selection import train_test_split
# Split the MNIST data into training, validation, and test
# set data to be used for training and testing/validation
# train_test_split only allows two-way splits. To get our training, validation, and test set, we will need to call train_test_split() 2 times
#############################################################################################################################################
# 1. Split the data into training set and validation-test set. Set the size of the training set to be 50000
#############################################################################################################################################
________,_______,__________,__________ = train_test_split(__,__, __________=50000)

#############################################################################################################################################
# 2. Split the validation-test set into validation set and test set. Set the size of the test set to be 10000
#############################################################################################################################################
________,_______,__________,__________ = train_test_split(__,__, __________=10000)

#############################################################################################################################################
# 3. Print the shape of your training, validation, and test data. You should see (50000,784) [training], (10000,784) [validation], (10000,784) [test]
#############################################################################################################################################
print(____________________)

Q2) Train various classifiers on the training set and compare them on the validation set

Hint: You may compare a RandomForestClassifier, an ExtraTreesClassifier, and a SVC, but we encourage you to be creative and include additional classifiers you find promising! The more the merrier 😀

Note from TA: The SVC can be slow to train. Test RandomForest and ExtraTrees first for quick results.

# Import all the classifiers you need
from sklearn._________ import RandomForestClassifier
from sklearn._________ import ExtraTreesClassifier
from sklearn._____ import SVC
# Initiate Classifiers
rfc = ___________________ # RandomForestClassifier
etc = ___________________ # ExtraTreesClassifier

# Fit Classifiers on training data
rfc.___(_________,________) # RandomForestClassifier
etc.___(_________,________) # ExtraTreesClassifier
# Import accuracy_score module from sklearn
from sklearn._______ import accuracy_score
# Use the trained classifiers to make predictions on the validation set
rfc_preds = rfc._______(____)
etc_preds = etc._______(____)

# Use accuracy_score() to see if our models can successfully classify the validation data.
# We got around 96-97% accuracy. Did your models perform well on the validation data as well?
rfc_acc = accuracy_score(______,_______)
etc_acc = accuracy_score(______,_______)
print(rfc_acc)
print(etc_acc)

Now it’s time to make the individual classifiers vote to form an ensemble model

Q3) Combine the classifiers into an ensemble that outperforms them all on the validation set, using a soft or hard voting classifier.

Hint: The documentation for scikit-learn’s VotingClassifier class can be found at this link. Note that its argument voting can be changed from hard to soft.

# Import VotingClassifier module from sklearn
from sklearn.__________ import VotingClassifier
# Define your voting classifier here
vc_hard = VotingClassifier(________=[(____,_______), (___,_____)]) # Hard Voting
vc_soft = VotingClassifier(________=[(____,_______), (___,_____)], ____=____) # Soft Voting
# Train the two voting classifiers
vc_soft.____(________, _________) # Hard voting
vc_hard.____(________, _________) # Soft voting

# Evaluate classifier performance on validation set. The evaluation will be based on accuracy_score(), similar to previous notebooks.
vc_soft_preds = vc_soft._________(_____)
vc_hard_preds = vc_hard._________(_____)

# Calculate your accuracy scores here.
vc_soft_acc = ____________(______, ____________)
vc_hard_acc = ____________(____, _____________)
# How well did our voting classifier perform on the validation data
# Which of the two was better?
# Compare the accuracy values for the voting classifiers to the accuracy of individual classifiers

Hint: If your ensemble does significantly worse than individual classifiers, consider deleting the individual classifiers negatively affecting the performance of your ensemble using del Voting_Classifier.estimators_[index_of_model_to_delete], where the estimators_ attribute of your Voting_Classifier’s lists the individual classifiers that were trained as part of the ensemble.

Q4) Does your ensemble clearly outperform your individual classifiers on the test set

# Use the classifiers to make classification on the test set
################################################################################
# Individual Classifier predictions
################################################################################
rfc_preds_test = rfc._______(______)
etc_preds_test = etc._______(______)
################################################################################
# Voting Classifier predictions
################################################################################
vc_soft_preds_test = vc_soft._______(______)
vc_hard_preds_test = vc_hard._______(______)
################################################################################
# Compare accuracy scores
################################################################################
vc_soft_acc_test = ______________(______________, ______________)
vc_hard_acc_test = ______________(______________, ______________)
rfc_acc_test = ______________(______________, ______________)
etc_acc_test = ______________(______________, ______________)
# Does it clearly beat the best individual classifier?
print("VC soft:", vc_soft_acc_test)
print("VC hard:", vc_hard_acc_test)
print("RandomForest: ", rfc_acc_test)
print("ExtraTrees: ", etc_acc_test)

Your voting classifier may only slightly beat the best model. Maybe voting isn’t the best way to get the best prediction!

Let’s try the brute-force approach: Training a classifier on the individual model’s predictions to beat the voting approach.

3.4.1. Bonus Exercise 3: From Individual Classifiers to Ensemble Stacking via Blenders#

Blender.jpg

Let’s learn how to best blend the individual classifiers’ predictions!

Q1) Run the individual classifiers from the previous exercise to make predictions on the validation set, and create a new training set with the resulting predictions

Hint: The target stays the same, but now each training instance is a vector containing the set of predictions from all your individual classifiers. You may group all these vectors into a feature array X_val_predictions that should have the shape: (Number_of_validation_instances,Number_of_individual_classifiers).

# Create the new training set
# Make sure it has the right shape and contains sensical values

Q2) Train a classifier on this new training set

Hint 1: You may train a RandomForestClassifier.

Hint 2: You could fine-tune this blender or try other types of blenders (e.g., a LogisticRegression or an MLPClassifier), then select the best one using cross-validation.

# Fit the classifier to the new training set
# Calculate its mean accuracy
# (Optional) Try other classifiers on this new training set
# if you're not satisfied with the new accuracy

Congratulations! 😃

You have just trained a blender, and together with classifiers they form a stacking ensemble. Now let’s evaluate the ensemble on the test set.

Q3) Evaluate the blender on the test set and compare it to the voting classifier you trained earlier

Hint 1: You will have to first calculate the predictions of your individual classifiers on the test set, similar to what you did in Question 1.

Hint 2: Make sure you use the same score (e.g., the accuracy_score) to compare both ensemble models.

# Calculate the predictions of your individual classifiers on the test set
# and format them so you can feed them to your blender
# Calculate the mean accuracy of the blender on the test set
# Compare it to the mean accuracy of individual models and the voting classifier

Is the blender worth the effort?