(Exercise) Artificial Neural Networks with Keras

5.2. (Exercise) Artificial Neural Networks with Keras#

This notebook was designed to be run on Google Colab and we recommend clicking on the Google Colab badge to proceed.

Photo Credits: Galaxy's Edge by Rod Long licensed under the Unsplash License

The defnition of AI is a highly contested concept. It often refers to technologies that demonstrate levels of independent intelligence from humans. By its very defnition, it is an intelligence that is differentiated from natural intelligence; it is a constructed, artificial, or machine intelligence.
\(\quad\)Ryan, M. (2020). In AI we trust: ethics, artificial intelligence, and reliability. Science and Engineering Ethics, 26(5), 2749-2767.

This notebook, whose first draft was written by Milton Gomez, covers Chapters 10 of Géron, and builds on the notebooks made available on Github.

5.2.1. Notebook Setup#

First, let’s import a few common modules, ensure MatplotLib plots figures inline and prepare a function to save the figures. We also check that Python 3.5 or later is installed (although Python 2.x may work, it is deprecated so we strongly recommend you use Python 3 instead), as well as Scikit-Learn ≥0.20 and TensorFlow ≥2.0.

# Python ≥3.5 is required
import sys
assert sys.version_info >= (3, 5)

# Scikit-Learn ≥0.20 is required
import sklearn
assert sklearn.__version__ >= "0.20"

try:
    # %tensorflow_version only exists in Colab.
    %tensorflow_version 2.x
except Exception:
    pass

# TensorFlow ≥2.0 is required
import tensorflow as tf
assert tf.__version__ >= "2.0"

# Common imports
import numpy as np
import os

# to make this notebook's output stable across runs
rnd_seed = 42
rnd_gen = np.random.default_rng(rnd_seed)

# To plot pretty figures
%matplotlib inline
import matplotlib as mpl
import matplotlib.pyplot as plt
mpl.rc('axes', labelsize=14)
mpl.rc('xtick', labelsize=12)
mpl.rc('ytick', labelsize=12)

# Where to save the figures
PROJECT_ROOT_DIR = "."
CHAPTER_ID = "ann"
IMAGES_PATH = os.path.join(PROJECT_ROOT_DIR, "images", CHAPTER_ID)
os.makedirs(IMAGES_PATH, exist_ok=True)

def save_fig(fig_id, tight_layout=True, fig_extension="png", resolution=300):
    path = os.path.join(IMAGES_PATH, fig_id + "." + fig_extension)
    print("Saving figure", fig_id)
    if tight_layout:
        plt.tight_layout()
    plt.savefig(path, format=fig_extension, dpi=resolution)

# Initialize the run_index
run_index = None

# Loading Tensorboard
%load_ext tensorboard

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[1], line 6
      3 assert sys.version_info >= (3, 5)
      5 # Scikit-Learn ≥0.20 is required
----> 6 import sklearn
      7 assert sklearn.__version__ >= "0.20"
      9 try:
     10     # %tensorflow_version only exists in Colab.

ModuleNotFoundError: No module named 'sklearn'

Data Setup

Today, we’ll once again be working on the MNIST handwritten digit database - we’re becoming experts in typography! ✍

Let’s begin by importing the dataset from the keras dataset library.

5.2.2. Q1) Load the MNIST dataset from Keras. Divide it into a training, validation, and test dataset#

Hint 1: To access the Keras library, you can either reimport keras (e.g., import tensorflow.keras as keras), or you can access it from the instance of tensorflow we imported during setup (i.e., using tf.keras)

Hint 2: Here is the documentation for the Keras implementation of the MNIST dataset

Hint 3: If you use the mnist.load_data() method, what will be returned will be a set of tuples: (training_data, testing_data), where training_data and testing_data are tuples of inputs and labels (X, y)

Hint 4: You can break down the training dataset from the .load() method into a training and validation dataset. Since the full training dataset includes 60 000 samples, try using 50 000 samples as training data and 10 000 samples as validation data.

# Load the keras dataset data
( (X_train_full, y_train_full) , (_____, _____) ) = _____.mnist._____()

# Split the data
X_train =
X_valid =
_______ =
_______ =

What does our data look like? Let’s get an idea of the values and figure out what kind of preprocessing we should do before training our neural network.

5.2.3. Q2) Print the shape of the training, validation, and test sets. Then, print the maximum and minimum input values.#

Hint 1: You loaded the data as numpy arrays. Thus, you can rely on the built-in methods for finding the shape and min/max values.

Hint 2: Click for the documentation on ndarray.max(), ndarray.min(), and ndarray.shape

#Write your code here

If you used the same train/validation split as we did, you should have 50k samples in the training set, 10k in the validation set, and 10k in the test set.

Since the data represents grayscale image values, data values should vary between 0 and 255; Normalize the data by dividing it by 255.

5.2.4. Q3) Normalize the input data for the training, validation, and testing sets#

Hint 1: The datasets are stored as simple numpy arrays, so you can perform arithmetic operations on them!

X_train = _____ / 255
_____ =
_____ =

We now have the normalized training, validation, and testing data that we’ll use to train our neural network. Before moving on, it might be worth it to make a small visualiation of samples in our data to ensure that everything worked out correctly.

5.2.5. Q4) To visualize a sample image, write a function that:#

1) Takes in an input dataset and its labels, a number of rows, and a number of columns
2) Prints out a random n_rows by n_columns sample of images with their labels

**

Hint 1: You can use the rnd_seed.integers() generator to generate a set of integers between 0 and the number of samples, with a size of (rows,columns). Here is some documentation that can help. It’s best practice to take in the random generator as an argument for your function.

Hint 2: You can use matplotlib’s fig, axes = plt.subplots() to make a grid of axes and call the imshow() method on each ax in order to plot the digit. It is recommended that you use the cmap='binary' argument in imshow to print the digits in black and white. Click on the links for the documentation to plt.sublopts(), plt.imshow(), and the colormaps (i.e., cmap values) available in matplotlib.

Hint 3: You can iterate using numpy ndenumerate() method, which will return the n-dimensional index of the array and the element located there. This will be useful when iterating through the indices you generated and plotting the corresponding digit and label

#@title Hint 4: Code Snippet, if you're feeling stuck

'''
def sample_plotter(X, y, n_rows, n_columns, rnd_gen):
    assert type(X) == type(np.empty(0))
    indices = rnd_gen.integers(0,X.shape[0], size=(n_rows, n_columns))

    fig, axes = plt.subplots(n_rows, n_columns, figsize=(8,6))

    for idx, element in np.ndenumerate(indices):
        axes[idx].imshow(X[element], cmap='binary')
        axes[idx].axis('off')
        axes[idx].title.set_text(y[element])
    return
''';

def sample_plotter(___, ___, ___, ___):

    # Create a set of indices to access the sample images/labels

    # Create a figure with n_rows and n_columns

    # Plot each selected digit
    for in :

    return None

Now that our function is defined, let’s go ahead and print out a 4 row by 8 column sample from each dataset.

5.2.6. Q5) Grab a 4x8 sample of digits from each dataset and print out the image and labels#

#Write your code here!

We’re now ready to start developing our neural network. The first thing that we want to do is figure out an appropriate learning rate for our model - after all, we want to choose one that converges to a solution and is the least computationally expensive possible.

Let’s start by setting up a keras callback (click here for the documentation), a type of object that will allow us to change the learning rate after every iteration (i.e., after every batch of data). We will set up what is called an exponential learning rate (that is, the learning will increase by a factor of \(k\) after each iteration). Expressed mathematically, \begin{align} \eta_{\scriptsize{t}} = \eta_{\scriptsize{0}} , \cdot , k^{\scriptsize{t}} \end{align} where \(t\) is the current iteration.

As a reminder, an epoch is an iteration through the entire training dataset, while a batch is an iteration through a predefined subset of . It’s important to make this distinction, as ML algorithms are often trained in batches when dealing with large datasets, and we normally do not want to change the learning rate in between batches during model training. However, we will do so during this evaluation phase in order to determine an adequate learning rate.

We will therefore set a callback that will do two things after the end of each batch:

Keep a track of the losses
2) Adjust the learning rate by multiplying it by a predefined factor

5.2.7. Q6) Set up an Exponential_Learning_Rate callback that, after each batch, logs the value of the loss function and learning rate, and then multiplies the learning rate by a factor of \(k\)#

Hint 1: Multiple backend options are available with Keras. We will be using tensorflow, but the code is thought to be written in such a way that a different backend could be used. tf.keras.backend has a .backend() method that allows you to check what backend is being used.

*Hint 2: You should extend the tf.keras.callbacks.Callback class. (Confused about extending classes? Here is a question on stack overflow that could provide some context) *

Hint 3: The ExponentialLearningRate callback we will implement will need to take in the \(k\) factor during its initialization (here’s a quick overview on the init contructor method and self arguments in classes, with a focus on python.). You will also need to save an empty list as an attribute for both the losses and the learning rates

Hint 4: Keras model optimizers have an attribute where the learning rate is stored: model.optimizer.learning_rate. In order to read the value, you will have to use the keras backend’s .get_value() method with the model’s learning rate as an argument

Hint 5: the on_train_batch_end method pass the logs argument into the function. You can access the loss function by using logs['loss']

Hint 6: In order to set the learning rate to a different value, you will have to depend on the keras backend’s .set_value() method. This method takes in two arguments: the first is the value that will be set (e.g., the learning rate in the model’s optimizer) and the value that it will be set to (e.g., the learning rate multiplied by the k factor).

Hint 7: Unlike in other documentations we’ve seen, backend.get_value() and backend.set_value() don’t yet have their own page. However, here is the link to an example where both methods are used in a learning rate scheduler.

# We'll start by making it easier to access the keras backend. See hint #1 for
# more details
K = tf.keras.backend

# Use the .backend() method to determine what backend we're running
___.___

# Remember that you can access the keras.backend using K, which we defined in
# the code cell above!

class _____(____.____.____.Callback): #define the ExponentialLearningRate class
    # Start
    def __init__(self, factor):
        self.____ = ____ # set the factor
        self.____ = ____ # initialize the losses list
        self.____ = ____ # initialize the learning rates list

    def on_batch_end(self, batch, logs):
        # Add the value of the learning rate to the list
        self.___.append(__.___(self.model.___.___))

        # Add the value of the loss
        self.___.append(___[___])

        # Set the value of the
        ___.___(self.model.___.___, self.model.___.___ * self.___)

Now that we’ve defined out callback, we can go ahead and start thinking about our neural network. For consistency’s sake, let’s start by clearing the Keras backend and setting our random state.

# Run this cell
K.clear_session()
np.random.seed(rnd_seed)
tf.random.set_seed(rnd_seed)

Let’s make a simple neural network model using Keras. For this, we will rely on a Sequential model, since we will want all of the inputs of one layer to be fed into the next layer. We recommend using the architecture described in the diagram below, but feel free to define your own architecture!

5.2.8. Q7) Write a sequential Keras model that will predict the digit class.#

Hint 1: You can add the layers in the sequential model when initializing the model. It expects the layers in a list. Alternatively, you can add them one by one using the model’s .add() method. Check out the documentation here.

Hint 2: The input images should be flattened before feeding them into any densely connected layers. Here is the documentation for the flatten layer.

Hint 3: You want to use simple, densely connected layers for this exercise. Here is the documentation for the dense layer.

Hint 4: Using a dense layer with the number of units set to the number of classes (e.g., the number of different digits in the MNIST dataset: 10) using a softmax activation unit can be interpreted as a probability of the input belonging to a given class. Here is the documentation for the softmax activation function in Keras

# Create your model! Feel free to use our outline, or make your own from scratch

model = tf.___.____.sequential([  # call the keras sequential model class
                            ___,  # 1st Layer
                            ___,  # 2nd Layer
                            ___,  # 3rd Layer
                            ___]) # 4th Layer

Now that we have a model defined, we need to run its `.compile()’ method, in which we will give the model the following hyper-parameters:

Loss function will be set to sparse categorical cross entropy
2) The optimizer will be set to Stochastic Gradient Descent with a learning rate of 1e-3
3) The model metrics will include the accuracy score

5.2.9. Q8) Compile the model with the given hyperparameters (i.e., loss function, optimizer, and metrics) and instantiate the callback we defined previously using a \(k\) factor of 1.005 (i.e., a 0.5% increase in learning rate per batch)#

Hint 1: Here is the documentation for the sparse categorical cross entropy loss function in keras. You can simply reference the function using loss='sparse_categorical_crossentropy' when compiling.

Hint 2: Here is the documentation for the Stochastic Gradient Descent optimizer in keras

Hint 3: Here is the documentation for the accuracy score implementation in keras. Like with the sparse_categorical_cross_entropy loss, you can reference the accuracy score in the metrics list, e.g. by setting metrics=['accuracy'] when compiling.

____.compile(___=___, # Set the loss function
              ___=___.____.___(___=___), # Set the optimizer and learning rate
              ___=[___]) # Set the metrics

exponential_lr_callback = _____(factor=____)

Let’s go ahead and train the compiled model for a single epoch.

5.2.10. Q9) Fit the model for a single epoch, using the exponential learning rate callback we defined in the previous code cell. Then, plot the Loss vs Learning rate.#

Hint 1: Just like in scikit-learn, the keras model includes a .fit() method to train the algorithm! Here is the documentation.

Hint 2: After training, you can access the recorded losses and corresponding learning rates using the attributes we defined when we defined the class in Q5!

history = model.___(____, # set the training inputs
                    ____, # set the training labels
                    ____=__, # set the number of epochs
                    validation_data=(____, ____), # set validation input/labels
                    callbacks=[_____]) # Set the callback

# Plotting
fig, ax = plt.subplots()

ax.plot(___.___, # learning rates
        ___.___) # losses

# Define a tuple with (min_learning_rate, max_learn_rate)
x_limits = ( min(___.___), max(___.___) )

# Set the xscale to logarithmic
ax.set_xscale('log')

# Draw a horizontal line at the minimum loss value
ax.hlines(min(____.___), #Find the minimum loss value to draw a horizontal line
          *x_limits, # the star unpacks x_limits to the expected num of args
          'g')

# Set the limits for drawing the curves
ax.set_xlim(x_limits)
ax.set_ylim(0, ____) # use the initial loss as the top y boundary

# Display gridlines to see better
ax.grid(which='both')

ax.set_xlabel("Learning rate")
ax.set_ylabel("Loss")

If you used the architecture we defined above with the learning rate we defined above, you should produce a graph that looks like this:

In this graph, you can see that the loss reaches a minimum at around 6e-1 and then begins to shoot up violently. Let’s avoid that by using half that value (e.g., 3e-1).

If you have a different curve, try setting your learning rate to half of the learning rate with the minimum loss! 😃

Now that we have an idea of what the learning rate should be, let’s go ahead and start from scratch once more.

# Run this cell - let's go back to a clean slate!
K.clear_session()
np.random.seed(rnd_seed)
tf.random.set_seed(rnd_seed)

We also want to instantiate the model again - the weights in our current model are quite bad and if we use it as is it won’t be able to learn since the weights are too far away from the solution. There are other ways to do this, but since our model is quite simple it’s worth it to just redefine and recompile it.

5.2.11. Q10) Redefine and re-compile the model with the learning rate you found in Q9.#

# redefine the model
model = tf.keras.___.___([ # call the sequential model class
    tf.keras.layers.___(), # flatten the data
    tf.keras.layers.___(), # densely connected ReLU layer, 300 units
    tf.keras.layers.___(), # densely connected ReLU layer, 100 units
    tf.keras.layers.___())] # densely connected Softmax layer, 10 units

____.compile(___=___, # Set the loss function
              ___=___.____.___(___=___), # Set the optimizer and learning rate
              ___=[___]) # Set the metrics

We’re now going to set up a saving directory in case you want to try running the model with different learning rates or other hyper-parameters!

#Change this number and rerun this cell whenever you want to change runs
run_index = 1

run_logdir = os.path.join(os.curdir, "my_mnist_logs", "run_{:03d}".format(run_index))

print(run_logdir)

We’ll also set up some additional callbacks.

An early stopping callback (documentation here). This callback will stop the training if no improvement is found after a patience number of epochs.
2) A model checkpoint callback (documentation here). This callback will ensure that only the best version of the model is kept (in case your model’s performance reaches a maximum and then deteriorates after a certain number of epochs)
3) A tensorboard callback (documentation here). This callback will enable using Tensorboard to visualize learning curves, metrics, etc. Handy 🙌!

early_stopping_cb = tf.keras.callbacks.EarlyStopping(patience=20)
checkpoint_cb = tf.keras.callbacks.ModelCheckpoint("my_mnist_model.h5", save_best_only=True)
tensorboard_cb = tf.keras.callbacks.TensorBoard(run_logdir)

Let’s go ahead and fit the model again!

5.2.12. Q11) Fit the updated model for 100 epochs#

history = model.fit(____, # inputs
                    ____, # labels
                    ____=___, #epochs
                    validation_data=(___, ___),
                    callbacks=[checkpoint_cb, early_stopping_cb, tensorboard_cb])

Finally, we need to evaluate the performance of our model. Go ahead and try it out on the test set!

5.2.13. Q12) Evaluate the model on the test set.#

Hint 1: Keras models include an evaluate() method that takes in the test set inputs/labels. Here is the documentation.

# Rollback to best model, which was saved by the callback
model = tf.keras.models.load_model("my_mnist_model.h5") # rollback to best model

# Evaluate the model
model.____(____, ____)

Finally, we can use tensorboard to check out our model’s performance! Note that the tensorboard extension was loaded in the notebook setup cell.

%tensorboard --logdir=./my_mnist_logs --port=6006

An enthusiastic (albeit somewhat sick 😷) TA noted that during the development of the notebook the accuracy reached on the test dataset was 97.84%. Additionally, the tensorboard curves from the test run is given below: