Open In Colab

6.2. Deep Computer Vision#

Top: Picture of Shinkyo Bridge in Nikko (日光の神橋) (2010), ©Milton Gomez 😀
Bottom: Style transfers based on the above image, using Google’s Deep Dream Generator

Convolutional networks find spatial relationships in multidimensional data (most often images), which allow computers to detect patterns to make predictions or modify the input images in surprising ways. The following image shows a series of feature map visualizations for a convolutional neural network.

The images may not seem immediately relevant at first glance without knowing a transfersbit more about the input. For example, with the picture of a cat as an input the CNN could extract features related to the eyes, ears, or details as fine as pupil types!

For the style transfer examples shown at the beginning of the notebook, the original image is transformed so that it looks similar to a target style image (not shown) until they’re virtually indistinguishable using the information from the filters.

Convolution Visualization images from: Qin, Z., Yu, F., Liu, C., & Chen, X. (2018). How convolutional neural network see the world-A survey of convolutional neural network visualization methods. arXiv preprint arXiv:1804.11191.

6.2.1. Notebook Setup#

Today we’ll be training CNNs, a process which can be very slow when run on a CPU. Instead, we’ll be relying on GPU processing! Thankfully, we can easily make the switch on Colab!

(If you’re running the notebooks on your own computer, you may have to jump through a few hoops in order to take advantage of your computer’s GPU)

Do note, however, that access to GPUs on Colab is somewhat limited - running your model’s training too many times may limit your access to the GPU resources at Google.

###Changing the Runtime to GPU on Colab


Click on the runtime dropdown menu and click on "change on runtime type"


THen select "GPU" on the hardware accelerator dropdown menu

Once you’ve changed the runtime type, run the Notebook setup cell. A message confirming that you’ve succesfully changed runtime type should be printed 😃

In the setup cell, let’s import a few common modules, ensure MatplotLib plots figures inline and prepare a function to save the figures. We’ll also check that Python 3.5 or later is installed (although Python 2.x may work, it is deprecated so we strongly recommend you use Python 3 instead), as well as Scikit-Learn ≥0.20 and TensorFlow ≥2.0.

# Python ≥3.5 is required
import sys
assert sys.version_info >= (3, 5)

# Is this notebook running on Colab or Kaggle?
IS_COLAB = "google.colab" in sys.modules
IS_KAGGLE = "kaggle_secrets" in sys.modules

# Scikit-Learn ≥0.20 is required
import sklearn
assert sklearn.__version__ >= "0.20"

# TensorFlow ≥2.0 is required
import tensorflow as tf
import tensorflow_datasets as tfds
from tensorflow import keras
assert tf.__version__ >= "2.0"

if not tf.config.list_physical_devices('GPU'):
    print("No GPU was detected. CNNs can be very slow without a GPU.")
    if IS_COLAB:
        print("Go to Runtime > Change runtime and select a GPU hardware accelerator.")
    if IS_KAGGLE:
        print("Go to Settings > Accelerator and select GPU.")
else:
    print(f"GPU runtime succesfully selected! We're ready to train our CNNs.")

# Common imports
import numpy as np
import os
import pooch

# to make this notebook's output stable across runs
rnd_seed = 42
rnd_gen = np.random.default_rng(rnd_seed)

# To plot pretty figures
%matplotlib inline
import matplotlib as mpl
import matplotlib.pyplot as plt
mpl.rc('axes', labelsize=14)
mpl.rc('xtick', labelsize=12)
mpl.rc('ytick', labelsize=12)

# Where to save the figures
PROJECT_ROOT_DIR = "."
CHAPTER_ID = "cnn"
IMAGES_PATH = os.path.join(PROJECT_ROOT_DIR, "images", CHAPTER_ID)
os.makedirs(IMAGES_PATH, exist_ok=True)

def save_fig(fig_id, tight_layout=True, fig_extension="png", resolution=300):
    path = os.path.join(IMAGES_PATH, fig_id + "." + fig_extension)
    print("Saving figure", fig_id)
    if tight_layout:
        plt.tight_layout()
    plt.savefig(path, format=fig_extension, dpi=resolution)

# Loading Tensorboard
%load_ext tensorboard
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[1], line 10
      7 IS_KAGGLE = "kaggle_secrets" in sys.modules
      9 # Scikit-Learn ≥0.20 is required
---> 10 import sklearn
     11 assert sklearn.__version__ >= "0.20"
     13 # TensorFlow ≥2.0 is required

ModuleNotFoundError: No module named 'sklearn'

6.2.2. Data Setup#

Today, we won’t be working on the MNIST dataset! Instead, we’ll be working on the tensorflow flower database, and we’ll be attempting to train a Neural Network to learn to classify the flowers into 1 of 5 flower species: daisies, dandelions, roses, sunflowers, and tulips.

Let’s begin by loading the data into our colab environment. The data is hosted online and loaded directly into Google’s servers (if you’re running this notebook on Colab, that is!) - which is lucky since its about two-hundred megabytes of data and we can take advantage of Google’s servers’ download speeds 🤖

# Let's clear out the backend and set our random seeds
# Consistency makes things easier for labs!
keras.backend.clear_session()
tf.random.set_seed(rnd_seed)
np.random.seed(rnd_seed)

6.2.3. Q1) Load the tf_flowers dataset from Tensorflow. Split it into a training, validation, and test set. Make sure you save the dataset information (it will be useful for sampling the datasets), and shuffle the files for good measure!#

Hint 1: Tensorflow Datasets was imported as tfds in the notebook setup. Check out the tfds.load() method on the documentation.

Hint 2: If you use the .load() method with with_info set to True, the function will return the requested datasplit in a tuple with the dataset information as a separate variable.

Hint 3: The datasets are loaded with as_supervised set to False by default, which requries that we worry about dictionaries when trying to access the data. We can make our lives easier by setting it to True

Hint 4: The dataset we’ll be using today includes a single train set, but by specifying the split list we can tell tfds how we want it to split that data. We can also use percentages in the indices to indicate which percent of the dataset to take into the dataset.

#@title Hint 5: One example implementation

'''
(test_set, valid_set, train_set), info = tfds.load(
                name="tf_flowers",
                split=["train[:10%]", "train[10%:25%]", "train[25%:]"],
                as_supervised=True,
                shuffle_files=True,
                with_info=True
                )

# Datasets loaded this way don't have a string ID to identify them, so we'll set
# up our own as it will make other code more compact/readable. :)
train_set.name='Training'
valid_set.name='Validation'
test_set.name='Test'
''';
(_____, _____, _____), _____ = tfds.load(
                name="tf_flowers",
                split=["train[:___%]", "train[___%:___%]", "train[___%:]"],
                as_supervised=_____,
                shuffle_files=_____,
                with_info=_____
                )

# Datasets loaded this way don't have a string ID to identify them, so we'll set
# up our own as it will make other code more compact/readable. :)
_____.name='Training'
_____.name='Validation'
_____.name='Test'

We now have a set of variables that have the training, validation, and test sets, as well as a variable with the information about the dataset. Let’s go ahead and define a function that will let us visualize our data.

6.2.4. Q2) Define a function that takes in a dataset and the information about the dataset, prints out how many samples are in the dataset, and displays a set of samples from the dataset.#

Hint 1: Tensorflow datasets include a .cardinality() method that counts the number of datapoints in the dataset, and a .numpy() method that converts the resulting value to a numpy array for clean print access

Hint 2: We defined the .name() attribute for each dataset in the previous cell!

Hint 3: tfds includes a show_examples method. Here is the documentation.

Hint 4: You can shuffle the contents of a dataset by calling its .shuffle() method. Try running it with an integer value between 16 and 256 as an argument.

Hint 5: show_examples expects a set of (dataset) and (dataset info) objects as argumnets!

def dataset_info(__, _____):
    # Extract the number of samples in the dataset in an easily printable format
    num_samples = __._____()._____()

    # Print the dataset name we defined in the previous code cell and the number
    # of samples
    print(f"\n{____._____} set contains {num_samples} data samples.",
          "Let's visualize some of them...\n")

    # Show examples from the dataset. Shuffle to make things more interesting!
    tfds._____(___._____(__), _____)

And now let’s run the function on each of our training, validation, and test sets…

6.2.5. Q3) Run your defined visualization function on each of the training, validation, and test sets.#

dataset_info(_____, _____)
dataset_info(_____, _____)
dataset_info(_____, _____)

If everything worked out fine, you’ll have something like the following as your output:

The flowers look very nice! (“And I’d be pretty bad at classifying them myself…” - a botanically challenged TA)

However, there is one sore point for our purposes - the images have different resolutions. Why is this a sore point? Well, in our architecture we’ll eventually flatten our convolutions and connect them to a dense layer, and as such we will need for all of the images to have the same dimensions! (There are other ways to address the issue of resolution, but we won’t discuss these for now)

We also note that the images are stored as pixels containing a value between 0 and 255 for each one of three color channels (RGB) - we’d prefer that the values be normalized to fall between 0 and 1.

Finally, if you paid close attention to the labels on the nice images we displayed you may have noticed that there is a number between 0 and 4 next to the name of each of the flowers - this is the integer value associated with the label. We’ve previously seen, however, that when addressing classification problems it’s often better to use one-hot encoding.

Let’s write a preprocessing function that will help us address all of these issues!

6.2.6. Q4) Write a preprocessing function for the images and labels in our dataset. The function should set the image size to 128x128, normalize the pixel values to be between 0 and 1, and one-hot encode the labels#

Hint 1: As we’ve set the as_supervised argument as True when loading the dataset, the preprocessing function should take in an image and a label as an argument. Any other parameters taken in by the function should be hardcoded into our function.

Hint 2: In order to modify the image fed into the function, we will have to convert it to a float32 type using Tensorflow’s .cast() method (here is its documentation). Similarly, we need to cast the label as an int32 type object.

Hint 3: Tensorflow has a built-in image resizer, implemented as the image.resize() method. Here is the documentation.

Hint 4: Tensorflow has a built-in one-hot encoder, implemented as the one_hot() method. Here is the documentation

Hint 5: After one-hot encoding, the label should be recast to the float32 datatype

#@title Hint 6: One example function implementation

'''
def preprocessing_function(image, label):
    # We're going to hard code the image size we want to use. We can define this
    # with a lambda function, but we won't really need to change this and it's
    # more trouble than it's worth for us right now :)
    image_size = 128
    num_classes = 5

    image = tf.cast(image, tf.float32)
    # Normalize the pixel values
    image = image / 255.0
    # Resize the image
    image = tf.image.resize(image, (image_size, image_size))

    # Casts to an Int and performs one-hot ops
    label = tf.one_hot(tf.cast(label, tf.int32), num_classes)
    # Recasts it to Float32
    label = tf.cast(label, tf.float32)
    return image, label
''';
def preprocessing_function(image, label):
    # We're going to hard code the image size we want to use. We can define this
    # with a lambda function, but we won't really need to change this and it's
    # more trouble than it's worth for us right now :)
    image_size = _____
    num_classes = _____

    # Cast the image and label datatypes
    image = tf._____(image, tf.float32)
    label = __.__(___,____)

    # Normalize the pixel values. Use a float value in the denominator!
    image = _____ / _____

    # Resize the image
    image = tf._____._____(_____, (_____, _____))

    # Cast the label to int32 and one-hot encode
    label = tf._____(_____, _____)
    # Recast label to Float32
    label = tf.cast(_____, __._____)

    return image, label

6.2.7. Q5) Apply the preprocessing function to each of the training, validation, and test sets.#

Hint 1: the datasets have a .map() method that allow applying a function to each image and label combination in the dataset. Here is the documentation.

train = _____.map(_____)
valid = _____.map(_____)
test = _____.map(_____)

At this point I’d also like to point out that our dataset is not set up to be taken in batches. If you call train_set.take(1), you’ll extract a single image! (Let’s run some code and verify this)

for images, labels in train.take(1):
  print(f'Images shape: {images.numpy().shape} Labels: {labels.numpy().shape}')

We actually want to work in 32 image batches, so let’s go ahead and batch our datasets.

6.2.8. Q6) Batch each of the training, validation, and test sets#

Hint 1: You can define a batch_size variable to guarantee that the batch size is updated for all three datasets if you change the value and rerun the cell.

Hint 2: Tensorflow datasets include a .batch() method that allows you to easily define batch sizes associated with the database instance. Here is the documentation.

# Define the batch size
batch_size = ______

train = ___.batch(___)
validation = ___.___(___)
test = ___.___(___)

If we now take a sample like we did before, we’ll notice that the first dimension in the shape tuple is our batch size!

for images, labels in validation.take(1):
  print(f'Images shape: {images.numpy().shape} Labels: {labels.numpy().shape}')
  print(f'Max pixel value: {images.numpy().max()}, min pixel value: {images.numpy().min()}')
  print("Here's a sample image:\n")
  fig,ax = plt.subplots()
  ax.axis('off')
  ax.imshow(images[0])

Now that we’ve verified that our data generators work as intended, let’s go ahead and build our model!

6.2.9. Model Setup and Training#

Let’s begin by setting up everything we need to define our callbacks! This time, we want to work with multiple runs, and in order to visualize them in Tensorboard we’ll want to generate the name automatically using the current date and time!

6.2.10. Q7) Define a function that returns a filepath with the format './CNN_logs/run_CURRENT-DATE-AND-TIME'#

Hint 1: Numpy includes a method to return the current date and time as a datetime64 object. Call the datetime64 method with 'now' as an argument.

Hint 2: The OS library, imported as os in the notebook setup, allows you to join path strings in a manner appropriate to the operating system using the os.path.join() method. This helps avoids headaches when running code across windows/linux/macOS! Here is the documentation.

Hint 3: You can use the .astype() method to convert a numpy datetime64 object to a string.

Hint 4: OS also includes a .curdir attribute that returns the current directory.

Hint 5: If you convert the datetime64 object to a string, it will include the seconds! You can remove these with regular python indexing (i.e., [:-3])

def get_CNN_logdir():
    time = ___(___).___(___)
    run_logdir = __.____.____(__._____, "CNN_logs", f"run_{____}") # time goes in the fstring
    return run_logdir

Let’s try out our function! It should return something like: ./CNN_logs/run_2022-04-10T18:49

get_CNN_logdir()

Define your callbacks below. For the checkpoint checkpoint_cb, we recommend monitoring the validation loss to avoid overfitting. Look for “monitor” in the model_checkpoint’s documentation at this link.

##Q8) Set up an EarlyStopping Callback, a ModelCheckpoint callback, and a Tensorboard Callback for a CNN model without data augmentation.

Hint 1: Here is the documentation for the EarlyStopping Callback and here is the one for the ModelCheckpoint Callback.

early_stopping_cb = tf.keras.callbacks.EarlyStopping(patience=____)
checkpoint_cb = tf.keras.callbacks.ModelCheckpoint("CNN_unaugmented.h5",
                                                   save_best_only=True,
                                                   monitor=_____)
tensorboard_cb = tf.keras.callbacks.TensorBoard(get_CNN_logdir())

We now have a set of callbacks to call during training. Let’s go ahead and define the model! Though let’s go ahead and clean up our random states first…

# Let's clear out the backend and set our random seeds.
# Consistency is key :)
keras.backend.clear_session()
tf.random.set_seed(rnd_seed)
np.random.seed(rnd_seed)

6.2.11. Q9) Define a convolutional neural network model. Do not use data augmentation techniques! We want to use this same architecture + data augmentation later.#

Hint 1: Here is the documentation for the Conv2D layer in tensorflow. Generally, you want to start out with larger kernels and a smaller number of filters and move on to smaller kernels with a large number of filters as your network becomes deeper.

Hint 2: After the convolutional layers, you can use flatten to change the collection of filtered images into a flat array to feed into a densely connected layer! In essence, the CNN is representing the images in a latent space , and a densely connected ANN is connected on top to classify the images based on their representation in the latent space.

Hint 3: Once your model is defined, we’ll be using the .build() and .summary() methods to check how many parameters our model includes! A TA’s model included around 13 million parameters when testing this notebook! This is significantly more parameters than the amount used when we trained our first artificial neural networks.

#@title Hint 4: Example model used during notebook development

'''
# Note: there is at least one small, immediate change that will make the
# performance of this model more effective :)

model = keras.models.Sequential([
    # Convolution 1
    keras.layers.Conv2D(32, kernel_size=7, padding="same", activation="relu"),
    keras.layers.MaxPool2D((3,3)),

    # Convolution 2
    keras.layers.Conv2D(64, kernel_size=5, padding="same", activation="relu"),
    keras.layers.MaxPool2D((2,2)),

    # Convolution 3
    keras.layers.Conv2D(96, kernel_size=3, padding="same", activation="relu"),
    keras.layers.MaxPool2D((2,2)),

    # Convolution 4
    keras.layers.Conv2D(128, kernel_size=3, padding="same", activation="relu"),
    keras.layers.MaxPool2D((2,2)),

    keras.layers.Flatten(),
    keras.layers.Dense(4096, activation="relu"),
    keras.layers.Dropout(0.5),
    keras.layers.Dense(5, activation="softmax")
])
''';
model = keras.models.Sequential([
    # Convolution 1
    keras.layers.Conv2D(___, kernel_size=__, padding="same", activation=____),
    keras.layers.MaxPool2D((__,__)),

    # Convolution 2
    keras.layers.Conv2D(___, kernel_size=__, padding="same", activation=____),
    keras.layers.MaxPool2D((__,__)),


    # Convolution 3
    keras.layers.Conv2D(___, kernel_size=__, padding="same", activation=____),
    keras.layers.MaxPool2D((__,__)),


    # Convolution 4
    keras.layers.Conv2D(___, kernel_size=__, padding="same", activation=____),
    keras.layers.MaxPool2D((__,__)),


    keras.layers.Flatten(),
    keras.layers.Dense(____, activation=____),
    keras.layers.Dropout(___),
    keras.layers.Dense(___, activation="softmax")
])
# Build the model using our input image resolution to produce the number of
# parameters in our model...
model.build((None, 128 , 128, 3))

# And visualize the structure of the model
model.summary()

##Q10) Compile the CNN model. We recommend using categorical_crossentropy as the loss function, ‘adam’ as the optimizer, and ‘accuracy’ as a metric

model.compile(loss=_____,
              optimizer=_____,
              metrics=[_____])

##Q11) Train the CNN model!

Hint 1: You can use the training data generator directly as your input and labels; the model class will automatically deal with it!

Hint 2: While it’s best to have defined too many epochs instead of too few (considering that the have an early stopping callback), it’s not worth training over a large number of epochs for this project. Try setting the epoch limit somewhere around 30-50.

history = model.fit(_____, # Training data generator
                    epochs=____,
                    validation_data=_____, # Validation data generator
                    callbacks=[_____,
                               _____,
                               _____])

Well, the performance of our CNN is likely underwhelming. This is what we saw during notebook development:

The model didn’t have too hard a time learning on the training set, but the validation loss quickly diverged and we started overfitting our training data. 😯

A common way to try to address this is by augmenting our training data - we can flip and rotate our images and it shouldn’t make too large a difference.

“A rose by any other name would smell as sweet” - Shakespeare

“An upside down rose is still a rose” - A significantly less talented poet than Shakespeare

# Let's clear out the backend and set our random seeds
# It's best to start out from a common point, no?
keras.backend.clear_session()
tf.random.set_seed(rnd_seed)
np.random.seed(rnd_seed)

##Q12) Set up an EarlyStopping Callback, a ModelCheckpoint callback, and a Tensorboard Callback for a CNN model with data augmentation.

Hint 1: Here is the documentation for the EarlyStopping Callback and here is the one for the ModelCheckpoint Callback.

early_stopping_cb = tf.keras.callbacks.EarlyStopping(patience=____)
checkpoint_cb = tf.keras.callbacks.ModelCheckpoint("CNN_augmented.h5",
                                                   save_best_only=True,
                                                   monitor=_____)
tensorboard_cb = tf.keras.callbacks.TensorBoard(get_CNN_logdir())

6.2.12. Q13) Train an identical model to that defined in Q9, with the exception of RandomFlip and RandomRotation augmentation layers added before the convolutional layers.#

Hint 1: Here is the documentation for the RandomFlip method.

Hint 1: Here is the documentation for the RandomRotation method.

model = keras.models.Sequential([
    keras.layers.RandomFlip(), # Flip augmentation
    keras.layers.RandomRotation(0.1), # Rotation Aumentation

    # Copy your previous model's layers here
])
# Build the model using our input image resolution to produce the number of
# parameters in our model...
model.build((None, 128 , 128, 3))

# And visualize the structure of the model
model.summary()

##Q14) Compile the CNN model. We recommend using categorical_crossentropy as the loss function, ‘adam’ as the optimizer, and ‘accuracy’ as a metric

model.compile(loss=_____,
              optimizer=_____,
              metrics=[_____])
history = model.fit(_____, # Training data generator
                    epochs=____,
                    validation_data=_____, # Validation data generator
                    callbacks=[_____,
                               _____,
                               _____])

If everything went according to plan, your model icluding data augmentation should perform a little like this:

Now let’s run tensorboard and compare your two runs!

%tensorboard --logdir=./CNN_logs --port=6006

If everything went well, the model trained on the augmented data should have a lower accuracy on the training set compared to the original, but the behavior accross both datasets should be indicative of more meaningful features being extracted and overall better generalization.

(While the use of augmented data is exciting, this might not be wise to do - e.g., when looking at maps of atmospheric variable data)

# Let's load the models!
non_aug_model = keras.models.load_model('CNN_unaugmented.h5')
aug_model = keras.models.load_model('CNN_augmented.h5')

# And test them on the testing dataset
non_aug_model.evaluate(test)
aug_model.evaluate(test)