{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "view-in-github"
   },
   "source": [
    "<a href=\"https://colab.research.google.com/github/tbeucler/2023_MLEES_JB/blob/main/ML_EES/DL/S4_1_NNs_with_Keras.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "2_Zny8rw4lon"
   },
   "source": [
    "# (Exercise) Artificial Neural Networks with Keras\n",
    "\n",
    "This notebook was designed to be run on Google Colab and we recommend clicking on the Google Colab badge to proceed."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "nD8r1k_osIxa"
   },
   "source": [
    "![picture](https://unils-my.sharepoint.com/:i:/g/personal/tom_beucler_unil_ch/EWzvoN-LqmBDtvuEXvi3m2MBRr4ACElB77IAfndUaDFVJQ?download=1)\n",
    "\n",
    "<center>\n",
    "<br> Photo Credits: <a href=\"https://unsplash.com/photos/_HRi5kBwGh0\">Galaxy's Edge</a> by <a href=\"https://unsplash.com/@rodlong\">Rod Long</a> licensed under the <a href='https://unsplash.com/license'>Unsplash License</a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "O5ROFvTFwNab"
   },
   "source": [
    "> *The defnition of AI is a highly contested concept. It often refers to technologies that demonstrate levels of independent intelligence from humans. By its very\n",
    "defnition, it is an intelligence that is differentiated from natural intelligence; it is\n",
    "a constructed, artificial, or machine intelligence.* <br>\n",
    "$\\quad$Ryan, M. (2020). In AI we trust: ethics, artificial intelligence, and reliability. Science and Engineering Ethics, 26(5), 2749-2767."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "8oWOJ_ZG42UM"
   },
   "source": [
    "*This notebook, whose first draft was written by Milton Gomez, covers Chapters 10 of Géron, and builds on the [notebooks made available on _Github_](https://github.com/ageron/handson-ml2).*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "P8n6IT3L5hc2"
   },
   "source": [
    "## **Notebook Setup**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "-ZJxkWn35ka2"
   },
   "source": [
    "First, let's import a few common modules, ensure MatplotLib plots figures inline and prepare a function to save the figures. We also check that Python 3.5 or later is installed (although Python 2.x may work, it is deprecated so we strongly recommend you use Python 3 instead), as well as Scikit-Learn ≥0.20 and TensorFlow ≥2.0."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "LW5QuuAZ4aMk"
   },
   "outputs": [],
   "source": [
    "# Python ≥3.5 is required\n",
    "import sys\n",
    "assert sys.version_info >= (3, 5)\n",
    "\n",
    "# Scikit-Learn ≥0.20 is required\n",
    "import sklearn\n",
    "assert sklearn.__version__ >= \"0.20\"\n",
    "\n",
    "try:\n",
    "    # %tensorflow_version only exists in Colab.\n",
    "    %tensorflow_version 2.x\n",
    "except Exception:\n",
    "    pass\n",
    "\n",
    "# TensorFlow ≥2.0 is required\n",
    "import tensorflow as tf\n",
    "assert tf.__version__ >= \"2.0\"\n",
    "\n",
    "# Common imports\n",
    "import numpy as np\n",
    "import os\n",
    "\n",
    "# to make this notebook's output stable across runs\n",
    "rnd_seed = 42\n",
    "rnd_gen = np.random.default_rng(rnd_seed)\n",
    "\n",
    "# To plot pretty figures\n",
    "%matplotlib inline\n",
    "import matplotlib as mpl\n",
    "import matplotlib.pyplot as plt\n",
    "mpl.rc('axes', labelsize=14)\n",
    "mpl.rc('xtick', labelsize=12)\n",
    "mpl.rc('ytick', labelsize=12)\n",
    "\n",
    "# Where to save the figures\n",
    "PROJECT_ROOT_DIR = \".\"\n",
    "CHAPTER_ID = \"ann\"\n",
    "IMAGES_PATH = os.path.join(PROJECT_ROOT_DIR, \"images\", CHAPTER_ID)\n",
    "os.makedirs(IMAGES_PATH, exist_ok=True)\n",
    "\n",
    "def save_fig(fig_id, tight_layout=True, fig_extension=\"png\", resolution=300):\n",
    "    path = os.path.join(IMAGES_PATH, fig_id + \".\" + fig_extension)\n",
    "    print(\"Saving figure\", fig_id)\n",
    "    if tight_layout:\n",
    "        plt.tight_layout()\n",
    "    plt.savefig(path, format=fig_extension, dpi=resolution)\n",
    "\n",
    "# Initialize the run_index\n",
    "run_index = None\n",
    "\n",
    "# Loading Tensorboard\n",
    "%load_ext tensorboard"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "LcFK6eUo7hZJ"
   },
   "source": [
    "**Data Setup**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "kM201pFQ7j0N"
   },
   "source": [
    "Today, we'll once again be working on the MNIST handwritten digit database - we're becoming experts in typography! ✍  \n",
    "\n",
    "Let's begin by importing the dataset from the keras dataset library.\n",
    "\n",
    "## Q1) Load the MNIST dataset from Keras. Divide it into a training, validation, and test dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "kN1_koqQGNjP"
   },
   "source": [
    "*Hint 1: To access the Keras library, you can either reimport keras (e.g., `import tensorflow.keras as keras`), or you can access it from the instance of tensorflow we imported during setup (i.e., using `tf.keras`)*\n",
    "\n",
    "*Hint 2: [Here is the documentation](https://keras.io/api/datasets/mnist/) for the Keras implementation of the MNIST dataset*\n",
    "\n",
    "*Hint 3: If you use the `mnist.load_data()` method, what will be returned will be a set of tuples: (training_data, testing_data), where training_data and testing_data are tuples of inputs and labels (X, y)*\n",
    "\n",
    "*Hint 4: You can break down the training dataset from the `.load()` method into a training and validation dataset. Since the full training dataset includes 60 000 samples, try using 50 000 samples as training data and 10 000 samples as validation data.*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "Xz4UdXhg8vkH"
   },
   "outputs": [],
   "source": [
    "# Load the keras dataset data\n",
    "( (X_train_full, y_train_full) , (_____, _____) ) = _____.mnist._____()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "PcLjZfdIMKrt"
   },
   "outputs": [],
   "source": [
    "# Split the data\n",
    "X_train =\n",
    "X_valid =\n",
    "_______ =\n",
    "_______ ="
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "_ViiopXOfS3G"
   },
   "source": [
    "What does our data look like? Let's get an idea of the values and figure out what kind of preprocessing we should do before training our neural network.\n",
    "\n",
    "## Q2) Print the shape of the training, validation, and test sets. Then, print the maximum and minimum input values.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "fkZ7STj2GRi7"
   },
   "source": [
    "*Hint 1: You loaded the data as numpy arrays. Thus, you can rely on the built-in methods for finding the shape and min/max values.*\n",
    "\n",
    "*Hint 2: Click for the documentation on [`ndarray.max()`](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.max.html), [`ndarray.min()`](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.min.html), and [`ndarray.shape`](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.shape.html)*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "tXhz-sgek-yk"
   },
   "outputs": [],
   "source": [
    "#Write your code here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "FDnMybuFk_lb"
   },
   "source": [
    "If you used the same train/validation split as we did, you should have 50k samples in the training set, 10k in the validation set, and 10k in the test set.\n",
    "\n",
    "Since the data represents grayscale image values, data values should vary between 0 and 255; Normalize the data by dividing it by 255.\n",
    "## Q3) Normalize the input data for the training, validation, and testing sets"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "eEfub-HiGblo"
   },
   "source": [
    "*Hint 1: The datasets are stored as simple numpy arrays, so you can perform arithmetic operations on them!*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "cOpw-2ruonwy"
   },
   "outputs": [],
   "source": [
    "X_train = _____ / 255\n",
    "_____ =\n",
    "_____ ="
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "l-VS2NTVv_kW"
   },
   "source": [
    "We now have the normalized training, validation, and testing data that we'll use to train our neural network. Before moving on, it might be worth it to make a small visualiation of samples in our data to ensure that everything worked out correctly.\n",
    "\n",
    "## Q4) To visualize a sample image, write a function that:\n",
    "\n",
    "<br> <blockquote>1) Takes in an input dataset and its labels, a number of rows, and a number of columns <br> 2) Prints out a random `n_rows` by `n_columns` sample of images with their labels</blockquote>**\n",
    "\n",
    "*Hint 1: You can use the `rnd_seed.integers()` generator to generate a set of integers between 0 and the number of samples, with a size of (rows,columns). [Here is some documentation that can help](https://numpy.org/doc/stable/reference/random/generator.html#simple-random-data). It's best practice to take in the random generator as an argument for your function.*\n",
    "\n",
    "*Hint 2: You can use matplotlib's `fig, axes = plt.subplots()` to make a grid of axes and call the `imshow()` method on each ax in order to plot the digit. It is recommended that you use the `cmap='binary'` argument in imshow to print the digits in black and white*. Click on the links for the documentation to [`plt.sublopts()`](https://matplotlib.org/3.5.0/api/_as_gen/matplotlib.pyplot.subplots.html), [`plt.imshow()`](https://matplotlib.org/3.5.0/api/_as_gen/matplotlib.pyplot.imshow.html), and [the colormaps (i.e., cmap values)](https://matplotlib.org/stable/gallery/color/colormap_reference.html) available in matplotlib.\n",
    "\n",
    "*Hint 3: You can iterate using numpy `ndenumerate()` method, which will return the n-dimensional index of the array and the element located there. This will be useful when iterating through the indices you generated and plotting the corresponding digit and label*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "id": "zcDF_uMuBKBO"
   },
   "outputs": [],
   "source": [
    "#@title Hint 4: Code Snippet, if you're feeling stuck\n",
    "\n",
    "'''\n",
    "def sample_plotter(X, y, n_rows, n_columns, rnd_gen):\n",
    "    assert type(X) == type(np.empty(0))\n",
    "    indices = rnd_gen.integers(0,X.shape[0], size=(n_rows, n_columns))\n",
    "\n",
    "    fig, axes = plt.subplots(n_rows, n_columns, figsize=(8,6))\n",
    "\n",
    "    for idx, element in np.ndenumerate(indices):\n",
    "        axes[idx].imshow(X[element], cmap='binary')\n",
    "        axes[idx].axis('off')\n",
    "        axes[idx].title.set_text(y[element])\n",
    "    return\n",
    "''';"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "qtmdxBHCEs_o"
   },
   "outputs": [],
   "source": [
    "def sample_plotter(___, ___, ___, ___):\n",
    "\n",
    "    # Create a set of indices to access the sample images/labels\n",
    "\n",
    "    # Create a figure with n_rows and n_columns\n",
    "\n",
    "    # Plot each selected digit\n",
    "    for in :\n",
    "\n",
    "\n",
    "    return None"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Txut6AxUEMac"
   },
   "source": [
    "Now that our function is defined, let's go ahead and print out a 4 row by 8 column sample from each dataset.\n",
    "\n",
    "## Q5) Grab a 4x8 sample of digits from each dataset and print out the image and labels"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "YZoG0jWzELBI"
   },
   "outputs": [],
   "source": [
    "#Write your code here!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "HLkcWwScg-nk"
   },
   "source": [
    "We're now ready to start developing our neural network. The first thing that we want to do is figure out an appropriate learning rate for our model - after all, we want to choose one that converges to a solution *and* is the least computationally expensive possible.\n",
    "\n",
    "Let's start by setting up a keras *callback* [(click here for the documentation)](https://keras.io/api/callbacks/), a type of object that will allow us to change the learning rate after every iteration (i.e., after every batch of data). We will set up what is called an exponential learning rate (that is, the learning will increase by a factor of $k$ after each iteration). Expressed mathematically,\n",
    "\\begin{align}\n",
    "\\eta_{\\scriptsize{t}} = \\eta_{\\scriptsize{0}} \\, \\cdot \\, k^{\\scriptsize{t}}\n",
    "\\end{align}\n",
    "where $t$ is the current iteration.\n",
    "\n",
    "As a reminder, an epoch is an iteration through the entire training dataset, while a batch is an iteration through a predefined subset of . It's important to make this distinction, as ML algorithms are often trained in batches when dealing with large datasets, and we *normally* do not want to change the learning rate in between batches during model training. However, we will do so during this evaluation phase in order to determine an adequate learning rate.\n",
    "\n",
    "We will therefore set a callback that will do two things after the end of each batch:\n",
    "\n",
    "> 1) Keep a track of the losses <br> 2) Adjust the learning rate by multiplying it by a predefined factor"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Gcl58OOaGBxV"
   },
   "source": [
    "## Q6) Set up an *Exponential_Learning_Rate* callback that, after each batch, logs the value of the loss function and learning rate, and then multiplies the learning rate by a factor of $k$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "LwAuEHkiF_Xt"
   },
   "source": [
    "*Hint 1: Multiple backend options are available with Keras. We will be using tensorflow, but the code is thought to be written in such a way that a different backend **could** be used. `tf.keras.backend` has a `.backend()` method that allows you to check what backend is being used.*\n",
    "\n",
    "*Hint 2: You should extend the `tf.keras.callbacks.Callback` class. (Confused about extending classes? [Here is a question on stack overflow](https://stackoverflow.com/questions/15526858/how-to-extend-a-class-in-python) that could provide some context) *\n",
    "\n",
    "*Hint 3: The ExponentialLearningRate callback we will implement will need to take in the $k$ factor during its initialization ([here's a quick overview](https://stackoverflow.com/questions/625083/what-do-init-and-self-do-in-python) on the __init__ contructor method and **self** arguments in classes, with a focus on python.). You will also need to save an empty list as an attribute for both the losses and the learning rates*\n",
    "\n",
    "*Hint 4: Keras model optimizers have an attribute where the learning rate is stored: `model.optimizer.learning_rate`. In order to read the value, you will have to use the keras backend's `.get_value()` method with the model's learning rate as an argument*\n",
    "\n",
    "*Hint 5: the on_train_batch_end method pass the `logs` argument into the function. You can access the loss function by using `logs['loss']`*\n",
    "\n",
    "*Hint 6: In order to set the learning rate to a different value, you will have to depend on the keras backend's `.set_value()` method. This method takes in two arguments: the first is the value that will be set (e.g., the learning rate in the model's optimizer) and the value that it will be set to (e.g., the learning rate multiplied by the k factor).*\n",
    "\n",
    "*Hint 7: Unlike in other documentations we've seen, `backend.get_value()` and `backend.set_value()` don't yet have their own page. However, [here is the link](https://www.tensorflow.org/guide/keras/custom_callback#learning_rate_scheduling) to an example where both methods are used in a learning rate scheduler.*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "fUNJlt80ran6"
   },
   "outputs": [],
   "source": [
    "# We'll start by making it easier to access the keras backend. See hint #1 for\n",
    "# more details\n",
    "K = tf.keras.backend\n",
    "\n",
    "# Use the .backend() method to determine what backend we're running\n",
    "___.___"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "Xh9OakL3r6hf"
   },
   "outputs": [],
   "source": [
    "# Remember that you can access the keras.backend using K, which we defined in\n",
    "# the code cell above!\n",
    "\n",
    "class _____(____.____.____.Callback): #define the ExponentialLearningRate class\n",
    "    # Start\n",
    "    def __init__(self, factor):\n",
    "        self.____ = ____ # set the factor\n",
    "        self.____ = ____ # initialize the losses list\n",
    "        self.____ = ____ # initialize the learning rates list\n",
    "\n",
    "    def on_batch_end(self, batch, logs):\n",
    "        # Add the value of the learning rate to the list\n",
    "        self.___.append(__.___(self.model.___.___))\n",
    "\n",
    "        # Add the value of the loss\n",
    "        self.___.append(___[___])\n",
    "\n",
    "        # Set the value of the\n",
    "        ___.___(self.model.___.___, self.model.___.___ * self.___)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "TOafIE55Z4LK"
   },
   "source": [
    "Now that we've defined out callback, we can go ahead and start thinking about our neural network. For consistency's sake, let's start by clearing the Keras backend and setting our random state."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "rC5MYnIFaVdR"
   },
   "outputs": [],
   "source": [
    "# Run this cell\n",
    "K.clear_session()\n",
    "np.random.seed(rnd_seed)\n",
    "tf.random.set_seed(rnd_seed)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "cpoubGhNnVew"
   },
   "source": [
    "Let's make a simple neural network model using Keras. For this, we will rely on a [*Sequential model*](https://keras.io/guides/sequential_model/), since we will want all of the inputs of one layer to be fed into the next layer. We recommend using the architecture described in the diagram below, but feel free to define your own architecture!\n",
    "\n",
    "<center><img width=60% src='https://unils-my.sharepoint.com/:i:/g/personal/tom_beucler_unil_ch/ETl6L_3bHENFt6ZDSgaCpIEBkg2cNPDGowc8u5V8Gxe7XQ?download=1'></center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "P84Ul1x27QJ3"
   },
   "source": [
    "## Q7) Write a sequential Keras model that will predict the digit class.\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "-fX-GC0_8IuC"
   },
   "source": [
    "*Hint 1: You can add the layers in the sequential model when initializing the model. It expects the layers in a list. Alternatively, you can add them one by one using the model's `.add()` method. [Check out the documentation here](https://keras.io/guides/sequential_model/#creating-a-sequential-model).*\n",
    "\n",
    "*Hint 2: The input images should be flattened before feeding them into any densely connected layers. [Here is the documentation](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Flatten) for the flatten layer.*\n",
    "\n",
    "*Hint 3: You want to use simple, densely connected layers for this exercise. [Here is the documentation](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense) for the dense layer.*\n",
    "\n",
    "*Hint 4: Using a dense layer with the number of units set to the number of classes (e.g., the number of different digits in the MNIST dataset: 10) using a softmax activation unit can be interpreted as a probability of the input belonging to a given class. [Here is the documentation](https://keras.io/api/layers/activations/#softmax-function) for the softmax activation function in Keras*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "TPuhRna87LkT"
   },
   "outputs": [],
   "source": [
    "# Create your model! Feel free to use our outline, or make your own from scratch\n",
    "\n",
    "model = tf.___.____.sequential([  # call the keras sequential model class\n",
    "                            ___,  # 1st Layer\n",
    "                            ___,  # 2nd Layer\n",
    "                            ___,  # 3rd Layer\n",
    "                            ___]) # 4th Layer"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "3BAK1EINKHaN"
   },
   "source": [
    "Now that we have a model defined, we need to run its `.compile()' method, in which we will give the model the following hyper-parameters:\n",
    "> 1) Loss function will be set to sparse categorical cross entropy <br> 2) The optimizer will be set to Stochastic Gradient Descent with a learning rate of 1e-3 <br> 3) The model metrics will include the accuracy score"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "h7Dkr99kKvNL"
   },
   "source": [
    "## Q8) Compile the model with the given hyperparameters (i.e., loss function, optimizer, and metrics) and instantiate the callback we defined previously using a $k$ factor of 1.005 (i.e., a 0.5% increase in learning rate per batch)\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "2aNgAFTsLEjd"
   },
   "source": [
    "*Hint 1: [Here is the documentation](https://www.tensorflow.org/api_docs/python/tf/keras/metrics/sparse_categorical_crossentropy) for the sparse categorical cross entropy loss function in keras. You can simply reference the function using `loss='sparse_categorical_crossentropy'` when compiling.*\n",
    "\n",
    "*Hint 2: [Here is the documentation](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/SGD) for the Stochastic Gradient Descent optimizer in keras*\n",
    "\n",
    "*Hint 3: [Here is the documentation](https://www.tensorflow.org/api_docs/python/tf/keras/metrics/Accuracy) for the accuracy score implementation in keras. Like with the sparse_categorical_cross_entropy loss, you can reference the accuracy score in the metrics list, e.g. by setting `metrics=['accuracy']` when compiling.*\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "e03v7LLuMfSP"
   },
   "outputs": [],
   "source": [
    "____.compile(___=___, # Set the loss function\n",
    "              ___=___.____.___(___=___), # Set the optimizer and learning rate\n",
    "              ___=[___]) # Set the metrics"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "PRYxo1QANhA1"
   },
   "outputs": [],
   "source": [
    "exponential_lr_callback = _____(factor=____)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "lNIgkpyeM92K"
   },
   "source": [
    "Let's go ahead and train the compiled model for a single epoch.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "TiyNQuouRMwQ"
   },
   "source": [
    "## Q9) Fit the model for a single epoch, using the exponential learning rate callback we defined in the previous code cell. Then, plot the Loss vs Learning rate.\n",
    "\n",
    "*Hint 1: Just like in scikit-learn, the keras model includes a `.fit()` method to train the algorithm! [Here is the documentation](https://www.tensorflow.org/api_docs/python/tf/keras/Model#fit).*\n",
    "\n",
    "*Hint 2: After training, you can access the recorded losses and corresponding learning rates using the attributes we defined when we defined the class in Q5!*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "OLjEwl0nRbp4"
   },
   "outputs": [],
   "source": [
    "\n",
    "history = model.___(____, # set the training inputs\n",
    "                    ____, # set the training labels\n",
    "                    ____=__, # set the number of epochs\n",
    "                    validation_data=(____, ____), # set validation input/labels\n",
    "                    callbacks=[_____]) # Set the callback"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "QrQb1q3DSt4X"
   },
   "outputs": [],
   "source": [
    "# Plotting\n",
    "fig, ax = plt.subplots()\n",
    "\n",
    "ax.plot(___.___, # learning rates\n",
    "        ___.___) # losses\n",
    "\n",
    "# Define a tuple with (min_learning_rate, max_learn_rate)\n",
    "x_limits = ( min(___.___), max(___.___) )\n",
    "\n",
    "# Set the xscale to logarithmic\n",
    "ax.set_xscale('log')\n",
    "\n",
    "# Draw a horizontal line at the minimum loss value\n",
    "ax.hlines(min(____.___), #Find the minimum loss value to draw a horizontal line\n",
    "          *x_limits, # the star unpacks x_limits to the expected num of args\n",
    "          'g')\n",
    "\n",
    "# Set the limits for drawing the curves\n",
    "ax.set_xlim(x_limits)\n",
    "ax.set_ylim(0, ____) # use the initial loss as the top y boundary\n",
    "\n",
    "# Display gridlines to see better\n",
    "ax.grid(which='both')\n",
    "\n",
    "ax.set_xlabel(\"Learning rate\")\n",
    "ax.set_ylabel(\"Loss\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "fFppBDPtXSdg"
   },
   "source": [
    "If you used the architecture we defined above with the learning rate we defined above, you should produce a graph that looks like this:\n",
    "<center> <img src='https://unils-my.sharepoint.com/:i:/g/personal/tom_beucler_unil_ch/EUhU2fuy3K1Nm5iuia8ocF8BB2-jU_pAf6h5TA8MaIqrfw?download=1'> </center>\n",
    "\n",
    "In this graph, you can see that the loss reaches a minimum at around 6e-1 and then begins to shoot up violently. Let's avoid that by using half that value (e.g., 3e-1).\n",
    "\n",
    "If you have a different curve, try setting your learning rate to half of the learning rate with the minimum loss! 😃\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "gzSXelTGz5G8"
   },
   "source": [
    "Now that we have an idea of what the learning rate should be, let's go ahead and start from scratch once more."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "XwJi4Xi6kVZY"
   },
   "outputs": [],
   "source": [
    "# Run this cell - let's go back to a clean slate!\n",
    "K.clear_session()\n",
    "np.random.seed(rnd_seed)\n",
    "tf.random.set_seed(rnd_seed)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "hlt-kl220Bhu"
   },
   "source": [
    "We also want to instantiate the model again - the weights in our current model are quite bad and if we use it as is it won't be able to learn since the weights are too far away from the solution. There are other ways to do this, but since our model is quite simple it's worth it to just redefine and recompile it."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "9EBUr0WPatQn"
   },
   "source": [
    "## Q10) Redefine and re-compile the model with the learning rate you found in Q9."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "naT8MpoklIY0"
   },
   "outputs": [],
   "source": [
    "# redefine the model\n",
    "model = tf.keras.___.___([ # call the sequential model class\n",
    "    tf.keras.layers.___(), # flatten the data\n",
    "    tf.keras.layers.___(), # densely connected ReLU layer, 300 units\n",
    "    tf.keras.layers.___(), # densely connected ReLU layer, 100 units\n",
    "    tf.keras.layers.___())] # densely connected Softmax layer, 10 units\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "ih9ddZA2bJEE"
   },
   "outputs": [],
   "source": [
    "____.compile(___=___, # Set the loss function\n",
    "              ___=___.____.___(___=___), # Set the optimizer and learning rate\n",
    "              ___=[___]) # Set the metrics"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "jLtjB_1gbgcI"
   },
   "source": [
    "We're now going to set up a saving directory in case you want to try running the model with different learning rates or other hyper-parameters!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "Hxqyiz5SbflW"
   },
   "outputs": [],
   "source": [
    "#Change this number and rerun this cell whenever you want to change runs\n",
    "run_index = 1\n",
    "\n",
    "run_logdir = os.path.join(os.curdir, \"my_mnist_logs\", \"run_{:03d}\".format(run_index))\n",
    "\n",
    "print(run_logdir)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "n232QFo3bulH"
   },
   "source": [
    "We'll also set up some additional callbacks.\n",
    "> 1) An early stopping callback ([documentation here](https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/EarlyStopping)). This callback will stop the training if no improvement is found after a `patience` number of epochs. <br> 2) A model checkpoint callback ([documentation here](https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/ModelCheckpoint)). This callback will ensure that only the best version of the model is kept (in case your model's performance reaches a maximum and then deteriorates after a certain number of epochs) <br> 3) A tensorboard callback ([documentation here](https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/TensorBoard)). This callback will enable using Tensorboard to visualize learning curves, metrics, etc. Handy 🙌!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "9m92fjw3bzhp"
   },
   "outputs": [],
   "source": [
    "early_stopping_cb = tf.keras.callbacks.EarlyStopping(patience=20)\n",
    "checkpoint_cb = tf.keras.callbacks.ModelCheckpoint(\"my_mnist_model.h5\", save_best_only=True)\n",
    "tensorboard_cb = tf.keras.callbacks.TensorBoard(run_logdir)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "gohJkkWOd0Qx"
   },
   "source": [
    "Let's go ahead and fit the model again!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "hHX8OjI7fJfN"
   },
   "source": [
    "## Q11) Fit the updated model for 100 epochs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "YeN1mGqdb2EK"
   },
   "outputs": [],
   "source": [
    "history = model.fit(____, # inputs\n",
    "                    ____, # labels\n",
    "                    ____=___, #epochs\n",
    "                    validation_data=(___, ___),\n",
    "                    callbacks=[checkpoint_cb, early_stopping_cb, tensorboard_cb])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "LvrFsiYJeQ3u"
   },
   "source": [
    "Finally, we need to evaluate the performance of our model. Go ahead and try it out on the test set!\n",
    "\n",
    "## Q12) Evaluate the model on the test set.\n",
    "\n",
    "*Hint 1: Keras models include an `evaluate()` method that takes in the test set inputs/labels. [Here is the documentation](https://www.tensorflow.org/api_docs/python/tf/keras/Model#evaluate).*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "-0FIYQzleRWt"
   },
   "outputs": [],
   "source": [
    "# Rollback to best model, which was saved by the callback\n",
    "model = tf.keras.models.load_model(\"my_mnist_model.h5\") # rollback to best model\n",
    "\n",
    "# Evaluate the model\n",
    "model.____(____, ____)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "5y0JJitTgYM9"
   },
   "source": [
    "Finally, we can use tensorboard to check out our model's performance! Note that the tensorboard extension was loaded in the notebook setup cell."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "NAnYPOQrgd6m"
   },
   "outputs": [],
   "source": [
    "%tensorboard --logdir=./my_mnist_logs --port=6006"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "HE_XjLrccEtn"
   },
   "source": [
    "An enthusiastic (albeit somewhat sick 😷) TA noted that during the development of the notebook the accuracy reached on the test dataset was 97.84%. Additionally, the tensorboard curves from the test run is given below:\n",
    "\n",
    "\n",
    "![picture](https://unils-my.sharepoint.com/:i:/g/personal/tom_beucler_unil_ch/EXPT4jVOfNZJpkSqD4wNktMByxa9LmH-uq0EU6PIaul27Q?download=1)"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "collapsed_sections": [
    "_ViiopXOfS3G",
    "l-VS2NTVv_kW"
   ],
   "include_colab_link": true,
   "name": "S4_1_NNs_with_Keras.ipynb",
   "provenance": [],
   "toc_visible": true
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}