(Exercises) Training Models

2.3. (Exercises) Training Models#

This week’s notebook is based off of the exercises in Chapter 4 of Géron’s book.

2.3.1. Notebook Setup#

Let’s begin like in the last notebook: importing a few common modules, ensuring MatplotLib plots figures inline and preparing a function to save the figures. We also check that Python 3.5 or later is installed (although Python 2.x may work, it is deprecated so once again we strongly recommend you use Python 3 instead), as well as Scikit-Learn ≥0.20.

You don’t need to worry about understanding everything that is written in this section.

#@title  Run this cell for preliminary requirements. Double click it if you want to check out the source :)

# Python ≥3.5 is required
import sys
assert sys.version_info >= (3, 5)

# Is this notebook running on Colab or Kaggle?
IS_COLAB = "google.colab" in sys.modules

# Scikit-Learn ≥0.20 is required
import sklearn
assert sklearn.__version__ >= "0.20"

# Common imports
import numpy as np
import os

# To make this notebook's output stable across runs
rnd_seed = 42
rnd_gen = np.random.default_rng(rnd_seed)

# To plot pretty figures
%matplotlib inline
import matplotlib as mpl
import matplotlib.pyplot as plt
mpl.rc('axes', labelsize=14)
mpl.rc('xtick', labelsize=12)
mpl.rc('ytick', labelsize=12)

# Where to save the figures
PROJECT_ROOT_DIR = "."
CHAPTER_ID = "classification"
IMAGES_PATH = os.path.join(PROJECT_ROOT_DIR, "images", CHAPTER_ID)
os.makedirs(IMAGES_PATH, exist_ok=True)

def save_fig(fig_id, tight_layout=True, fig_extension="png", resolution=300):
    path = os.path.join(IMAGES_PATH, fig_id + "." + fig_extension)
    print("Saving figure", fig_id)
    if tight_layout:
        plt.tight_layout()
    plt.savefig(path, format=fig_extension, dpi=resolution)

#Ensure the palmerspenguins dataset is installed
%pip install palmerpenguins --quiet

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[1], line 11
      8 IS_COLAB = "google.colab" in sys.modules
     10 # Scikit-Learn ≥0.20 is required
---> 11 import sklearn
     12 assert sklearn.__version__ >= "0.20"
     14 # Common imports

ModuleNotFoundError: No module named 'sklearn'

Data Setup

In this notebook we will be working with the Palmer Penguins dataset. Each entry in the dataset includes the penguin’s species, island, sex, flipper length, body mass, bill length, bill depth, and the year the study was carried out. Let’s take a moment and observe our subjects!

🐧
In order: Adélie (Pygoscelis adeliae), Chinstrap (Pygoscelis antarcticus), and Gentoo (Pygoscelis papua) penguins

As you can imagine, this dataset is normally used to train multiclass/multinomial classification algorithms and not binary classification algorithms, since there are more than 2 classes.

“Three classes, even!” - an observant TA

For this exercise, however, we will implement the binary classification algorithm referred to as the logistic regression algorithm (also called logit regression).

# Let's load the Palmer Penguins Dataset!
from palmerpenguins import load_penguins
data = load_penguins()

Like with the Titanic dataset in the previous notebook, the data here is loaded as a Pandas DataFrame. Feel free to play around with it in the cell below!

# The following code will make the dataframe be shown in an interactive table
# inside of Google colab. Use data.head(5) if you're running this locally

from google.colab import data_table
data_table.enable_dataframe_formatter()

data

As we mentioned before, there are three species of penguin in the dataset. However, today we’ll be implementing a binary classification algorithm, which means we need to have exactly two target classes! Let’s go ahead and filter the data so that we keep the Adelie and Gentoo species.

# We define the species that we're interested in
species = ['Adelie','Gentoo']

# And use the .loc method in Pandas to keep only the two species mentioned above
data = data.loc[data['species'].isin(species)]

#@title Today, we'll be learning to classify the penguins based on the length and depth of their bills.  Run the cell and take a look at the data! 🔎

import plotly.express as px

# Dimensions for interactive plot
dims = ['bill_length_mm', 'bill_depth_mm']
colors = ['orange','black','lightseagreen']

fig = px.scatter_matrix(
                        data, 
                        dimensions=dims,
                        color="species",
                        color_discrete_sequence = colors
                        )

fig.show()

We now have a dataframe with all the information that we need. Let’s go ahead and extract the bill length and depth to use as input data, storing it in $x$. Then we’ll store the labels (i.e., the targets) in $y$.

2.3.2. Q1) Extract the bill length and bill depth to use as the input vector $x$, and store the label (i.e., the target data) in $y$#

#@title Hints - Data Loading and Filtering

'''
Loading data into X:

You can access multiple columns of a pandas dataframe using a list! The snippet
below will return the species and island associated with each penguin in the
database. 

In the cell below, you want to load the bill length and bull depth columns.
Make sure you use the right column name! Copy it from the dataframe view we
printed before, and make sure there aren't any extra spaces
''';
data[['species','island']];

'''
Finding the NaN row indices

Pandas has a built-in function to determine if the value is a NaN (Not a Number)
value. 

mydata.notna() will return True wherever the data isn't a NaN value, but we need
to check if each row has _any_ NaN values - that's what the _all(axis=1)_ does.
''';

# Load the bill length and depth into X
X = data[______]

# Find out the rows where you don't have an valid input (i.e., rows with a nan value)
indices = X._____().all(axis=1)

# Filter out the datapoints using the indices we found
X = X[___]

# We'll also normalize the data using the mean and standard deviation
X = (X - X.mean())/X.std()

# Let's take a look at the input dataset - if you did everything right, you'll 
# have 274 entries and printing out x.shape will return (274,2)
print(X.shape)

We have our input data, but we need a target to predict. We previously filtered the data to only include Adélie and Gentoo penguins, but we still have them as strings! Let’s convert them to a binary representation (i.e., 0 or 1). Make sure you have the same penguins as in your input!

2.3.3. Q2) Convert the species label to a binary classification, and filter the target data to match the input data.#

#@title Hints - Boolean Representation & Type Conversion

''' 
Boolean Representation

You can access the species data by calling data['species']

== is the operator that lets you check if the data is equal to another value

data['island'] == Torgesen 
will return True for each row if the penguin was studied in Torgesen, and False
if it was studied in another island
''';

'''
Type Conversion

Pandas dataframes include a method to change the type of the data being called.

data['bill_length_mm'].astype(int) will return the bill length data as integers
''';

# Convert species data into boolean form by checking if the species is Adélie
y = (data['_______'] _____ '_______')

# Filter out the points for which we have NaN values. Reuse the indices from Q1! 
y = y[____]

# Convert the boolean data into an integer
y = y._______(_____)

# Print out y! If everything is implemented correctly, you should see a panda 
# series full of ones and zeroes with 274 rows
print(y)

We now have a set of binary classification data we can use to train an algorithm.

As we saw during our reading, we need to define three things in order to train our algorithm:

$\cdot$ the type of algorithm we will train, \ $\cdot$ the cost function (which will tell us how close our prediction is to the truth), and \ $\cdot$ a method for updating the parameters in our model according to the value of the cost function (e.g., the gradient descent method).

Let’s begin by defining the type of algorithm we will use. We will train a logistic regression model to differentiate between two classes. A reminder of how the logistic regression algorithm works is given below.

The logistic regression algorithm will thus take an input $t$ that is a linear combination of the features:

$t_{\small{n}} = \beta_{\small{0}} + \beta_{\small{1}} \cdot X_{1,n} + \beta_{\small{2}} \cdot X_{2,n}$

where

$n$ is the ID of the sample
$X_{\small{0}}$ represents the bill length
$X_{\small{1}}$ represents the bill width

This input is then fed into the logistic function, $\sigma$: \begin{align} \sigma: t\mapsto \dfrac{1}{1+e^ {-t}} \end{align}

Let’s define the logistic function for later use.

2.3.4. Q3) Define the logistic function#

#@title Hint - Exponential Function
'''
Numpy includes the exponential function in its library as numpy.exp
https://numpy.org/doc/stable/reference/generated/numpy.exp.html
''';

np.exp(2);

def logistic(in_val):
    # Return the value of the logistic function
    out_value = _________ 
    return out_value

Now that the logistic function has been defined, we can plot it (this will help us remember what it looks like!) Run the code below - you won’t have to fill anything in for this one 😀 But feel free to show the code and read through it - some of the functions used can be helpful to you down the line!

#@title Run this to plot the logistic function!
# Let's generate an array of 20 points with values from -4 to +4 
t = np.linspace(-4,4,20)

# Initiate a figure and axes object using matplotlib
fig, ax = plt.subplots()

# Draw the X and Y axes
ax.axvline(0, c='black', alpha=1)
ax.axhline(0, c='black', alpha=1)

# Draw the threshold line (y_val=0,5) and asymptote (y=1)
[ax.axhline(y_val, c='black', alpha=0.5, linestyle='dotted') for y_val in (0.5,1)]

# Scale things to make the graph look nicer
plt.autoscale(axis='x', tight=True)

# Plot the logistic function. X values from the t vector, y values from logistic(t)
ax.plot(t, logistic(t));
ax.set_xlabel('$t$')
ax.set_ylabel('$\\sigma\\  \\left(t\\right)$')
fig.tight_layout()

With the logistic function, we define inputs resulting in $\sigma\geq0.5$ as belonging to the one class, and any value below that is considered to belong to the zero class.

We now have a function which lets us map the value of the bill length and width to the class to which the observation belongs (i.e., whether the length and width correspond to Adélie or Gentoo penguins). However, there is a parameter vector $\theta$ with a number of parameters that we do not have a value for:
$\theta = [ \beta_{\small{0}}, \beta_{\small{1}}$, $\beta_{\small{2}} ]$

2.3.5. Q4) Set up an array of random numbers between 0 and 1 representing the $\theta$ vector.#

#@title Hints: Random Number Generation 
''' 
Random Number Generation
Use `rnd_gen`! If you're not sure how to use it, consult the `default_rng` 
documentation at this address:
https://numpy.org/doc/stable/reference/random/generator.html

For instance, you may use the `random` method of `rnd_gen`.*
''';

'''
The theta array should have 3 elements in it! 
''';

#@title Hint: Code Snipppet
'''
rnd_gen.random((___,)) # length of array
''';

theta = ______

In order to determine whether a set of $\beta$ values is better than the other, we need to quantify well the values are able to predict the class. This is where the cost function comes in.

The cost function, $c$, will return a value close to zero when the prediction, $\hat{p}$, is correct and a large value when it is wrong. In a binary classification problem, we can use the log loss function. For a single prediction and truth value, it is given by: \begin{align} \text{c}(\hat{p},y) = \left{ \begin{array}{cl} -\log(\hat{p})& \text{if}; y=1\ -\log(1-\hat{p}) & \text{if}; y=0 \end{array} \right. \end{align}

However, we want to apply the cost function to an n-dimensional set of predictions and truth values. Thankfully, we can find the average value of the log loss function $J$ for an an-dimensional set of $\hat{y}$ & $y$ as follows:

\begin{align} \text{J}(\mathbf{\hat{p}},y) = - \dfrac{1}{n} \sum_{i=1}^{n} \left[ y_i\cdot \log\left( \hat{p}_i \right) \right] + \left[ \left( 1 - y_i \right) \cdot \log\left( 1-\hat{p}_i \right) \right] \end{align}

We now have a formula that can be used to calculate the average cost over the training set of data.

Now let’s code 💻

2.3.6. Q5) Define a log_loss function that takes in an arbitrarily large set of prediction and truths#

Hint 1: You need to encode the function $J$ above, for which Numpy’s functions may be quite convenient (e.g., log, mean, etc.)

Hint 2: Asserting the dimensions of the vector is a good way to check that your function is working correctly. Here’s a tutorial on how to use assert. For instance, to assert that two vectors X and y have the same dimension, you may use:

assert X.shape==y.shape

#@title Hint: Example code snippet
'''
J_vector  = -(y * np.log(p_hat + epsilon) + (1-y) * np.log(1-y_hat))
J.mean()
''';

def log_loss(p_hat, y, epsilon=1e-7):
  
  # Begin by calculating the two possibilities for the cost function, i.e.
  # 1: -log(p_hat + epsilon), and 2: -log(1- p_hat). We added an epsilon term 
  # to -log(p_hat) because we can run into mathematical problems if p_hat = 0.
  term_1 = -np.___( _____ + _____ )
  term_2 = -np.___( 1 - ____ )
  
  # We can almost calculate J! We'll need to 1) multiply term_1 by y, and 
  # 2) multiply term_2 by (1-y). We then add the new terms together.
  # Calculate the value of the cost function (i.e., what's inside the brackets)
  inside_brackets = (__) * term_1 + ( ___ - ___ ) * term_2

  #Verify the shape of inside_brackets. 
  print(f'The size of the term inside the brackets is {inside_brackets.shape}')

  # You should have a cost value for each one of your predictions. We won't
  # use the individual values, though. We'll aggregate the information from
  # all our predictions by calculating the mean! (i.e., 1/n_terms * terms_sum)
  # This single value is J
  J = _____.mean()

  return J

We now have a way of quantifying how good our predictions are. The final thing needed for us to train our algorithm is figuring out a way to update the parameters in a way that improves the average quality of our predictions.

Warning: we’ll go into a bit of math below

Let’s look at the change in a single parameter within $\theta$: $\beta_1$ (given $X_{1,i} = X_1$, $\;\hat{p}_{i} = \hat{p}$, $\;y_{i} = y$). If we want to know what the effect of changing the value of $\beta_1$ will have on the log loss function we can find this with the partial derivative:

$ \dfrac{\partial J}{\partial \beta_1} $

This may not seem very helpful by itself - after all, $\beta_1$ isn’t even in the expression of $J$. But if we use the chain rule, we can rewrite the expression as:

$\dfrac{\partial J}{\partial \hat{p}} \cdot \dfrac{\partial \hat{p}}{\partial \theta} \cdot \dfrac{\partial \theta}{\partial \beta_1}$

We’ll spare you the math (feel free to verify it youself, however!):

$\dfrac{\partial J}{\partial \hat{p}} = \dfrac{\hat{p} - y}{\hat{p}(1-\hat{p})}, \quad \dfrac{\partial \hat{p}}{\partial \theta} = \hat{p} (1-\hat{p}), \quad \dfrac{\partial \theta}{\partial \beta_1} = X_1 $

and thus

$ \dfrac{\partial J}{\partial \beta_1} = (\hat{p} - y) \cdot X_1 $

We can calculate the partial derivative for each parameter in $\theta$ which, as you may have realized, is simply the $\theta$ gradient of $J$: $\nabla_{\theta}(J)$

With all of this information, we can now write $\nabla_{\theta} J$ in terms of the error, the feature vector, and the number of samples we’re training on!

$\nabla_{\mathbf{\theta}^{(k)}} \, J(\mathbf{\theta^{(k)}}) = \dfrac{1}{n} \sum\limits_{i=1}^{n}{ \left ( \hat{p}^{(k)}_{i} - y_{i} \right ) \mathbf{X}_{i}}$

Note that here $k$ represents the iteration of the parameters we are currently on.

We now have a gradient we can calculate and use in the batch gradient descent method! The updated parameters will thus be:

\begin{align} {\mathbf{\theta}^{(k+1)}} = {\mathbf{\theta}^{(k)}} - \eta,\nabla_{\theta^{(k)}}J(\theta^{(k)}) \end{align}

Where $\eta$ is the learning rate parameter. It’s also worth pointing out that $\;\hat{p}^{(k)}_i = \sigma\left(\theta^{(k)}, X_i\right) $

In order to easily calculate the input to the logistic regression, we’ll multiply the $\theta$ vector with the X data, and as we have a non-zero bias $\beta_0$ we’d like to have an X matrix whose first column is filled with ones.

\begin{align} X_{\small{with\ bias}} = \begin{pmatrix} 1 & X_{1,0} & X_{2,0}\ 1 & X_{1,1} & X_{2,1}\ &…&\ 1 & X_{1,n} & X_{2,n} \end{pmatrix} \end{align}

2.3.7. Q6) Prepare the `X_with_bias` matrix.#

#@title Hints: Making an an array filled with ones, hints on concatenation

'''
Making the ones array

Making an array with ones and the same number of entries as rows in your input 
data: You can use numpy.ones( (array_dimensions) ) in order to generate an array with
the given array_dimensions shape. e.g., np.ones((4,)) => array([1,1,1,1])

Accessing the number of rows: dataframes have the "shape" attribute implemented.
For our penguin data, the input vector shape should be (274,2), and so using
shape[0] should return the right length for our ones array
''';


'''
Concatenation

You can quickly concatenate your arrays using np.c_[array1,array2]. Note that
the order matters, so make sure array1 is the array filled with ones :). Also,
np.c_ uses square brackets! [] - you'll get an error if you use regular 
brackets ().

numpy.c_ will automagically understand that the second array is a dataframe - 
you don't need to worry about transforming it into a numpy array for today!

''';

# Generate the ones array
ones_array = _______._______(_______.______[___])

# Make the x_with_bias matrix
x_with_bias = ______._______(_______,_______)

# Print your x with bias matrix to make sure it looks the way it's supposed to
print(X_with_bias[:10])

Our X_with_bias matrix looks like this: \ [[ 1. $\quad$ -0.69346042 $\quad$ 0.92572752] \ [ 1. $\quad$ -0.6164717 $\quad$ 0.28005659] \ [ 1. $\quad$ -0.46249427 $\quad$ 0.57805856] \ [ 1. $\quad$ -1.15539273 $\quad$ 1.22372949] \ [ 1. $\quad$ -0.65496606 $\quad$ 1.86940041] \ [ 1. $\quad$ -0.73195478 $\quad$ 0.47872457] \ [ 1. $\quad$ -0.67421324 $\quad$ 1.37273047] \ [ 1. $\quad$ -1.65581939 $\quad$ 0.62772555] \ [ 1. $\quad$ -0.13529222 $\quad$ 1.67073244] \ [ 1. $\quad$ -0.94367375 $\quad$ 0.13105561]]

2.3.8. Q7) Write a function called `predict` that takes in the parameter vector $\theta$ and the `X_with_bias` matrix and evaluates the logistic function for each of the samples.#

#@title Hint: Pseudocode Snippet

'''
Pseudocode below:

define predict_function(x_with_bias, theta_vector):
  argument_for_logistic_function = dot_product(x_with_bias, theta_vector)
  return logistic_function(argument_for_logistic_function)

''';

# Write your predict function here
def predict_function(____, ____):
    # Find the dot product of X_with_bias and theta
    dot_product = _______._______(_______,_______)

    # Use your logistic function!
    output = _______(_______)

    return _____ # Return the value you get

# Let's test your predict function!

# Set up debug data and parameters
debug_data = np.c_[np.ones(5), np.linspace(-1,1,10).reshape((-1,2))]
debug_theta = np.array([0.2,0.1,0.9]) 

print(predict_function(debug_data, debug_theta))

If everything is set up correctly and you didn’t change the debug data and theta, the output for your predict function should be:

[0.35434369 0.46118934 0.57172409 0.67553632 0.76454801]

2.3.9. Q8) Now that you have a `predict` function, write a `gradient_calc` function that calculates the gradient for the logistic function.#

Hint: You’ll have to feed theta, X, and y to the gradient_calc function.

Hint: You can use this equation to calculate the gradient of the cost function.

#@title Hint: Pseudocode Snippet

'''

define gradient_calculator_function(y, X_with_bias, theta_vector):
  # predicted values using theta and inputs
  prediction = predict(x_with_bias,theta_vector)
  
  number_of_predictions = len(prediction)

  assert number_of_predictions == len(y)

  error = prediction - y

  X_transpose = transpose(X)

  return dot_product(X_transpose, error) / number_of_predictions

''';

def gradient_calculator(_______, _______, _______):
    # Find predicted values using the predict function
    prediction = _______(_______, _______)

    # Assert that you have the same number of predictions as you do targets
    # Otherwise, something went wrong!
    assert len(prediction) == __________

    # Calculate the error
    error = _______ - _______

    # Find the dot product with the input matrix and divide by the number of 
    # predictions
    output =  
    return output

# Let's test the gradient calculator
# Begin by creating dummy labels
debug_labels = np.array([0,0,0,1,1])

# And call the function you defined with the dummy labels and data we made before
print(gradient_calculator(debug_labels, debug_data, debug_theta))

If you kept the same dummy data we included by default in the notebook, you should get [ 0.16546829 -0.19307376 -0.15630302] as the output of your gradient calculator! 💻

We can now write a function that will train a logistic regression algorithm!

Your logistic_regression function needs to:

Take in a set of training input/output data, validation input/output data, a number of iterations to train for, a set of initial parameters $\theta$, and a learning rate $\eta$
At each iteration:
Generate a set of predictions on the training data. Hint: You may use your function predict on inputs X_train from the training set.
Calculate and store the loss function for the training data at each iteration. Hint: You may use your function log_loss on inputs X_train and outputs y_train from the training set.
Calculate the gradient. Hint: You may use your function grad_calc.
Update the $\theta$ parameters. Hint: You need to implement this equation.
Generate a set of predictions on the validation data using the updated parameters. Hint: You may use your function predict on inputs X_valid from the validation set.
Calculate and store the loss function for the validation data. Hint: You may use your function log_loss on inputs X_valid and outputs y_valid from the validation set.
Bonus: Calculate and store the accuracy of the model on the training and validation data as a metric!
Return the final set of parameters $\theta$ & the stored training/validation loss function values (and the accuracy, if you did the bonus)

2.3.10. Q9) Write the `logistic_regression` function#

#@title Hint: Pseudocode Snippet

'''
define logistic_regression(
                           X_train,
                           y_train,
                           X_validation,
                           y_validation,
                           theta_vector,
                           number_of_iterations,
                           learning_rate_eta,
                          ):
  #initialize the list of losses
  training_losses = list()
  validation_losses = list()

  for iteration in range(number_of_iterations):
    train_set_predictions = predict(X_train, theta_vector)
    train_loss = log_loss(train_set_predictions, y_train)
    training_losses.append(train_loss)

    gradient = gradient_calculator(y_train, X_train, theta_vector)
    theta_vector = theta_vector - gradient * learning_rate_eta

    validation_set_predictions = predict(X_validation, theta_vector)
    validation_loss = log_loss(validation_set_predictions, y_validation)
    validation_losses.append(validation_loss)

    print(Completed (iteration)/(number_of_iterations)*100%)

    return [training_losses, validation_losses], theta
''';

def logistic_regression(_______,
                        _______,
                        _______,
                        _______,
                        _______,
                        num_iters,
                        _______,
                      ):
  # Initialize the list of losses
  training_losses = _______
  validation_losses = _______
  
  # Loop through as many times as defined in the function call
  for iteration in _______(_______):
    
    #--------Training-------
    # Get predictions on training dataset
    _______ = _______(_______, _______)
    
    # Calculate the loss
    _______ = _______(_______, _______)

    # Add it to the list of training losses to keep track of it
    training_losses._______(_______)
    
    # Calculate the Gradient
    _______ = _______(_______, _______, _______)
    
    # Find the new value of theta
    _______ = _______ - _______ * _______

    #--------Validation-----------
    # Get predictions on the validation dataset
    _______ = _______(_______, _______)

    # Calculate the validation loss
    _______ = _______(_______, _______)

    # Add it to the list of validation losses to keep track of it
    validation_losses._______(_______)
    
    # Progress Indicator
    if (iteration/num_iters * 100) % 5 == 0:
      print(f'\rCompleted {(iteration)/(num_iter)*100}%', end='')
  
  print('\rCompleted 100%')
  return [_______, _______], _______

¡¡¡Important Note!!!

The notebook assumes that you will return

a Losses list, where Losses[0] is the training loss and Losses[1] is the validation loss
a tuple with the 3 final coefficients ($\beta_0$, $\beta_1$, $\beta_2$)

Now that we have our logistic regression function, we’re all set to train our algorithm! Or are we?

There’s an important data step that we’ve neglected up to this point - we need to split the data into the train, validation, and test datasets.

train ✂️ validation ✂️ test

test_ratio = 0.2
validation_ratio = 0.2
total_size = len(X_with_bias)

test_size = int(total_size * test_ratio)
validation_size = int(total_size * validation_ratio)
train_size = total_size - test_size - validation_size

rnd_indices = rnd_gen.permutation(total_size)

X_train = X_with_bias[rnd_indices[:train_size]]
y_train = y.iloc[rnd_indices[:train_size]]
X_valid = X_with_bias[rnd_indices[train_size:-test_size]]
y_valid = y.iloc[rnd_indices[train_size:-test_size]]
X_test = X_with_bias[rnd_indices[-test_size:]]
y_test = y.iloc[rnd_indices[-test_size:]]

Now we’re ready!

2.3.12. Challenges#

C1) Add more features to try to improve our accuracies!
C2) Add early stopping to the training algorithm! (e.g., stop training when the accuracy is greater than a target accuracy)

(Exercises) Training Models

Contents

2.3. (Exercises) Training Models#

2.3.1. Notebook Setup#

2.3.2. Q1) Extract the bill length and bill depth to use as the input vector \(x\), and store the label (i.e., the target data) in \(y\)#

2.3.3. Q2) Convert the species label to a binary classification, and filter the target data to match the input data.#

2.3.4. Q3) Define the logistic function#

2.3.5. Q4) Set up an array of random numbers between 0 and 1 representing the \(\theta\) vector.#

2.3.6. Q5) Define a log_loss function that takes in an arbitrarily large set of prediction and truths#

2.3.7. Q6) Prepare the `X_with_bias` matrix.#

2.3.8. Q7) Write a function called `predict` that takes in the parameter vector \(\theta\) and the `X_with_bias` matrix and evaluates the logistic function for each of the samples.#

2.3.9. Q8) Now that you have a `predict` function, write a `gradient_calc` function that calculates the gradient for the logistic function.#

2.3.10. Q9) Write the `logistic_regression` function#

2.3.12. Challenges#

(Exercises) Training Models

Contents

2.3. (Exercises) Training Models#

2.3.1. Notebook Setup#

2.3.2. Q1) Extract the bill length and bill depth to use as the input vector \(x\), and store the label (i.e., the target data) in \(y\)#

2.3.3. Q2) Convert the species label to a binary classification, and filter the target data to match the input data.#

2.3.4. Q3) Define the logistic function#

2.3.5. Q4) Set up an array of random numbers between 0 and 1 representing the \(\theta\) vector.#

2.3.6. Q5) Define a log_loss function that takes in an arbitrarily large set of prediction and truths#

2.3.7. Q6) Prepare the X_with_bias matrix.#

2.3.8. Q7) Write a function called predict that takes in the parameter vector \(\theta\) and the X_with_bias matrix and evaluates the logistic function for each of the samples.#

2.3.9. Q8) Now that you have a predict function, write a gradient_calc function that calculates the gradient for the logistic function.#

2.3.10. Q9) Write the logistic_regression function#

2.3.11. Q10) Train your logistic regression algorithm. We recommend you use 500 iterations, \(\eta\)=0.1#

2.3.12. Challenges#

2.3.7. Q6) Prepare the `X_with_bias` matrix.#

2.3.8. Q7) Write a function called `predict` that takes in the parameter vector \(\theta\) and the `X_with_bias` matrix and evaluates the logistic function for each of the samples.#

2.3.9. Q8) Now that you have a `predict` function, write a `gradient_calc` function that calculates the gradient for the logistic function.#

2.3.10. Q9) Write the `logistic_regression` function#