{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "view-in-github"
},
"source": [
""
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "5Tt5C4PoIRl0"
},
"source": [
"# (Exercises) Training Models\n",
"\n",
"This week's notebook is based off of the exercises in Chapter 4 of Géron's book."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "666iBNeL8-7H"
},
"source": [
"## Notebook Setup\n",
"Let's begin like in the last notebook: importing a few common modules, ensuring MatplotLib plots figures inline and preparing a function to save the figures. We also check that Python 3.5 or later is installed (although Python 2.x may work, it is deprecated so once again we strongly recommend you use Python 3 instead), as well as Scikit-Learn ≥0.20.\n",
"\n",
"You don't need to worry about understanding everything that is written in this section."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"id": "S_OXSp49IOF2"
},
"outputs": [],
"source": [
"#@title Run this cell for preliminary requirements. Double click it if you want to check out the source :)\n",
"\n",
"# Python ≥3.5 is required\n",
"import sys\n",
"assert sys.version_info >= (3, 5)\n",
"\n",
"# Is this notebook running on Colab or Kaggle?\n",
"IS_COLAB = \"google.colab\" in sys.modules\n",
"\n",
"# Scikit-Learn ≥0.20 is required\n",
"import sklearn\n",
"assert sklearn.__version__ >= \"0.20\"\n",
"\n",
"# Common imports\n",
"import numpy as np\n",
"import os\n",
"\n",
"# To make this notebook's output stable across runs\n",
"rnd_seed = 42\n",
"rnd_gen = np.random.default_rng(rnd_seed)\n",
"\n",
"# To plot pretty figures\n",
"%matplotlib inline\n",
"import matplotlib as mpl\n",
"import matplotlib.pyplot as plt\n",
"mpl.rc('axes', labelsize=14)\n",
"mpl.rc('xtick', labelsize=12)\n",
"mpl.rc('ytick', labelsize=12)\n",
"\n",
"# Where to save the figures\n",
"PROJECT_ROOT_DIR = \".\"\n",
"CHAPTER_ID = \"classification\"\n",
"IMAGES_PATH = os.path.join(PROJECT_ROOT_DIR, \"images\", CHAPTER_ID)\n",
"os.makedirs(IMAGES_PATH, exist_ok=True)\n",
"\n",
"def save_fig(fig_id, tight_layout=True, fig_extension=\"png\", resolution=300):\n",
" path = os.path.join(IMAGES_PATH, fig_id + \".\" + fig_extension)\n",
" print(\"Saving figure\", fig_id)\n",
" if tight_layout:\n",
" plt.tight_layout()\n",
" plt.savefig(path, format=fig_extension, dpi=resolution)\n",
"\n",
"#Ensure the palmerspenguins dataset is installed\n",
"%pip install palmerpenguins --quiet"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "RtuO7Elb9LuC"
},
"source": [
" **Data Setup**"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "wKsvLXdmzqD8"
},
"source": [
"In this notebook we will be working with the [*Palmer Penguins dataset*](https://allisonhorst.github.io/palmerpenguins/articles/intro.html). Each entry in the dataset includes the penguin's species, island, sex, flipper length, body mass, bill length, bill depth, and the year the study was carried out. Let's take a moment and observe our subjects!
\n",
"\n",
"