{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "view-in-github"
},
"source": [
""
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "GAwSQhMS3TK_"
},
"source": [
"This notebook will be used in the lab session for week 1 of the course and provides some hands-on experience applying the lessons to environmental science datasets."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "kS57AUwjmh-J"
},
"source": [
"# (Exercises) Statistical Forecasting\n",
"\n",
"We will be using data from Wilks' book on Statistical Methods for the Atmospheric Sciences"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "spYmkik-3K_i"
},
"source": [
"## Notebook Setup"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"id": "o_x3_GmVcKl9"
},
"outputs": [],
"source": [
"#@title Run this cell to get the python environment set up!\n",
"# Python ≥3.5 is required\n",
"import sys\n",
"assert sys.version_info >= (3, 5)\n",
"\n",
"# Is this notebook running on Colab or Kaggle?\n",
"IS_COLAB = \"google.colab\" in sys.modules\n",
"\n",
"# Scikit-Learn ≥0.20 is required\n",
"import sklearn\n",
"assert sklearn.__version__ >= \"0.20\"\n",
"\n",
"# Common imports\n",
"import numpy as np\n",
"import os\n",
"import pandas as pd\n",
"import pooch\n",
"\n",
"#Data Visalization Import\n",
"from google.colab import data_table\n",
"\n",
"\n",
"# to make this notebook's output stable across runs\n",
"rnd_seed = 42\n",
"rnd_gen = np.random.default_rng(rnd_seed)\n",
"\n",
"# To plot pretty figures\n",
"%matplotlib inline\n",
"import matplotlib as mpl\n",
"import matplotlib.pyplot as plt\n",
"mpl.rc('axes', labelsize=14)\n",
"mpl.rc('xtick', labelsize=12)\n",
"mpl.rc('ytick', labelsize=12)\n",
"\n",
"# Where to save the figures\n",
"PROJECT_ROOT_DIR = \".\"\n",
"CHAPTER_ID = \"classification\"\n",
"IMAGES_PATH = os.path.join(PROJECT_ROOT_DIR, \"images\", CHAPTER_ID)\n",
"os.makedirs(IMAGES_PATH, exist_ok=True)\n",
"\n",
"def save_fig(fig_id, tight_layout=True, fig_extension=\"png\", resolution=300):\n",
" path = os.path.join(IMAGES_PATH, fig_id + \".\" + fig_extension)\n",
" print(\"Saving figure\", fig_id)\n",
" if tight_layout:\n",
" plt.tight_layout()\n",
" plt.savefig(path, format=fig_extension, dpi=resolution)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "gQhMBs1CHqPo"
},
"source": [
"Let's begin by loading relevant data from the cloud. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"id": "lISPw_EDHtkb"
},
"outputs": [],
"source": [
"#@title And this cell to load the data we'll be using as `A3_df`\n",
"#Loading Wilks' Table A-3 from the course datastore\n",
"csv_path = 'https://unils-my.sharepoint.com/:x:/g/personal/tom_beucler_unil_ch/EXG7Rht55mhPiwkUKEDSI8oBuXNe8OOLYJX3_5ACmK1w5A?download=1'\n",
"hash = 'c158828a1bdf1aa521c61321842352cb1674e49187e21c504188ab976d3a41f2'\n",
"csv_file = pooch.retrieve(csv_path, known_hash=hash)\n",
"\n",
"A3_df = pd.read_csv(csv_file, index_col=0)\n",
"print(\"Here's a data sample. You can copy the row header text from here if you need it later 😉\")\n",
"A3_df.head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "AxNcWHCxG3vI"
},
"source": [
"## **Linear Regression**\n",
"\n",
"The goal for this exercise is to train a linear regression model and a logistic regression model to forecast atmospheric temperature using atmospheric pressure. 🌡 \n",
"\n",
"For the first case, we want to train linear regression to calculate June temperatures (the predictand) from June pressures (as the predictor) in Guayaquil, Ecuador.\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "jqI3IRvoXjG0"
},
"source": [
""
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "qGvcGjGrXjG8"
},
"source": [
"**Caption** A beautiful day in Guayacil, Ecuador. Can you predict how hot it will be? 🌞"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "7nkw4g5pI3w4"
},
"source": [
"We can try addressing this question using a [linear regression model](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html) from scikit. \n",
"\n",
"## **Q1) Import the LinearRegression model. Instantiate it and fit it using the A3 dataframes' pressure and temperature.**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "sP4qIkhaM1YB"
},
"outputs": [],
"source": [
"# Import the LinearRegression model\n",
"from _______._______ import _______"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "6S5OMx6SVfBk"
},
"outputs": [],
"source": [
"# Instantiate the model\n",
"lin_reg = _______()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "lG5ZjQDEVgYN"
},
"outputs": [],
"source": [
"# Load and reshape the input data\n",
"pressure = _______['_______'].values.reshape(-1,1)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "dmVjMnidNaEC"
},
"outputs": [],
"source": [
"# Load the truth data (i.e., the predictant)\n",
"temperature = _______['_______'].to_numpy().ravel()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "5It7zEtYN1mN"
},
"outputs": [],
"source": [
"# Fit the model\n",
"lin_reg._______(_______, _______)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "znzId7C4T3-x"
},
"source": [
"We now have a linear regression model for the temperature and pressure. Let's make some plots to visualize our data and get a qualitative sense of our model.\n",
"\n",
"## **Q2) Generate a scatter plot with the linear regression plot for our data.**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "8ps2eoPhPgd1"
},
"outputs": [],
"source": [
"#Instantiate a figure having size 13,6\n",
"fig, ax = plt.subplots(_______=(_______,_______))\n",
"\n",
"\"\"\"---------------------------------------------------------------------------- \n",
"Let's start by plotting the data points from our dataset\n",
"----------------------------------------------------------------------------\"\"\"\n",
"# Set figure title and axis labels\n",
"fig._______('June Temperature vs Pressure in Guayaquil, Ecuador')\n",
"ax._______(\"Pressure (mb)\")\n",
"ax._______(\"Temperature (°C)\")\n",
"\n",
"# The colors and styles suggested below are not compulsory, but please avoid \n",
"# using the default settings.\n",
"# Make a scatter plot for the pressure (x) and temperature (y). Use color=black,\n",
"# marker size = 100, and set the marker style to '2'.\n",
"\n",
"ax._______(_______, # X values\n",
" _______, # y values\n",
" _______=_______, # Color\n",
" _______ = _______, # Marker size\n",
" _______ = _______) # Marker style\n",
"\n",
"\n",
"'''---------------------------------------------------------------------------- \n",
"Now, let's plot the line we fit to the datapoints\n",
"----------------------------------------------------------------------------'''\n",
"# Make a 100 point numpy array between 1008 and 1014 and store it in reg_x. \n",
"# Reshape it to (-1,1). Hint: numpy has a linear space generator\n",
"reg_x = _______._______(_______, # Start\n",
" _______, # Stop\n",
" _______,# Number of Points\n",
" ).reshape(_______,________) # Reshape to row=sample, col=feature\n",
"\n",
"# Let's produce a set of predictions from our linear space array.\n",
"reg_y = lin_reg._______(reg_x)\n",
"\n",
"# Let's plot the regression line using reg_x and reg_y. Set the color to red and\n",
"# the linewidth to 1.5\n",
"ax.plot(_______, # X\n",
" _______, # y\n",
" _______ = _______, # Color\n",
" _______ = _______) # Linewidth\n",
"\n",
"ax.autoscale(axis='x', tight=True)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "gZoEKrwba9XY"
},
"source": [
"We now have a qualitative verification of our model! Your figure should look similar to this one:\n",
"\n",
"