{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "colab": { "name": "S1_2_Numpy.ipynb", "provenance": [], "toc_visible": true, "include_colab_link": true }, "kernelspec": { "name": "python3", "display_name": "Python 3" }, "language_info": { "name": "python" } }, "cells": [ { "cell_type": "markdown", "metadata": { "id": "view-in-github", "colab_type": "text" }, "source": [ "\"Open" ] }, { "cell_type": "markdown", "source": [ "# Scientific Computing with `NumPy`" ], "metadata": { "id": "MyiDxofYIpEC" } }, { "cell_type": "markdown", "source": [ "![1280px-NumPy_logo_2020.svg.png]()" ], "metadata": { "id": "eO5PUDbQxFF4" } }, { "cell_type": "markdown", "source": [ "**Numpy** is the fundamental package for scientific computing with Python\n", "\n", "Website: https://numpy.org/\n", "\n", "GitHub: https://github.com/numpy/numpy" ], "metadata": { "id": "gOa6QUTgx6pG" } }, { "cell_type": "markdown", "source": [ "## Importing and Examining a New Package" ], "metadata": { "id": "YpN9ITCsyzTQ" } }, { "cell_type": "markdown", "source": [ "This will be our first experience with importing a package which is not part of the Python standard library." ], "metadata": { "id": "XMDjUdcyy17C" } }, { "cell_type": "code", "source": [ "import numpy as np" ], "metadata": { "id": "XTH2r-RIy4wN" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "What did we just do? We *imported* a package. This brings new variables (mostly functions) into our interpreter. We access them as follows." ], "metadata": { "id": "BoLrluOLy8o-" } }, { "cell_type": "code", "source": [ "# find out what is in our namespace\n", "dir()" ], "metadata": { "id": "eMjwKsQ4zB1n" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "# find out what's in numpy\n", "dir(np)" ], "metadata": { "id": "6Pzxa_LqyEWq" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "# find out what version we have\n", "np.__version__" ], "metadata": { "id": "Ud5-mbmxzHIz" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "There is no way we could explicitly learn/teach each of these functions.\n", "Therefore, the numpy documentation is crucial!\n", "\n", "https://numpy.org/doc/stable/reference/" ], "metadata": { "id": "cQO5bE3fzR8A" } }, { "cell_type": "markdown", "source": [ "\n", "## NDArrays" ], "metadata": { "id": "DqstYumlzYk_" } }, { "cell_type": "markdown", "source": [ "The core class is the numpy ndarray (n-dimensional array).\n", "\n", "The main difference between a numpy array an a more general data container such as `list` are the following:\n", "\n", "* Numpy arrays can have N dimensions (while `lists`, `tuples`, etc. only have 1)\n", "* Numpy arrays hold values of the same datatype (e.g. `int`, `float`), while `lists` can contain anything.\n", "* Numpy optimizes numerical operations on arrays. Numpy is *fast*!" ], "metadata": { "id": "-Gp2tbMpzfwJ" } }, { "cell_type": "code", "source": [ "from IPython.display import Image\n", "Image(url='http://docs.scipy.org/doc/numpy/_images/threefundamental.png')" ], "metadata": { "id": "IsR449NHznch" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "# create an array from a list\n", "a = np.array([9,0,2,1,0])" ], "metadata": { "id": "7QZ3r1QGz0_q" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "# find out the datatype\n", "a.dtype" ], "metadata": { "id": "TnuWVAETz81r" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "# find out the shape\n", "a.shape" ], "metadata": { "id": "XQFjwyppz-zB" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "# what is the shape\n", "type(a.shape)" ], "metadata": { "id": "4tG7sV8F0BCL" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "# another array with a different datatype and shape\n", "b = np.array([[5,3,1,9],[9,2,3,0]], dtype=np.float64)\n", "\n", "# check dtype and shape\n", "b.dtype, b.shape" ], "metadata": { "id": "MvR8Fp4v0CXw" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "**Note**\n", "\n", "The fastest varying dimension is the last dimension! The outer level of the hierarchy is the first dimension. (This is called “c-style” indexing)" ], "metadata": { "id": "XhzNeMr10IaG" } }, { "cell_type": "markdown", "source": [ "## Array Creation" ], "metadata": { "id": "R5IGPWkN0Lw2" } }, { "cell_type": "markdown", "source": [ "There are lots of ways to create arrays." ], "metadata": { "id": "xIFpjR_k0TMM" } }, { "cell_type": "code", "source": [ "# create some uniform arrays\n", "c = np.zeros((9,9))\n", "d = np.ones((3,6,3), dtype=np.complex128)\n", "e = np.full((3,3), np.pi)\n", "e = np.ones_like(c)\n", "f = np.zeros_like(d)" ], "metadata": { "id": "AGhmRbTV0U5N" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "`arange` works very similar to `range`, but it populates the array “eagerly” (i.e. immediately), rather than generating the values upon iteration." ], "metadata": { "id": "teVvvR3K0Wv1" } }, { "cell_type": "code", "source": [ "np.arange(10)" ], "metadata": { "id": "siVUkgkY0eKX" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "`arange` is left inclusive, right exclusive, just like `range`,\n", "but also works with floating-point numbers." ], "metadata": { "id": "SH2ScrQW0isJ" } }, { "cell_type": "code", "source": [ "np.arange(2,4,0.25)" ], "metadata": { "id": "o5KX5EM30obn" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "A frequent need is to generate an array of N numbers, evenly spaced between two values. That is what `linspace` is for." ], "metadata": { "id": "NVwSS6kw0pwW" } }, { "cell_type": "code", "source": [ "np.linspace(2,4,20)" ], "metadata": { "id": "9czR_sB-0zrd" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "# log spaced\n", "np.logspace(1,2,10)" ], "metadata": { "id": "HoKVMDyy01d_" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "Numpy also has some utilities for helping us generate multi-dimensional arrays. `meshgrid` creates 2D arrays out of a combination of 1D arrays." ], "metadata": { "id": "mnwG4YU9031H" } }, { "cell_type": "code", "source": [ "x = np.linspace(-2*np.pi, 2*np.pi, 100)\n", "y = np.linspace(-np.pi, np.pi, 50)\n", "xx, yy = np.meshgrid(x, y)\n", "xx.shape, yy.shape" ], "metadata": { "id": "Ca7KhJAu1SC4" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "## Indexing" ], "metadata": { "id": "jxpFkeyX1TWd" } }, { "cell_type": "markdown", "source": [ "Basic indexing is similar to lists" ], "metadata": { "id": "5BD5fyHq1Wmo" } }, { "cell_type": "code", "source": [ "# get some individual elements of xx\n", "xx[0,0], xx[-1,-1], xx[3,-5]" ], "metadata": { "id": "xUpHG8-J1e8y" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "# get some whole rows and columns\n", "xx[0].shape, xx[:,-1].shape" ], "metadata": { "id": "DKIcHeGl1gMq" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "# get some ranges\n", "xx[3:10,30:40].shape" ], "metadata": { "id": "Z6hTG-gQ1hvY" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "There are many advanced ways to index arrays. You can [read about them](https://numpy.org/doc/stable/reference/arrays.indexing.html) in the manual. Here is one example." ], "metadata": { "id": "DGBuA2kQ1jS5" } }, { "cell_type": "code", "source": [ "# use a boolean array as an index\n", "idx = xx<0\n", "yy[idx].shape" ], "metadata": { "id": "DXNzxd7m2AWc" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "# the array got flattened\n", "xx.ravel().shape" ], "metadata": { "id": "8fxal8712Byv" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "## Visualizing Arrays with Matplotlib" ], "metadata": { "id": "BUFFnj3e2DQA" } }, { "cell_type": "markdown", "source": [ "It can be hard to work with big arrays without actually seeing anything with our eyes! We will now bring in Matplotib to start visualizating these arrays. For now we will just skim the surface of Matplotlib. Much more depth will be provided in the next notebook." ], "metadata": { "id": "hfavOorK2GyH" } }, { "cell_type": "code", "source": [ "from matplotlib import pyplot as plt" ], "metadata": { "id": "JJttyGEw2JMG" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "For plotting a 1D array as a line, we use the `plot` command." ], "metadata": { "id": "5X2My4vq43Lz" } }, { "cell_type": "code", "source": [ "plt.plot(x)" ], "metadata": { "id": "Wp_aN7Yl465b" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "There are many ways to visualize 2D data. He we use `pcolormesh`." ], "metadata": { "id": "Mwu42gZq48zH" } }, { "cell_type": "code", "source": [ "plt.pcolormesh(xx)" ], "metadata": { "id": "mUMNhE3U5ALD" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "plt.pcolormesh(yy)" ], "metadata": { "id": "SKObF-8I5Ck8" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "## Array Operations" ], "metadata": { "id": "E7QGjxYR5EQ3" } }, { "cell_type": "markdown", "source": [ "There are a huge number of operations available on arrays. All the familiar arithemtic operators are applied on an element-by-element basis." ], "metadata": { "id": "NLI1NnH25LmT" } }, { "cell_type": "markdown", "source": [ "### Basic Math" ], "metadata": { "id": "6H3j9aNy5Not" } }, { "cell_type": "code", "source": [ "f = np.sin(xx) * np.cos(0.5*yy)" ], "metadata": { "id": "H_UzxfKi5QMR" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "plt.pcolormesh(f)" ], "metadata": { "id": "ErTACPHh5RNV" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "### Manipulating Array Dimensions" ], "metadata": { "id": "7XlSuedn5SYv" } }, { "cell_type": "markdown", "source": [ "Swapping the dimension order is accomplished by calling `transpose`." ], "metadata": { "id": "bvdIloeH5VvG" } }, { "cell_type": "code", "source": [ "f_transposed = f.transpose()\n", "plt.pcolormesh(f_transposed)" ], "metadata": { "id": "U4Hr-7o25bmo" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "We can also manually change the shape of an array… as long as the new shape has the same number of elements." ], "metadata": { "id": "namRwhmd5eha" } }, { "cell_type": "code", "source": [ "g = np.reshape(f, (8,9))" ], "metadata": { "id": "EsXbyvFi5hI0" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "However, be careful with reshaping data! You can accidentally lose the structure of the data." ], "metadata": { "id": "QGfZvloR5kQT" } }, { "cell_type": "code", "source": [ "g = np.reshape(f, (200,25))\n", "plt.pcolormesh(g)" ], "metadata": { "id": "9cvae1W85pA8" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "We can also “tile” an array to repeat it many times." ], "metadata": { "id": "Xuzh7ved5qWs" } }, { "cell_type": "code", "source": [ "f_tiled = np.tile(f,(3, 2))\n", "plt.pcolormesh(f_tiled)" ], "metadata": { "id": "d-pl13Sc7C9l" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "Another common need is to add an extra dimension to an array. This can be accomplished via indexing with `None`." ], "metadata": { "id": "iOJg0CvD7EJB" } }, { "cell_type": "code", "source": [ "x.shape" ], "metadata": { "id": "Ceb8XgSK7IQA" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "x[None, :].shape" ], "metadata": { "id": "m8JCLxTm7Jkk" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "x[None, :, None, None].shape" ], "metadata": { "id": "1uUW24wT7Kjn" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "## Broadcasting" ], "metadata": { "id": "adiNcUPp7Lkr" } }, { "cell_type": "markdown", "source": [ "Not all the arrays we want to work with will have the same size. One approach would be to manually “expand” our arrays to all be the same size, e.g. using `tile`. *Broadcasting* is a more efficient way to multiply arrays of different sizes Numpy has specific rules for how broadcasting works. These can be confusing but are worth learning if you plan to work with Numpy data a lot." ], "metadata": { "id": "wlt3_Eyt7Zyr" } }, { "cell_type": "markdown", "source": [ "The core concept of broadcasting is telling Numpy which dimensions are supposed to line up with each other." ], "metadata": { "id": "HhIvFyYC7gsL" } }, { "cell_type": "code", "source": [ "Image(url='http://scipy-lectures.github.io/_images/numpy_broadcasting.png',\n", " width=720)" ], "metadata": { "id": "1CeXfc517pnP" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "Dimensions are automatically aligned *starting with the last dimension*. If the last two dimensions have the same length, then the two arrays can be broadcast." ], "metadata": { "id": "AyqfBTLH7r6n" } }, { "cell_type": "code", "source": [ "print(f.shape, x.shape)\n", "g = f * x\n", "print(g.shape)" ], "metadata": { "id": "XnE3mFfD7yDq" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "plt.pcolormesh(g)" ], "metadata": { "id": "H8XeOTma-wHt" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "However, if the last two dimensions are *not* the same, Numpy cannot just automatically figure it out." ], "metadata": { "id": "Tu2A4Q_5-xZe" } }, { "cell_type": "code", "source": [ "# multiply f by y\n", "print(f.shape, y.shape)\n", "h = f * y" ], "metadata": { "id": "GLDWYS1X-35T" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "We can help numpy by adding an extra dimension to `y` at the end. Then the length-50 dimensions will line up." ], "metadata": { "id": "GXOfcm4U-5dt" } }, { "cell_type": "code", "source": [ "print(f.shape, y[:, None].shape)\n", "h = f * y[:, None]\n", "print(h.shape)" ], "metadata": { "id": "utD_kxl2-73_" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "plt.pcolormesh(h)" ], "metadata": { "id": "tpTw9B_s-_T_" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "\n", "## Reduction Operations" ], "metadata": { "id": "TI2IdcyZ_A-B" } }, { "cell_type": "markdown", "source": [ "In scientific data analysis, we usually start with a lot of data and want to reduce it down in order to make plots of summary tables. Operations that reduce the size of numpy arrays are called “reductions”. There are many different reduction operations. Here we will look at some of the most common ones." ], "metadata": { "id": "iSs5V3Rr_F4S" } }, { "cell_type": "code", "source": [ "# sum\n", "g.sum()" ], "metadata": { "id": "q7DDAwoi_Jac" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "# mean\n", "g.mean()" ], "metadata": { "id": "rjp3WnND_LZX" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "# standard deviation\n", "g.std()" ], "metadata": { "id": "_zZ6M73c_MmS" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "A key property of numpy reductions is the ability to operate on just one axis." ], "metadata": { "id": "ooAtQGpS_Nvk" } }, { "cell_type": "code", "source": [ "# apply on just one axis\n", "g_ymean = g.mean(axis=0)\n", "g_xmean = g.mean(axis=1)" ], "metadata": { "id": "YiBjIdFB_Qcc" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "plt.plot(x, g_ymean)" ], "metadata": { "id": "5yLYM58L_Rou" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "plt.plot(g_xmean, y)" ], "metadata": { "id": "coQxmUi7_SlI" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "\n", "## Data Files" ], "metadata": { "id": "bb_14NVY_UgV" } }, { "cell_type": "markdown", "source": [ "It can be useful to save numpy data into files." ], "metadata": { "id": "bgcm4Wji_WkQ" } }, { "cell_type": "code", "source": [ "np.save('g.npy', g)" ], "metadata": { "id": "Pbm9DyDs_YIS" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "**Warning**\n", "\n", "Numpy `.npy` files are a convenient way to store temporary data, but they are not considered a robust archival format. Later we will learn about NetCDF, the recommended way to store earth and environmental data." ], "metadata": { "id": "g9G52zZg_aIY" } }, { "cell_type": "code", "source": [ "g_loaded = np.load('g.npy')\n", "\n", "np.testing.assert_equal(g, g_loaded)" ], "metadata": { "id": "6LR5ugEB_gND" }, "execution_count": null, "outputs": [] } ] }