{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "Week1_0_Variables_Operators.ipynb",
"provenance": [],
"toc_visible": true,
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"
"
]
},
{
"cell_type": "markdown",
"source": [
"# Variables, Control Flow, and File I/O"
],
"metadata": {
"id": "zfTVQqtJ-PcC"
}
},
{
"cell_type": "markdown",
"source": [
""
],
"metadata": {
"id": "tnTpT3VDm_Ay"
}
},
{
"cell_type": "markdown",
"source": [
"Image by haim charbit from Pixabay"
],
"metadata": {
"id": "NqV2k3JoO372"
}
},
{
"cell_type": "markdown",
"source": [
"In this section we introduce the basic building blocks of the Python language.\n",
"\n",
"Python has the following 6 built-in Data-Types:\n",
"\n",
"\n",
"
\n",
"\n",
"\n",
"\n",
"\n",
"\n",
" \n",
" Type | \n",
" Description | \n",
" Examples | \n",
"
\n",
" \n",
" int | \n",
" Integer | \n",
" 123 | \n",
"
\n",
" \n",
" float | \n",
" Floating point | \n",
" 10.12 | \n",
"
\n",
" \n",
" complex | \n",
" Complex values | \n",
" 1.0+3j | \n",
"
\n",
" \n",
" bool | \n",
" Boolean values | \n",
" True | \n",
"
\n",
" \n",
" string | \n",
" String values | \n",
" 'Bonjour' | \n",
"
\n",
" \n",
" NoneType | \n",
" None value | \n",
" None | \n",
"
\n",
"
\n",
"\n",
"\n",
"\n",
"\n",
"Python has four data structures:\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
" \n",
" Type | \n",
" Description | \n",
" Examples | \n",
"
\n",
" \n",
" list | \n",
" Ordered collection of values | \n",
" [1, 'abc', 3, 1] | \n",
"
\n",
" \n",
" set | \n",
" Unordered collection of unique values | \n",
" {1, 'abc', 3} | \n",
"
\n",
" \n",
" tuple | \n",
" Immutable Ordered collection | \n",
" (1, 'abc', 3) | \n",
"
\n",
" \n",
" dictionary | \n",
" Unordered collection of key-value pairs | \n",
" {'key1':aaa,'key2':111} | \n",
"
\n",
"
\n",
"\n",
"\n",
"\n",
"\n",
"Reference:\n",
"* CUSP UCSL bootcamp 2017 (https://github.com/Mohitsharma44/ucsl17)"
],
"metadata": {
"id": "ROpnsZ9LTrgi"
}
},
{
"cell_type": "markdown",
"source": [
"## Basic Variables: Numbers and Strings\n",
"The main difference between Python and languages like C++ and Fortran is that Python variables do not need explicit declaration to reserve memory space. The declaration happens automatically when a value is assigned to a variable. This means that a variable that was used to store a string can also be used to store an integer/array/list etc.\n",
"\n",
"**Rules for naming a variable**\n",
"\n",
"The start of the variable name can be an underscore (_), a capital letter, or a lowercase letter. However, it is generally recommended to use all uppercase for global variables and all lower case for local variables. The letters following the first letter can be a digit or a string. Python is a case-sensitive language. Therefore, **var** is not equal to **VAR** or **vAr**.\n",
"\n",
"Apart from the above restrictions, Python keywords cannot be used as identifier names. These are:\n",
"\n",
"||||||\n",
"|:-------|:--------|:---------|:--------|:----|\n",
"|and |del |from |not |while|\n",
"|as | elif |global |or |with |\n",
"|assert | else | if |pass |yield|\n",
"|break | except | import |print\n",
"|class | exec | in |raise\n",
"|continue| finally | is |return\n",
"|def | for | lambda |try\n",
"\n",
"\\\n",
"\n",
"Additionally, the following are built in functions which are always available in your namespace once you open a Python interpreter\n",
"\n",
"```\n",
"abs() dict() help() min() setattr() all() dir() hex() next() slice() any()\n",
"divmod() id() object() sorted() ascii() enumerate() input() oct() staticmethod()\n",
"bin() eval() int() open() str() bool() exec() isinstance() ord() sum() bytearray()\n",
"filter() issubclass() pow() super() bytes() float() iter() print() tuple()\n",
"callable() format() len() property() type() chr() frozenset() list() range()\n",
"vars() classmethod() getattr() locals() repr() zip() compile() globals() map()\n",
"reversed() __import__() complex() hasattr() max() round() delattr() hash()\n",
"memoryview() set()\n",
"```\n"
],
"metadata": {
"id": "q0u5PHc5fK2k"
}
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "nqTV-mOqQSsa"
},
"outputs": [],
"source": [
"# Basic Variables: Numbers and Strings\n",
"# comments are anything that comes after the \"#\" symbol\n",
"a = 1 # assign 1 to variable a\n",
"b = \"hello\" # assign \"hello\" to variable b"
]
},
{
"cell_type": "markdown",
"source": [
"All variables are objects. Every object has a type (class). To find out what type your variables are."
],
"metadata": {
"id": "GYwIJqyciBli"
}
},
{
"cell_type": "code",
"source": [
"print(type(a), type(b))"
],
"metadata": {
"id": "5xe-2YUzgric"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# we can check for the type of an object\n",
"print(type(a) is int)\n",
"print(type(a) is str)"
],
"metadata": {
"id": "xpzy-RIUg0wU"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"We can also define multiple variables simultaneously"
],
"metadata": {
"id": "6ZGZUybghlmH"
}
},
{
"cell_type": "code",
"source": [
"var1,var2,var3,var4 = 'Hello', 'World', 1, 2\n",
"print(var1,var2,var3,var4)"
],
"metadata": {
"id": "1MbbJnP-hk6V"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"#### **String**\n",
"We now focus on strings a bit. We will discuss\n",
"\n",
"\n",
"1. String concatenation\n",
"2. String indexing\n",
"3. String slicing\n",
"4. String formatting\n",
"5. Built-in String Methods\n",
"\n"
],
"metadata": {
"id": "WiPfIZv1n0fe"
}
},
{
"cell_type": "code",
"source": [
"# String concatenation\n",
"text1,text2,text3,text4 = 'Introduction','to','Python','course'\n",
"print(text1+text2+text3+text4)\n"
],
"metadata": {
"id": "9Bg6x8DUojRy"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"#@title #####Can you figure out a way to add spaces between the words?\n",
"print(text1+' '+text2+' '+text3+' '+text4)"
],
"metadata": {
"cellView": "form",
"id": "IXUWfkMLpXcJ"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Characters in a string can be accessed using the standard square bracket [ ] syntax. Python uses zero-based indexing, which means that first character in a string will be indexed at the 0$^{\\text{th}}$ location."
],
"metadata": {
"id": "IV0h0eEip8ST"
}
},
{
"cell_type": "code",
"source": [
"# String indexing\n",
"print(text1[0],text1[5],text1[-1],text1[-7])"
],
"metadata": {
"id": "rzoIt7Uep0Qe"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# String slicing\n",
"print(text1[:5],text1[-5:],text1[:5]+text3[0:2])"
],
"metadata": {
"id": "MhSrBBqUqKqv"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# String formatting\n",
"#f strings allow you to format data easily, but require Python >= 3.6\n",
"print(f'The a variable has type {type(a)} and value {a}')\n",
"print(f'The b variable has type {type(b)} and value {b}')"
],
"metadata": {
"id": "Tcii4NhRg3uL"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Each object includes _attributes_ and _methods_, respectively referring to variables or functions associated with that object. Object attributes and methods can be accessed via the syntax `variable.atribute` and `variable.method()`\n",
"\n",
"IPython will autocomplete if you press `` to show you the methods available. If you're using Google Colab, you can do the same with ` + `"
],
"metadata": {
"id": "d7UE8Y6Kmspz"
}
},
{
"cell_type": "code",
"source": [
"# this returns the method itself\n",
"b.capitalize"
],
"metadata": {
"id": "HnIkGz59mzcM"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# this calls the method\n",
"b.capitalize()\n",
"# there are lots of other methods"
],
"metadata": {
"id": "luJnQ2vCm1pD"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"#### **Math Operators**\n",
"We now focus on using Python to perform mathematical operations."
],
"metadata": {
"id": "khmqC62KrQ_Z"
}
},
{
"cell_type": "code",
"source": [
"# Addition/Subtraction (Remember var3=1,var4=2)\n",
"print(var3+var4,var3-var4)"
],
"metadata": {
"id": "BnTneh4HrnfX"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# Multiplication\n",
"print(var3*var4)"
],
"metadata": {
"id": "pxw3lbzkrya1"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# Division\n",
"print(var3/var4,type(var3/var4))"
],
"metadata": {
"id": "ZuTvN_pRr-oO"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# exponentiation\n",
"print(var4**(var3+2))"
],
"metadata": {
"id": "NzkYvpBxsHAR"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# Modulus\n",
"7 % 2"
],
"metadata": {
"id": "ov-S_KGnsXkS"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# rounding\n",
"round(9/10)"
],
"metadata": {
"id": "UCxxa0ehsld7"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"#### **Relational Operators**\n"
],
"metadata": {
"id": "vBlQ5zcoshQg"
}
},
{
"cell_type": "code",
"source": [
"# Equal to (==)\n",
"a, b = 10, 10\n",
"a==b"
],
"metadata": {
"id": "pXShSsYSscTt"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# Not Equal to (!=)\n",
"print(a!=b, 6!=2)"
],
"metadata": {
"id": "Ce1lFwJ7suiZ"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# Greater than (>) & Less than (<)\n",
"print(6>2, 2<6)"
],
"metadata": {
"id": "hP_bOPgss4uB"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"#### **Assignment Operators**\n"
],
"metadata": {
"id": "siTgqRs2tHa9"
}
},
{
"cell_type": "code",
"source": [
"# Add AND (+=) [equivalent to var=var+10]\n",
"a = 10\n",
"a+=10\n",
"print(a)"
],
"metadata": {
"id": "RJ8DGURatKOz"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# Multiplication AND\n",
"a = 10\n",
"a*=5\n",
"print(10*5,a)"
],
"metadata": {
"id": "pgzyagirtXQ5"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"#### **Logical Operators**\n"
],
"metadata": {
"id": "MqXf1IJItyOr"
}
},
{
"cell_type": "code",
"source": [
"print(True and True, True and False, True or False, (not True) or (not False))"
],
"metadata": {
"id": "EFosohSVtvY4"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"a, b = 'Hello','Bye'\n",
"print(a is b, a is not b)"
],
"metadata": {
"id": "hlYhal90uPN6"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"## **Control Flow**\n",
"The first thing you need to know is that Python programs (or _Python Scripts_) are usually executed sequentially and a code statement will not be executed again once operated.\n",
"\n",
"However, in real life situations you will often need to execute a snippet of code multiple times, or execute a portion of a code based on different conditions. We use control flow statements for these slightly more complex tasks.\n",
"\n",
"In this section, we will be covering:\n",
"\n",
"1. Conditional statements -- if, else, and elif\n",
"2. Loop statements -- for, while\n",
"3. Loop control statements -- break, continue, pass\n",
"\n",
"Reference:\n",
"* IBM Congnitive Class - Intro to Python (https://github.com/computationalcore/introduction-to-python)\n",
"* CUSP UCSL bootcamp 2017 (https://github.com/Mohitsharma44/ucsl17)"
],
"metadata": {
"id": "6cM-n3t9Iv05"
}
},
{
"cell_type": "markdown",
"source": [
"### Conditional Statements\n",
"Here, we combine relational operators and logical operators so that a program can have different information flow according to some conditions. In other words, some code snippets are executed only if some conditions are satisfied.\n",
"\n",
"The logic of the conditional statements is simple. `if` -> `condition met` -> do something. `if` -> `condition not met` -> do something else."
],
"metadata": {
"id": "p4YvSIp0Ix0e"
}
},
{
"cell_type": "code",
"source": [
"x = 100\n",
"if x > 0:\n",
" print('Positive Number')\n",
"elif x < 0:\n",
" print('Negative Number')\n",
"else:\n",
" print ('Zero!')"
],
"metadata": {
"id": "vznXnjlnIzUS"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# indentation is MANDATORY\n",
"# blocks are closed by indentation level\n",
"if x > 0:\n",
" print('Positive Number')\n",
" if x >= 100:\n",
" print('Huge number!')"
],
"metadata": {
"id": "1NEgxM1_I0h0"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"### Loop Statements\n",
"We use loop statements if we want to execute some code statements multiple times. An example where it would be appropriate to use loop statements:\n",
"\n",
"1. We have multiple data files\n",
"2. We use a loop statements to read the files into memory iteratively.\n",
"3. Within the loop statements, we perform the same proprocessing algorithm on the imported data\n",
"\n",
"In Python language, there are two main types of loop statements: `while` loops and `for` loops."
],
"metadata": {
"id": "J8kioLR7I2A1"
}
},
{
"cell_type": "code",
"source": [
"# use range [range(5)==[0,1,2,3,4]]\n",
"for i in range(5):\n",
" print(i)"
],
"metadata": {
"id": "1x4eW3wTI3Xb"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"```{tip}\n",
"Here we use the `range()` function to create a sequence of numbers to drive the for loop.\n",
"\n",
"**range(N)** will create a list of N numbers that starts with `0`.\\\n",
"**range(A,B)** will create a list of end-start numbers that starts with `A` and ends with `B-1`\\\n",
"**range(A,B,step)** starts and ends with the same numbers as `range(A,B)`. The only difference is that the difference between numbers changes from `1` to `step`\n",
"\n",
"We can also use non-numerical iterators to drive for loops!\n",
"```\n"
],
"metadata": {
"id": "jhOmz65iI4sk"
}
},
{
"cell_type": "code",
"source": [
"# iterate over a list we make up, and access both the indices and elements with enumerate()\n",
"for index,pet in enumerate(['dog', 'cat', 'fish']):\n",
" print(index, pet, len(pet))"
],
"metadata": {
"id": "cyx1q1noI7p1"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"As we can see, the `for` loop is suitable if you want to repeat the operations in the loop for a fixed number of times `N`. But what if you have no idea of how many times you would like to repeat a code snippet? This is not a trivial problem and often occurs in numerical optimization problems.\n",
"\n",
"For these problems, we will forego the `for` loop and use the `while` loop instead. The termination of a `while` loop depends on whether a condition remains satisfied or not. Theoretically, the loop can run forever if the condition you set is always true."
],
"metadata": {
"id": "Adg6NGknJBfe"
}
},
{
"cell_type": "code",
"source": [
"# make a loop\n",
"count = 0\n",
"while count < 10:\n",
" # bad way\n",
" # count = count + 1\n",
" # better way\n",
" count += 1\n",
"print(count)"
],
"metadata": {
"id": "xoPngHDzJDGw"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"#### Loop control statements:\n",
"Sometimes we want to make loop execution to diverge from its normal behaviour. Perhaps we want to leave the loop when some conditions are satistied to save processing time. Alternatively, we might want the loop to skip some code if the data satisfies some conditions.\n",
"\n",
"Two control statements are quite useful here: `break` and `continue`. We'll use a `for` loop as an example:"
],
"metadata": {
"id": "6givFbV_JEl8"
}
},
{
"cell_type": "code",
"source": [
"for i in range(1, 10):\n",
" if i == 5:\n",
" print('Condition satisfied')\n",
" break\n",
" print(i) # What would happen if this was placed before the `if` condition?"
],
"metadata": {
"id": "HxkdZpJ5JKY6"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"for i in range(1, 10):\n",
" if i == 5:\n",
" print('Condition satisfied')\n",
" continue\n",
" print(\"whatever.. I won't get printed anyways.\")\n",
" print(i)"
],
"metadata": {
"id": "GNHyfHzXJNlU"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"for i in range(1, 10):\n",
" if i == 5:\n",
" print('Condition satisfied')\n",
" pass\n",
" print(i)"
],
"metadata": {
"id": "2R2fz0auJP2u"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"## **File I/O**\n",
"In this section, we will introduce the basic functions we can use to store and retrieve data from files in different formats.\n",
"\n",
"For environmental science projects, research data are most commonly stored in the following formats:\n",
"1. Text files (`TXT`)\n",
"2. Tabular files (e.g., `CSV`, `XLS`)\n",
"3. Structured Data / Python dictionaries etc. (e.g., `Pickle`, `dill`, `JSON`)\n",
"4. Gridded data (e.g., `HDF5`, `NetCDF`)\n",
"\n",
"We will now see how we can use Python and different Python packages to retrieve the data stored in these formats, and how to save your data to different formats for future use.\n",
"\n",
"Reference:\n",
"* CUSP UCSL bootcamp 2017 (https://github.com/Mohitsharma44/ucsl17)\n",
"* Python 3 tutorial (https://docs.python.org/3/tutorial/inputoutput.html)\n",
"* GSFC Python Bootcamp (https://github.com/astg606/py_materials/blob/master/useful_modules/)\n",
"* Working on JSON Data in Python (https://realpython.com/python-json/)\n",
"* PyHOGS (http://pyhogs.github.io/intro_netcdf4.html)"
],
"metadata": {
"id": "tfmNGlp24j7m"
}
},
{
"cell_type": "markdown",
"source": [
"Let's import some packages first..."
],
"metadata": {
"id": "Q3QqnEE74nyV"
}
},
{
"cell_type": "code",
"source": [
"import csv\n",
"import netCDF4\n",
"import pickle\n",
"import pandas as pd\n",
"import xarray as xr\n",
"import numpy as np"
],
"metadata": {
"id": "-tK76B6k4lpe"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"### TXT files\n",
"Now we will learn how to write information to a .TXT file and read it back with built-in Python functions. The data used in this part of the tutorial will be very simple. In the next exercises, we will also introduce commands in community packages that allow us to read and store more complex data.\n",
"\n",
"#### Opening Files:\n",
"Files can be opened using python's built-in `open()` function. The function will create a file object for subsequent operations. Use the following syntax to read a TXT file: \\\\\n",
"`fhandler = open(file_name, access mode, encoding)`\n",
"\n",
"- `file_name`: The file name that you would like to perform your I/O operations on. \\\n",
"Note that this is the full file path (e.g., $\\text{\\\\home\\\\Documents\\\\myfile.txt}$ )\n",
"- `encoding`: Encoding scheme to use to convert the stream of bytes to text. (Standard=`utf-8`)\n",
"- `access_mode`: The way in which a file is opened, available choices for this option include:\n",
"\n",
"|access_mode | Its Function|\n",
"|:------|------------:|\n",
"|r\t|Opens a file as read only|\n",
"|rb\t|Opens a file as read only in binary format|\n",
"|r+\t|Opens a file for reading and writing|\n",
"|rb+\t|Opens a file for reading and writing in binary format|\n",
"|w\t|Opens a file for writing only|\n",
"|wb\t|Opens a file for writing only in binary format|\n",
"|w+\t|Opens a file for both reading and writing|\n",
"|wb+\t|Opens a file for writing and reading in binary format|\n",
"|a\t|Opens a file for appending|\n",
"|ab\t|Opens a file for appending in binary|\n",
"|a+\t|Opens a file for appending and reading|\n",
"|ab+\t|Opens a file for appending and reading in binary format|\n",
"\n",
"In the example below, we will try to store several sentences into a new TXT file, and use the `open()` function to see if the code works as intended."
],
"metadata": {
"id": "20miP0pU4shF"
}
},
{
"cell_type": "code",
"source": [
"fhandler = open('test.txt', 'w', encoding=\"utf-8\")\n",
"fhandler.write('Hello World!\\n')\n",
"fhandler.write('I am a UNIL Master Student.\\n')\n",
"fhandler.write('I am learning how to code!\\n')\n",
"fhandler.close()"
],
"metadata": {
"id": "kIk3ABV24t5F"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"```{note}\n",
"In the code above, we use the `open()` command to create a *write-only* (`access_mode='w'`) file `test.txt`. The open command creates a file object (`fhandler`) on which we can perform extra operations.\n",
"\n",
"We then try to add three sentences to the TXT file using the `.write()` operation on the file object.\n",
"\n",
"Remember to close the file with `.close()` command so that the changes can be finalized!\n",
"\n",
"If the code is writing, we should see a `test.txt` file created in the same path as this notebook. Let's see if that's the case!\n",
"```"
],
"metadata": {
"id": "TuVW2jg84yOL"
}
},
{
"cell_type": "markdown",
"source": [
"```{tip} Exclamation marks directly pass commands to the shell, which you can think of as the interface between a computer's user and its inner workings\n",
"```"
],
"metadata": {
"id": "htm5mOfHcSKs"
}
},
{
"cell_type": "code",
"source": [
"! ls ."
],
"metadata": {
"id": "gKB6i_TL40FF"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"! cat test.txt"
],
"metadata": {
"id": "dhErHt8C42ez"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Hurray! It is working! đ\n",
"\n",
"But didn't we just say we want to read it back? đ¤¨\n",
"\n",
"Let try to read the file then! Can you think of ways to do this?\n",
"\n",
"Here are some of the functions that you may end up using.\n",
"\n",
"1. `.close()`: Close the file that we have currently open.\n",
"2. `.readline([size])`: Read strings from a file till it reaches a new line character `\\n` if the `size` parameter is empty. Otherwise it will read string of the given size.\n",
"3. `.readlines([size])`: Repeatly call `.readline()` till the end of the file.\n",
"4. `.write(str)`: Writes the string str to file.\n",
"5. `.writelines([list])`: Write a sequence of strings to file. No new line is added automatically."
],
"metadata": {
"id": "mr6OKuub45FD"
}
},
{
"cell_type": "code",
"source": [
"fhandler = open('test.txt','r',encoding='utf-8')\n",
"fhandler.readlines()"
],
"metadata": {
"id": "x7KhCnVW45gl"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"What if we want to add some text to the file?"
],
"metadata": {
"id": "FzaB9urJ49zF"
}
},
{
"cell_type": "code",
"source": [
"with open('test.txt', 'r+') as fhandler:\n",
" print(fhandler.readlines())\n",
" fhandler.writelines(['Now,\\n', 'I am trying to', ' add some stuff.'])\n",
" # Go to the starting of file\n",
" fhandler.seek(0)\n",
" # Print the content of file\n",
" print(fhandler.readlines())"
],
"metadata": {
"id": "FKf95MtT4-PO"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Here we use an alternative way to open and write the data file.\n",
"By using the `with` statement to open the TXT file, we ensure that the data is automatically closed after the final operation. We now do not need to write the `fhandler.close()` statement any more."
],
"metadata": {
"id": "KDLdbMtY5C2v"
}
},
{
"cell_type": "markdown",
"source": [
"### Tabular files\n",
"What would you do if you have data that are nicely organized in the format below?\n",
"```\n",
"Data1, Data2, Data3\n",
"Example01, Example02, Example03\n",
"Example11, Example12, Example13\n",
"```\n",
"When you open a file that looks like this in Excel, this is how it would look like:\n",
"\n",
"||||\n",
"|:--|:--|:--|\n",
"|Data1\t|Data2\t|Data3|\n",
"|Example1\t|Example2\t|Example3|\n",
"\n",
"This is a _comma-separated_ tabular file. Files like these are commonly stored with the `.csv` extension. `.csv` files can then be opened and viewed using a spreadsheet program, such as Google Sheets, Numbers, or Microsoft Excel.\n",
"\n",
"But what if we want to use the data in Python?\n",
"\n",
"#### Opening Files:\n",
"Luckily, there are community packages that could help you import and retrieve your tabular data with minimal effort. Here, we will introduce two such packages: CSV and Pandas.\n",
"\n",
"##### Reading CSV files with the `CSV` package\n",
"\n",
"`reader()` can be used to create an object that is used to read the data from a csv file. The reader can be used as an iterator to process the rows of the file in order. Lets take a look at an example:"
],
"metadata": {
"id": "upeppHSJ5FjL"
}
},
{
"cell_type": "code",
"source": [
"import pooch\n",
"import urllib.request\n",
"datafile = pooch.retrieve('https://unils-my.sharepoint.com/:x:/g/personal/tom_beucler_unil_ch/ETDZdgCkWbZLiv_LP6HKCOAB2NP7H0tUTLlP_stknqQHGw?download=1',\n",
" known_hash='c7676360997870d00a0da139c80fb1b6d26e1f96050e03f2fed75b921beb4771')"
],
"metadata": {
"id": "ktV7XSF64_jb"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"row = []\n",
"# https://unils-my.sharepoint.com/:x:/g/personal/tom_beucler_unil_ch/ETDZdgCkWbZLiv_LP6HKCOAB2NP7H0tUTLlP_stknqQHGw?e=N541Yq\n",
"with open(datafile, 'r') as fh:\n",
" reader = csv.reader(fh)\n",
" for info in reader:\n",
" row.append(info)"
],
"metadata": {
"id": "Ivyb0ZK65NQz"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"print(row[0])"
],
"metadata": {
"id": "J7dD6aCy5QVT"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"print(row[1])"
],
"metadata": {
"id": "5qmIUGd05Sy5"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"```{tip}\n",
"In the code above, we use the `csv.reader()` method to read iteratively process each row in the CSV file.\n",
"\n",
"We add one new row to a empty list at each iteration.\n",
"\n",
"Using the `print()` function to look at what was written to the list. We found that the first row contains variable name information, whereas the second row contains data at a given time step.\n",
"```\n",
"\n",
"#### Extract data and write to new CSV file:\n",
"The CSV file that we just imported actually contains weather station data from January 2022 to August 2022. What if we want data from the first five rows only? Can we extract the data and save it to a new CSV file?"
],
"metadata": {
"id": "hNmSS-vu5WCA"
}
},
{
"cell_type": "code",
"source": [
"with open('testsmall.csv', 'w') as fh:\n",
" writer = csv.writer(fh)\n",
" for num in range(5):\n",
" writer.writerow(row[num])"
],
"metadata": {
"id": "zJBW0p4o5U7p"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"```{note}\n",
"\n",
"Actually there is a better package for tabular data. The library is named `Pandas`. We will introduce this package in greater details next week. For now, we will just demonstrate that we can use pandas to do the same FileI/O procedure we did earlier with CSV.\n",
"\n",
"Here, we read in the large weather station datasheet `datafile` with pandas function `.read_csv()`.\n",
"```"
],
"metadata": {
"id": "gUeDJq4R5bCy"
}
},
{
"cell_type": "code",
"source": [
"# Import CSV file with pandas\n",
"ALOdatasheet = pd.read_csv(datafile)"
],
"metadata": {
"id": "HHvZJHfh5fMp"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# Export first five rows in the Pandas dataframe to CSV file\n",
"ALOdatasheet[0:5].to_csv('./testsmall_pd.csv')"
],
"metadata": {
"id": "rx8jjYCY5imA"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"### Serialization and Deserialization with Pickle\n",
"(Rewritten from GSFC Python Bootcamp)\n",
"\n",
"The pickle is an internal Python format for writing arbitrary data to a file in a way that allows it to be read in again, intact.\n",
"* `pickle` âserialisesâ the object first before writing it to file.\n",
"* Pickling (serialization) is a way to convert a python object (list, dict, etc.) into a character stream which contains all the information necessary to reconstruct the object in another python script.\n",
"\n",
"The following types can be serialized and deserialized using the `pickle` module:\n",
"* All native datatypes supported by Python (booleans, None, integers, floats, complex numbers, strings, bytes, byte arrays)\n",
"* Dictionaries, sets, lists, and tuples - as long as they contain pickleable objects\n",
"* Functions (pickled by their name references, and not by their value) and classes that are defined at the top level of a module.\n",
"\n",
"The main functions of `pickle` are:\n",
"\n",
"* `dump()`: pickles data by accepting data and a file object.\n",
"* `load()`: takes a file object, reconstruct the objects from the pickled representation, and returns it.\n",
"* `dumps()`: returns the pickled data as a string.\n",
"* `loads()`: reads the pickled data from a string.\n",
"\n",
"`dump()`/`load()` serializes/deserializes objects through files but `dumps()`/`loads()` serializes/deserializes objects through string representation."
],
"metadata": {
"id": "hooDROMY5lBY"
}
},
{
"cell_type": "code",
"source": [
"# Example Python dictionary\n",
"data_org = { 'mydata1':np.linspace(0,800,801), 'mydata2':np.linspace(0,60,61)}"
],
"metadata": {
"id": "No-ddH3A5maZ"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# Save Python dictionary to pickle file\n",
"with open('pickledict_sample.pkl', 'wb') as fid:\n",
" pickle.dump(data_org, fid)\n",
"# Deserialize saved pickle file\n",
"with open('pickledict_sample.pkl', 'rb') as fid:\n",
" data3 = pickle.load(fid)"
],
"metadata": {
"id": "GunL_jdN5rzh"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"for strg in data_org.keys():\n",
" print(f\"Variable {strg} is the same in data_org and data3: {(data_org[strg]==data3[strg]).all()}\")"
],
"metadata": {
"id": "YtJKMTqp5vBu"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"## **Bonus**\n",
"\n",
"We have already discussed a lot of material for one day, but your TA also wrote instructions on reading and writing data in other formats! The following tutorial will thus be left for you to experiment at home."
],
"metadata": {
"id": "YqNKhU4h5-VP"
}
},
{
"cell_type": "markdown",
"source": [
"### Structral Data with JSON\n",
"JSON is a popular format for structured data that can be used in Python and Perl, among other languages.\n",
"JSON format is built on a collection of name/value pairs. The name information can be an object, record, dictionary, hash table, keyed list, or associative array. The value paired with the name can be an array, vector, list, or sequence.\n",
"\n",
"We can use `json` package for I/O. The syntax of the package is very similar to `pickle`:\n",
"\n",
"* `dump()`: encoded string writing on file.\n",
"* `load()`: Decode while JSON file read.\n",
"* `dumps()`: encoding to JSON objects\n",
"* `loads()`: Decode the JSON string.\n",
"\n",
"**Example of JSON Data**\n",
"\n",
"```python\n",
"{\n",
" \"stations\": [\n",
" {\n",
" \"acronym\": âBLDâ,\n",
" \"name\": \"Boulder Colorado\",\n",
" \"latitudeâ: 40.00,\n",
" \"longitudeâ: -105.25\n",
" },\n",
" {\n",
" \"acronymâ: âBHDâ,\n",
" \"name\": \"Baring Head Wellington New Zealand\",\n",
" \"latitude\": -41.28,\n",
" \"longitude\": 174.87\n",
" }\n",
" ]\n",
"}\n",
"```"
],
"metadata": {
"id": "04CTFmtH552V"
}
},
{
"cell_type": "markdown",
"source": [
"Let's try to read this JSON dataframe with `json`!"
],
"metadata": {
"id": "tEPsP4Bx6gCe"
}
},
{
"cell_type": "code",
"source": [
"import json\n",
"json_data = '{\"stations\": [{\"acronym\": \"BLD\", \\\n",
" \"name\": \"Boulder Colorado\", \\\n",
" \"latitude\": 40.00, \\\n",
" \"longitude\": -105.25}, \\\n",
" {\"acronym\": \"BHD\", \\\n",
" \"name\": \"Baring Head Wellington New Zealand\",\\\n",
" \"latitude\": -41.28, \\\n",
" \"longitude\": 174.87}]}'\n",
"\n",
"python_obj = json.loads(json_data)"
],
"metadata": {
"id": "t_qE6zCC55PX"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"for x in python_obj['stations']:\n",
" print(x[\"name\"])"
],
"metadata": {
"id": "XU6ARYDy6i3I"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# Convert python_obj back to JSON\n",
"print(json.dumps(python_obj, sort_keys=True, indent=4))"
],
"metadata": {
"id": "4JiZTam16lNI"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Now we try to convert a python object to JSON and write it to a file.\n",
"Syntax for serialization and deserialization in the `json` package is almost the same as `pickle`"
],
"metadata": {
"id": "xY7MCj1a6paV"
}
},
{
"cell_type": "code",
"source": [
"# Convert python objects to JSON\n",
"x = {\n",
" \"name\": \"John\",\n",
" \"age\": 30,\n",
" \"married\": True,\n",
" \"divorced\": False,\n",
" \"children\": (\"Ann\",\"Billy\"),\n",
" \"pets\": None,\n",
" \"cars\": [\n",
" {\"model\": \"BMW 230\", \"mpg\": 27.5},\n",
" {\"model\": \"Ford Edge\", \"mpg\": 24.1}\n",
" ]\n",
"}"
],
"metadata": {
"id": "sHF4DNzw6qvh"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# Serialization\n",
"with open('./pythonobj.json','w') as sid:\n",
" json.dump(x,sid)\n",
"# Deserialization\n",
"with open('./pythonobj.json','r') as sid:\n",
" z = json.load(sid)\n",
"\n",
"print(z)"
],
"metadata": {
"id": "s5EUImg96tN-"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"### N-dimensional gridded data with NetCDF4\n",
"Geoscience datasets often contain multiple dimensions. For example, climate model outputs ususally contain 4 dimensions: time (t), vertical level (z), longitude (lon) and latitude (lat). These data are too complex to store in tabular tables.\n",
"\n",
"Developed at _Unidata_ (a subsidary of UCAR), the NetCDF format contains a hierarchial structure that allows better organization and storage of large multi-dimensional datasets, axes information, and other metadata. It is well suited to handle large numerical datasets as it allows users to access portions of a dataset without loading its entirety into memory.\n",
"\n",
"We can use `netCDF4` package to create, read and store data in NetCDF4. Another package, `xarray`, is also available for this data format.\n",
"\n",
"#### **Here is how you would normally create and store data in a netCDF file:**\n",
"\n",
"\n",
"1. Open/create a netCDF dataset.\n",
"2. Define the dimensions of the data.\n",
"3. Construct netCDF variables using the defined dimensions.\n",
"4. Pass data into the netCDF variables.\n",
"5. Add attributes to the variables and dataset (optional but recommended).\n",
"6. Close the netCDF dataset."
],
"metadata": {
"id": "Lf_NU-3U6vZf"
}
},
{
"cell_type": "markdown",
"source": [
"##### **Open a netCDF4 dataset**"
],
"metadata": {
"id": "cpshea2h6yHO"
}
},
{
"cell_type": "code",
"source": [
"ncfid = netCDF4.Dataset('sample_netcdf.nc', mode='w', format='NETCDF4')"
],
"metadata": {
"id": "yxSN9g2v6v-I"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"`modeType` has the options:\n",
"* 'w': to create a new file\n",
"* 'r+': to read and write with an existing file\n",
"* 'r': to read (only) an existing file\n",
"* 'a': to append to existing file\n",
"\n",
"`fileFormat` has the options:\n",
"* 'NETCDF3_CLASSIC': Original netCDF format \n",
"* 'NETCDF3_64BIT_OFFSET': Used to ease the size restrictions of netCDF classic files\n",
"* 'NETCDF4_CLASSIC'\n",
"* 'NETCDF4': Offer new features such as groups, compound types, variable length arrays, new unsigned integer types, parallel I/O access, etc.\n",
"* 'NETCDF3_64BIT_DATA'"
],
"metadata": {
"id": "VRxFY_2C61hG"
}
},
{
"cell_type": "markdown",
"source": [
"##### **Creating Dimensions in a netCDF File**\n",
"* Declare dimensions with `.createDimension(size)`\n",
"* For unlimited dimensions, use `None` or `0` as size.\n",
"* Unlimited size dimensions must be declared before (âto the left ofâ) other dimensions."
],
"metadata": {
"id": "rfljyEYV63QH"
}
},
{
"cell_type": "code",
"source": [
"# Define data dimensions\n",
"time = ncfid.createDimension('time', None)\n",
"lev = ncfid.createDimension('lev', 72)\n",
"lat = ncfid.createDimension('lat', 91)\n",
"lon = ncfid.createDimension('lon', 144)"
],
"metadata": {
"id": "1NU0NyZ-6xmH"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"##########################################################################################\n",
"# Create dimension variables and data variable pre-filled with fill_value\n",
"##########################################################################################\n",
"# Dimension variables\n",
"times = ncfid.createVariable('time','f8',('time',))\n",
"levels = ncfid.createVariable('lev','i4',('lev',))\n",
"latitudes = ncfid.createVariable('lat','f4',('lat',))\n",
"longitudes = ncfid.createVariable('lon','f4',('lon',))\n",
"# Pre-filled data variable\n",
"temp = ncfid.createVariable('temp','f4',\n",
" ('time','lev','lat','lon',),\n",
" fill_value=1.0e15)"
],
"metadata": {
"id": "qz2Hlpwu65VW"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"##### **Add variable attributes**"
],
"metadata": {
"id": "vLxKzEI37EId"
}
},
{
"cell_type": "code",
"source": [
"import datetime\n",
"latitudes.long_name = 'latitude'\n",
"latitudes.units = 'degrees north'\n",
"\n",
"longitudes.long_name = 'longitude'\n",
"longitudes.units = 'degrees east'\n",
"\n",
"levels.long_name = 'vertical levels'\n",
"levels.units = 'hPa'\n",
"levels.positive = 'down'\n",
"\n",
"beg_date = datetime.datetime(year=2019, month=1, day=1)\n",
"times.long_name = 'time'\n",
"times.units = beg_date.strftime('hours since %Y-%m-%d %H:%M:%S')\n",
"times.calendar = 'gregorian'\n",
"\n",
"temp.long_name = 'temperature'\n",
"temp.units = 'K'\n",
"temp.standard_name = 'atmospheric_temperature'"
],
"metadata": {
"id": "GmOqGfiK7FdM"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"##### **Write data on file**"
],
"metadata": {
"id": "HREk67oC7HqM"
}
},
{
"cell_type": "code",
"source": [
"latitudes[:] = np.arange(-90,91,2.0)\n",
"longitudes[:] = np.arange(-180,180,2.5)\n",
"levels[:] = np.arange(0,72,1)\n",
"\n",
"out_frequency = 3 # ouput frequency in hours\n",
"num_records = 5\n",
"dates = [beg_date + n*datetime.timedelta(hours=out_frequency) for n in range(num_records)]\n",
"times[:] = netCDF4.date2num(dates, units=times.units, calendar=times.calendar)\n",
"for i in range(num_records):\n",
" temp[i,:,:,:] = np.random.uniform(size=(levels.size,\n",
" latitudes.size,\n",
" longitudes.size))"
],
"metadata": {
"id": "VducI-mE7JCs"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"ncfid.close()"
],
"metadata": {
"id": "cHH7fBZg7LCV"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"#### Now we read the stored netCDF4 file to see what we did just now."
],
"metadata": {
"id": "FLljgxAt7NC7"
}
},
{
"cell_type": "code",
"source": [
"databank = netCDF4.Dataset('./sample_netcdf.nc', mode='r')"
],
"metadata": {
"id": "h1QOqLlB7OLk"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# We print the names of the variables in the `sample_netcdf.nc` file\n",
"print(databank.variables.keys())"
],
"metadata": {
"id": "1w-p-_0i7RFU"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# We can read the data like this\n",
"time = ncfid.variables['time'][:]\n",
"lev = ncfid.variables['lev'][:]\n",
"lat = ncfid.variables['lat'][:]\n",
"lon = ncfid.variables['lon'][:]\n",
"temp = ncfid.variables['temp'][:]"
],
"metadata": {
"id": "Gbz-qz7c7SkN"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"```{important}\n",
"\n",
"While reading data from a file:\n",
"\n",
"- If you do not include `[:]` at the end of `variables[var_name]`, you are getting a variable object.\n",
"- If you include `[:]` (or `[:,:]`, `[0, i:j, :]`, etc.) at the end of `variables[var_name]`, you are getting the Numpy array containing the data.\n",
"```"
],
"metadata": {
"id": "MWRT2l5c7Uw7"
}
},
{
"cell_type": "code",
"source": [
"print(lat)"
],
"metadata": {
"id": "qYnzz2Hz7WEU"
},
"execution_count": null,
"outputs": []
}
]
}