Open In Colab

1.1. Variables, Control Flow, and File I/O#

snake-7386684_640.jpg

Image by haim charbit from Pixabay

In this section we introduce the basic building blocks of the Python language.

Python has the following 6 built-in Data-Types:

Type Description Examples
int Integer 123
float Floating point 10.12
complex Complex values 1.0+3j
bool Boolean values True
string String values 'Bonjour'
NoneType None value None

Python has four data structures:

Type Description Examples
list Ordered collection of values [1, 'abc', 3, 1]
set Unordered collection of unique values {1, 'abc', 3}
tuple Immutable Ordered collection (1, 'abc', 3)
dictionary Unordered collection of key-value pairs {'key1':aaa,'key2':111}

Reference:

1.1.1. Basic Variables: Numbers and Strings#

The main difference between Python and languages like C++ and Fortran is that Python variables do not need explicit declaration to reserve memory space. The declaration happens automatically when a value is assigned to a variable. This means that a variable that was used to store a string can also be used to store an integer/array/list etc.

Rules for naming a variable

The start of the variable name can be an underscore (_), a capital letter, or a lowercase letter. However, it is generally recommended to use all uppercase for global variables and all lower case for local variables. The letters following the first letter can be a digit or a string. Python is a case-sensitive language. Therefore, var is not equal to VAR or vAr.

Apart from the above restrictions, Python keywords cannot be used as identifier names. These are:

and

del

from

not

while

as

elif

global

or

with

assert

else

if

pass

yield

break

except

import

print

class

exec

in

raise

continue

finally

is

return

def

for

lambda

try

\

Additionally, the following are built in functions which are always available in your namespace once you open a Python interpreter

abs() dict() help() min() setattr() all() dir() hex() next() slice() any()
divmod() id() object() sorted() ascii() enumerate() input() oct() staticmethod()
bin() eval() int() open() str() bool() exec() isinstance() ord() sum() bytearray()
filter() issubclass() pow() super() bytes() float() iter() print() tuple()
callable() format() len() property() type() chr() frozenset() list() range()
vars() classmethod() getattr() locals() repr() zip() compile() globals() map()
reversed() __import__() complex() hasattr() max() round() delattr() hash()
memoryview() set()
# Basic Variables: Numbers and Strings
# comments are anything that comes after the "#" symbol
a = 1       # assign 1 to variable a
b = "hello" # assign "hello" to variable b

All variables are objects. Every object has a type (class). To find out what type your variables are.

print(type(a), type(b))
<class 'int'> <class 'str'>
# we can check for the type of an object
print(type(a) is int)
print(type(a) is str)
True
False

We can also define multiple variables simultaneously

var1,var2,var3,var4 = 'Hello', 'World', 1, 2
print(var1,var2,var3,var4)
Hello World 1 2

1.1.1.1. String#

We now focus on strings a bit. We will discuss

  1. String concatenation

  2. String indexing

  3. String slicing

  4. String formatting

  5. Built-in String Methods

# String concatenation
text1,text2,text3,text4 = 'Introduction','to','Python','course'
print(text1+text2+text3+text4)
IntroductiontoPythoncourse
#@title #####Can you figure out a way to add spaces between the words?
print(text1+' '+text2+' '+text3+' '+text4)
Introduction to Python course

Characters in a string can be accessed using the standard square bracket [ ] syntax. Python uses zero-based indexing, which means that first character in a string will be indexed at the 0\(^{\text{th}}\) location.

# String indexing
print(text1[0],text1[5],text1[-1],text1[-7])
I d n d
# String slicing
print(text1[:5],text1[-5:],text1[:5]+text3[0:2])
Intro ction IntroPy
# String formatting
#f strings allow you to format data easily, but require Python >= 3.6
print(f'The a variable has type {type(a)} and value {a}')
print(f'The b variable has type {type(b)} and value {b}')
The a variable has type <class 'int'> and value 1
The b variable has type <class 'str'> and value hello

Each object includes attributes and methods, respectively referring to variables or functions associated with that object. Object attributes and methods can be accessed via the syntax variable.atribute and variable.method()

IPython will autocomplete if you press <tab> to show you the methods available. If you’re using Google Colab, you can do the same with <ctrl> + <space>

# this returns the method itself
b.capitalize
<function str.capitalize()>
# this calls the method
b.capitalize()
# there are lots of other methods
'Hello'

1.1.1.2. Math Operators#

We now focus on using Python to perform mathematical operations.

# Addition/Subtraction (Remember var3=1,var4=2)
print(var3+var4,var3-var4)
3 -1
# Multiplication
print(var3*var4)
2
# Division
print(var3/var4,type(var3/var4))
0.5 <class 'float'>
# exponentiation
print(var4**(var3+2))
8
# Modulus
7 % 2
1
# rounding
round(9/10)
1

1.1.1.3. Relational Operators#

# Equal to (==)
a, b = 10, 10
a==b
True
# Not Equal to (!=)
print(a!=b, 6!=2)
False True
# Greater than (>) & Less than (<)
print(6>2, 2<6)
True True

1.1.1.4. Assignment Operators#

# Add AND (+=) [equivalent to var=var+10]
a = 10
a+=10
print(a)
20
# Multiplication AND
a = 10
a*=5
print(10*5,a)
50 50

1.1.1.5. Logical Operators#

print(True and True, True and False, True or False, (not True) or (not False))
True False True True
a, b = 'Hello','Bye'
print(a is b, a is not b)
False True

1.1.2. Control Flow#

The first thing you need to know is that Python programs (or Python Scripts) are usually executed sequentially and a code statement will not be executed again once operated.

However, in real life situations you will often need to execute a snippet of code multiple times, or execute a portion of a code based on different conditions. We use control flow statements for these slightly more complex tasks.

In this section, we will be covering:

  1. Conditional statements – if, else, and elif

  2. Loop statements – for, while

  3. Loop control statements – break, continue, pass

Reference:

1.1.2.1. Conditional Statements#

Here, we combine relational operators and logical operators so that a program can have different information flow according to some conditions. In other words, some code snippets are executed only if some conditions are satisfied.

The logic of the conditional statements is simple. if -> condition met -> do something. if -> condition not met -> do something else.

x = 100
if x > 0:
    print('Positive Number')
elif x < 0:
    print('Negative Number')
else:
    print ('Zero!')
Positive Number
# indentation is MANDATORY
# blocks are closed by indentation level
if x > 0:
    print('Positive Number')
    if x >= 100:
        print('Huge number!')
Positive Number
Huge number!

1.1.2.2. Loop Statements#

We use loop statements if we want to execute some code statements multiple times. An example where it would be appropriate to use loop statements:

  1. We have multiple data files

  2. We use a loop statements to read the files into memory iteratively.

  3. Within the loop statements, we perform the same proprocessing algorithm on the imported data

In Python language, there are two main types of loop statements: while loops and for loops.

# use range [range(5)==[0,1,2,3,4]]
for i in range(5):
    print(i)
0
1
2
3
4

Tip

Here we use the range() function to create a sequence of numbers to drive the for loop.

range(N) will create a list of N numbers that starts with 0.
range(A,B) will create a list of end-start numbers that starts with A and ends with B-1
range(A,B,step) starts and ends with the same numbers as range(A,B). The only difference is that the difference between numbers changes from 1 to step

We can also use non-numerical iterators to drive for loops!

# iterate over a list we make up, and access both the indices and elements with enumerate()
for index,pet in enumerate(['dog', 'cat', 'fish']):
    print(index, pet, len(pet))
0 dog 3
1 cat 3
2 fish 4

As we can see, the for loop is suitable if you want to repeat the operations in the loop for a fixed number of times N. But what if you have no idea of how many times you would like to repeat a code snippet? This is not a trivial problem and often occurs in numerical optimization problems.

For these problems, we will forego the for loop and use the while loop instead. The termination of a while loop depends on whether a condition remains satisfied or not. Theoretically, the loop can run forever if the condition you set is always true.

# make a loop
count = 0
while count < 10:
    # bad way
    # count = count + 1
    # better way
    count += 1
print(count)
10

1.1.2.2.1. Loop control statements:#

Sometimes we want to make loop execution to diverge from its normal behaviour. Perhaps we want to leave the loop when some conditions are satistied to save processing time. Alternatively, we might want the loop to skip some code if the data satisfies some conditions.

Two control statements are quite useful here: break and continue. We’ll use a for loop as an example:

for i in range(1, 10):
    if i == 5:
        print('Condition satisfied')
        break
    print(i)  # What would happen if this was placed before the `if` condition?
1
2
3
4
Condition satisfied
for i in range(1, 10):
    if i == 5:
        print('Condition satisfied')
        continue
        print("whatever.. I won't get printed anyways.")
    print(i)
1
2
3
4
Condition satisfied
6
7
8
9
for i in range(1, 10):
    if i == 5:
        print('Condition satisfied')
        pass
    print(i)
1
2
3
4
Condition satisfied
5
6
7
8
9

1.1.3. File I/O#

In this section, we will introduce the basic functions we can use to store and retrieve data from files in different formats.

For environmental science projects, research data are most commonly stored in the following formats:

  1. Text files (TXT)

  2. Tabular files (e.g., CSV, XLS)

  3. Structured Data / Python dictionaries etc. (e.g., Pickle, dill, JSON)

  4. Gridded data (e.g., HDF5, NetCDF)

We will now see how we can use Python and different Python packages to retrieve the data stored in these formats, and how to save your data to different formats for future use.

Reference:

Let’s import some packages first…

import csv
import netCDF4
import pickle
import pandas as pd
import xarray as xr
import numpy as np
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[33], line 2
      1 import csv
----> 2 import netCDF4
      3 import pickle
      4 import pandas as pd

ModuleNotFoundError: No module named 'netCDF4'

1.1.3.1. TXT files#

Now we will learn how to write information to a .TXT file and read it back with built-in Python functions. The data used in this part of the tutorial will be very simple. In the next exercises, we will also introduce commands in community packages that allow us to read and store more complex data.

1.1.3.1.1. Opening Files:#

Files can be opened using python’s built-in open() function. The function will create a file object for subsequent operations. Use the following syntax to read a TXT file: \ fhandler = open(file_name, access mode, encoding)

  • file_name: The file name that you would like to perform your I/O operations on.
    Note that this is the full file path (e.g., \(\text{\\home\\Documents\\myfile.txt}\) )

  • encoding: Encoding scheme to use to convert the stream of bytes to text. (Standard=utf-8)

  • access_mode: The way in which a file is opened, available choices for this option include:

access_mode

Its Function

r

Opens a file as read only

rb

Opens a file as read only in binary format

r+

Opens a file for reading and writing

rb+

Opens a file for reading and writing in binary format

w

Opens a file for writing only

wb

Opens a file for writing only in binary format

w+

Opens a file for both reading and writing

wb+

Opens a file for writing and reading in binary format

a

Opens a file for appending

ab

Opens a file for appending in binary

a+

Opens a file for appending and reading

ab+

Opens a file for appending and reading in binary format

In the example below, we will try to store several sentences into a new TXT file, and use the open() function to see if the code works as intended.

fhandler = open('test.txt', 'w', encoding="utf-8")
fhandler.write('Hello World!\n')
fhandler.write('I am a UNIL Master Student.\n')
fhandler.write('I am learning how to code!\n')
fhandler.close()

Note

In the code above, we use the open() command to create a write-only (access_mode='w') file test.txt. The open command creates a file object (fhandler) on which we can perform extra operations.

We then try to add three sentences to the TXT file using the .write() operation on the file object.

Remember to close the file with .close() command so that the changes can be finalized!

If the code is writing, we should see a test.txt file created in the same path as this notebook. Let’s see if that’s the case!

Tip

Exclamation marks directly pass commands to the shell, which you can think of as the interface between a computer’s user and its inner workings

! ls .
! cat test.txt

Hurray! It is working! 😀

But didn’t we just say we want to read it back? 🤨

Let try to read the file then! Can you think of ways to do this?

Here are some of the functions that you may end up using.

  1. .close(): Close the file that we have currently open.

  2. .readline([size]): Read strings from a file till it reaches a new line character \n if the size parameter is empty. Otherwise it will read string of the given size.

  3. .readlines([size]): Repeatly call .readline() till the end of the file.

  4. .write(str): Writes the string str to file.

  5. .writelines([list]): Write a sequence of strings to file. No new line is added automatically.

fhandler = open('test.txt','r',encoding='utf-8')
fhandler.readlines()

What if we want to add some text to the file?

with open('test.txt', 'r+') as fhandler:
  print(fhandler.readlines())
  fhandler.writelines(['Now,\n', 'I am trying to', ' add some stuff.'])
  # Go to the starting of file
  fhandler.seek(0)
  # Print the content of file
  print(fhandler.readlines())

Here we use an alternative way to open and write the data file. By using the with statement to open the TXT file, we ensure that the data is automatically closed after the final operation. We now do not need to write the fhandler.close() statement any more.

1.1.3.2. Tabular files#

What would you do if you have data that are nicely organized in the format below?

Data1, Data2, Data3
Example01, Example02, Example03
Example11, Example12, Example13

When you open a file that looks like this in Excel, this is how it would look like:

Data1

Data2

Data3

Example1

Example2

Example3

This is a comma-separated tabular file. Files like these are commonly stored with the .csv extension. .csv files can then be opened and viewed using a spreadsheet program, such as Google Sheets, Numbers, or Microsoft Excel.

But what if we want to use the data in Python?

1.1.3.2.1. Opening Files:#

Luckily, there are community packages that could help you import and retrieve your tabular data with minimal effort. Here, we will introduce two such packages: CSV and Pandas.

1.1.3.2.1.1. Reading CSV files with the CSV package#

reader() can be used to create an object that is used to read the data from a csv file. The reader can be used as an iterator to process the rows of the file in order. Lets take a look at an example:

import pooch
import urllib.request
datafile = pooch.retrieve('https://unils-my.sharepoint.com/:x:/g/personal/tom_beucler_unil_ch/ETDZdgCkWbZLiv_LP6HKCOAB2NP7H0tUTLlP_stknqQHGw?download=1',
                          known_hash='c7676360997870d00a0da139c80fb1b6d26e1f96050e03f2fed75b921beb4771')
row = []
# https://unils-my.sharepoint.com/:x:/g/personal/tom_beucler_unil_ch/ETDZdgCkWbZLiv_LP6HKCOAB2NP7H0tUTLlP_stknqQHGw?e=N541Yq
with open(datafile, 'r') as fh:
  reader = csv.reader(fh)
  for info in reader:
    row.append(info)
print(row[0])
print(row[1])

Tip

In the code above, we use the csv.reader() method to read iteratively process each row in the CSV file.

We add one new row to a empty list at each iteration.

Using the print() function to look at what was written to the list. We found that the first row contains variable name information, whereas the second row contains data at a given time step.

1.1.3.2.2. Extract data and write to new CSV file:#

The CSV file that we just imported actually contains weather station data from January 2022 to August 2022. What if we want data from the first five rows only? Can we extract the data and save it to a new CSV file?

with open('testsmall.csv', 'w') as fh:
  writer = csv.writer(fh)
  for num in range(5):
    writer.writerow(row[num])

Note

Actually there is a better package for tabular data. The library is named Pandas. We will introduce this package in greater details next week. For now, we will just demonstrate that we can use pandas to do the same FileI/O procedure we did earlier with CSV.

Here, we read in the large weather station datasheet datafile with pandas function .read_csv().

# Import CSV file with pandas
ALOdatasheet = pd.read_csv(datafile)
# Export first five rows in the Pandas dataframe to CSV file
ALOdatasheet[0:5].to_csv('./testsmall_pd.csv')

1.1.3.3. Serialization and Deserialization with Pickle#

(Rewritten from GSFC Python Bootcamp)

The pickle is an internal Python format for writing arbitrary data to a file in a way that allows it to be read in again, intact.

  • pickle “serialises” the object first before writing it to file.

  • Pickling (serialization) is a way to convert a python object (list, dict, etc.) into a character stream which contains all the information necessary to reconstruct the object in another python script.

The following types can be serialized and deserialized using the pickle module:

  • All native datatypes supported by Python (booleans, None, integers, floats, complex numbers, strings, bytes, byte arrays)

  • Dictionaries, sets, lists, and tuples - as long as they contain pickleable objects

  • Functions (pickled by their name references, and not by their value) and classes that are defined at the top level of a module.

The main functions of pickle are:

  • dump(): pickles data by accepting data and a file object.

  • load(): takes a file object, reconstruct the objects from the pickled representation, and returns it.

  • dumps(): returns the pickled data as a string.

  • loads(): reads the pickled data from a string.

dump()/load() serializes/deserializes objects through files but dumps()/loads() serializes/deserializes objects through string representation.

# Example Python dictionary
data_org = { 'mydata1':np.linspace(0,800,801), 'mydata2':np.linspace(0,60,61)}
# Save Python dictionary to pickle file
with open('pickledict_sample.pkl', 'wb') as fid:
     pickle.dump(data_org, fid)
# Deserialize saved pickle file
with open('pickledict_sample.pkl', 'rb') as fid:
     data3 = pickle.load(fid)
for strg in data_org.keys():
  print(f"Variable {strg} is the same in data_org and data3: {(data_org[strg]==data3[strg]).all()}")

1.1.4. Bonus#

We have already discussed a lot of material for one day, but your TA also wrote instructions on reading and writing data in other formats! The following tutorial will thus be left for you to experiment at home.

1.1.4.1. Structral Data with JSON#

JSON is a popular format for structured data that can be used in Python and Perl, among other languages. JSON format is built on a collection of name/value pairs. The name information can be an object, record, dictionary, hash table, keyed list, or associative array. The value paired with the name can be an array, vector, list, or sequence.

We can use json package for I/O. The syntax of the package is very similar to pickle:

  • dump(): encoded string writing on file.

  • load(): Decode while JSON file read.

  • dumps(): encoding to JSON objects

  • loads(): Decode the JSON string.

Example of JSON Data

{
    "stations": [
        {
            "acronym": “BLD”,
            "name": "Boulder Colorado",
            "latitude”: 40.00,
            "longitude”: -105.25
        },
        {
            "acronym”: “BHD”,
            "name": "Baring Head Wellington New Zealand",
            "latitude": -41.28,
            "longitude": 174.87
        }
    ]
}

Let’s try to read this JSON dataframe with json!

import json
json_data = '{"stations": [{"acronym": "BLD", \
                                "name": "Boulder Colorado", \
                            "latitude": 40.00, \
                            "longitude": -105.25}, \
                            {"acronym": "BHD", \
                             "name": "Baring Head Wellington New Zealand",\
                             "latitude": -41.28, \
                             "longitude": 174.87}]}'

python_obj = json.loads(json_data)
for x in python_obj['stations']:
    print(x["name"])
# Convert python_obj back to JSON
print(json.dumps(python_obj, sort_keys=True, indent=4))

Now we try to convert a python object to JSON and write it to a file. Syntax for serialization and deserialization in the json package is almost the same as pickle

# Convert python objects to JSON
x = {
  "name": "John",
  "age": 30,
  "married": True,
  "divorced": False,
  "children": ("Ann","Billy"),
  "pets": None,
  "cars": [
    {"model": "BMW 230", "mpg": 27.5},
    {"model": "Ford Edge", "mpg": 24.1}
  ]
}
# Serialization
with open('./pythonobj.json','w') as sid:
  json.dump(x,sid)
# Deserialization
with open('./pythonobj.json','r') as sid:
  z = json.load(sid)

print(z)

1.1.4.2. N-dimensional gridded data with NetCDF4#

Geoscience datasets often contain multiple dimensions. For example, climate model outputs ususally contain 4 dimensions: time (t), vertical level (z), longitude (lon) and latitude (lat). These data are too complex to store in tabular tables.

Developed at Unidata (a subsidary of UCAR), the NetCDF format contains a hierarchial structure that allows better organization and storage of large multi-dimensional datasets, axes information, and other metadata. It is well suited to handle large numerical datasets as it allows users to access portions of a dataset without loading its entirety into memory.

We can use netCDF4 package to create, read and store data in NetCDF4. Another package, xarray, is also available for this data format.

1.1.4.2.1. Here is how you would normally create and store data in a netCDF file:#

  1. Open/create a netCDF dataset.

  2. Define the dimensions of the data.

  3. Construct netCDF variables using the defined dimensions.

  4. Pass data into the netCDF variables.

  5. Add attributes to the variables and dataset (optional but recommended).

  6. Close the netCDF dataset.

1.1.4.2.1.1. Open a netCDF4 dataset#
ncfid = netCDF4.Dataset('sample_netcdf.nc', mode='w', format='NETCDF4')

modeType has the options:

  • ‘w’: to create a new file

  • ‘r+’: to read and write with an existing file

  • ‘r’: to read (only) an existing file

  • ‘a’: to append to existing file

fileFormat has the options:

  • ‘NETCDF3_CLASSIC’: Original netCDF format

  • ‘NETCDF3_64BIT_OFFSET’: Used to ease the size restrictions of netCDF classic files

  • ‘NETCDF4_CLASSIC’

  • ‘NETCDF4’: Offer new features such as groups, compound types, variable length arrays, new unsigned integer types, parallel I/O access, etc.

  • ‘NETCDF3_64BIT_DATA’

1.1.4.2.1.2. Creating Dimensions in a netCDF File#
  • Declare dimensions with .createDimension(size)

  • For unlimited dimensions, use None or 0 as size.

  • Unlimited size dimensions must be declared before (“to the left of”) other dimensions.

# Define data dimensions
time = ncfid.createDimension('time', None)
lev  = ncfid.createDimension('lev', 72)
lat  = ncfid.createDimension('lat', 91)
lon  = ncfid.createDimension('lon', 144)
##########################################################################################
# Create dimension variables and data variable pre-filled with fill_value
##########################################################################################
# Dimension variables
times      = ncfid.createVariable('time','f8',('time',))
levels     = ncfid.createVariable('lev','i4',('lev',))
latitudes  = ncfid.createVariable('lat','f4',('lat',))
longitudes = ncfid.createVariable('lon','f4',('lon',))
# Pre-filled data variable
temp = ncfid.createVariable('temp','f4',
                            ('time','lev','lat','lon',),
                            fill_value=1.0e15)
1.1.4.2.1.3. Add variable attributes#
import datetime
latitudes.long_name  = 'latitude'
latitudes.units      = 'degrees north'

longitudes.long_name = 'longitude'
longitudes.units     = 'degrees east'

levels.long_name     = 'vertical levels'
levels.units         = 'hPa'
levels.positive      = 'down'

beg_date = datetime.datetime(year=2019, month=1, day=1)
times.long_name      = 'time'
times.units          = beg_date.strftime('hours since %Y-%m-%d %H:%M:%S')
times.calendar       = 'gregorian'

temp.long_name       = 'temperature'
temp.units           = 'K'
temp.standard_name   = 'atmospheric_temperature'
1.1.4.2.1.4. Write data on file#
latitudes[:]  =  np.arange(-90,91,2.0)
longitudes[:] =  np.arange(-180,180,2.5)
levels[:]     =  np.arange(0,72,1)

out_frequency = 3   # ouput frequency in hours
num_records   = 5
dates = [beg_date + n*datetime.timedelta(hours=out_frequency) for n in range(num_records)]
times[:] = netCDF4.date2num(dates, units=times.units, calendar=times.calendar)
for i in range(num_records):
    temp[i,:,:,:] = np.random.uniform(size=(levels.size,
                                            latitudes.size,
                                            longitudes.size))
ncfid.close()

1.1.4.2.2. Now we read the stored netCDF4 file to see what we did just now.#

databank = netCDF4.Dataset('./sample_netcdf.nc', mode='r')
# We print the names of the variables in the `sample_netcdf.nc` file
print(databank.variables.keys())
# We can read the data like this
time   = ncfid.variables['time'][:]
lev    = ncfid.variables['lev'][:]
lat    = ncfid.variables['lat'][:]
lon    = ncfid.variables['lon'][:]
temp   = ncfid.variables['temp'][:]

Important

While reading data from a file:

  • If you do not include [:] at the end of variables[var_name], you are getting a variable object.

  • If you include [:] (or [:,:], [0, i:j, :], etc.) at the end of variables[var_name], you are getting the Numpy array containing the data.

print(lat)