1.1. Variables, Control Flow, and File I/O#
Image by haim charbit from Pixabay
In this section we introduce the basic building blocks of the Python language.
Python has the following 6 built-in Data-Types:
Type | Description | Examples |
---|---|---|
int | Integer | 123 |
float | Floating point | 10.12 |
complex | Complex values | 1.0+3j |
bool | Boolean values | True |
string | String values | 'Bonjour' |
NoneType | None value | None |
Python has four data structures:
Type | Description | Examples |
---|---|---|
list | Ordered collection of values | [1, 'abc', 3, 1] |
set | Unordered collection of unique values | {1, 'abc', 3} |
tuple | Immutable Ordered collection | (1, 'abc', 3) |
dictionary | Unordered collection of key-value pairs | {'key1':aaa,'key2':111} |
Reference:
CUSP UCSL bootcamp 2017 (Mohitsharma44/ucsl17)
1.1.1. Basic Variables: Numbers and Strings#
The main difference between Python and languages like C++ and Fortran is that Python variables do not need explicit declaration to reserve memory space. The declaration happens automatically when a value is assigned to a variable. This means that a variable that was used to store a string can also be used to store an integer/array/list etc.
Rules for naming a variable
The start of the variable name can be an underscore (_), a capital letter, or a lowercase letter. However, it is generally recommended to use all uppercase for global variables and all lower case for local variables. The letters following the first letter can be a digit or a string. Python is a case-sensitive language. Therefore, var is not equal to VAR or vAr.
Apart from the above restrictions, Python keywords cannot be used as identifier names. These are:
and |
del |
from |
not |
while |
as |
elif |
global |
or |
with |
assert |
else |
if |
pass |
yield |
break |
except |
import |
||
class |
exec |
in |
raise |
|
continue |
finally |
is |
return |
|
def |
for |
lambda |
try |
\
Additionally, the following are built in functions which are always available in your namespace once you open a Python interpreter
abs() dict() help() min() setattr() all() dir() hex() next() slice() any()
divmod() id() object() sorted() ascii() enumerate() input() oct() staticmethod()
bin() eval() int() open() str() bool() exec() isinstance() ord() sum() bytearray()
filter() issubclass() pow() super() bytes() float() iter() print() tuple()
callable() format() len() property() type() chr() frozenset() list() range()
vars() classmethod() getattr() locals() repr() zip() compile() globals() map()
reversed() __import__() complex() hasattr() max() round() delattr() hash()
memoryview() set()
# Basic Variables: Numbers and Strings
# comments are anything that comes after the "#" symbol
a = 1 # assign 1 to variable a
b = "hello" # assign "hello" to variable b
All variables are objects. Every object has a type (class). To find out what type your variables are.
print(type(a), type(b))
<class 'int'> <class 'str'>
# we can check for the type of an object
print(type(a) is int)
print(type(a) is str)
True
False
We can also define multiple variables simultaneously
var1,var2,var3,var4 = 'Hello', 'World', 1, 2
print(var1,var2,var3,var4)
Hello World 1 2
1.1.1.1. String#
We now focus on strings a bit. We will discuss
String concatenation
String indexing
String slicing
String formatting
Built-in String Methods
# String concatenation
text1,text2,text3,text4 = 'Introduction','to','Python','course'
print(text1+text2+text3+text4)
IntroductiontoPythoncourse
#@title #####Can you figure out a way to add spaces between the words?
print(text1+' '+text2+' '+text3+' '+text4)
Introduction to Python course
Characters in a string can be accessed using the standard square bracket [ ] syntax. Python uses zero-based indexing, which means that first character in a string will be indexed at the 0\(^{\text{th}}\) location.
# String indexing
print(text1[0],text1[5],text1[-1],text1[-7])
I d n d
# String slicing
print(text1[:5],text1[-5:],text1[:5]+text3[0:2])
Intro ction IntroPy
# String formatting
#f strings allow you to format data easily, but require Python >= 3.6
print(f'The a variable has type {type(a)} and value {a}')
print(f'The b variable has type {type(b)} and value {b}')
The a variable has type <class 'int'> and value 1
The b variable has type <class 'str'> and value hello
Each object includes attributes and methods, respectively referring to variables or functions associated with that object. Object attributes and methods can be accessed via the syntax variable.atribute
and variable.method()
IPython will autocomplete if you press <tab>
to show you the methods available. If you’re using Google Colab, you can do the same with <ctrl> + <space>
# this returns the method itself
b.capitalize
<function str.capitalize()>
# this calls the method
b.capitalize()
# there are lots of other methods
'Hello'
1.1.1.2. Math Operators#
We now focus on using Python to perform mathematical operations.
# Addition/Subtraction (Remember var3=1,var4=2)
print(var3+var4,var3-var4)
3 -1
# Multiplication
print(var3*var4)
2
# Division
print(var3/var4,type(var3/var4))
0.5 <class 'float'>
# exponentiation
print(var4**(var3+2))
8
# Modulus
7 % 2
1
# rounding
round(9/10)
1
1.1.1.3. Relational Operators#
# Equal to (==)
a, b = 10, 10
a==b
True
# Not Equal to (!=)
print(a!=b, 6!=2)
False True
# Greater than (>) & Less than (<)
print(6>2, 2<6)
True True
1.1.1.4. Assignment Operators#
# Add AND (+=) [equivalent to var=var+10]
a = 10
a+=10
print(a)
20
# Multiplication AND
a = 10
a*=5
print(10*5,a)
50 50
1.1.1.5. Logical Operators#
print(True and True, True and False, True or False, (not True) or (not False))
True False True True
a, b = 'Hello','Bye'
print(a is b, a is not b)
False True
1.1.2. Control Flow#
The first thing you need to know is that Python programs (or Python Scripts) are usually executed sequentially and a code statement will not be executed again once operated.
However, in real life situations you will often need to execute a snippet of code multiple times, or execute a portion of a code based on different conditions. We use control flow statements for these slightly more complex tasks.
In this section, we will be covering:
Conditional statements – if, else, and elif
Loop statements – for, while
Loop control statements – break, continue, pass
Reference:
IBM Congnitive Class - Intro to Python (computationalcore/introduction-to-python)
CUSP UCSL bootcamp 2017 (Mohitsharma44/ucsl17)
1.1.2.1. Conditional Statements#
Here, we combine relational operators and logical operators so that a program can have different information flow according to some conditions. In other words, some code snippets are executed only if some conditions are satisfied.
The logic of the conditional statements is simple. if
-> condition met
-> do something. if
-> condition not met
-> do something else.
x = 100
if x > 0:
print('Positive Number')
elif x < 0:
print('Negative Number')
else:
print ('Zero!')
Positive Number
# indentation is MANDATORY
# blocks are closed by indentation level
if x > 0:
print('Positive Number')
if x >= 100:
print('Huge number!')
Positive Number
Huge number!
1.1.2.2. Loop Statements#
We use loop statements if we want to execute some code statements multiple times. An example where it would be appropriate to use loop statements:
We have multiple data files
We use a loop statements to read the files into memory iteratively.
Within the loop statements, we perform the same proprocessing algorithm on the imported data
In Python language, there are two main types of loop statements: while
loops and for
loops.
# use range [range(5)==[0,1,2,3,4]]
for i in range(5):
print(i)
0
1
2
3
4
Tip
Here we use the range()
function to create a sequence of numbers to drive the for loop.
range(N) will create a list of N numbers that starts with 0
.
range(A,B) will create a list of end-start numbers that starts with A
and ends with B-1
range(A,B,step) starts and ends with the same numbers as range(A,B)
. The only difference is that the difference between numbers changes from 1
to step
We can also use non-numerical iterators to drive for loops!
# iterate over a list we make up, and access both the indices and elements with enumerate()
for index,pet in enumerate(['dog', 'cat', 'fish']):
print(index, pet, len(pet))
0 dog 3
1 cat 3
2 fish 4
As we can see, the for
loop is suitable if you want to repeat the operations in the loop for a fixed number of times N
. But what if you have no idea of how many times you would like to repeat a code snippet? This is not a trivial problem and often occurs in numerical optimization problems.
For these problems, we will forego the for
loop and use the while
loop instead. The termination of a while
loop depends on whether a condition remains satisfied or not. Theoretically, the loop can run forever if the condition you set is always true.
# make a loop
count = 0
while count < 10:
# bad way
# count = count + 1
# better way
count += 1
print(count)
10
1.1.2.2.1. Loop control statements:#
Sometimes we want to make loop execution to diverge from its normal behaviour. Perhaps we want to leave the loop when some conditions are satistied to save processing time. Alternatively, we might want the loop to skip some code if the data satisfies some conditions.
Two control statements are quite useful here: break
and continue
. We’ll use a for
loop as an example:
for i in range(1, 10):
if i == 5:
print('Condition satisfied')
break
print(i) # What would happen if this was placed before the `if` condition?
1
2
3
4
Condition satisfied
for i in range(1, 10):
if i == 5:
print('Condition satisfied')
continue
print("whatever.. I won't get printed anyways.")
print(i)
1
2
3
4
Condition satisfied
6
7
8
9
for i in range(1, 10):
if i == 5:
print('Condition satisfied')
pass
print(i)
1
2
3
4
Condition satisfied
5
6
7
8
9
1.1.3. File I/O#
In this section, we will introduce the basic functions we can use to store and retrieve data from files in different formats.
For environmental science projects, research data are most commonly stored in the following formats:
Text files (
TXT
)Tabular files (e.g.,
CSV
,XLS
)Structured Data / Python dictionaries etc. (e.g.,
Pickle
,dill
,JSON
)Gridded data (e.g.,
HDF5
,NetCDF
)
We will now see how we can use Python and different Python packages to retrieve the data stored in these formats, and how to save your data to different formats for future use.
Reference:
CUSP UCSL bootcamp 2017 (Mohitsharma44/ucsl17)
Python 3 tutorial (https://docs.python.org/3/tutorial/inputoutput.html)
GSFC Python Bootcamp (astg606/py_materials)
Working on JSON Data in Python (https://realpython.com/python-json/)
Let’s import some packages first…
import csv
import netCDF4
import pickle
import pandas as pd
import xarray as xr
import numpy as np
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
Cell In[33], line 2
1 import csv
----> 2 import netCDF4
3 import pickle
4 import pandas as pd
ModuleNotFoundError: No module named 'netCDF4'
1.1.3.1. TXT files#
Now we will learn how to write information to a .TXT file and read it back with built-in Python functions. The data used in this part of the tutorial will be very simple. In the next exercises, we will also introduce commands in community packages that allow us to read and store more complex data.
1.1.3.1.1. Opening Files:#
Files can be opened using python’s built-in open()
function. The function will create a file object for subsequent operations. Use the following syntax to read a TXT file: \
fhandler = open(file_name, access mode, encoding)
file_name
: The file name that you would like to perform your I/O operations on.
Note that this is the full file path (e.g., \(\text{\\home\\Documents\\myfile.txt}\) )encoding
: Encoding scheme to use to convert the stream of bytes to text. (Standard=utf-8
)access_mode
: The way in which a file is opened, available choices for this option include:
access_mode |
Its Function |
---|---|
r |
Opens a file as read only |
rb |
Opens a file as read only in binary format |
r+ |
Opens a file for reading and writing |
rb+ |
Opens a file for reading and writing in binary format |
w |
Opens a file for writing only |
wb |
Opens a file for writing only in binary format |
w+ |
Opens a file for both reading and writing |
wb+ |
Opens a file for writing and reading in binary format |
a |
Opens a file for appending |
ab |
Opens a file for appending in binary |
a+ |
Opens a file for appending and reading |
ab+ |
Opens a file for appending and reading in binary format |
In the example below, we will try to store several sentences into a new TXT file, and use the open()
function to see if the code works as intended.
fhandler = open('test.txt', 'w', encoding="utf-8")
fhandler.write('Hello World!\n')
fhandler.write('I am a UNIL Master Student.\n')
fhandler.write('I am learning how to code!\n')
fhandler.close()
Note
In the code above, we use the open()
command to create a write-only (access_mode='w'
) file test.txt
. The open command creates a file object (fhandler
) on which we can perform extra operations.
We then try to add three sentences to the TXT file using the .write()
operation on the file object.
Remember to close the file with .close()
command so that the changes can be finalized!
If the code is writing, we should see a test.txt
file created in the same path as this notebook. Let’s see if that’s the case!
Tip
Exclamation marks directly pass commands to the shell, which you can think of as the interface between a computer’s user and its inner workings
! ls .
! cat test.txt
Hurray! It is working! 😀
But didn’t we just say we want to read it back? 🤨
Let try to read the file then! Can you think of ways to do this?
Here are some of the functions that you may end up using.
.close()
: Close the file that we have currently open..readline([size])
: Read strings from a file till it reaches a new line character\n
if thesize
parameter is empty. Otherwise it will read string of the given size..readlines([size])
: Repeatly call.readline()
till the end of the file..write(str)
: Writes the string str to file..writelines([list])
: Write a sequence of strings to file. No new line is added automatically.
fhandler = open('test.txt','r',encoding='utf-8')
fhandler.readlines()
What if we want to add some text to the file?
with open('test.txt', 'r+') as fhandler:
print(fhandler.readlines())
fhandler.writelines(['Now,\n', 'I am trying to', ' add some stuff.'])
# Go to the starting of file
fhandler.seek(0)
# Print the content of file
print(fhandler.readlines())
Here we use an alternative way to open and write the data file.
By using the with
statement to open the TXT file, we ensure that the data is automatically closed after the final operation. We now do not need to write the fhandler.close()
statement any more.
1.1.3.2. Tabular files#
What would you do if you have data that are nicely organized in the format below?
Data1, Data2, Data3
Example01, Example02, Example03
Example11, Example12, Example13
When you open a file that looks like this in Excel, this is how it would look like:
Data1 |
Data2 |
Data3 |
Example1 |
Example2 |
Example3 |
This is a comma-separated tabular file. Files like these are commonly stored with the .csv
extension. .csv
files can then be opened and viewed using a spreadsheet program, such as Google Sheets, Numbers, or Microsoft Excel.
But what if we want to use the data in Python?
1.1.3.2.1. Opening Files:#
Luckily, there are community packages that could help you import and retrieve your tabular data with minimal effort. Here, we will introduce two such packages: CSV and Pandas.
1.1.3.2.1.1. Reading CSV files with the CSV
package#
reader()
can be used to create an object that is used to read the data from a csv file. The reader can be used as an iterator to process the rows of the file in order. Lets take a look at an example:
import pooch
import urllib.request
datafile = pooch.retrieve('https://unils-my.sharepoint.com/:x:/g/personal/tom_beucler_unil_ch/ETDZdgCkWbZLiv_LP6HKCOAB2NP7H0tUTLlP_stknqQHGw?download=1',
known_hash='c7676360997870d00a0da139c80fb1b6d26e1f96050e03f2fed75b921beb4771')
row = []
# https://unils-my.sharepoint.com/:x:/g/personal/tom_beucler_unil_ch/ETDZdgCkWbZLiv_LP6HKCOAB2NP7H0tUTLlP_stknqQHGw?e=N541Yq
with open(datafile, 'r') as fh:
reader = csv.reader(fh)
for info in reader:
row.append(info)
print(row[0])
print(row[1])
Tip
In the code above, we use the csv.reader()
method to read iteratively process each row in the CSV file.
We add one new row to a empty list at each iteration.
Using the print()
function to look at what was written to the list. We found that the first row contains variable name information, whereas the second row contains data at a given time step.
1.1.3.2.2. Extract data and write to new CSV file:#
The CSV file that we just imported actually contains weather station data from January 2022 to August 2022. What if we want data from the first five rows only? Can we extract the data and save it to a new CSV file?
with open('testsmall.csv', 'w') as fh:
writer = csv.writer(fh)
for num in range(5):
writer.writerow(row[num])
Note
Actually there is a better package for tabular data. The library is named Pandas
. We will introduce this package in greater details next week. For now, we will just demonstrate that we can use pandas to do the same FileI/O procedure we did earlier with CSV.
Here, we read in the large weather station datasheet datafile
with pandas function .read_csv()
.
# Import CSV file with pandas
ALOdatasheet = pd.read_csv(datafile)
# Export first five rows in the Pandas dataframe to CSV file
ALOdatasheet[0:5].to_csv('./testsmall_pd.csv')
1.1.3.3. Serialization and Deserialization with Pickle#
(Rewritten from GSFC Python Bootcamp)
The pickle is an internal Python format for writing arbitrary data to a file in a way that allows it to be read in again, intact.
pickle
“serialises” the object first before writing it to file.Pickling (serialization) is a way to convert a python object (list, dict, etc.) into a character stream which contains all the information necessary to reconstruct the object in another python script.
The following types can be serialized and deserialized using the pickle
module:
All native datatypes supported by Python (booleans, None, integers, floats, complex numbers, strings, bytes, byte arrays)
Dictionaries, sets, lists, and tuples - as long as they contain pickleable objects
Functions (pickled by their name references, and not by their value) and classes that are defined at the top level of a module.
The main functions of pickle
are:
dump()
: pickles data by accepting data and a file object.load()
: takes a file object, reconstruct the objects from the pickled representation, and returns it.dumps()
: returns the pickled data as a string.loads()
: reads the pickled data from a string.
dump()
/load()
serializes/deserializes objects through files but dumps()
/loads()
serializes/deserializes objects through string representation.
# Example Python dictionary
data_org = { 'mydata1':np.linspace(0,800,801), 'mydata2':np.linspace(0,60,61)}
# Save Python dictionary to pickle file
with open('pickledict_sample.pkl', 'wb') as fid:
pickle.dump(data_org, fid)
# Deserialize saved pickle file
with open('pickledict_sample.pkl', 'rb') as fid:
data3 = pickle.load(fid)
for strg in data_org.keys():
print(f"Variable {strg} is the same in data_org and data3: {(data_org[strg]==data3[strg]).all()}")
1.1.4. Bonus#
We have already discussed a lot of material for one day, but your TA also wrote instructions on reading and writing data in other formats! The following tutorial will thus be left for you to experiment at home.
1.1.4.1. Structral Data with JSON#
JSON is a popular format for structured data that can be used in Python and Perl, among other languages. JSON format is built on a collection of name/value pairs. The name information can be an object, record, dictionary, hash table, keyed list, or associative array. The value paired with the name can be an array, vector, list, or sequence.
We can use json
package for I/O. The syntax of the package is very similar to pickle
:
dump()
: encoded string writing on file.load()
: Decode while JSON file read.dumps()
: encoding to JSON objectsloads()
: Decode the JSON string.
Example of JSON Data
{
"stations": [
{
"acronym": “BLD”,
"name": "Boulder Colorado",
"latitude”: 40.00,
"longitude”: -105.25
},
{
"acronym”: “BHD”,
"name": "Baring Head Wellington New Zealand",
"latitude": -41.28,
"longitude": 174.87
}
]
}
Let’s try to read this JSON dataframe with json
!
import json
json_data = '{"stations": [{"acronym": "BLD", \
"name": "Boulder Colorado", \
"latitude": 40.00, \
"longitude": -105.25}, \
{"acronym": "BHD", \
"name": "Baring Head Wellington New Zealand",\
"latitude": -41.28, \
"longitude": 174.87}]}'
python_obj = json.loads(json_data)
for x in python_obj['stations']:
print(x["name"])
# Convert python_obj back to JSON
print(json.dumps(python_obj, sort_keys=True, indent=4))
Now we try to convert a python object to JSON and write it to a file.
Syntax for serialization and deserialization in the json
package is almost the same as pickle
# Convert python objects to JSON
x = {
"name": "John",
"age": 30,
"married": True,
"divorced": False,
"children": ("Ann","Billy"),
"pets": None,
"cars": [
{"model": "BMW 230", "mpg": 27.5},
{"model": "Ford Edge", "mpg": 24.1}
]
}
# Serialization
with open('./pythonobj.json','w') as sid:
json.dump(x,sid)
# Deserialization
with open('./pythonobj.json','r') as sid:
z = json.load(sid)
print(z)
1.1.4.2. N-dimensional gridded data with NetCDF4#
Geoscience datasets often contain multiple dimensions. For example, climate model outputs ususally contain 4 dimensions: time (t), vertical level (z), longitude (lon) and latitude (lat). These data are too complex to store in tabular tables.
Developed at Unidata (a subsidary of UCAR), the NetCDF format contains a hierarchial structure that allows better organization and storage of large multi-dimensional datasets, axes information, and other metadata. It is well suited to handle large numerical datasets as it allows users to access portions of a dataset without loading its entirety into memory.
We can use netCDF4
package to create, read and store data in NetCDF4. Another package, xarray
, is also available for this data format.
1.1.4.2.1. Here is how you would normally create and store data in a netCDF file:#
Open/create a netCDF dataset.
Define the dimensions of the data.
Construct netCDF variables using the defined dimensions.
Pass data into the netCDF variables.
Add attributes to the variables and dataset (optional but recommended).
Close the netCDF dataset.
1.1.4.2.1.1. Open a netCDF4 dataset#
ncfid = netCDF4.Dataset('sample_netcdf.nc', mode='w', format='NETCDF4')
modeType
has the options:
‘w’: to create a new file
‘r+’: to read and write with an existing file
‘r’: to read (only) an existing file
‘a’: to append to existing file
fileFormat
has the options:
‘NETCDF3_CLASSIC’: Original netCDF format
‘NETCDF3_64BIT_OFFSET’: Used to ease the size restrictions of netCDF classic files
‘NETCDF4_CLASSIC’
‘NETCDF4’: Offer new features such as groups, compound types, variable length arrays, new unsigned integer types, parallel I/O access, etc.
‘NETCDF3_64BIT_DATA’
1.1.4.2.1.2. Creating Dimensions in a netCDF File#
Declare dimensions with
.createDimension(size)
For unlimited dimensions, use
None
or0
as size.Unlimited size dimensions must be declared before (“to the left of”) other dimensions.
# Define data dimensions
time = ncfid.createDimension('time', None)
lev = ncfid.createDimension('lev', 72)
lat = ncfid.createDimension('lat', 91)
lon = ncfid.createDimension('lon', 144)
##########################################################################################
# Create dimension variables and data variable pre-filled with fill_value
##########################################################################################
# Dimension variables
times = ncfid.createVariable('time','f8',('time',))
levels = ncfid.createVariable('lev','i4',('lev',))
latitudes = ncfid.createVariable('lat','f4',('lat',))
longitudes = ncfid.createVariable('lon','f4',('lon',))
# Pre-filled data variable
temp = ncfid.createVariable('temp','f4',
('time','lev','lat','lon',),
fill_value=1.0e15)
1.1.4.2.1.3. Add variable attributes#
import datetime
latitudes.long_name = 'latitude'
latitudes.units = 'degrees north'
longitudes.long_name = 'longitude'
longitudes.units = 'degrees east'
levels.long_name = 'vertical levels'
levels.units = 'hPa'
levels.positive = 'down'
beg_date = datetime.datetime(year=2019, month=1, day=1)
times.long_name = 'time'
times.units = beg_date.strftime('hours since %Y-%m-%d %H:%M:%S')
times.calendar = 'gregorian'
temp.long_name = 'temperature'
temp.units = 'K'
temp.standard_name = 'atmospheric_temperature'
1.1.4.2.1.4. Write data on file#
latitudes[:] = np.arange(-90,91,2.0)
longitudes[:] = np.arange(-180,180,2.5)
levels[:] = np.arange(0,72,1)
out_frequency = 3 # ouput frequency in hours
num_records = 5
dates = [beg_date + n*datetime.timedelta(hours=out_frequency) for n in range(num_records)]
times[:] = netCDF4.date2num(dates, units=times.units, calendar=times.calendar)
for i in range(num_records):
temp[i,:,:,:] = np.random.uniform(size=(levels.size,
latitudes.size,
longitudes.size))
ncfid.close()
1.1.4.2.2. Now we read the stored netCDF4 file to see what we did just now.#
databank = netCDF4.Dataset('./sample_netcdf.nc', mode='r')
# We print the names of the variables in the `sample_netcdf.nc` file
print(databank.variables.keys())
# We can read the data like this
time = ncfid.variables['time'][:]
lev = ncfid.variables['lev'][:]
lat = ncfid.variables['lat'][:]
lon = ncfid.variables['lon'][:]
temp = ncfid.variables['temp'][:]
Important
While reading data from a file:
If you do not include
[:]
at the end ofvariables[var_name]
, you are getting a variable object.If you include
[:]
(or[:,:]
,[0, i:j, :]
, etc.) at the end ofvariables[var_name]
, you are getting the Numpy array containing the data.
print(lat)