1.6. (Exercise) Ocean Floats Data Analysis#
Learning Objectives
Creating new arrays using
linspace
andarange
Computing basic formulas with
numpy
arraysLoading data from
.npy
filesPerforming reductions (e.g.
mean
,std
onnumpy
arrays)Making 1D line plots
Making scatterplots
Annotating plots with titles and axes
In this problem, we use real data from ocean profiling floats. ARGO floats are autonomous robotic instruments that collect Temperature, Salinity, and Pressure data from the ocean. ARGO floats collect one “profile” (a set of messurements at different depths or “levels”).
Each profile has a single latitude, longitude, and date associated with it, in addition to many different levels.
Let’s start by using pooch to download the data files we need for this exercise. The following code will give you a list of .npy
files that you can open in the next step.
import pooch
url = "https://unils-my.sharepoint.com/:u:/g/personal/tom_beucler_unil_ch/EZwbaBqass1LhZO3DS3BCL0BhIlcENuoDItMB9b4IYDUCQ?download=1"
files = pooch.retrieve(url, processor=pooch.Unzip(), known_hash='2a703c720302c682f1662181d329c9f22f9f10e1539dc2d6082160a469165009')
files
['C:\\Users\\tbeucler\\AppData\\Local\\pooch\\pooch\\Cache\\4e2111f8c8dc35a2e2c0f9ec759ecb61-EZwbaBqass1LhZO3DS3BCL0BhIlcENuoDItMB9b4IYDUCQ.unzip\\float_data\\date.npy',
'C:\\Users\\tbeucler\\AppData\\Local\\pooch\\pooch\\Cache\\4e2111f8c8dc35a2e2c0f9ec759ecb61-EZwbaBqass1LhZO3DS3BCL0BhIlcENuoDItMB9b4IYDUCQ.unzip\\float_data\\lat.npy',
'C:\\Users\\tbeucler\\AppData\\Local\\pooch\\pooch\\Cache\\4e2111f8c8dc35a2e2c0f9ec759ecb61-EZwbaBqass1LhZO3DS3BCL0BhIlcENuoDItMB9b4IYDUCQ.unzip\\float_data\\levels.npy',
'C:\\Users\\tbeucler\\AppData\\Local\\pooch\\pooch\\Cache\\4e2111f8c8dc35a2e2c0f9ec759ecb61-EZwbaBqass1LhZO3DS3BCL0BhIlcENuoDItMB9b4IYDUCQ.unzip\\float_data\\lon.npy',
'C:\\Users\\tbeucler\\AppData\\Local\\pooch\\pooch\\Cache\\4e2111f8c8dc35a2e2c0f9ec759ecb61-EZwbaBqass1LhZO3DS3BCL0BhIlcENuoDItMB9b4IYDUCQ.unzip\\float_data\\P.npy',
'C:\\Users\\tbeucler\\AppData\\Local\\pooch\\pooch\\Cache\\4e2111f8c8dc35a2e2c0f9ec759ecb61-EZwbaBqass1LhZO3DS3BCL0BhIlcENuoDItMB9b4IYDUCQ.unzip\\float_data\\S.npy',
'C:\\Users\\tbeucler\\AppData\\Local\\pooch\\pooch\\Cache\\4e2111f8c8dc35a2e2c0f9ec759ecb61-EZwbaBqass1LhZO3DS3BCL0BhIlcENuoDItMB9b4IYDUCQ.unzip\\float_data\\T.npy']
import numpy as np
Q1) Load each data file as a numpy
array.
You can use whatever names you want for your arrays, but we recommend:
T
: temperature
S
: salinity
P
: pressure
date
: date
lat
: latitude
lon
: longitude
level
: depth level
Hint 1: Look at the file name (the items in files
) to know which files corresponds to which variable.
Hint 2: Check out the documentation for np.load
.
Display the names of the items in files here
___
''
Then, load the files as numpy
arrays, for instance using list comprehension
___,___,___,___,___,___,___ = [np.___(___[___]) for ___ in range(___)]
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[4], line 1
----> 1 ___,___,___,___,___,___,___ = [np.___(___[___]) for ___ in range(___)]
TypeError: 'str' object cannot be interpreted as an integer
Q2) Recreate the level
array using np.arange
and np.linspace
Hints:
The documentation for
np.arange
is at this linkThe documentation for
np.linspace
is at this linkThe documentation for
np.testing.assert_equal
is at this link
Display the level
array
___
Recreate the level
array using np.arange
and call it level_arange
level_arange = np.arange(___,___)
Check that level
and level_arange
are equal using np.testing.assert_equal
np.___.___(___,level_arange)
Recreate the level
array using np.linspace
and call it level_linspace
___ = np.___(___,___,___)
Check that level
and level_linspace
are equal using np.testing.assert_equal
___.___(___,___)
Q3) Examine the shapes of T
, S
and P
compared to lon
, lat
, date
and level
. How do you think they are related?
Hint: Check out the NDArrays subsection
Display the shapes of all loaded variables
___.shape
___.___
Based on the shapes, which dimensions do you think are shared among the arrays?
Q4) Based on the formula below, calculate the seawater density relative_density
, relative to pure water, as a function of the temperature, the salinity, and the pressure.
relative_density
\( = ρ −ρ_{Pure\ Water} = a \times S +b \times \Theta + c \times \Theta^{2}\)
where:
The densities \(\rho\) and \(ρ_{Pure\ Water}\) are in units \(kg/m^{3}\).
The constants \(a\), \(b\), and \(c\) are provided below.
The function to calculate the conservative temperature \(\Theta\) (in units Celcius) from temperature, salinity, and pressure is provided below.
The temperature \(T\) is in units Celcius.
The salinity \(S\) is in units \(g/kg\).
The pressure \(p\) is in units \(dbar\).
Hint: The loaded numpy
arrays temperature
, salinity
, and pressure
already have the right units and no conversion is needed.
Sources:
Below are the constants a, b, and c:
a = 7.718e-1
b = -8.44e-2
c = -4.561e-3
Let’s import the library gsw
that contains the function CS_from_t
to calculate the conservative temperature \(\Theta\) from temperature, salinity, and pressure.
!pip install gsw
from gsw import CT_from_t
Now it’s all up to you. Here’s the equation to avoid having to scroll back up:
relative_density
\( = ρ −ρ_{Pure\ Water} = a \times S +b \times \Theta + c \times \Theta^{2}\)
Calculate the conservative temperature
Hint: use CT_from_t
___ = CT_from_t(___,___,___)
Calculate the relative density
using the equation above
___ = ___
Q4) Make a plot for each column of data in T
, S
, P
, and
relative_density
(four plots)
For this question, we have to use the Pyplot interface of the Matplotlib library for visualization even if we have not covered it extensively in class yet. But fear not as we provide easy-to-follow instructions below. 😊
The first step is to import Pyplot. Simply execute the code below.
import matplotlib.pyplot as plt
Then, we will plot variables as a function of the ocean depth, level
. Simply read the documentation at this link to infer the correct syntax. Label your axes using plt.xlabel
and plt.ylabel
, and add a title using plt.title
.
Hint: The vertical scale should use the level
data to be consistent with oceanographic conventions.
Hint 2: Each plot should have a line for each column of data. It will look messy, like the plot below:
plt.plot(___,___); # The semi-colon prevents printing the line objects
plt.xlabel(___) # Takes a string as argument
plt.ylabel(___)
plt.title(___)
Make more plots below:
Q5) Compute the mean and standard deviation of each of T
, S
, P
, and seawater_density
at each depth in level
.
Hint: You may want to read the documentation at this link and this link.
Hint 2: You can check that you took the mean and standard deviations along the correct axes by checking the shape
of your results.
Compute the means…
___ = np.___(___,axis=___)
___
___
___
… and the standard deviations.
___ = ___.___(___,___)
Check that they have the same shape as your vertical level lev
coordinate:
np.testing.assert_equal(___.shape,lev.shape)
Q6) Now make similar plots, but show only the mean T
, S
, P
, and seawater_density
at each depth. Show error bars on each plot using the standard deviations.
Hint: If you are feeling adventurous, you can directly use the plt.errorbar
function.
Hint 2: You should get plots similar to the one below
plt.errorbar(___,___,xerr=___)
plt.xlabel(___)
plt.ylabel(___)
plt.title(___)
Three more plots and we’ll be all set! 🙂
Q7) Account For Missing Data
The profiles contain many missing values. These are indicated by the special “Not a Number” value, or np.nan
.
When you take the mean or standard deviation of data with NaNs in it, the entire result becomes NaN
. Instead, if you use the special functions np.nanmean
and np.nanstd
, you tell NumPy to ignore the NaNs.
Recalculate the means and standard deviations as in the previous sections using these functions and plot the results.
Hint: Links to the np.nanmean
documentation and the np.nanstd
documentation.
Recalculate the means below ignoring the missing values. We trust that you can now come up with the full syntax yourself 😎
Similarly, recalculate the standard deviations ignoring the missing values.
Q8) Create a scatter plot of the longitudinal (lon
) and latitudinal (lat
) coordinates of the ARGO floats.
Again, we have not discussed it in the tutorial, but there is a really convenient scatter plot function called plt.scatter
provided by the Pyplot interface.
Bonus: Label your figure using plt.xlabel
, plt.ylabel
, and plt.title
.
Bonus 2: Increase the fontsize of your labels by adding a fontsize=
argument to the label functions.
Bonus 3: Make your scatter plot beautiful by changing the arguments of plt.scatter
listed in the documentation, for example s=
.
plt.scatter(___,___)
___ # Fancy bonuses
___ # More fancy bonuses