Open In Colab

1.6. (Exercise) Ocean Floats Data Analysis#

Learning Objectives

  • Creating new arrays using linspace and arange

  • Computing basic formulas with numpy arrays

  • Loading data from .npy files

  • Performing reductions (e.g. mean, std on numpy arrays)

  • Making 1D line plots

  • Making scatterplots

  • Annotating plots with titles and axes

In this problem, we use real data from ocean profiling floats. ARGO floats are autonomous robotic instruments that collect Temperature, Salinity, and Pressure data from the ocean. ARGO floats collect one “profile” (a set of messurements at different depths or “levels”).

float_cycle_1.png

Each profile has a single latitude, longitude, and date associated with it, in addition to many different levels.

Let’s start by using pooch to download the data files we need for this exercise. The following code will give you a list of .npy files that you can open in the next step.

import pooch

url = "https://unils-my.sharepoint.com/:u:/g/personal/tom_beucler_unil_ch/EZwbaBqass1LhZO3DS3BCL0BhIlcENuoDItMB9b4IYDUCQ?download=1"
files = pooch.retrieve(url, processor=pooch.Unzip(), known_hash='2a703c720302c682f1662181d329c9f22f9f10e1539dc2d6082160a469165009')
files
['C:\\Users\\tbeucler\\AppData\\Local\\pooch\\pooch\\Cache\\4e2111f8c8dc35a2e2c0f9ec759ecb61-EZwbaBqass1LhZO3DS3BCL0BhIlcENuoDItMB9b4IYDUCQ.unzip\\float_data\\date.npy',
 'C:\\Users\\tbeucler\\AppData\\Local\\pooch\\pooch\\Cache\\4e2111f8c8dc35a2e2c0f9ec759ecb61-EZwbaBqass1LhZO3DS3BCL0BhIlcENuoDItMB9b4IYDUCQ.unzip\\float_data\\lat.npy',
 'C:\\Users\\tbeucler\\AppData\\Local\\pooch\\pooch\\Cache\\4e2111f8c8dc35a2e2c0f9ec759ecb61-EZwbaBqass1LhZO3DS3BCL0BhIlcENuoDItMB9b4IYDUCQ.unzip\\float_data\\levels.npy',
 'C:\\Users\\tbeucler\\AppData\\Local\\pooch\\pooch\\Cache\\4e2111f8c8dc35a2e2c0f9ec759ecb61-EZwbaBqass1LhZO3DS3BCL0BhIlcENuoDItMB9b4IYDUCQ.unzip\\float_data\\lon.npy',
 'C:\\Users\\tbeucler\\AppData\\Local\\pooch\\pooch\\Cache\\4e2111f8c8dc35a2e2c0f9ec759ecb61-EZwbaBqass1LhZO3DS3BCL0BhIlcENuoDItMB9b4IYDUCQ.unzip\\float_data\\P.npy',
 'C:\\Users\\tbeucler\\AppData\\Local\\pooch\\pooch\\Cache\\4e2111f8c8dc35a2e2c0f9ec759ecb61-EZwbaBqass1LhZO3DS3BCL0BhIlcENuoDItMB9b4IYDUCQ.unzip\\float_data\\S.npy',
 'C:\\Users\\tbeucler\\AppData\\Local\\pooch\\pooch\\Cache\\4e2111f8c8dc35a2e2c0f9ec759ecb61-EZwbaBqass1LhZO3DS3BCL0BhIlcENuoDItMB9b4IYDUCQ.unzip\\float_data\\T.npy']
import numpy as np

Q1) Load each data file as a numpy array.

You can use whatever names you want for your arrays, but we recommend:

T: temperature

S: salinity

P: pressure

date: date

lat: latitude

lon: longitude

level: depth level

Hint 1: Look at the file name (the items in files) to know which files corresponds to which variable.

Hint 2: Check out the documentation for np.load.

Display the names of the items in files here

___
''

Then, load the files as numpy arrays, for instance using list comprehension

___,___,___,___,___,___,___ = [np.___(___[___]) for ___ in range(___)]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[4], line 1
----> 1 ___,___,___,___,___,___,___ = [np.___(___[___]) for ___ in range(___)]

TypeError: 'str' object cannot be interpreted as an integer

Q2) Recreate the level array using np.arange and np.linspace

Hints:

Display the level array

___

Recreate the level array using np.arange and call it level_arange

level_arange = np.arange(___,___)

Check that level and level_arange are equal using np.testing.assert_equal

np.___.___(___,level_arange)

Recreate the level array using np.linspace and call it level_linspace

___ = np.___(___,___,___)

Check that level and level_linspace are equal using np.testing.assert_equal

___.___(___,___)

Q3) Examine the shapes of T, S and P compared to lon, lat, date and level. How do you think they are related?

Hint: Check out the NDArrays subsection

Display the shapes of all loaded variables

___.shape
___.___

Based on the shapes, which dimensions do you think are shared among the arrays?

Q4) Based on the formula below, calculate the seawater density relative_density, relative to pure water, as a function of the temperature, the salinity, and the pressure.

relative_density\( = ρ −ρ_{Pure\ Water} = a \times S +b \times \Theta + c \times \Theta^{2}\)

where:

  • The densities \(\rho\) and \(ρ_{Pure\ Water}\) are in units \(kg/m^{3}\).

  • The constants \(a\), \(b\), and \(c\) are provided below.

  • The function to calculate the conservative temperature \(\Theta\) (in units Celcius) from temperature, salinity, and pressure is provided below.

  • The temperature \(T\) is in units Celcius.

  • The salinity \(S\) is in units \(g/kg\).

  • The pressure \(p\) is in units \(dbar\).

Hint: The loaded numpy arrays temperature, salinity, and pressure already have the right units and no conversion is needed.

Sources:

  1. Roquet, Fabien, et al. “Defining a simplified yet “realistic” equation of state for seawater.” Journal of Physical Oceanography 45.10 (2015): 2564-2579.

  2. The Gibbs SeaWater (GSW) Oceanographic Toolbox of TEOS-10. (License)

Below are the constants a, b, and c:

a = 7.718e-1
b = -8.44e-2
c = -4.561e-3

Let’s import the library gsw that contains the function CS_from_tto calculate the conservative temperature \(\Theta\) from temperature, salinity, and pressure.

!pip install gsw
from gsw import CT_from_t

Now it’s all up to you. Here’s the equation to avoid having to scroll back up:

relative_density\( = ρ −ρ_{Pure\ Water} = a \times S +b \times \Theta + c \times \Theta^{2}\)

Calculate the conservative temperature

Hint: use CT_from_t

___ = CT_from_t(___,___,___)

Calculate the relative density using the equation above

___ = ___

Q4) Make a plot for each column of data in T, S, P, and relative_density (four plots)

For this question, we have to use the Pyplot interface of the Matplotlib library for visualization even if we have not covered it extensively in class yet. But fear not as we provide easy-to-follow instructions below. 😊

The first step is to import Pyplot. Simply execute the code below.

import matplotlib.pyplot as plt

Then, we will plot variables as a function of the ocean depth, level. Simply read the documentation at this link to infer the correct syntax. Label your axes using plt.xlabel and plt.ylabel, and add a title using plt.title.

Hint: The vertical scale should use the level data to be consistent with oceanographic conventions.

Hint 2: Each plot should have a line for each column of data. It will look messy, like the plot below:

Salinity_example.png

plt.plot(___,___); # The semi-colon prevents printing the line objects
plt.xlabel(___) # Takes a string as argument
plt.ylabel(___)
plt.title(___)

Make more plots below:

Q5) Compute the mean and standard deviation of each of T, S, P, and seawater_density at each depth in level.

Hint: You may want to read the documentation at this link and this link.

Hint 2: You can check that you took the mean and standard deviations along the correct axes by checking the shape of your results.

Compute the means…

___ = np.___(___,axis=___)
___
___
___

… and the standard deviations.

___ = ___.___(___,___)

Check that they have the same shape as your vertical level lev coordinate:

np.testing.assert_equal(___.shape,lev.shape)

Q6) Now make similar plots, but show only the mean T, S, P, and seawater_density at each depth. Show error bars on each plot using the standard deviations.

Hint: If you are feeling adventurous, you can directly use the plt.errorbar function.

Hint 2: You should get plots similar to the one below

Salinity_mean.png

plt.errorbar(___,___,xerr=___)
plt.xlabel(___)
plt.ylabel(___)
plt.title(___)

Three more plots and we’ll be all set! 🙂

Q7) Account For Missing Data

The profiles contain many missing values. These are indicated by the special “Not a Number” value, or np.nan.

When you take the mean or standard deviation of data with NaNs in it, the entire result becomes NaN. Instead, if you use the special functions np.nanmean and np.nanstd, you tell NumPy to ignore the NaNs.

Recalculate the means and standard deviations as in the previous sections using these functions and plot the results.

Hint: Links to the np.nanmean documentation and the np.nanstd documentation.

Recalculate the means below ignoring the missing values. We trust that you can now come up with the full syntax yourself 😎

Similarly, recalculate the standard deviations ignoring the missing values.

Q8) Create a scatter plot of the longitudinal (lon) and latitudinal (lat) coordinates of the ARGO floats.

Again, we have not discussed it in the tutorial, but there is a really convenient scatter plot function called plt.scatter provided by the Pyplot interface.

Bonus: Label your figure using plt.xlabel, plt.ylabel, and plt.title.

Bonus 2: Increase the fontsize of your labels by adding a fontsize= argument to the label functions.

Bonus 3: Make your scatter plot beautiful by changing the arguments of plt.scatter listed in the documentation, for example s=.

plt.scatter(___,___)
___ # Fancy bonuses
___ # More fancy bonuses