Processing S-5p using Python (only)

Hi,

I am processing a temporal series (10 days -> 10 products over the same city) of Sentinel-5p N02 products using the xarray library in Python. I have read the files and access the nitrogendioxide_tropospheric_column variable which I manage to subset using xarray.where (http://xarray.pydata.org/en/stable/generated/xarray.where.html) and the lat/lon of my AOI.

My questiosn are:

  • once I have this subset data as xarray.DataArray how can I add them together to calculate the average values over that perior of time?
  • Do I need to regrid them to a common grid before the average calculation? How?
  • Any way to do this using only Python?

M

1 Like

My suggestion would be to read the S5P data using the Python interface of HARP. This will allow you to specify the regridding and merging as options to the harp.import_product() function. As a final step you could then turn the harp.Product into an xarray.DataArray. Since the harp product already uses numpy arrays, this shouldn’t be too difficult.

1 Like

Thanks for the suggestion. In that case, the operations to be included in my haprp.import_product() call would be:

test = harp.import_product(data, operations = "tropospheric_NO2_column_number_density_validity>50;
bin_spatial( , , , , , );
derive(tropospheric_NO2_column_number_density [Pmolec/cm2]);
latitude > X1 [degree_north] ; latitude < X2[degree_north] ; longitude > -Y1[degree_east] ; longitude < Y2[degree_east];
derive(longitude {longitude}) ; derive(latitude {latitude}

Some doubsts I have are (assuming I am not wrong in my assumptions):

  1. bin_spatial() → Since I define the lat/lon grid coordinates to which the product has to be resampled, is it necessary to add the comparison filter (latitude > X1 [degree_north], etc.) to subset my product?
  2. I use the derive(longitude {longitude})... operation to write the lat/lon center coordinates of the spatial grid in the output we use the derive operation. Why is this required? Is it necessary?
  3. Am I missing any other action that should be added in the operations call?

Thanks!

  1. bin_spatial() → Since I define the lat/lon grid coordinates to which the product has to be resampled, is it necessary to add the comparison filter ( latitude > X1 [degree_north] , etc.) to subset my product?

It is not necessary, but it will improve the performance when reading the L2 products.
Note that the order of operations in HARP is important. The lat/lon filtering needs to take place before you perform the bin_spatial. Either before or after the validity filter.

  1. I use the derive(longitude {longitude})... operation to write the lat/lon center coordinates of the spatial grid in the output we use the derive operation. Why is this required? Is it necessary?

The target grid that is used for the bin_spatial operation is defined in terms of grid edges (latitude_bounds and longitude_bounds). However, other software usually wants the grid coordinates defined in terms of center positions. These derive operations for longitude and latitude produce these center position grid coordinates.

  1. Am I missing any other action that should be added in the operations call?

That depends on what you want to do.
Note that if you want to combine all the files into a single grid (instead of a time series of grids per orbit), you would also have to provide a post_operations="bin()". Or probably even a post_operations="squash(time,(latitude,longitude,latitude_bounds,longitude_bounds);bin()" to keep the grid axis time independent.

Thanks again, I will have a look and test if my application requires the post_processing action. A question related to my previous post, I have seen differnet examples in this blog where the command following comand is added:

`…; derive(datetime_stop {{time}})

From the documentation, I see the purpose of derive but my question is again, why is this step necessary? Is it to add a comon time dimension ?

The derive operation is very powerful operation. It can be used to perform simple type or unit conversion. But you can also use it to derive completely new variables based on other variables that are already in the product.

In the case of the datetime_stop derivation, this is a variable that didn’t exist yet, so the derive operation will instruct harp to generate it for you. And harp automatically knows how to do this if it matches one of the built-in conversions. In the case of datetime_stop this is described here. If you derive a variable like this, then you will also have to provide the set of dimensions that the variable is depending on. In this case we want a datetime_stop variable that only depends on the time dimension (since that is where harp has a known conversion for).

Hi Sander,

Clear about the fact that a new variable is created using derive to report the datetime stop from start and length, but why would you need such variable? For which purpose is this required or in which context?

So far, I am testing my methodology in Python with one product only with:

test = harp.import_product(data,operations= “tropospheric_NO2_column_number_density_validity>50;
derive(tropospheric_NO2_column_number_density [Pmolec/cm2]);
derive(datetime_stop {time});
latitude > 42.2 [degree_north] ; latitude < 52.2 [degree_north] ; longitude > -12.2 [degree_east] ; longitude < 22.2 [degree_east];
bin_spatial(1000 ,42.2 ,0.01 ,1000 ,12.2 ,0.01);
derive(latitude {latitude});derive(longitude {longitude});\
keep(NO2_column_number_density,tropospheric_NO2_column_number_density,stratospheric_NO2_column_number_density,NO2_slant_column_number_density,tropopause_pressure,absorbing_aerosol_index,cloud_fraction,sensor_altitude,sensor_azimuth_angle,sensor_zenith_angle,solar_azimuth_angle,solar_zenith_angle)”)

At this point I have my product filtered, converted to molec/cm2, subseted and regridded to my AOI and with new variables (time_stop and the center cordinates of the grid). The output being:

source product = 'S5P_OFFL_L2__NO2____20200105T094546_20200105T112716_11549_01_010302_20200107T025641.nc'
history = "2020-04-01T08:24:43Z [harp-1.9.2] harp.import_product('/shared/Training/ATMO02_MonitoringPollution_Italy/Original4/S5P_OFFL_L2__NO2____20200105T094546_20200105T112716_11549_01_010302_20200107T025641.nc',operations='tropospheric_NO2_column_number_density_validity>50;                             derive(tropospheric_NO2_column_number_density [Pmolec/cm2]);                             derive(datetime_stop {time});                             latitude > 42.2 [degree_north] ; latitude < 52.2 [degree_north] ; longitude > -12.2 [degree_east] ; longitude < 22.2 [degree_east];                             bin_spatial(1000 ,42.2 ,0.01 ,1000 ,12.2 ,0.01);                             derive(latitude {latitude});derive(longitude {longitude})')"

double datetime_start {time=1} [seconds since 2010-01-01]
float datetime_length [s]
int orbit_index
double sensor_altitude {time=1, latitude=999, longitude=999} [m]
double solar_zenith_angle {time=1, latitude=999, longitude=999} [degree]
double solar_azimuth_angle {time=1, latitude=999, longitude=999} [degree]
double sensor_zenith_angle {time=1, latitude=999, longitude=999} [degree]
double sensor_azimuth_angle {time=1, latitude=999, longitude=999} [degree]
double pressure_bounds {time=1, latitude=999, longitude=999, vertical=34, 2} [Pa]
double tropospheric_NO2_column_number_density {time=1, latitude=999, longitude=999} [Pmolec/cm2]
double tropospheric_NO2_column_number_density_amf {time=1, latitude=999, longitude=999} []
double NO2_column_number_density {time=1, latitude=999, longitude=999} [mol/m^2]
double NO2_column_number_density_amf {time=1, latitude=999, longitude=999} []
double stratospheric_NO2_column_number_density {time=1, latitude=999, longitude=999} [mol/m^2]
double stratospheric_NO2_column_number_density_amf {time=1, latitude=999, longitude=999} []
double NO2_slant_column_number_density {time=1, latitude=999, longitude=999} [mol/m^2]
double cloud_fraction {time=1, latitude=999, longitude=999} []
double absorbing_aerosol_index {time=1, latitude=999, longitude=999} []
double cloud_albedo {time=1, latitude=999, longitude=999} []
double cloud_pressure {time=1, latitude=999, longitude=999} [Pa]
double surface_albedo {time=1, latitude=999, longitude=999} []
double surface_altitude {time=1, latitude=999, longitude=999} [m]
double surface_pressure {time=1, latitude=999, longitude=999} [Pa]
double surface_meridional_wind_velocity {time=1, latitude=999, longitude=999} [m/s]
double surface_zonal_wind_velocity {time=1, latitude=999, longitude=999} [m/s]
double sea_ice_fraction {time=1, latitude=999, longitude=999} []
double tropopause_pressure {time=1, latitude=999, longitude=999} [Pa]
double datetime_stop {time=1} [s since 2000-01-01]
long count {time=1}
float weight {time=1, latitude=999, longitude=999}
float solar_zenith_angle_weight {time=1, latitude=999, longitude=999}
float solar_azimuth_angle_weight {time=1, latitude=999, longitude=999}
float sensor_zenith_angle_weight {time=1, latitude=999, longitude=999}
float sensor_azimuth_angle_weight {time=1, latitude=999, longitude=999}
double latitude_bounds {latitude=999, 2} [degree_north]
double longitude_bounds {longitude=999, 2} [degree_east]
double latitude {latitude=999} [degree_north]
double longitude {longitude=999} [degree_east]

The objecitve is to continue my analysis using xarray so I need to export my harp.Product into an xarray.DataArray. Using:

import xarray as xr

export_path = 'my/path/test6.nc'
harp.export_product(test, export_path,file_format='netcdf')
test_open = xr.open_dataset(export_path)
test_open

I get the following error:

ValueError: unable to decode time units ‘s since 2000-01-01’ with the default calendar. Try opening your dataset with decode_times=False.

What I am missing on the way I export my product?

Clear about the fact that a new variable is created using derive to report the datetime stop from start and length, but why would you need such variable? For which purpose is this required or in which context?

This is useful when you want to know the overal time range if you merge a few products together using bin() as post_operations. The datetime_start and datetime_stop will then be propagated by taken the min/max (respectively) when you do the binning in the time dimension.

What I am missing on the way I export my product?

I am not that familiar with xarray. The time unit that harp uses is just the same kind of unit that almost everyone in the netcdf/hdf community uses, which is udunits2. You might want to ask the xarray people on this (or just pass this decode_times=False argument to xr.open_dataset as is already suggested).

Note that HARP products may not be imported directly in xarray anyway. You might have to perform some steps of your own to make things work. Most come down to the compatibility aspects that are described in the HARP documentation.

1 Like

I see that you are filtering the tropospheric column product on a qa_value of 0.5, but I don’t see you using an averaging kernel. If you are not using an averaging kernel in your application, then the strongly recommended threshold for the tropospheric column product is 0.75 (tropospheric_NO₂_column_number_density_validity>75;).

1 Like

@svniemeijer thank you, I think I can continue from here to assimilate the harp product into xarray.

@MaartenSneepKNMI thank you for noticing. Reading the technical guide again, I see it is clearly mention indeed. When using qa_value > 0.5:

In particular this choice is useful for assimilation and model comparison studies where the averaging kernels are used.

Using the averaging kernel is important for data users who wish to minimise the discrepancies betweenthe assumptions in the TROPOMI retrieval and their application of interest, for example for validation, datavassimilation, or comparison to a model

Two questions I have in this regard then:

  1. Knowing that using qa_value > 0.5 (in comparison to `qa_value > 0.75) introduces lower quality pixels, and that it is useful for assimilation and model comparison studies, in the case of using the data for simple visualization of concentration on a single product (or an average map over a specific period), should we stick always to 0.75?

  2. Not very sure on how to introduce the averaging kernel in HARP. Could you please develop more on this? On the technical guide I see it is mentioned:

A model simulated satellite NO2 column is obtained by multiplying the model partial column profile with the averaging kernel

  1. Yes.
  2. Not a clue, sorry.
  1. Not very sure on how to introduce the averaging kernel in HARP. Could you please develop more on this? On the technical guide I see it is mentioned:

This only applies if you have a second dataset (which is from an atmospheric model or from ground based data). You would then apply the averaging kernel from the satellite data to this second dataset. There are ways to do this with HARP, but I am not sure that this is what you are looking for.

@svniemeijer Good to know, definetly I am not working in modelling applications. Thank you!

Hi,
I’d like to ask you almost similar question, Why xr. couldn’t read the s5p_offline_2019 and it reads 2018 product without any error?

Thanks a lot.

Note that this forum is about the atmospheric toolbox software. Using xarray to directly read the S5P L2 products is not really a valid topic.

Also, if you need help, please provide more detailed information. You should show the errors that you are getting. And also provide exact references to the 2018 and 2019 products that you tried.

Hi,

Thanks a lot for your response, I actually asked my question in this forum, because the title is S-5P, and then the most s5 related is the atmosphere.

This is part of the script, and not directly used, but only to show the way…,

This one example of the products 2019,

S5P_OFFL_L2__NO2____20190327T083715_20190327T101846_07519_01_010300_20190402T100740.nc

I managed to install anaconda and install harp, however, I still keen to find out why xr doesn’t work,

The error is ,

Might be the download of the item caused an issue, so I should download it again!

Many thanks for your nice assistance.

This looks very much like a corrupt product indeed. Please try to download the file again.

1 Like

Other question please,

Do you have any idea of downloading part of the orbit rather thank entire orbit?

Downloading partial S5P L2 products is unfortunately not possible.
Note that the NRTI products are actually already split into orbit parts, but they are of lower quality than the OFFL/RPRO products, so it is not recommended to use them.

1 Like

Thanks a lot for your very nice answers and notes.