Processing S-5p using Python (only)

Hello. I am doing similar data processing as people above, however I want to also keep the time variable (in my case, this is just the date of the scan). I have added it in with the keep operations, however it says the time variable does not exist, is there a way I can get round this.
I know the scan date is also in the name of the file but I need it as a variable.
Will appreciate any help!

Note that HARP does a format conversion when it reads S5P products. Several variables will end up with a different name. Have a look at the HARP documentation for the S5P NO2 product to see which variable names to use. In your case, you should at least include datetime_start and probably also datetime_length.

1 Like

Oh I see! That makes alot of sense, thank you !

Hi Sander.

I am interested in applying the averaging kernel from TROPOMI onto WRF-chem data and in this post you mention that HARP can do it.

Kindly share with me a sample script on how HARP can do this.

Unfortunately, I don’t have some example script at hand. It would take some effort to create an example for this as well. Not to mention that HARP currently does not support WRF-chem data natively, so the conversion of the model data to HARP would then also first have to be performed.

Roughly speaking, how it goes is that you interpolate the model to the satellite grid. Then you collocate the model data and satellite data by assigning a collocation_index variable (something that normally is done via harpcollocate, but you can shortcut it in this case). Then you apply the averaging kernel using either the smooth() HARP operation when you want profiles or derive_smoothed_column() when you want columns. And finally you can regrid both datasets back to a lat/lon grid for easier visualisation.

related to the derive(latitude {latitude}) operation I have the following question:

Why is “squash(time,(latitude,longitude));” needed as a post processing in the following commmand to produce lat/lon center

harpmerge -f hdf5 -a “bin_spatial(360,-90,0.5,720,-180,0.5);derive(latitude{latitude});
derive(longitude{longitude});” -ap “squash(time,(latitude,longitude));” x.nc y.nc

(with x.nc , for example, a tropomi L2 file)
Whereas without the squash operation in the previous command, y.nc does not contain the lat/lon centers.

If you don’t perform the squash then you will still have latitude/longitude variables containing the center positions (a simple ncdump will show you this). However, these variables will have dimensions time,latitude and time,longitude, which most applications don’t accept as axis variables. The squash operation removes the time axis (which is allowed if your grid is the same for each time step). And the latitude/longitude variables will then be ‘proper’ axis variables to those applications.

I doubt that (but maybe I am doing something else wrong). So what I do (with harp version 1.14) is the following:

harpmerge -f hdf5 -a “bin_spatial(360,-90,0.5,720,-180,0.5);derive(latitude{latitude});d
erive(longitude{longitude});” S5P_OFFL_L2__HCHO___20211001T011729_20211001T025859_20553_02_020201_20211002T165738.nc y.nc

then the output file y.nc does not contain lat or lon centers (it contains all 0.0 for lat/lon)

When I do
harpmerge -f hdf5 -a “bin_spatial(360,-90,0.5,720,-180,0.5);derive(latitude{latitude});derive(longitude{longitude});” -ap “squash(time,(latitude,longitude));” S5P_OFFL_L2__HCHO___20211001T011729_20211001T025859_20553_02_020201_20211002T165738.nc y.nc

y.nc then contains the lat lon centers, apparently the squash operation causes the lat/lon centers to be printed out, whereas without the squash operation the lat/lon centers are not printed, but are all 0.0.

There is indeed something strange going on. This looks like a bug in harp. I will investigate.

I looked into it. Things are actually as they are intended.

The point is that you are using hdf5(/netcdf4). Everything is actually explained already in our documentation.

The actual latitude/longitude values are stored in variables called _nc4_non_coord_latitude and _nc4_non_coord_longitude if they are not one-dimensional.

When you read the data with either the netcdf4 library or harp, you will not see those 0 values for lat/lon. As they are picked up from these _nc4_non_coord_ variables.

Hi Sorry to pop in that discussion. I have a related question and maybe some of you have the answer. About the averaging kernel. I’m trying to use the averaging kernel to compare modelled mixing ratio with tropomi retrievals. From a netcdf file, I am able to get:
“DETAILED_RESULTS/column_averaging_kernel”
“INPUT_DATA/altitude_levels”
but the averaging kernel has 12 levels (array 12xlonxlat) while the altitude has 13 levels (array 13xlonxlat).
I have been browsing online document for days to find out how to match altitude with averaging kernel values. I have not find if (1) the altitude in the netcdf is the one to use and (2) how to reconciliate the dimension (is it intervals of altitudes?).
Thanks!

@eveliseb, first, please make sure to mention what product you are trying to read. I assume you are trying to use a CH4 product?

If you read the data with HARP, then all this complication gets resolved for you. You will then get lower/upper bounds for each layer in an altitude_bounds variable. And you can get the layer mid points if you provide a derive(altitude {time,vertical} [km]) step in your HARP operations.

You can find all the details on how HARP maps the S5P data in the documentation for the CH4 product. The mapping table at the bottom describes where HARP gets all the data from. This table is also a good reference for how to interpret the data if you don’t want to use HARP.