Create a time series of SP5 NO2 data merging tiles to cover Europe country

TheRed86 · June 12, 2023, 11:40am

Hi,

I discovered the HARP module while struggling with SP5 data a few days ago.

I must create a .nc file containing a time series (e.g., ten consecutive days) of merged tiles (I need to cover a delimited area, so I need more than one orbit per day) of NO2 data from Tropomi. I was able to merge the tiles perfectly, but I don’t understand how to obtain a time series of that merged data. Is that possible to use the Harp toolkit for the time series?
Before trying HARP, I tried a combination of xarray, rioxarray, gdal, and cdo but encountered issues with each.

I really appreciate any help you can provide.
Monica

sander.niemeijer · June 12, 2023, 12:51pm

What exactly do you need per day? Is it only one pixel (i.e. the delimited area is just one satellite pixel), is it a grid for a small area (do you use bin_spatial?), or is it a single averaged value for a series of pixels for that area (do you use bin on all pixels for the area)?
Based on that there are different ways to arrive at a time series.

TheRed86 · June 12, 2023, 1:04pm

Dear Sander,

I want to cover Europe for several consecutive days in one single .nc file.

Therefore, I need to merge about four original data tiles from the Sentinel hub to have Europe coverage per day. I am interested in NO2 for now, but I will apply the same mechanism to other SP5 quantities once I understand how to do it.

Is that clearer now?

sander.niemeijer · June 12, 2023, 1:45pm

But how do you want to ‘represent’ Europe for a single day? As a single averaged value? Using some common grid? If you want a time series, you need to specify a time series of ‘what’.
If you just filter the satellite pixels for the area, you already have a time series, a time series of satellite pixels for that day. But if I understand you correctly, you want to turn this time series of satellite pixels for a day into something that is associated to a single timestamp. Are you already using bin_spatial for this? (because I think this is what you would need). Please check out the other topics on this forum on that operation. Otherwise, please provide more information on what harp operations you currently are already performing, because I have a hard time understanding what it is that you are looking for.

TheRed86 · June 12, 2023, 3:16pm

Dear Sander,

I want to have a daily map of NO2 over Europe with a single timestamp associated with the current day.
Practically, I want a dataset having a time series of maps with dimensions of longitude, latitude, and time (e.g., 2022-07-01, 2022-07-02, and so forth).
I don’t understand how to include the time dimension in the dataset and collect maps over several days in the same file.

For a single day, I was able to merge the data (following the example of Usecase_4_S5P_NO2_LosAngeles_port) as follows:

operations = ";".join([
    "tropospheric_NO2_column_number_density_validity>75",
    "keep(latitude_bounds,longitude_bounds,datetime_start,datetime_length,tropospheric_NO2_column_number_density)",
    "derive(datetime_start {time} [days since 2000-01-01])",
    "derive(datetime_stop {time}[days since 2000-01-01])", 
    "exclude(datetime_length)",
    "bin_spatial(601,30,0.05,901,-15,0.05)",
    "derive(tropospheric_NO2_column_number_density [Pmolec/cm2])",
    "derive(latitude {latitude})",
    "derive(longitude {longitude})",
    "count>0"
])
reduce_operations=";".join([
    "squash(time, (latitude, longitude, latitude_bounds, longitude_bounds))",
    "bin()"
])

files_in=".<path_to_data>/S5P_OFFL_L2__NO2____20220731*.nc"

no2_data =harp.import_product(files_in, operations, reduce_operations=reduce_operations)

harp.export_product(no2_data, 's5p-NO2_L3_timeseries.nc')

Obviously, as it is, no time dimension is present in the output .nc file, but the spatial merging is fine, as well as the projection (lon, lat coordinates). Here is the output data open in Panoply:

I need to have the same for several consecutive days, all stored in one single NetCDF file.
I looked for something similar in the forum, but nobody asked for daily series of spatial merged data.

Thanks for the quick response. Hopefully, I explained well what I would need.
Regards,
Monica

sander.niemeijer · June 12, 2023, 3:41pm

Ok. Then I think I understand what you want.

You will have to perform this in multiple steps. First you would have to create separate files per day as you are now doing. So you would have a s5p-no2-l3-20220731.nc, s5p-no2-l3-20220801.nc, etc.

Then, you just call a harpmerge on these files (without operations), to generate the s5p-no2-l3-timeseries.nc.
From within python this would be:

no2_series=harp.import_product('s5p-no2-l3-2022*.nc')
harp.export_product(no2_series, 's5p-no2-l3-timeseries.nc')

TheRed86 · June 12, 2023, 3:45pm

Will it contain a human-readable date? Something like 2022-07-01, 2022-07-02, etc. ?

I asked that because I only see “double datetime {time=1} [s since 2000-01-01]” in the dataset as it is now.

Thanks for the help!!! Appreciated.

sander.niemeijer · June 12, 2023, 3:52pm

One thing, you might still need to do:

harp.export_product(no2_series, 's5p-no2-l3-timeseries.nc'), operations="squash(time, (latitude, longitude, latitude_bounds, longitude_bounds))")

to remove the time dimension from the axis variables again.

For the datetime variables, you won’t get ‘dates’, but you will get a time series of datetime_start and datetime_stop values with the minimum and maximum time for each grid.
I will have to think about having a capability in HARP to turn this into single truncated ‘date’ values per grid, because that is currently not possible. However, you could also write a routine in python to create such a variable in the no2_series product yourself based on the datetime_start/datetime_stop variables. Have a look at the harp python examples to see how to add variables to a product.

TheRed86 · June 12, 2023, 3:57pm

Thanks again. I will adopt your method to come up with the wanted solution!
Have a nice day,
Monica

TheRed86 · June 13, 2023, 8:34am

Dear Sander, good morning!

I have a question regarding the merging method adopted by Harp.
What value is used in pixels covered by multiple tiles (e.g., .nc file) during the merge action?

Thanks in advance,
M

sander.niemeijer · June 13, 2023, 11:03am

For bin_spatial you can find a short description here. For the bin operation we just take an average.
Note that HARP also automatically handles some special cases. For instance, pressure is averaged logarithmically, and angles are averaged based on unit vectors for those angles.

TheRed86 · June 13, 2023, 11:25am

Hi Sander,

I am trying to apply the method we discussed yesterday, but I’ve just discovered that the projection method is not optimal for me, as I need EPSG:4326. Is there a way to use something similar to the following GDAL command?

gdalwarp -srcnodata 9.96921e+36 -dstnodata 9999 -geoloc -t_srs EPSG:4326 -overwrite $product_file.tmp.variable_mod.vrt $output_file

Note that to be able to apply such a command using the original S5P files from the Sentinel hub, I do the following steps, which return me a correct CRS (I can load them in GeoServer):

gdal_translate -of VRT HDF5:"${product_file}":longitude $product_file.tmp.lon.vrt
gdal_translate -of VRT HDF5:"${product_file}":latitude $product_file.tmp.lat.vrt
gdal_translate -of VRT HDF5:"${product_file}":$variable $product_file.tmp.variable.vrt

head -n1 $product_file.tmp.variable.vrt > $product_file.tmp.variable_mod.vrt
echo '<Metadata domain="GEOLOCATION">
	<mdi key="X_DATASET">'$product_file'.tmp.lon.vrt</mdi>
	<mdi key="X_BAND">1</mdi>
	<mdi key="Y_DATASET">'$product_file'.tmp.lat.vrt</mdi>
	<mdi key="Y_BAND">1</mdi>
	<mdi key="PIXEL_OFFSET">0</mdi>
	<mdi key="LINE_OFFSET">0</mdi>
	<mdi key="PIXEL_STEP">1</mdi>
	<mdi key="LINE_STEP">1</mdi>
      </Metadata>' >> $product_file.tmp.variable_mod.vrt
tail --lines=+2 $product_file.tmp.variable.vrt >> $product_file.tmp.variable_mod.vrt

gdalwarp -srcnodata 9.96921e+36 -dstnodata 9999 -geoloc -t_srs EPSG:4326 -overwrite $product_file.tmp.variable_mod.vrt $output_file

My problem concerns the coordinates lon and lat created with harp_import. I get an error if I try to apply the same process to reproject data:

gdal_translate -of VRT HDF5:S5P_OFFL_L2__NO2____20220731_merged.nc:longitude prova.lon.vrt
HDF5-DIAG: Error detected in HDF5 (1.10.7) thread 1:
  #000: ../../../src/H5F.c line 429 in H5Fopen(): unable to open file
    major: File accessibility
    minor: Unable to open file
  #001: ../../../src/H5Fint.c line 1803 in H5F_open(): unable to read superblock
    major: File accessibility
    minor: Read failed
  #002: ../../../src/H5Fsuper.c line 420 in H5F__super_read(): file signature not found
    major: File accessibility
    minor: Not an HDF5 file
ERROR 4: HDF5:S5P_OFFL_L2__NO2____20220731_merged.nc:longitude: No such file or directory

If I try gdalinfo over the obtained files, I see the following:

**gdalinfo S5P_OFFL_L2__NO2____20220731_merged.nc** 

Driver: netCDF/Network Common Data Format
Files: S5P_OFFL_L2__NO2____20220731_merged.nc
Size is 512, 512
Metadata:
  NC_GLOBAL#Conventions=HARP-1.0
  NC_GLOBAL#datetime_start=8227.443437696758
  NC_GLOBAL#datetime_stop=8247.609589328702
  NC_GLOBAL#history=2023-06-13T11:01:10Z [harp-1.17] harp.import_product('../eodata/sentinel5p/no2/S5P_OFFL_L2__NO2____202207*.nc',operations='tropospheric_NO2_column_number_density_validity>75;keep(latitude_bounds,longitude_bounds,datetime_start,datetime_length,tropospheric_NO2_column_number_density, absorbing_aerosol_index);derive(datetime_start {time} [days since 2000-01-01]);derive(datetime_stop {time}[days since 2000-01-01]);exclude(datetime_length);bin_spatial(801,30,0.05,901,-15,0.05);derive(tropospheric_NO2_column_number_density [Pmolec/cm2]);derive(latitude {latitude});derive(longitude {longitude});latitude <= 85[degree_north];latitude >= 30[degree_north];longitude >= -15 [degree_east];longitude <= 30 [degree_east];count>0',reduce_operations='squash(time, (latitude, longitude, latitude_bounds, longitude_bounds));bin()')
Subdatasets:
  SUBDATASET_1_NAME=NETCDF:"S5P_OFFL_L2__NO2____20220731_merged.nc":tropospheric_NO2_column_number_density
  SUBDATASET_1_DESC=[1x800x900] tropospheric_NO2_column_number_density (64-bit floating-point)
  SUBDATASET_2_NAME=NETCDF:"S5P_OFFL_L2__NO2____20220731_merged.nc":absorbing_aerosol_index
  SUBDATASET_2_DESC=[1x800x900] absorbing_aerosol_index (64-bit floating-point)
  SUBDATASET_3_NAME=NETCDF:"S5P_OFFL_L2__NO2____20220731_merged.nc":weight
  SUBDATASET_3_DESC=[1x800x900] weight (32-bit floating-point)
  SUBDATASET_4_NAME=NETCDF:"S5P_OFFL_L2__NO2____20220731_merged.nc":latitude_bounds
  SUBDATASET_4_DESC=[800x2] latitude_bounds (64-bit floating-point)
  SUBDATASET_5_NAME=NETCDF:"S5P_OFFL_L2__NO2____20220731_merged.nc":longitude_bounds
  SUBDATASET_5_DESC=[900x2] longitude_bounds (64-bit floating-point)
Corner Coordinates:
Upper Left  (    0.0,    0.0)
Lower Left  (    0.0,  512.0)
Upper Right (  512.0,    0.0)
Lower Right (  512.0,  512.0)
Center      (  256.0,  256.0)

Why longitude and latitude do not appear in the SUBDASET?
I guess this is the reason why I cannot perform the extract of the longitude and latitude variables.

Is there a way to keep longitude and latitude as actual variables so that I see them as for latitude_bounds and longitude_bounds?

I tried the following operations:

operations = ";".join([
    "tropospheric_NO2_column_number_density_validity>75",
    "keep(latitude_bounds,longitude_bounds,datetime_start,datetime_length,tropospheric_NO2_column_number_density, absorbing_aerosol_index)",
    "derive(datetime_start {time} [days since 2000-01-01])",
    "derive(datetime_stop {time}[days since 2000-01-01])", 
    "exclude(datetime_length)",
    "bin_spatial(801,30,0.05,901,-15,0.05)",
    "derive(tropospheric_NO2_column_number_density [Pmolec/cm2])",
    "derive(latitude {latitude})",
    "derive(longitude {longitude})",
    "latitude <= 85[degree_north]",
    "latitude >= 30[degree_north]",
    "longitude >= -15 [degree_east]",
    "longitude <= 30 [degree_east]",
    "count>0",
    "exclude(count)",
    "keep(latitude, longitude, datetime_start, datetime_stop, tropospheric_NO2_column_number_density)"
])

But I get the following error:

CLibraryError                             Traceback (most recent call last)
Cell In[20], line 42
     40     s5p_fps = sorted(glob.glob(q))
     41     print('Merging: ', date)
---> 42     no2_map=harp.import_product(files_in, operations, reduce_operations=reduce_operations)
     43 #     harp.export_product(no2_map, output_file, operations="squash(time, (latitude, longitude, latitude_bounds, longitude_bounds))")
     44     harp.export_product(no2_map, output_file)

File ~/miniconda3/envs/SP5_harp_2/lib/python3.11/site-packages/harp/_harppy.py:1254, in import_product(filename, operations, options, reduce_operations, post_operations)
   1252 try:
   1253     if _lib.harp_product_append(merged_product_ptr[0], c_product_ptr[0]) != 0:
-> 1254         raise CLibraryError()
   1255 finally:
   1256     _lib.harp_product_delete(c_product_ptr[0])

CLibraryError: products don't both have variable 'tropospheric_NO2_column_number_density_count'

Sorry for bothering you again, but I have to understand if Harp is the right library for me.

Monica