S5p large area tiles and mosaic using harp and xarray-pyhton

Trying to generate daily average Sentinel 5P(S5p) variables
(tropospheric_NO2_column_number_density) for continent level (~4000x4000 km). Following steps have been carried out. The sample code is posted here. Requesting comments on the queries.

0. S5p downloaded using Sentinelsat python API

  1. The images are then grouped into day wise after the datetime of image (end position) is converted into local time (IST) from UTC. Subsequent process are carried out on the day grouped netcdf files.
  2. Due to memory limitation, the area extent is split into four tiles(p1-p4). HARP, harpconvert is applied on to the individual images with each tile extent.

harpconvert --format netcdf --hdf5-compression 9 -a ‘latitude>2;latitude<23.62;longitude>62;longitude<84.64999999999999; tropospheric_NO2_column_number_density_validity>75; bin_spatial(2152,2,0.01,2255,62,0.01); derive(datetime_stop {time}); derive(latitude {latitude}); derive(longitude {longitude});keep(tropospheric_NO2_column_number_density,datetime_stop,latitude,longitude)’ /home/s5pdownload_daywise/ newfilename

  1. HARP, harpmerge is applied on to the split group of tiles. Assuming day average is happening in this step.

harpmerge -ap ‘bin(); squash(time, (latitude,longitude))’ -a ‘latitude>2;latitude<23.62;longitude>62;longitude<84.64999999999999; derive(longitude {longitude});derive(latitude {latitude})’ /home/s5pdownload_daywise/ newfilename

  1. The resultant four tiles are applied with unit conversion (into µmol/m2 ).
  2. Used numpy concatenate to mosaic/join the four tiles into one. Tried to use harpmerge in this steps ends up in multiple error.


  1. Is it okay to do daily average by local time conversion, although the region cover more than one time zones. There are images which having endposition time on night (after dawn in local time) is it okay to include these images in the daily average.
  2. Requesting comments on issues and methods to improve the workflow.
1 Like

This should be fine. Measurements can only be made during daytime, because reflected sunlight is needed.
There might be multiple overpasses near the poles, but that is not the region you are looking at.

I would recommend to combine the whole step of generating a daily grid into 1 single harpmerge call and make use of the new -ar parameter of harpmerge that got introduced in HARP 1.11 (e.g. -ar 'squash(time, (latitude, longitude));bin()'). This should resolve the memory issues that you are having, so you don’t have to split your grid in four subregions.

Many thanks, will explore with -ar option. The memory error raises in harpconvert such as follows, the grid number is 3000

harpconvert --format netcdf --hdf5-compression 9 -a ‘latitude>2;latitude<23.62;longitude>62;longitude<84.64999999999999; tropospheric_NO2_column_number_density_validity>75; bin_spatial(3000,2,0.01,3000,62,0.01); derive(datetime_stop {time}); derive(latitude {latitude}); derive(longitude {longitude});keep(tropospheric_NO2_column_number_density,datetime_stop,latitude,longitude)’ /home/s5pdownload_daywise/ newfilename

The error is out of memory (could not allocate 5483860000 bytes)

Make sure to put the keep operation at the beginning (and include latitude_bounds and longitude_bounds in the list). Otherwise you will create a grid of all variables first (before throwing most of them away again with the current keep operation at the end), which will require a lot of memory

Thanks, will check that.