Subsetting Data using lot/lan values


I have been using the bellow code to analyse sentinel-5p NO2 data:

product = harp.import_product(“path\month\*.nc”,
operations=“tropospheric_NO2_column_number_density_validity>75;keep(latitude_bounds,longitude_bounds,tropospheric_NO2_column_number_density,surface_zonal_wind_velocity,surface_meridional_wind_velocity);bin_spatial(1801,-90,0.1,3601,-180,0.1);derive(tropospheric_NO2_column_number_density [Pmolec/cm2])”,
post_operations= “bin();squash(time, (latitude_bounds,longitude_bounds));derive(latitude {latitude});derive(longitude {longitude});exclude(latitude_bounds,longitude_bounds,latitude_bounds_weight,longitude_bounds_weight,count,weight)”)

I am only interested in the Mediterranean Sea region. Even though the files in my directory are only relevant for the area of interest I am still processing a lot of unnecessary data. This results into problems when trying to compute monthly plots with around 90 netcdf files. At the moment I can run around 30 files at a time otherwise my Python crash. It is a computational limitation I have to live with.

Is there a way to subset the data with HARP using specific lot/lan values so that I can process just the data relevant for me?

Thanks in advance for your help.

Hi Ryan,

There are several things you can do.

If you are only interested in the Mediterranean area, then use a spatial binning that matches that area. For instance bin_spatial(251,25,0.1,551,-10,0.1). This will already reduce the size a bit.

What I would further recommend is to only ingest daily grids as a start:

grid1 = harp.import_product(".../day1/*.nc", operations="...", post_operations="...")
grid2 = harp.import_product(".../day2/*.nc", operations="...", post_operations="...")

daily_grids = [grid1, grid2, ...]

And then merge these daily grids into a single monthly grid using:

monthly_grid = harp.execute_operations(daily_grids, "", "bin()")

This is both fast and memory efficient.

1 Like