Creating L3 grids very slow for SO2

jamessill1978 · December 15, 2021, 7:30pm

Hi All. I am creating a global L3 product for each day of Sentinel 5p S02 data and I am seeing very slow processing times - upwards of 30 minutes and sometimes seeing the conda env crash. I am using the following operations called from the Harp Python library:

operations = “;”.join([‘SO2_column_number_density_validity>50’,
‘derive(SO2_column_number_density {time})’,
‘keep(latitude_bounds,longitude_bounds,SO2_column_number_density)’,
‘bin_spatial(3601,-90,0.05,7201,-180,.05)’,
‘derive(latitude {latitude})’,
‘derive(longitude {longitude})’])
reduce_operations = “squash(time, (latitude, longitude, latitude_bounds, longitude_bounds));bin()”

Is there a different approach that I should be taking with the operations and post operations to reduce the processing time? Thanks for the help.

sander.niemeijer · December 16, 2021, 10:59am

What you are doing is more or less what is done for Copernicus Sentinel-5P Mapping Portal and that takes less than 3 minutes for constructing a daily grid.

Are you running into memory issues perhaps? If your system needs to swap a lot, things can become very very slow.

jamessill1978 · December 16, 2021, 3:12pm

It is taking less than 3 minutes for them to do the daily grid with a .05 cell size? Yes, something has to be going wrong on my system. Memory problems could be the issue, but I’m not seeing any serious memory spikes when running the process. I am using Windows10 as my operating system, so I’ve adjusted the system paging values, so hopefully that works.

sander.niemeijer · December 16, 2021, 4:07pm

For the mapping portal the SO2 is ingested using options="so2_column=7km" and operations="SO2_type>0;solar_zenith_angle<70;valid(SO2_column_number_density)".
In other words, the 7km data is ingested. I think the valid() and SO2_type filters already remove a lot of pixels (you only get the plumes), which makes things go pretty fast.

simonB · December 17, 2021, 7:44am

I don’t know whether it can help you but I’ve also been running into slow processing times when creating daily maps out of S5P products. I’ve found a way to go faster:

Pre-process the native S5P products to crop the full orbit files to your area of interest and get rid of the variables you’re not interested in. Native S5P L2 files contain many variables and I guess not all of them is of interest for your study.
Export the preprocessed files. They are now much lighter (few dozens of Mb rather than several hundreds of Mb)
Merge the Pre-processed files altogether.

Here is an example for Ozone total column products:
Step 1 & 2

# the function `list` recovers the name of all files in the input_path folder
list = sorted(os.listdir(input_path))

# Get the name of all files in the folder and sort them in alphabetic order
files_input= sorted(glob.glob(join(input_path, 'S5P_OFFL_*.nc')))

#Starting time of the preprocessing
t0 = time.time()

#Pre-processing loop on each file in input_path to get rid of unnecessary variables (keep()) and crop to our ROI
for i in range(len(files_input)):
    #Pre-processing
    Converted_O3=harp.import_product(files_input[i], \
            operations= "latitude <= -40[degree_north]; \
            keep(latitude,latitude_bounds,longitude,longitude_bounds, \
            O3_column_number_density,O3_column_number_density_validity,datetime_start)")
    #Export of the preprocessed file to export_path
    harp.export_product(Converted_O3, join(export_path, list[i]),file_format="netcdf") 
    print("product", files_input[i],"pre-processed")
#End time of the pre-preprocessing
t1 = time.time()

pre_processing_time = t1-t0

print("Pre-processing time: {} seconds".format(pre_processing_time))

Step 3: (the input files are the preprocessed files from steps 1&2

#Use harp.import_product python function to generate the merged product

#Start time of the processing
t0 = time.time()

#Merging of the pre-processed files
Converted_O3 = harp.import_product(input_files, \
                      operations= "latitude <= -40[degree_north] ; \
                      O3_column_number_density_validity>50; \
                      bin_spatial(125, -90, 0.4, 900, -180, 0.4); \
                      derive(latitude {latitude}); derive(longitude {longitude}); \
                      keep(latitude,longitude,O3_column_number_density,weight)", \
                      post_operations="bin(); squash(time, (latitude,longitude,O3_column_number_density))"                   
                      )
#Export of the merged file        
harp.export_product(Converted_O3, export_file,file_format="netcdf")

#End time of the processing
t1 = time.time()

processing_time = t1-t0

#Total processing time accounts for the preprocessing time and the actual merging
total_processing_time=pre_processing_time+processing_time

print("Processing time: {} seconds".format(processing_time))
print("Total processing time: {} seconds".format(total_processing_time))

Using this methid I’ve halved and even divided by three my processing time. Hope it will be the same for you.