I have a number of L3 files containing daily grids of vertical Glyoxal (CHOCHO) column densities, each one created from L2 orbit files using the HARP bin_spatial() and bin() operations, and then manually edited to include additional metadata etc. The daily L3 files have *weight variables describing the overlap of L2 pixels with L3 grid cells. I’d like to use harpmerge to obtain the mean vertical column density for a number of days. This works when I use harpmerge as follows
harpmerge -f hdf5 -a "keep(*time*,tropospheric_CHOCHO_column_number_density,longitude*,latitude*,*count,*weight)" -ap "squash(time,(latitude,longitude,latitude_bounds,longitude_bounds));bin()" <daily_input_files> merged_output.nc
However, the file merged_output.nc no longer contains *weight variables. As a consequence I’m unable to perform the same calulation using harpmerge “reduce operations” (-ar option), as the weight variables are lost in the intermediate product which is created after the first merge. When I try to perform the same binning with reduce operations (which should be more efficient/less memory intensive), I get the error
ERROR: products don't both have variable 'tropospheric_CHOCHO_column_number_density_weight' (while merging <second daily grid>)
What could be the reason that the tropospheric_CHOCHO_column_number_density_weight variable is removed after performing bin()?
I am not sure that it is true that allweight variables got removed. The variable called weight itself should still be present.
The reason for the existence of tropospheric_CHOCHO_column_number_density_weight is if your original tropospheric_CHOCHO_column_number_density variable contained invalid/NaN values while you were regridding. Since NaN values cannot be included in an average, we need to keep a separate weight sum (tropospheric_CHOCHO_column_number_density_weight) that will be lower than the total weight (weight).
Whenever in HARP we append two products we actually have a mechanism in place to add missing variable-specific weight and count variables in case it only existed in one of the two. It wil then create the variable-specific weight/count variable by making a copy of the global weight/count variable. But this will only be performed if the variable-specific weight/count variable in the product that had it has the same dimensions as the global weight/count variable.
Could you please check what HARP version you are using? We have made some changes in the past, so maybe you are using a version from before that?
As a side note, it is better if you just omit any NaN values in your daily gridding (you should not include invalid measurements in your daily grids). This will eliminate this problem altogether.
Also, be aware that for CHOCHO there are already L3 products available from the S5P-PAL Data Portal. These are properly pre-filtered with the recommended QA flag filtering.
I should have added that I’m using HARP 1.23. I think I remember that this binning procedure did work as I expected with a previous version of HARP, but then also the format of my daily L3 files has evolved since that, and I can’t trace back which HARP or L3 versions I was using where this worked. Just tested using HARP 1.19 on another machine, and I run into the same problem.
I can provide 2 example files (14M each, or could be even smaller if I grid to a coarser resolution).
The global ‘weight’ variable is removed as well. We do have count and tropospheric-CHOCHO_column_number_density_count.
I’ll see if I can simplify things by excluding NaN values. Currently, we have different filter criteria for different variables, which we handle by setting values to NaN for different data points, I have to think if we can avoid that and just drop all measurements where we want to exclude the VCD.
This is for a different L2 version than the one available from S5P-PAL, so I can’t use those files.
After some offline communication we found out that the input products were using a modified version of the weight variables. The variables had a unit attribute, which is something that HARP does not expect, resulting in the weight variables getting removed.
Removing the unit from the weight variables solved the issue.