I understood that since HARP 1.9 or so the HARP bin operation is no longer an arithmetic average but a weighted average.
Suppose for example a “point_distance(lat,lon,50 [km]);bin();” operations on satellite orbit files, where the satellite pixel dimensions are present (i.e., there is latitude_bounds and longitude_bounds)

The result is a weighted average. But how is it exactly calculated? By weighting according to the pixel area?

In the resulting file there are the new “weight” and “count” variables.

“weight” will be used in subsequent bin operations, correct?
Does “count” play a role in subsequent bin operations? Perhaps when the weight variable is excluded before?

For a limited number of variables, there is now a corresponding “_weight” variable:
latitude_weight, longitude_weight, solar_zenith_angle_weight, solar_azimuth_angle_weight, sensor_zenith_angle_weight, sensor_azimuth_angle_weight

What is the role of these new _weight variables? Why only for these variables?

Thanks. As I use the “bin” operation, I guess “harp_product_bin” (and not harp_product_bin_spatial) is the relevant C function.
The explanation for harp_product_bin_spatial seems a bit more complete than for harp_product_bin.

harp_product_bin_spatial:
“In addition, a ‘weight’ variable will be added that will contain the sum of weights for the contribution to each cell. If a variable contained NaN values then a variable specific weight variable will be created with only the sum of weights for the non-NaN entries.”

-> I guess the same holds for harp_product_bin, but replacing “to each cell” by “to each bin” ??

In harp_product_bin_spatial, area binning means that "each sample will be allocated to each lat/lon grid cell based on the amount of overlap. "

-> For harp_product_bin, there are no grid cells and so no overlap can be calculated. So I guess weighting uses surface area, not amount of overlap?

harp_product_bin_spatial actually does both temporal binning and spatial binning (although in practice you would mostly use it with a single temporal target bin).
harp_product_bin only does the temporal binning.

The documentation is pretty accurate. There is no ‘spatial overlap’ in the temporal binning case (harp_product_bin). Weight variables are only created for angular variables (to store the length of the combined unit vectors).

Ok.
Does this mean that in the above example (bin() per orbit file), an arithmetic average of the individual pixels will be taken?
And would, in a subsequent bin() operation (combining several “binned-per-overpass” instances), the average be weighted using the “count” variable?

this gives an output.nc file with two bins in the time dimension, to average the temporal binning, I need to do a bin() operation. So, I dont understand what you mean by " harp_product_bin_spatial actually does both temporal binning and spatial binning" ?

First of all, you are calling harpmerge. This merges the result of operations done on multiple products. In your case 2 products. Per product you are performing a bin_spatial operation. So your bin_spatial does not work on the products combined. The final bin() that you want to do (but are not showing in your example) is because you want to flatten the concatenation that the merge is doing (i.e. you want to unbin, what got binned per product).

The bin_spatial that you perform on a single product, is, in fact, binning your product in time: it is putting everything in a single bin that has start time / stop time equal to that of the product. In other words, your time resolution changes (the time resolution is no longer a scanline, but an orbit).

If you would use the C-library you can actually, when gridding a single product, provide a binning grid in the time dimension when performing a harp_product_bin_spatial. See the num_time_bin and num_time_elements arguments, which behave similar to that of harp_product_bin. You could actually provide the validity (quality flag) variable for this, and then you would get separate grids per quality flag value.

harpmerge -f hdf5 -a “derive(latitude{latitude});derive(longitude{longitude});exclude(angle);squash(ti
me, (latitude,longitude));” output.nc output2.nc

from output.nc it is clear that a temporal average (over all scanlines of both S5P files) is done and the end results is a single bin in time dimension. The second command can then perform some additional operations on output.nc

How can I get separate grids per quality flag value without using the C interface (maybe this is not possible? ). I see I can do a temporal average per quality flag as follows:

Doing a bin_spatial as an -ap means that you first merge the datasets and then perform the spatial binning. This may require too much memory to keep all the L2 data in memory.

The whole idea of doing bin_spatial as -a and then perform ‘additional operations’ on the merged data is what the -ap is actually for. There should be no need to split things in two commands.

Also, if you are not using multiple products as input, just use harpconvert.

Doing a spatial bin with binning on a temporal variable is indeed not possible as a harp operation. This would make the bin_spatial function too complex (it already has quite a few parameters).

I found the answer of my own previous question. To apply the bin() operation with harpmerge in combination with the -ar option, the following can be done: (I only keep the HCHO_slant_column_number_density variable here)

this is because the reduce operations (after -ar) acts on the merged output file that is created after each append. Therefore this -ar is memory efficient (as explained in harpmerge — HARP 1.14 documentation) .
However I dont understand why I needed to do the change from double to float (derive(HCHO_slant_column_number_density float {time}) ). This causes the result in output.nc to be of another datatype than when one should have used the -ap option.

I don’t understand your example. If I execute your command I actually get a ERROR: variables don't have the same datatype (HCHO_slant_column_number_density). So I have to leave out the type conversion of the variable.

Indeed, you 're right, sorry, I ve made some mistake here. The correct command is indeed:
harpmerge -f hdf5 -a “keep(HCHO_slant_column_number_density);bin();” -ar “bin();” S5P_OFFL_L2__HCHO___20211001T011729_20211001T025859_20553_02_020201_20211002T165738.nc S5P_OFFL_L2__HCHO___20211001T231652_20211002T005821_20566_02_020201_20211003T144620.nc output.nc