TROPOMI CHOCHO PAL data | ingest 3D variable with coda

Good morning Sander,

In the ingestion file for the S5P PAL product of CHOCHO, the fitted slant columns are not included. While it is not an official recommendation for the product, some the SCDs should be used to filter the data in the case of fire events. The fitted columns have a third dimension to include all SCDs, 14 in total.

I found a round-about way to perform the coda ingestion, with your amazing help all these years, but I am wondering if there is an even faster way.

My operations are:

    *operations = ";".join([*
  •        "latitude>33.5;latitude<42.5",\*
    
  •        "longitude>19;longitude<29",\*
    
  •        'C2H2O2_column_number_density_validity==100;\*
    
  •            keep(latitude, longitude, solar_zenith_angle, \*
    
  •                 absorbing_aerosol_index, \*
    
  •                 cloud_fraction,\*
    
  •                 C2H2O2_column_number_density, \*
    
  •                 C2H2O2_column_number_density_uncertainty,\*
    
  •                 orbit_index, datetime_start, index)'])*
    

and then I first ingest the product

    *try:                                                              *
  •         product = harp.import_product(infile,operations)* 
    

and then I coda ingest the fitted SCDs.

         *with coda.open(infile) as pf:*
  •     #  variables are not in the ingestion definition, so we need to ingest them by hand*
    
  •              temp_product = harp.Variable(coda.fetch(pf, "PRODUCT/SUPPORT_DATA/DETAILED_RESULTS/fitted_slant_columns"), ["time"])*
    
  •              product.NO2_220K = temp_product.data[:,product.index.data // 450,product.index.data % 450,2].squeeze()*
    
  •              product.NO2_296K = temp_product.data[:,product.index.data // 450,product.index.data % 450,3].squeeze()*
    

Is there a way to avoid ingesting first the whole orbit via the temp_product so as to save time in the algorithm? I’m using python 3.11.6 and harp 1.20.2 [don’t ask…]

Many thanks,
MariLiza

Note that you are not ingesting the whole product for the SCD, you are only reading the SCD variable itself. It would be more correct to call this variable temp_variable instead of temp_product.

The algorithm you use is already rather optimal. Note that internally in HARP we also read variables in full from HDF5 files before applying any internal filtering (the HDF5 library is much more efficient at reading full blocks than reading element-wise). And this is exactly what you are doing here as well.

What is, however, not correct is to assign a numpy array directly to a harp product field. This should be a harp.Variable. You should likely use something like:

scd = coda.fetch(pf, "PRODUCT/SUPPORT_DATA/DETAILED_RESULTS/fitted_slant_columns")
product.NO2_220K = harp.Variable(scd[:,product.index.data // 450,product.index.data % 450,2].squeeze(), ["time"])
product.NO2_296K = harp.Variable(scd[:,product.index.data // 450,product.index.data % 450,3].squeeze(), ["time"])
1 Like

Thanks Sander, for all the tips and hints! You should see what I originally name my variables when I am in the devel stage… even Greek swearwords if things don’t work out. :wink:

Continuing the thread above: say I now have the two new variables in my harp product : is there an optimal way to filter the entire harp product based on some clause on the new variables?

Since I do not wish to have to know the variable names that are in the harp product after all the operations, etc., I made a new dictionary as follows to grab the names, and apply the mask as follows:

new_product = harp.to_dict(product)
for name in new_product.items():
if “unit” not in name:
# print(len(product[name].data))
product[name].data = product[name].data[mask]
# print(len(product[name].data))

Is there a more Harpian-way to perform this filtering?
mask is a simple true/false mask, e.g. mask = product.NO2_220K.data>2.

Thanks again!
MariLiza

Once you are in the python domain, you should be using python operations.
You can just manipulate the HARP product in-place by replacing the numpy data fields of each variable.

For instance:

for variable_name in product:
    variable = product[variable_name]
    if len(variable.dimension) > 0 and variable.dimension[0] == 'time':
        variable.data = variable.data[mask]
1 Like

Thanks Sander, I must have missed the product[variable_name] syntax! I think I was trying with simply name/key/variable.
Best wishes,
MariLiza