Global attributes with HARP in-memory operations

jvgent · August 19, 2024, 12:21pm

I am trying to keep track of the performed HARP operations in Python through the history global file attrribute, but I don’t get it to work as intended.
Let’s say I perform all operations during harp.import_product:

file = "./S5P_OFFL_L2__HCHO___20240707T174154_20240707T192324_34892_03_020601_20240709T095147.nc"
hp = harp.import_product(file, operations="keep(tropospheric_HCHO_column_number_density,latitude_bounds,longitude_bounds);bin_spatial(181,-90,1.,361,-180,1.);derive(latitude {latitude});derive(longitude {longitude});", post_operations="bin();squash(time, (latitude,longitude));")
harp.export_product(hp, "harp_testfile.h5")

then the history attribute of the exported file contains the full content of the operations string.

However, if I perform the operations in 2 steps:

hp = harp.import_product(file, operations="keep(tropospheric_HCHO_column_number_density,latitude_bounds,longitude_bounds);"

hpl3 = harp.execute_operations(hp, operations="bin_spatial(181,-90,1.,361,-180,1.);derive(latitude {latitude});derive(longitude {longitude});", post_operations="bin();squash(time, (latitude,longitude));")

harp.export_product(hpl3, "harp_testfile.h5")

Here the history attribute only contains the ‘keep’ operation performed during harp.import_product.
I expected it to (also) include the most recent operations, from the execute_operations command.

This seems to contradict this statement in the manual at:
Global attributes — HARP 1.23 documentation, where is says:

Note that the Conventions, datetime_start, and datetime_stop attributes are only used inside files. For the in-memory representation (in C, Python, etc.) only the history and source_product attributes are present.

Do I misunderstand?
Thanks in advance,
Jeroen

sander.niemeijer · August 19, 2024, 1:30pm

The history attribute is indeed preserved (as the documentation states), but what is not happening is that harp.execute_operations introduces a line of its own.

I am also not yet sure if we want to do this.

First, be aware that calling harp.execute_operations should be avoided as much as possible. It creates a full duplicate of the product content twice (which is slow and takes memory), once to go from Python to C, and once to go back. You should try to combine any operations you have inside a harp.import_product or harp.export_product when possible.

Also be aware that using post_operations makes no sense if you are just applying the operations to just a single product (you should then combine everything into a single operations argument).

I assume that what you are trying to do is create a daily grid by iterating over all orbits per day?
If so, you should really try to combine all of this into a single harp.import_product call by passing the list of filenames and passing the right operations, reduce_operations, and post_operations arguments. This will do all the steps in the C domain and only at the end convert the data structure to Python. And then your history attribute will also be correct.

If you really want to call harp.execute_operations then conceptually this falls into the same category as performing some manual operation on the product content by e.g. applying direct numpy operations. In that case you would also be responsible for keeping track what you did and manually constructing the right history attribute content.

If your harp.execute_operations is part of such a bigger set of operations, where you have some numpy operations performed before and some after, then you actually might not even want to have harp update the history attribute for you, because it would prevent you to combine everything into a single history line. Since harp.import_product and harp.export_product are directly at the boundaries of your operations (complete start and complete end), having a history line added for those are not a problem, since you then only have to describe what you did in between (as a single line).

Also note that harp.import_product and harp.export_product do not introduce a history line if there were no operations/options arguments provided.

jvgent · August 19, 2024, 2:35pm

Thank you for the elaborate explanation.
Using a single L2 file was just to create an example case. My particular case at hand is where I create a set of harp L2 products manually and then combine them in a single L3 product. That’s the reason why I do things ‘in memory’. But your answer gives me enough handles to continue and optimize things. Many thanks.