Atmospheric Toolbox

How to include time as a coordinate in netCDF output?


I’m working through the RUS Copernicus tutorial on regridding Sentinel-5P data using harp in Python. I’ve been able to successfully regrid and download the modified NetCDF, but when I open up the NetCDF in Python Xarray, I don’t see the time as a coordinate (only latitude and longitude). However, in the RUS Copernicus tutorial, the time shows up under the ‘coordinate’ category.

Here is the harp command I used:

harp_op = "tropospheric_NO2_column_number_density_validity>75; \
    derive(tropospheric_NO2_column_number_density [Pmolec/cm2]); \
    derive(datetime_stop {time}); \
    bin_spatial(340, 40, 0.05, 420, -95, 0.05); \
    derive(latitude {latitude}); derive(longitude {longitude}); \
    keep(NO2_column_number_density, tropospheric_NO2_column_number_density, NO2_slant_column_number_density,latitude_bounds, longitude_bounds, latitude, longitude)"

for i in file_names:
    no2_l3 = harp.import_product(i,operations=harp_op)
    export_filename = "{output_path}{name}".format(output_path=output_path,name=i.split('/')[-1].replace('L2','L3'))

Can someone tell me if I made an error? Or how I can get time as a coordinate? This is very important to me as I plan on stacking multiple months worth of data by time.

If you want to keep the time information then you should add datetime_start and datetime_stop to your keep() function call.

That adds it as a variable, how do you add it as a coordinate in xarray?

That is more an xarray question than a HARP question.
I would guess you should be able to use xarray.Dataset.assign_coords, but maybe someone else on this forum can help you with that.

I hope this will be helpful.

1 Like

The solution is in Xarray. When loading the dataset you must preprocess the file to add the time info to the dataset as a dimension. If I recall correctly this is in the RUS tutorial at a later step.

Here’s the code I use to do this, which extracts the time from the filename and inserts it as a dimension before concatenating the data arrays along the time dimension.

def prep(ds): 
    print(ds.encoding["source"].split("\\")[-1].split("_")[2].strip('.nc'), '\n')
    extracted_time = ds.encoding["source"].split("\\")[-1].split("_")[2].strip('.nc')
    tim = pd.to_datetime(np.array([extracted_time]))
    ds['time'] = tim
    return ds

ds = xr.open_mfdataset(fl, combine='by_coords', preprocess=prep, join='override',
                        concat_dim='time', chunks={'time': 15}, 
                        parallel = True)
1 Like