How to deal with SO2 from COBRA after QA>0.5 and with Nan values

I am trying now to regridd SO2-COBRA Data with a QA_flag>=0.5. But I don’t know what HARP does with the other values where QA<0.5 and Nan values.
I want to compare model with COBRA SO2. So this is my script:

##### COBRA Daten
import harp
import numpy as np
import matplotlib.pyplot as plt
import sys
from netCDF4 import Dataset
import pandas as pd
import cartopy
import cartopy.crs as ccrs
import xarray
from glob import glob
import matplotlib.cm as cm
import matplotlib as mpl
from scipy.interpolate import griddata as gd
from metpy.interpolate import interpolate_1d
from datetime import datetime as dt
import gc
import datetime
import time
from xarray import DataArray
import dask.array as da

#import xesmf as xe
import h5py
import netCDF4
import harp

sat_data=[]
path_model=glob(‘/work/bd0617/b309194/TROPOMI/final_result/07/final_result20190710_orbit_no_09010.nc’)
n=len(path_model)
print(n)
for i in range (0, 1):
print(i) ***
orbit=path_model[i]***
*** orbit=orbit[75:80]***
*** print(orbit)***
*** time= path_model[i][57:65]***
*** print(time)***


*** try:***



*** path_sat = glob (‘/work/bd0617/b309194/TROPOMI/COBRA_TROPOMI/2019/TROPOMISO2_COBRA_v11_iter4__’+orbit+‘.nc’)***
*** xr_sat = xarray.open_mfdataset (path_sat[0])***

*** f= netCDF4.Dataset(glob(‘/work/bd0617/b309194/TROPOMI/COBRA_TROPOMI/2019/TROPOMISO2_COBRA_v11_iter4__’+orbit+‘.nc’)[0],‘r’)***
*** if ‘QA flag’ in xr_sat:***

*** lon=f[‘longitude_bounds’][:] ***
*** lat=f[‘latitude_bounds’][:] ***
*** so2=f[‘SO2 VCD cobra’][:] ***
*** qa=f[‘QA flag’][:] ***
*** sel=np.where(qa<=1.0) ***
*** f.close()***

*** print(‘Satellite File without group!’)***

*** else:***
*** lon=f[‘Data Fields/longitude_bounds’][:] ***
*** lat=f[‘Data Fields/latitude_bounds’][:] ***
*** so2=f[‘Data Fields/SO2 VCD cobra’][:] ***
*** qa=f[‘Data Fields/QA flag’][:] ***
*** sel=np.where(qa<=1.0) ***
*** f.close()***

*** print(‘Satellite file with group’)***


*** e=netCDF4.Dataset(path_model[i],‘r’)***

*** so2_emac=e.variables[‘vert_col_SO2’][:] ***
*** e.close()***

*** ncfile = Dataset(“/work/bd0617/b309194/HARP_Inputs/07s_days/COBRA_test7_”+str(time)+“_orbit” +str(orbit) +“.nc”, “w”, format=“NETCDF4”)***

*** nc_x = ncfile.createDimension(‘time’, len(sel[0]))***
*** nc_c = ncfile.createDimension(‘corner’, 4) ***
*** nc_lat = ncfile.createVariable(“latitude_bounds”,“f4”,(“time”,“corner”,),zlib=True)#,fill_value=9.96921E36)***
*** nc_lon = ncfile.createVariable(“longitude_bounds”,“f4”,(“time”,“corner”,),zlib=True)***
*** nc_lat[:] = lat[sel]***
*** nc_lon[:] = lon[sel] ***
*** nc_lat.units = ‘degree_north’***
*** nc_lon.units = ‘degree_east’***

*** nc_lat.long_name = ‘latitude of pixel center’***
*** nc_lon.long_name = ‘longitude of pixel center’ ***
*** nc_lon.valid_min = -180. ***
*** nc_lon.valid_max = 180. ***
*** nc_lat.valid_min = -90. ***
*** nc_lat.valid_max = 90.***
*** nc_lon.description =‘longitudes of the ground pixel corners (WGS84)’ ***
*** nc_lat.description = ‘latitudes of the ground pixel corners (WGS84)’ ***

*** nc_so2 = ncfile.createVariable(“SO2”,“f4”,(“time”,),zlib=True) ***
*** nc_so2[:] = (so2[sel]10000) / 2.69e20**
*** nc_so2.units= “DU”***
*** nc_so2.long_name = “SO2 vertical column density” ***


*** nc_so2emac = ncfile.createVariable(“SO2_Emac”,“f4”,(“time”,),zlib=True) ***
*** nc_so2emac[:] = so2_emac[sel] ***
*** nc_so2emac.units= “DU”***
*** nc_so2emac.long_name = “SO2 vertical column density”***

*** nc_qa = ncfile.createVariable(“QA”,“f4”,(“time”,),zlib=True) ***
*** nc_qa[:] = qa[sel]***
*** nc_qa.units= “”***
*** nc_qa.long_name = “Quality Flag” ***
*** ncfile.setncatts({ “Conventions”:“HARP-1.0”}) ***
*** ncfile.close() ***

*** filelist=glob(“/work/bd0617/b309194/HARP_Inputs/07s_days/COBRA_test7_”+str(time)+“orbit" +str(orbit) +“.nc”)***
*** filelistnew=[harp.import_product(ifile,post_operations=‘QA>=0.5’) for ifile in filelist]***
*** post_operations=‘bin_spatial(91,-90.0,2,181,-180.0,2);derive(latitude {latitude});derive(longitude {longitude})’ ***
*** binned=harp.execute_operations(filelistnew,post_operations=post_operations)***
*** harp.export_product(binned, "/work/bd0617/b309194/HARP_Inputs/07s_days/Output_COBRA_test7
”+str(time)+“_orbit”+str(orbit)+“.nc”) ***
*** except:***
*** print(‘Fehler’)***

And this is the plot what I got after regridding to a 2*2 Grid:

Thank you in advance for your help.
Ismail Makroum

I think just doing a QA>=0.5;valid(SO2) as operations argument to harp.import_product would do it (not sure why you are using the post_operations argument here).

Also, the data that you are using doesn’t seem to be public data and you are trying to construct HARP products of these yourself. I therefore can’t rule out that any weird things that you are seeing is actually a problem with your data itself.

Thank you for your reply. I do the post_operations argument because I want to regrid the data on a regular Grid. That’s why I give HARP the bin_spatial, that I want to regrid on.
Ah ok, so in the harp.import_product I need to give the qa flag and valid SO2. Should it looks like this then?

The Data is correct and the regridding also look really good. The only problem that I get is those blank regions that I think that are because of the Nan values.

new update:
My code now is looking like that:
filelist=glob(“/work/bd0617/b309194/HARP_Inputs/07s_days/COBRA_test7_”+str(time)+“orbit" +str(orbit) +“.nc”)
filelistnew=[harp.import_product(ifile,operations=‘QA>=0.5;valid(SO2);valid(QA);valid(SO2_Emac)’) for ifile in filelist]
post_operations=‘bin_spatial(91,-90.0,2,181,-180.0,2);derive(latitude {latitude});derive(longitude {longitude})’
binned=harp.execute_operations(filelistnew,post_operations=post_operations)
harp.export_product(binned, "/work/bd0617/b309194/HARP_Inputs/07s_days/Output_COBRA_test7
”+str(time)+“_orbit”+str(orbit)+“.nc”)

And I am getting this Plot:

And I would like to know why to I have those missing values by Germany and Netherlands for example.

The original file looks like that and he has values in those regions.

Thank you in advance.

The operations look Ok. Without having the file itself, I can’t say why the plots would differ (I don’t know how you created the bottom plots) and why the regridding may still show gaps.

Perhaps the valid_min/valid_max attributes are too tight for the variables that you perform a valid filter on?

Dear Ismail, dear @sander.niemeijer,

If I understand correctly, you are first using the python xarray module to ingest the COBRA SO2 data as well as the EMAC model data, you then create one single netcdf file which you then ingest using harp, so that you can apply the QA filter and bin?

I can think of a number of issues that may arise with this way of performing the analysis. I have in the past encountered differences in necdf files read in and created with xarray, which I am not sure harp is equipped to deal with. And it shouldn’t, just to be clear!

Can we assume that you are using the COBRA data available via the PAL system, from here S5P-PAL Data portal ?

In any case, I would suggest that you use the harp-compliant coda_fetch command to ingest the COBRA and the EMAC files separately, and to write them out as a single harp-compliant file. In this step, you should filter for qa for the COBRA data.

Then you will be able to use the harp_export commands to bin your data.

For e.g. in this example, Create time-series from OMI L3 daily average - Atmospheric Toolbox, a non-harp compliant netcdf file is read in with coda fetch and a harp-compliant file is created, as a new product. Then the binning is applied and L2 files created.

Do let me know if you wish any assistance in this matter.

Best wishes,
MariLiza

If that is the case