Atmospheric Toolbox

Best bin_spatial resolution to match TROPOMI sensor data


I’m working with HARP merge and I’m wondering what the best resolution is to use the bin_spatial tool for the 7x3.5km resolution available from S5P.

Obviously it would depend on the latitude but is there a rule we can follow? What are some good solutions to this issue?

I have used a spatial resolution of 0.02 which looks like this:
bin_spatial(2275, -54.75, 0.02, 2309, 112.919, 0.02)

It’s just a guess, and I’ve erred on the side of caution by making the resolution more than I think the sensor has produced. That said, the scanline values @ 215 would suggest the correct number of pixels along the longitude for my AOI would be around 500, though this would shrink as we look further south.

Some guidance on this would be appreciated.


Note that since Aug 2019 the spatial resolution of Tropomi is 5.5x3.5km.

The resolution also depends on the product type. NO2 uses the ‘native’ resolution of the instrument, but for e.g. CO several pixels are combined and the resolution is halved in each dimension.

The resolution of 0.02 degree is a good resolution to use (for e.g. NO2). It slightly oversamples the native resolution of Tropomi itself. It is also the resolution that is used by the S5P Mapping Portal (this uses 2^14 pixels for 360 degrees longitude, which is ~0.022). For CO you would use about 0.04 degree.

Thankyou! Helpful as ever.


Hi there,

After rendering the S5P CH4 data in a visualisation via Xarray and Matplotlib pcolormesh some blank lines appear along certain latitudes. I’ve attached a screenshot as an example.

I’m wondering whether this may be due to the HARP merge resolution, as it appears in the raw data too.

As a test I also rendered any NaN data in red on another map, and the concentrated areas of nan data at some latitudes seems to align roughly with the data gaps.

Has anyone else dealt with this issue? Is it a HARP issue or something else?

Attaching the code used to merge the files here:

operations=CH4_column_volume_mixing_ratio_dry_air_validity>50;derive(datetime_stop {time});latitude > -45; latitude < -10.2; longitude > 110.6;longitude < 157; bin_spatial(1739,-45,0.02,2320,110.6,0.02); derive(latitude {latitude});derive(longitude {longitude});keep(CH4_column_volume_mixing_ratio_dry_air,cloud_fraction,latitude,longitude);
post_operations=bin();squash(time, (latitude, longitude))

Any help or advise would be appreciated. Thanks

Can you let me know which product you are using? We might then try to replicate this.

Thanks. I’m using the S5P methane (CH4) product. It’s averaged over a month in Xarray.

You probably need to provide more details on what you are exactly doing.

With the following script

import harp
import numpy as np
import matplotlib.pyplot as plt
import as ccrs

def harp_l3meshplot(product, value, colorrange=None, colortable='jet'):
    variable = product[value]
    gridlat = np.append([:,0],[-1,1])
    gridlon = np.append([:,0],[-1,1])
    if colorrange is not None:
        vmin, vmax = colorrange
        vmin = np.nanmin(
        vmax = np.nanmax(

    fig=plt.figure(figsize=(20, 10))

    ax = plt.axes(projection=ccrs.PlateCarree())

    img = plt.pcolormesh(gridlon, gridlat,[0,:,:], vmin=vmin, vmax=vmax,
                         cmap=colortable, transform=ccrs.PlateCarree())


    cbar = fig.colorbar(img, ax=ax, orientation='horizontal', fraction=0.04, pad=0.1)
    cbar.set_label(f'{variable.description} [{variable.unit}]')

files = [

operations = ";".join([
    "latitude > -45",
    "latitude < -10.2",
    "longitude > 110.6",
    "longitude < 157",
post_operations="bin();squash(time, (latitude_bounds, longitude_bounds))"
grid = harp.import_product(files, operations, post_operations=post_operations)

harp_l3meshplot(grid, "CH4_column_volume_mixing_ratio_dry_air")

I am getting the following plot:

Which looks fine to me

Thanks for this.

I have seen the issue appear in smaller AOI plots, see below.

Whereas using HARP to load the data and Numpy to reduce the daily data along the time axis the result is more in line with expectations. See below.

I had thought it may have been possible that where bin resolution is set too high empty rows could be added to the plots and therefore explain the horizontal lines that appear, but I’ve been doing some further work on this problem by comparing the results from your code with plots from Xarray and it seems like the issue may be on the Xarray side. I’ll follow up with the Xarray community.

Thanks for your help again, D

In case anyone is looking for a solution to a similar issue, I’ll outline what happened, and what I’ve done to fix it.

I’ve used HARP to merge multiple NetCDF files with S5P orbit data into a single file to cover my AOI, in this case the Australian mainland. This usually takes 2-3 orbit files for full coverage.

Using the harp merge settings outlined above I’ve created a set of files that are re-gridded and therefore standardised the lat and lon values for each value across the dataset.
I then load the daily data into an Xarray DataArray with lat, lon and time dimensions. This process is all outlined in tutorials online and I won’t go into them here.
Where this issue seems to have originated is that there are minute differences in the lat and lon indexes of the merged data. So when Xarray concatenates multiple NetCDF data files on the time dimension, the first may have a row with latitude 110.123454543623212 and the second file with the row at the same index having a lat of 110.1234545436232127. That 1.7-e15 value difference – or multiples of it – was appearing between some indexes that were otherwise identical, and Xarray was treating each different value as a new row. This was only occurring on the latitude dimension.

So when I performed an aggregation operation, ie. mean() over time, Xarray was averaging the data against empty rows, as the data was minutely misaligned.

The way it can be fixed is to use the lat and lon from the first array and discard subsequent arrays’ lat and lon indexes.
I’ve used this code to load multiple datasets with aligned lat and lon dimensions, and I do this with the knowledge that the dataset has been gridded to conform to a set grid, which is known to only vary by insignificant amounts. (Any precision after the 4th decimal place is pretty useless to me.)
The argument that made the difference is join=‘override’

def prep(ds): 
    extracted_time = ds.encoding["source"].split("\\")[-1].split("_")[2].strip('.nc')
    tim = pd.to_datetime(np.array([extracted_time]))
    ds['time'] = tim
    return ds

aus = xr.open_mfdataset(filelist, combine='by_coords', preprocess=prep, **join='override',**
                        concat_dim='time', chunks={'time': 3}, 
                        parallel = True) 

Hope this helps if anyone else is facing issues like this.