I keep getting a truncated file error when trying to process certain files from 2018 to 2019 using this code. The “week relative paths” is the paths to the s3 bucket for all the files in a given week. The code works if I remove the files that are causing issues.
mean_no2=harp.import_product(week_relative_paths, harp_operations, reduce_operations=reduce_operations)
name= f’s5p-NO2_L3_weekly_averaged_{start_date_withouttime}.nc’
harp.export_product(mean_no2, filename=os.path.join(output_sentinel_dir_path, name), file_format=“netcdf”)
I read that this may be an issue with corrupted files from downloading, so I have now mounted the AWS open dataset sentinel-5p bucket on my AWS instance, so the code is pulling directly from the s3 bucket and I am still getting the exact same errors for the exact same files I was getting errors for when I was downloading the files. Does this mean that some of the 2022 reprocessed NO2 sentinel files on S3 are corrupted (Sentinel-5P Level 2 - Registry of Open Data on AWS) and if so, is there a more reliable place to pull from? Or is it some other issue that can be solved with harp?
Also, is there anyway to add code to so that it checks for truncated files before processing sothe code will run without error? I am trying to process 100+ weeks of data so this would be ideal. Right now I’ve added some code to filter out file sizes less that 300 MB which seems to be catching corrupted files for most weeks, but would love a way to filter out corrupted files more explicitly so the code can run without error (and I don’t actually filter out files that are uncorrupted).