Skip to content

Conversation

@gshiroma
Copy link
Contributor

This PR adds the option shadow_no_data_value to the GCOV runconfig to assign values to areas without valid radar samples. It has been requested that this value should be set to 0 in production of GCOV products.

@hfattahi
Copy link
Contributor

hfattahi commented Aug 7, 2025

@gshiroma there was an ALSO2 test case from @jkellndorfer which initiated this discussion in first place. Do you have that test case to see how the browse looks like after this change?

@jkellndorfer
Copy link

Hi @gshiroma and @hfattahi

This PR adds the option shadow_no_data_value to the GCOV runconfig to assign values to areas without valid radar samples. It has been requested that this value should be set to 0 in production of GCOV products.

It's great to have the option to set a default value for shadow regions. I think we don't want to set those shadow pixel values to 0 by default in the GCOV product though. Problem with 0 is that any conversion of the power data to dB would result in invalid data and lead to holes when the data are displayed in any GIS/image processing software. I would suggest the default being either an interpolated value (is that possible?) from valid surrounding pixels for most pleasing images to look at or a value at or below to the noise floor, e.g. -40dB (0.0001). Of course users can change values from 0 before conversion. Related question, how are we dealing with layover regions?

@gshiroma
Copy link
Contributor Author

gshiroma commented Aug 12, 2025

Thank you, @hfattahi and @jkellndorfer , FWIW, I’m not a fan of assigning zero or low values to missing data. Such values should only result from actual valid measurements. If we assign “valid” values to missing data, it becomes difficult to identify where this was done. One possible approach would be to ensure that the mask layer classifies these pixels as shadow or missing data.

BTW, if we assume that invalid points are those with zero values or below a certain threshold, we may also discard actual low values that became zero or very small after thermal noise correction. Negative values are clipped to zero here. These measurements are still in the RSLC grid and will be geocoded using adaptive multilooking with neighboring pixels. It’s unlikely that the multilooked (geocoded) values will remain zero in the geogrid, since all contributing samples would have to be zero for that to happen, but it's not impossible.

In our internal repo, @bhawkins brought a good point for consideration:

"However, there's another side to that coin, which is helping users get correct results. All of the operations noted above will give incorrect results when there are missing data. Averages will be biased low, spectra will be incorrect, etc. Using a more "convenient" fill value increases the likelihood that missing data issues will be ignored and cause analysis errors. Imagine some paper comes out claiming global biomass decreased 5%, but what really happened is the project updated the DEM in a way that caused 5% more pixels to become invalid. IMO "annoying" values like inf/nan make that sort of thing less likely."

It's also worth mentioning that multiple packages can deal with statistics and averaging windows over data containing NaNs to avoid this type of biasing. Example from ISCE3 Stats module here or Looks here. Or NumPy functions such as np.nanmean(), np.nanmin(), np.nanmax(), np.nanstd(). Astropy also has modules to handle convolution and filtering with NaNs here.

I think the main concern is that the QA code is currently generating the browse image and reports with "NaN values that are expanding".

The image below shows a GCOV imagery produced from a very challenging S-1 dataset acquired over the Himalayas. The dataset was processed with PLAnT , that's able to handle NaNs when computing statistics or performing multilooking:



Screenshot 2025-08-12 at 4 23 26 PM


The image below shows the browse image generated using the current QA code:



Screenshot 2025-08-12 at 4 14 27 PM


And this is how the QA report looks like (currently):



Screenshot 2025-08-12 at 3 27 00 PM


Setting NaNs to zero (or a low number) is one way (workaround?) to improve these QA outputs. If we set them to zero, this is how the browse image will look like:



Screenshot 2025-08-12 at 4 15 10 PM


And the corresponding QA report looks like this:

Screenshot 2025-08-12 at 3 32 23 PM
Another thing to consider is how the images will appear in full resolution. This subset is from one of the most challenging areas, containing NaNs:

Screenshot 2025-08-13 at 5 04 13 PM
And this is with invalid values set to zero:

Screenshot 2025-08-13 at 5 04 17 PM

But as @bhawkins mentioned, this would bias the measurements downward. Any histogram or mean computed over that area would reflect a lower than the actual value.

Having said that, if we have to assign a "valid" value to missing data, I think I'd prefer to assign the value "0". It's less likely that users will filter out "valid values", and also helps dealing with complex values from off-diagonal terms, point that @hfattahi brought up offline.

Another option is to review and merge this PR, setting the invalid value to NaN. If we don’t have time to fix QA, we can still set the invalid value to zero via the runconfig in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants