This service updates the metadata attributes of an input file to values that are known to be correct, either amending, adding or deleting attributes as appropriate. The underlying methodology is to use a configuration file with earthdata-varinfo to supply known corrections to the metadata.
π
βββ .π github
βββ CHANGELOG.md
βββ CONTRIBUTING.md
βββ LICENSE
βββ README.md
βββ π bin
βββ dev_requirements.txt
βββ π docker
βββ π harmony_service
βββ π metadata_annotator
βββ requirements.txt
βββ π tests
.github- Contains CI/CD workflows and pull request template.CHANGELOG.md- Contains a record of changes applied to each new release of the Harmony Metadata Annotator Service.CONTRIBUTING.md- Instructions on how to contribute to the repository.LICENSE- Required for distribution under NASA open-source approval. Details conditions for use, reproduction and distribution.README.md- This file, containing guidance on developing the library and service.bin- A directory containing utility scripts to build the service and test images. A script to extract the release notes for the most recent version, as contained inCHANGELOG.mdis also in this directory.dev_requirements.txt- Contains a list of Python packages required for local development, but not for the service itself.docker- A directory containing the Dockerfiles for the service and test images. It also containsservice_version.txt, which contains the semantic version number of the library and service image. Update this file with a new version to trigger a release.harmony_service- A directory containing the Harmony Service specific Python code.adapter.pycontains theMetadataAnnotatorAdapterclass that is invoked by calls to the Harmony service.metadata_annotator- Directory containing business logic for the service, including Harmony scaffolding, such as the adapter class for the service.requirements.txt- Contains a list of Python packages needed to run the service.tests- Contains thepytesttest suite.
Local testing of service functionality can be achieved via a local instance of Harmony aka Harmony-In-A-Box. Please see instructions there regarding creation of a local Harmony instance.
For local development and testing of library modifications or small functions independent of the main Harmony application:
- Create a Python virtual environment
- Install the dependencies in
requirements.txt, andtests/test_requirements.txt - Install the pre-commit hooks (described below).
SMAP L3 collections are missing spatial dimension variables. This service can generate them by using a combination of required CF-compliant attributes and temporary helper attributes.
Temporary attributes are identified by a _* prefix. They are defined in the earthdata-varinfo configuration and made available in the VarInfoFromNetCDF4 object for use in annotations. These attributes are not written to the DataTree object or the NetCDF output file.
standard_nameβ Must be eitherprojection_x_coordinateorprojection_y_coordinate(per CF conventions).grid_mappingβ References a properly configured CRS variable (described below).
To accommodate possible subsetting, one of the following is also required in the dimension coordinate variable configuration:
_*corner_point_offsets: history_subset_index_rangesβ Indicates that the subset index range should be parsed from the history metadata attribute._*subset_index_reference: <variable-reference>β Indicates that the subset index range should be obtained from the referenced row or column grid variable. The referenced variable must be explicitly requested or, preferably configured as an ancillary variable in Harmony Opendap SubSetter - varinfo confituration to ensure it is always available to the metadata-annotator.
- When used for creating spatial dimensions, the following attribute is required:
_*master_geotransformβ Defines the grid details used to generate dimension coordinates.
The configuration example below creates the /Soil_Moisture_Retrieval_Data/y variable for SPL3SMAP.
For the service to create a spatial dimension variable, all 3 overrides are required.
- The first override creates
/Soil_Moisture_Retrieval_Data/yas a new variable in the VarInfoFromNetCDF4 object. - The second override adds the
grid_mappingattribute to all variables in the/Soil_Moisture_Retrieval_Data/group (including/Soil_Moisture_Retrieval_Data/y). - The third override creates the CRS variable and includes the
_*master_geotransformattribute (required for creating the spatial dimension coordinates).
{
"Applicability": {
"Mission": "SMAP",
"ShortNamePath": "SPL3SMAP",
"VariablePattern": "/Soil_Moisture_Retrieval_Data/y"
},
"Attributes": [
{
"Name": "standard_name",
"Value": "projection_y_coordinate"
},
{
"Name": "long_name",
"Value": "y coordinate of projection"
},
{
"Name": "dimensions",
"Value": "y"
},
{
"Name": "axis",
"Value": "Y"
},
{
"Name": "units",
"Value": "m"
},
{
"Name": "type",
"Value": "float64"
},
{
"Name": "_*corner_point_offsets",
"Value": "history_subset_index_ranges"
}
],
"_Description": "The pseudo-dimension variable is supplemented with variable attributes (as if it was a dimension variables) to fully specify the Y dimension."
},
{
"Applicability": {
"Mission": "SMAP",
"ShortNamePath": "SPL3SMAP",
"VariablePattern": "/Soil_Moisture_Retrieval_Data/.*"
},
"Attributes": [
{
"Name": "grid_mapping",
"Value": "/EASE2_global_projection_9km"
}
],
"_Description": "SMAP L3 collections omit global grid mapping information"
},
{
"Applicability": {
"Mission": "SMAP",
"ShortNamePath": "SPL3SMAP",
"VariablePattern": "/EASE2_global_projection_9km"
},
"Attributes": [
{
"Name": "grid_mapping_name",
"Value": "lambert_cylindrical_equal_area"
},
{
"Name": "standard_parallel",
"Value": 30.0
},
{
"Name": "longitude_of_central_meridian",
"Value": 0.0
},
{
"Name": "false_easting",
"Value": 0.0
},
{
"Name": "false_northing",
"Value": 0.0
},
{
"Name": "horizontal_datum_name",
"Value": "WGS84"
},
{
"Name": "inverse_flattening",
"Value": 298.257223563
},
{
"Name": "semi_major_axis",
"Value": 6378137.0
},
{
"Name": "semi_minor_axis",
"Value": 6356752.314245
},
{
"Name": "_*master_geotransform",
"Value": [-17367530.4451615, 9008.055210146, 0, 7314540.8306386, 0, -9008.055210146]
}
],
"_Description": "Provide missing global grid mapping attributes for SMAP L3 collections."
},
This service utilises the Python pytest package to perform unit tests on
classes and functions in the service. After local development is complete, and
test have been updated, they can be run in Docker via:
$ ./bin/build-image && ./bin/build-test && ./bin/run-testIt is also possible to run the test scripts directly (without Docker) by just
running the run_tests.sh script with a proper Python environment. Do note
that the reports directory will appear in the directory you call the script from.
The tests/run_tests.sh script will also generate a coverage report, rendered
in HTML, and scan the code with pylint.
Currently, the pytest suite is run automatically within a GitHub workflow
as part of a CI/CD pipeline. These tests are run for all changes made in a PR
against the main branch. The tests must pass in order to merge the PR.
This repository uses pre-commit to enable pre-commit checks that enforce coding standard best practices. These include:
- Removing trailing whitespaces.
- Removing blank lines at the end of a file.
- Ensure JSON files have valid formats.
- ruff Python linting checks.
- black Python code formatting checks.
- Ensuring no committed files are above 500 kB.
To enable these checks:
# Install pre-commit Python package via the listed development requirements:
pip install -r dev_requirements.txt
# Install the git hook scripts:
pre-commit installDocker service images for the harmony-metadata-annotator adhere to semantic
version numbers: major.minor.patch.
- Major increments: These are non-backwards compatible API changes.
- Minor increments: These are backwards compatible API changes.
- Patch increments: These updates do not affect the API to the service.
The service currently uses xarray.DataTree.to_netcdf to write the whole
DataTree object out to a file. This is very memory intensive, meaning that
the Harmony in a Box configuration listed above uses 8 GiB for the memory limit
of the service. A future improvement would be to find a way to write things out
incrementally. The Harmony SMAP L2 Gridder does perform such an operation, and
may be a good model to update this code.