Prefect 3.0 Flow: Convert Zarr Files to NetCDF4 and Store in Blob Storage
Implement a Prefect flow which will process and concatenate xarray datasets stored in a specified Azure Blob Storage container, as zarr stores. The flow will:
- Identify individual file IDs from the container directory structure.
- Open both normal and denoised Zarr files using existing functions.
- Retrieve additional metadata from the database for each file ID.
- Concatenate the datasets while preserving metadata.
- Convert the concatenated dataset to NetCDF4 format.
- Optionally create echograms of the new concatenated dataset (both denoised and normal), compute MVBS, and NASC.
- Store the NetCDF4 file in an output Blob Storage container and generate an access link.
The flow will have the following signature:
load_and_process_files.serve(
name='convert-to-netcdf',
parameters={
'cruise_id': 'example_cruise',
'load_from_blobstorage': True,
'get_list_from_db': False,
'start_datetime': None,
'end_datetime': None,
'source_container': 'input-zarr-container',
'save_to_blobstorage': True,
'output_container': 'output-netcdf-container',
'save_to_directory': False,
'output_directory': '',
'plot_echograms': False,
'compute_nasc': False,
'compute_mvbs': False,
'chunks_ping_time': 500,
'chunks_range_sample': 500,
'batch_size': BATCH_SIZE
}
)
Workflow Steps
1. Retrieve List of File IDs from Container
- List all folders under
{cruise_id}/
- Extract
{individual_file_id} from folder names.
- Identify the presence of both
{individual_file_id}.zarr and {individual_file_id_denoised}.zarr.
2. Retrieve Metadata from Database
- Extend
FileSegmentService to fetch metadata for each file, including:
location
file_name
id
location_data
file_freqs
file_start_time
file_end_time
3. Load Zarr Datasets
- Use
open_zarr_store() to lazily load both normal and denoised datasets.
4. Concatenate Zarr Datasets
- Call
concatenate_zarr_files() to merge all datasets while keeping metadata.
- Ensure datasets are rechunked appropriately.
5. Convert to NetCDF4
- Use
save_dataset_to_netcdf() to convert the dataset.
6. Upload to Output Container
- Store the NetCDF4 file in
output_container.
- Generate an access link via
generate_container_access_url().
Prefect 3.0 Flow: Convert Zarr Files to NetCDF4 and Store in Blob Storage
Implement a Prefect flow which will process and concatenate xarray datasets stored in a specified Azure Blob Storage container, as zarr stores. The flow will:
The flow will have the following signature:
Workflow Steps
1. Retrieve List of File IDs from Container
{cruise_id}/{individual_file_id}from folder names.{individual_file_id}.zarrand{individual_file_id_denoised}.zarr.2. Retrieve Metadata from Database
FileSegmentServiceto fetch metadata for each file, including:locationfile_nameidlocation_datafile_freqsfile_start_timefile_end_time3. Load Zarr Datasets
open_zarr_store()to lazily load both normal and denoised datasets.4. Concatenate Zarr Datasets
concatenate_zarr_files()to merge all datasets while keeping metadata.5. Convert to NetCDF4
save_dataset_to_netcdf()to convert the dataset.6. Upload to Output Container
output_container.generate_container_access_url().