Add MOOSEz segmentation workflow with Dockerfiles, WDL pipelines, and Terra notebooks by Sunderlandkyl · Pull Request #67 · ImagingDataCommons/CloudSegmentator

Sunderlandkyl · 2026-06-22T20:38:58Z

Add inference and post-processing Dockerfiles
Add Terra WDL workflows (single and split VM variants)
Add preprocessing, inference, and post-processing notebooks

… Terra notebooks - Add inference and post-processing Dockerfiles - Add Terra WDL workflows (single and split VM variants) - Add preprocessing, inference, and post-processing notebooks

review-notebook-app · 2026-06-22T20:39:04Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

dataset.json's "labels" field is {organ_name: label_id} (nnU-Net v2 schema, confirmed against moosez's own Model.__get_organ_indices), but the bundling code assumed the opposite ({label_id: organ_name}) and filtered on k.isdigit() -- which is never true for a name key, so organ_indices came out empty for every model and every segment fell back to "segment_N". Capture organ_indices straight from each Model object's own .organ_indices during inference instead of re-parsing dataset.json, so this can't drift from how moosez itself reads its files again. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Enable running moosePostProcessNotebook.ipynb standalone in Colab or a local Jupyter/Python kernel. A presence-gated setup cell installs the missing Python packages, the lz4 CLI, dcmqi 1.5.4 (itkimage2segimage with labelmap support) and SNOMED.py, mirroring the post_process_moose Dockerfile. Inside the prebuilt image every check is a no-op. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…ence Switch the post_process_moose Dockerfile and post-process notebook from the dcmqi release tarball to pip (dcmqi==1.5.5). Remove dcmqi from the inference image, which only produces NIfTI and never runs itkimage2segimage. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…stly The PyPI `dcmqi` package version is independent of the dcmqi release it bundles; `dcmqi==1.5.5` does not exist, so the install failed. Pin `dcmqi==0.4.1`, which ships dcmqi binaries v1.5.5, in the post-process Dockerfile and notebook. In the notebook, also locate itkimage2segimage via the installed package's bundled bin/ directory and prepend it to PATH, so the converter is found even when pip's console-script shim lands in a directory off PATH (e.g. ~/.local/bin). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…i 1.5.5 Add --useLabelIDAsSegmentNumber to itkimage2segimage so the DICOM-SEG segment numbers match the source moosez label values instead of being renumbered 1..N. Also change the bare --skip switch to "--skip 1": in dcmqi 1.5.5 --skip takes an int (default 1 = skip empty slices), so the valueless form from older dcmqi would now fail argument parsing. "1" preserves the prior skip-empty-slices behavior. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The inference step bundled each label's SNOMED {ID, name} from moosez's own mappings.SNOMED into moose_organ_indices.json. That data still carries the upstream issues (left/right structures collapsed to one code, gluteus_minimus mapped to the medius code) and only a single type code with no category, laterality modifier, anatomic region, or curated color. Switch the source of truth to workflows/MOOSE/resources/moose_snomed_mapping.csv: - Inference notebook: replace the moosez.mappings.SNOMED lookup with a stdlib-csv loader that reads the curated CSV (wget'd next to the notebook) and bundles a rich record per label -- category, type, laterality modifier, anatomic region (+ modifier), and rgb -- into moose_organ_indices.json. Canonicalizes the vertebra/vertebrae naming difference. - Post-process notebook: build_dcmqi_config now emits the bundled category, type, laterality modifier, and anatomic region, and uses the CSV color (falling back to the distinct-color palette). Removes the now-dead SNOMED.py download. - twoVM.wdl: wget the CSV in the inference task. Builds on the corrected CSV (gluteus_minimus -> 75297007; laterality encoded as a Type Modifier so left/right no longer share one code). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The SNOMED mapping is static, so attaching it to every label in moose_organ_indices.json at inference time (and rewriting that JSON on every run) is needless work on the GPU VM. Move the CSV resolution to the post-process step, which is where the DICOM-SEG metadata is actually built. - Inference notebook: organ_indices is back to {model: {label_id: name}} -- just the moosez label names. Drops the CSV loader/lookup and the SNOMED coverage print. - Post-process notebook: loads moose_snomed_mapping.csv once and resolves each label name (canonicalizing vertebra/vertebrae) in build_dcmqi_config, emitting category / type / laterality modifier / anatomic region / curated color, with the distinct-color palette as the rgb fallback. organ_name() tolerates the older bundled {"name","snomed"} shape for back-compat. - twoVM.wdl: wget the CSV in the post-process task instead of the inference task. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Pass --compress deflate to itkimage2segimage so each generated SEG is written compressed, shrinking the .dcm files and the packaged archive. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Task 2 now receives Task 1's usageMetricsCsv as a new WDL input, merges it with its own metrics (inference_/postprocess_ prefixes resolve the series_download_s column collision between phases), and uploads the combined CSV to a _metrics/ subfolder under dicomSegBucketUri. Drop the JSON metrics files entirely -- the CSV already covers everything needed for analysis, and the full usage_metrics dict is still visible in the output notebook's printed cell output if needed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

MOOSE: source SNOMED coding from curated CSV (laterality, gluteus_minimus, category/region/colors)

…ization The post-process notebook no longer substitutes a generic placeholder (Body structure + palette color) for labels it can't resolve. build_dcmqi_config now collects every label that lacks a complete SNOMED entry (category + type + color) and raises KeyError listing them, so mapping gaps fail loudly instead of producing meaningless DICOM SEG metadata. Also removes _canon_label, which rewrote "vertebrae_*" -> "vertebra_*". moosez emits plural vertebra labels (vertebrae_C1, ...) and the curated CSV now keys them plural too, so the lookup is an exact match and the canonicalization (applied to both sides) was redundant. The loader keys by the raw label name and raises on a genuine duplicate-with-conflicting-codes. Updates the stale section doc (SNOMED is sourced from the curated CSV here, not moosez.mappings.SNOMED) and the "unknown segments use generic codes" warning. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

… and twoVM.wdl Lets both notebooks pull DICOM from a gs:// URI via s5cmd (HMAC creds from Secret Manager) instead of IDC, sorting downloaded files into series folders by SeriesInstanceUID. Threads the new input_uri/secret_project parameters through twoVM.wdl so the source can be selected from Terra. In the inference notebook, checkpoint restore is moved to run after both download paths so its run_key reflects the series actually processed, since GCS-sourced series_uids aren't known until after download+sort completes. In the post-process notebook, GCS DICOM is bulk-downloaded and sorted once before the per-series loop rather than per series. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Pass Task 1's moose_stats.tar.lz4 into the post-process task and upload its per-series volume/HU-intensity CSVs to a _stats/ prefix under dicomSegBucketUri, mirroring the existing _metrics/ usage-metrics upload.

Neither MOOSE Docker image installs the gcloud SDK, so GCS-input-mode runs failed with FileNotFoundError: 'gcloud'. Switch both notebooks to the google-cloud-secret-manager Python client (matching the existing google-cloud-storage/ADC pattern), pin the package in both Dockerfiles, and add runtime pip-install fallbacks (inference: on ImportError; post-process: via its existing prereq-installer cell, which now also installs s5cmd for standalone/Colab GCS-mode runs). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add MOOSEz segmentation workflow with Dockerfiles, WDL pipelines, and…

49e14f0

… Terra notebooks - Add inference and post-processing Dockerfiles - Add Terra WDL workflows (single and split VM variants) - Add preprocessing, inference, and post-processing notebooks

Sunderlandkyl and others added 26 commits June 23, 2026 12:30

Export execution metrics as CSV

d1d3a23

Update CSV column names

f28cb2f

Don't compress json metric files using lz4

f6e4d9e

Fix SNOMED mappings

46ffc7c

Pin dcmqi to v1.5.5 via pip in MOOSE inference Dockerfile

c78d3d9

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

update snomed mapping to latest

40dfcbe

Compress MOOSE DICOM-SEG output with deflate transfer syntax

87f3ed0

Pass --compress deflate to itkimage2segimage so each generated SEG is written compressed, shrinking the .dcm files and the packaged archive. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Merge branch 'moose_test' into moose_snomed_csv_mapping

0694233

Merge pull request #2 from fedorov/moose_snomed_csv_mapping

b18f0a4

MOOSE: source SNOMED coding from curated CSV (laterality, gluteus_minimus, category/region/colors)

Fix incorrect merge

b0a444f

Re-add csv metrics export

58e30a2

Add notebook to benchmark gcs vs S5cmd

372d091

Fix gcloud auth in download benchmark notebook

e307769

Fix benchmark manifest parsing

3bb4d9e

Upload MOOSE structure stats CSVs to the output bucket

678f92b

Pass Task 1's moose_stats.tar.lz4 into the post-process task and upload its per-series volume/HU-intensity CSVs to a _stats/ prefix under dicomSegBucketUri, mirroring the existing _metrics/ usage-metrics upload.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add MOOSEz segmentation workflow with Dockerfiles, WDL pipelines, and Terra notebooks#67

Add MOOSEz segmentation workflow with Dockerfiles, WDL pipelines, and Terra notebooks#67
Sunderlandkyl wants to merge 27 commits into
ImagingDataCommons:mainfrom
Sunderlandkyl:moose_test

Sunderlandkyl commented Jun 22, 2026

Uh oh!

review-notebook-app Bot commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Sunderlandkyl commented Jun 22, 2026

Uh oh!

review-notebook-app Bot commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants