Add MOOSEz segmentation workflow with Dockerfiles, WDL pipelines, and Terra notebooks#67
Open
Sunderlandkyl wants to merge 27 commits into
Open
Add MOOSEz segmentation workflow with Dockerfiles, WDL pipelines, and Terra notebooks#67Sunderlandkyl wants to merge 27 commits into
Sunderlandkyl wants to merge 27 commits into
Conversation
Sunderlandkyl
commented
Jun 22, 2026
- Add inference and post-processing Dockerfiles
- Add Terra WDL workflows (single and split VM variants)
- Add preprocessing, inference, and post-processing notebooks
… Terra notebooks - Add inference and post-processing Dockerfiles - Add Terra WDL workflows (single and split VM variants) - Add preprocessing, inference, and post-processing notebooks
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
dataset.json's "labels" field is {organ_name: label_id} (nnU-Net v2 schema,
confirmed against moosez's own Model.__get_organ_indices), but the bundling
code assumed the opposite ({label_id: organ_name}) and filtered on
k.isdigit() -- which is never true for a name key, so organ_indices came out
empty for every model and every segment fell back to "segment_N".
Capture organ_indices straight from each Model object's own .organ_indices
during inference instead of re-parsing dataset.json, so this can't drift
from how moosez itself reads its files again.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Enable running moosePostProcessNotebook.ipynb standalone in Colab or a local Jupyter/Python kernel. A presence-gated setup cell installs the missing Python packages, the lz4 CLI, dcmqi 1.5.4 (itkimage2segimage with labelmap support) and SNOMED.py, mirroring the post_process_moose Dockerfile. Inside the prebuilt image every check is a no-op. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ence Switch the post_process_moose Dockerfile and post-process notebook from the dcmqi release tarball to pip (dcmqi==1.5.5). Remove dcmqi from the inference image, which only produces NIfTI and never runs itkimage2segimage. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…stly The PyPI `dcmqi` package version is independent of the dcmqi release it bundles; `dcmqi==1.5.5` does not exist, so the install failed. Pin `dcmqi==0.4.1`, which ships dcmqi binaries v1.5.5, in the post-process Dockerfile and notebook. In the notebook, also locate itkimage2segimage via the installed package's bundled bin/ directory and prepend it to PATH, so the converter is found even when pip's console-script shim lands in a directory off PATH (e.g. ~/.local/bin). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…i 1.5.5 Add --useLabelIDAsSegmentNumber to itkimage2segimage so the DICOM-SEG segment numbers match the source moosez label values instead of being renumbered 1..N. Also change the bare --skip switch to "--skip 1": in dcmqi 1.5.5 --skip takes an int (default 1 = skip empty slices), so the valueless form from older dcmqi would now fail argument parsing. "1" preserves the prior skip-empty-slices behavior. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The inference step bundled each label's SNOMED {ID, name} from moosez's own
mappings.SNOMED into moose_organ_indices.json. That data still carries the
upstream issues (left/right structures collapsed to one code, gluteus_minimus
mapped to the medius code) and only a single type code with no category,
laterality modifier, anatomic region, or curated color.
Switch the source of truth to workflows/MOOSE/resources/moose_snomed_mapping.csv:
- Inference notebook: replace the moosez.mappings.SNOMED lookup with a stdlib-csv
loader that reads the curated CSV (wget'd next to the notebook) and bundles a
rich record per label -- category, type, laterality modifier, anatomic region
(+ modifier), and rgb -- into moose_organ_indices.json. Canonicalizes the
vertebra/vertebrae naming difference.
- Post-process notebook: build_dcmqi_config now emits the bundled category, type,
laterality modifier, and anatomic region, and uses the CSV color (falling back
to the distinct-color palette). Removes the now-dead SNOMED.py download.
- twoVM.wdl: wget the CSV in the inference task.
Builds on the corrected CSV (gluteus_minimus -> 75297007; laterality encoded as
a Type Modifier so left/right no longer share one code).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The SNOMED mapping is static, so attaching it to every label in
moose_organ_indices.json at inference time (and rewriting that JSON on every
run) is needless work on the GPU VM. Move the CSV resolution to the
post-process step, which is where the DICOM-SEG metadata is actually built.
- Inference notebook: organ_indices is back to {model: {label_id: name}} --
just the moosez label names. Drops the CSV loader/lookup and the SNOMED
coverage print.
- Post-process notebook: loads moose_snomed_mapping.csv once and resolves each
label name (canonicalizing vertebra/vertebrae) in build_dcmqi_config, emitting
category / type / laterality modifier / anatomic region / curated color, with
the distinct-color palette as the rgb fallback. organ_name() tolerates the
older bundled {"name","snomed"} shape for back-compat.
- twoVM.wdl: wget the CSV in the post-process task instead of the inference task.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Pass --compress deflate to itkimage2segimage so each generated SEG is written compressed, shrinking the .dcm files and the packaged archive. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Task 2 now receives Task 1's usageMetricsCsv as a new WDL input, merges it with its own metrics (inference_/postprocess_ prefixes resolve the series_download_s column collision between phases), and uploads the combined CSV to a _metrics/ subfolder under dicomSegBucketUri. Drop the JSON metrics files entirely -- the CSV already covers everything needed for analysis, and the full usage_metrics dict is still visible in the output notebook's printed cell output if needed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
MOOSE: source SNOMED coding from curated CSV (laterality, gluteus_minimus, category/region/colors)
…ization The post-process notebook no longer substitutes a generic placeholder (Body structure + palette color) for labels it can't resolve. build_dcmqi_config now collects every label that lacks a complete SNOMED entry (category + type + color) and raises KeyError listing them, so mapping gaps fail loudly instead of producing meaningless DICOM SEG metadata. Also removes _canon_label, which rewrote "vertebrae_*" -> "vertebra_*". moosez emits plural vertebra labels (vertebrae_C1, ...) and the curated CSV now keys them plural too, so the lookup is an exact match and the canonicalization (applied to both sides) was redundant. The loader keys by the raw label name and raises on a genuine duplicate-with-conflicting-codes. Updates the stale section doc (SNOMED is sourced from the curated CSV here, not moosez.mappings.SNOMED) and the "unknown segments use generic codes" warning. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… and twoVM.wdl Lets both notebooks pull DICOM from a gs:// URI via s5cmd (HMAC creds from Secret Manager) instead of IDC, sorting downloaded files into series folders by SeriesInstanceUID. Threads the new input_uri/secret_project parameters through twoVM.wdl so the source can be selected from Terra. In the inference notebook, checkpoint restore is moved to run after both download paths so its run_key reflects the series actually processed, since GCS-sourced series_uids aren't known until after download+sort completes. In the post-process notebook, GCS DICOM is bulk-downloaded and sorted once before the per-series loop rather than per series. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Pass Task 1's moose_stats.tar.lz4 into the post-process task and upload its per-series volume/HU-intensity CSVs to a _stats/ prefix under dicomSegBucketUri, mirroring the existing _metrics/ usage-metrics upload.
Neither MOOSE Docker image installs the gcloud SDK, so GCS-input-mode runs failed with FileNotFoundError: 'gcloud'. Switch both notebooks to the google-cloud-secret-manager Python client (matching the existing google-cloud-storage/ADC pattern), pin the package in both Dockerfiles, and add runtime pip-install fallbacks (inference: on ImportError; post-process: via its existing prereq-installer cell, which now also installs s5cmd for standalone/Colab GCS-mode runs). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.