Skip to content

Resolve milestone 2.1.0 issues#628

Open
jscgh wants to merge 17 commits into
devfrom
milestone-2.1.0-issues
Open

Resolve milestone 2.1.0 issues#628
jscgh wants to merge 17 commits into
devfrom
milestone-2.1.0-issues

Conversation

@jscgh

@jscgh jscgh commented May 28, 2026

Copy link
Copy Markdown
Contributor

Resolves #619 #209 #456 #489 #576

PR checklist

  • This comment contains a description of changes (with reason).
  • Make sure your code lints (nf-core pipelines lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@jscgh jscgh added this to the 2.1.0 milestone May 28, 2026
@jscgh jscgh self-assigned this May 28, 2026
@github-actions

github-actions Bot commented May 28, 2026

Copy link
Copy Markdown

nf-core pipelines lint overall result: Passed ✅ ⚠️

Posted for pipeline commit 02b886b

+| ✅ 350 tests passed       |+
#| ❔   4 tests were ignored |#
!| ❗  31 tests had warnings |!
Details

❗ Test warnings:

  • files_exist - File not found: conf/igenomes.config
  • files_exist - File not found: conf/igenomes_ignored.config
  • pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
  • pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
  • pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!
  • pipeline_todos - TODO string in methods_description_template.yml: #Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline
  • pipeline_todos - TODO string in nextflow.config: Specify any additional parameters here
  • pipeline_todos - TODO string in awsfulltest.yml: You can customise AWS full pipeline tests as required
  • schema_description - No description provided in schema for parameter: rosettafold2na_uniref30_link
  • schema_description - No description provided in schema for parameter: rosettafold2na_bfd_link
  • schema_description - No description provided in schema for parameter: rosettafold2na_pdb100_link
  • schema_description - No description provided in schema for parameter: rosettafold2na_weights_link
  • schema_description - No description provided in schema for parameter: rfam_full_region_link
  • schema_description - No description provided in schema for parameter: rfam_cm_link
  • schema_description - No description provided in schema for parameter: rnacentral_rfam_annotations_link
  • schema_description - No description provided in schema for parameter: rnacentral_id_mapping_link
  • schema_description - No description provided in schema for parameter: rnacentral_sequences_link
  • schema_description - No description provided in schema for parameter: rosettafold2na_uniref30_path
  • schema_description - No description provided in schema for parameter: rosettafold2na_bfd_path
  • schema_description - No description provided in schema for parameter: rosettafold2na_pdb100_path
  • schema_description - No description provided in schema for parameter: rosettafold2na_weights_path
  • local_component_structure - aria2_uncompress.nf in subworkflows/local should be moved to a SUBWORKFLOW_NAME/main.nf structure
  • local_component_structure - post_processing.nf in subworkflows/local should be moved to a SUBWORKFLOW_NAME/main.nf structure
  • local_component_structure - prepare_alphafold2_dbs.nf in subworkflows/local should be moved to a SUBWORKFLOW_NAME/main.nf structure
  • local_component_structure - prepare_esmfold_dbs.nf in subworkflows/local should be moved to a SUBWORKFLOW_NAME/main.nf structure
  • local_component_structure - prepare_rosettafold_all_atom_dbs.nf in subworkflows/local should be moved to a SUBWORKFLOW_NAME/main.nf structure
  • local_component_structure - prepare_alphafold3_dbs.nf in subworkflows/local should be moved to a SUBWORKFLOW_NAME/main.nf structure
  • local_component_structure - prepare_rosettafold2na_dbs.nf in subworkflows/local should be moved to a SUBWORKFLOW_NAME/main.nf structure
  • local_component_structure - prepare_boltz_dbs.nf in subworkflows/local should be moved to a SUBWORKFLOW_NAME/main.nf structure
  • local_component_structure - prepare_colabfold_dbs.nf in subworkflows/local should be moved to a SUBWORKFLOW_NAME/main.nf structure
  • local_component_structure - prepare_helixfold3_dbs.nf in subworkflows/local should be moved to a SUBWORKFLOW_NAME/main.nf structure

❔ Tests ignored:

✅ Tests passed:

Run details

  • nf-core/tools version 4.0.2
  • Run at 2026-06-15 06:33:25

@jscgh

jscgh commented May 28, 2026

Copy link
Copy Markdown
Contributor Author

Seems the GitHub runners fail to pull the containers for #365. I'll strip out that change tomorrow.

Edit: I will increase the sharding, same as other nf-core pipelines.

@jscgh jscgh force-pushed the milestone-2.1.0-issues branch from c531231 to c428d14 Compare May 29, 2026 01:34
@jscgh jscgh force-pushed the milestone-2.1.0-issues branch from c428d14 to 1228ff1 Compare May 29, 2026 02:16
@jscgh

jscgh commented May 29, 2026

Copy link
Copy Markdown
Contributor Author

I've split off the CI/container changes and opened a separate PR for them (#630).

@jscgh jscgh requested a review from tlitfin May 29, 2026 02:49
@jscgh jscgh requested review from JoseEspinosa and removed request for JoseEspinosa May 29, 2026 05:59
@keiran-rowell-unsw keiran-rowell-unsw self-requested a review June 9, 2026 04:57
Comment thread bin/extract_metrics.py Outdated

@keiran-rowell-unsw keiran-rowell-unsw left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only strongly desired change is import warning & throwing a warning if infer_model_rank() returns a None. I think that line ~160 is the single most valuable place, but might be worth a couple more?

Also ,one suggested naming change.

Stub runs pass (minus version bumps) and end-to-end scientific .tsv validation seems okay.
I defer to Tom's sign off on the nitty-gritty ranking correctness of ColabFold, Boltz etc.

Comment thread bin/extract_metrics.py
Comment on lines +134 to +139
try:
return (0, int(label))
except (TypeError, ValueError):
return (1, str(label))

def infer_model_rank(file_path):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking for a colabfold quick fix for v2.1, but it would be good for rank sorting to be in utils.py with --prog passed to it. Then imported like the other previous helper functions.

That would allow easy extension of mode-specific ranking patterns.
Would also allow one central function to maintain that can be called anywhere else rank-based ordering is needed.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A second benefit is that, if --prog is passed through in a future function, it makes the intended pattern for that mode explicit and avoids ambiguous matches and potential regex collisions when new modes might do different filenaming

Comment thread bin/extract_metrics.py
break
return result

sorted_entries = sorted(chain_pair_entries.items(), key=lambda item: sort_model_label(item[0]))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this sorted() lambda approach and clear index access to the first sorted entry here 👍

Comment thread bin/extract_metrics.py
Comment thread bin/extract_metrics.py Outdated
Comment thread bin/extract_metrics.py Outdated
Comment on lines 655 to 660
@@ -632,14 +658,13 @@ def read_colabfold_metrics(name, colabfold_metrics_fns, struct_files=None):
ipsae_rows = []
chainwise_iptm = {}
chainwise_ipsae = {}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Future (v3), not for this minor release: I like the move toward program-specific extraction logic rather than the prior file-extension-driven initial implementation.

For future refactoring development, I think a lot of the metric row list initialisation and write_tsv() boilerplate can be avoided being repeated by building this into a shared read_metrics(...) function that has program details passed through the function signature.

With something like:
read_metrics(program=args.prog, struct_files=args.structs, metrics_files=...)

Then all program specific compressed data deserialisation can be branched on --prog=${meta.mode}, and all the great ranking and structure mapping helpers can be re-used without altering lines in different functions.

@keiran-rowell-unsw keiran-rowell-unsw self-requested a review June 11, 2026 07:12

@keiran-rowell-unsw keiran-rowell-unsw left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No more changes request, will let Tom review when I'm on leave

@keiran-rowell-unsw keiran-rowell-unsw dismissed their stale review June 11, 2026 07:14

Changes made - clearing my block requests while going on leave

@tlitfin tlitfin left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may need to split out the colabfold templates feature from this PR.

Using the custom-template-path flag will lead to the colabfold_batch process modifying the global mmcif directory which is not ideal. It will also lead to searching a highly redundant pdb database.

I think the correct solution is to add the templates flag to the prepare colabfold dbs subworkflow and then add the required flag to colabfold_search and then also add the pdb_hit_file and local_pdb_path to the call to colabfold_batch. The helptext suggests using the pdb nested structure but looks like it will fall back to the flat PDB structure as well.

@jscgh

jscgh commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

I think we may need to split out the colabfold templates feature from this PR.

Using the custom-template-path flag will lead to the colabfold_batch process modifying the global mmcif directory which is not ideal. It will also lead to searching a highly redundant pdb database.

I think the correct solution is to add the templates flag to the prepare colabfold dbs subworkflow and then add the required flag to colabfold_search and then also add the pdb_hit_file and local_pdb_path to the call to colabfold_batch. The helptext suggests using the pdb nested structure but looks like it will fall back to the flat PDB structure as well.

For now I've set --colabfold_use_templates to only work when --use_msa_server=true. We can overhaul the prepare_colabfold_dbs step to generate local templates in another PR.

@jscgh jscgh requested a review from tlitfin June 15, 2026 06:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants