Resolve milestone 2.1.0 issues#628
Conversation
|
|
Seems the GitHub runners fail to pull the containers for #365. I'll strip out that change tomorrow. Edit: I will increase the sharding, same as other nf-core pipelines. |
c531231 to
c428d14
Compare
c428d14 to
1228ff1
Compare
|
I've split off the CI/container changes and opened a separate PR for them (#630). |
keiran-rowell-unsw
left a comment
There was a problem hiding this comment.
Only strongly desired change is import warning & throwing a warning if infer_model_rank() returns a None. I think that line ~160 is the single most valuable place, but might be worth a couple more?
Also ,one suggested naming change.
Stub runs pass (minus version bumps) and end-to-end scientific .tsv validation seems okay.
I defer to Tom's sign off on the nitty-gritty ranking correctness of ColabFold, Boltz etc.
| try: | ||
| return (0, int(label)) | ||
| except (TypeError, ValueError): | ||
| return (1, str(label)) | ||
|
|
||
| def infer_model_rank(file_path): |
There was a problem hiding this comment.
Non-blocking for a colabfold quick fix for v2.1, but it would be good for rank sorting to be in utils.py with --prog passed to it. Then imported like the other previous helper functions.
That would allow easy extension of mode-specific ranking patterns.
Would also allow one central function to maintain that can be called anywhere else rank-based ordering is needed.
There was a problem hiding this comment.
A second benefit is that, if --prog is passed through in a future function, it makes the intended pattern for that mode explicit and avoids ambiguous matches and potential regex collisions when new modes might do different filenaming
| break | ||
| return result | ||
|
|
||
| sorted_entries = sorted(chain_pair_entries.items(), key=lambda item: sort_model_label(item[0])) |
There was a problem hiding this comment.
I like this sorted() lambda approach and clear index access to the first sorted entry here 👍
| @@ -632,14 +658,13 @@ def read_colabfold_metrics(name, colabfold_metrics_fns, struct_files=None): | |||
| ipsae_rows = [] | |||
| chainwise_iptm = {} | |||
| chainwise_ipsae = {} | |||
There was a problem hiding this comment.
Future (v3), not for this minor release: I like the move toward program-specific extraction logic rather than the prior file-extension-driven initial implementation.
For future refactoring development, I think a lot of the metric row list initialisation and write_tsv() boilerplate can be avoided being repeated by building this into a shared read_metrics(...) function that has program details passed through the function signature.
With something like:
read_metrics(program=args.prog, struct_files=args.structs, metrics_files=...)
Then all program specific compressed data deserialisation can be branched on --prog=${meta.mode}, and all the great ranking and structure mapping helpers can be re-used without altering lines in different functions.
keiran-rowell-unsw
left a comment
There was a problem hiding this comment.
No more changes request, will let Tom review when I'm on leave
Changes made - clearing my block requests while going on leave
tlitfin
left a comment
There was a problem hiding this comment.
I think we may need to split out the colabfold templates feature from this PR.
Using the custom-template-path flag will lead to the colabfold_batch process modifying the global mmcif directory which is not ideal. It will also lead to searching a highly redundant pdb database.
I think the correct solution is to add the templates flag to the prepare colabfold dbs subworkflow and then add the required flag to colabfold_search and then also add the pdb_hit_file and local_pdb_path to the call to colabfold_batch. The helptext suggests using the pdb nested structure but looks like it will fall back to the flat PDB structure as well.
For now I've set |
Resolves #619 #209 #456 #489 #576
PR checklist
nf-core pipelines lint).nextflow run . -profile test,docker --outdir <OUTDIR>).nextflow run . -profile debug,test,docker --outdir <OUTDIR>).docs/output.mdis updated.CHANGELOG.mdis updated.README.mdis updated (including new tool citations and authors/contributors).