Move complex-level scores from per-interface to end-of-report (with PAE)#14
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0d3e01a841
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| ranked.sort(key=lambda t: t[3], reverse=True) | ||
|
|
||
| top_rows = [(label, score, r) for label, _, _, score, r in ranked[:top_n]] | ||
| ranked_per_page = ranked if max_complexes is None else ranked[:max_complexes] |
There was a problem hiding this comment.
Apply max_complexes to complexes, not raw interfaces
When --max-complexes is used on an aggregate report that includes multimers, ranked contains one entry per interface row, so this slice limits the PDF to the first N interfaces rather than the first N complexes. A single complex with many high-scoring chain pairs can consume the entire cap and exclude other complexes from the interface pages, which contradicts the max_complexes option/help and makes capped cohort reports misleading.
Useful? React with 👍 / 👎.
The per-interface slider panel previously repeated two scalars that are
properties of the predicted complex (confidence_score and
pDockQ/mpDockQ), even though they have the same value on every chain
pair of a given model. That was visually misleading and duplicated the
information across every page.
Reorganisation:
- _AF_DERIVED_FEATURES drops confidence_score and pDockQ/mpDockQ.
iptm stays because AF3 provides per-pair chain_pair_iptm so it is
meaningfully per-interface.
- New _COMPLEX_LEVEL_FEATURES = (confidence_score, pDockQ/mpDockQ).
- New _complex_evidence_page renders the complex-level sliders on
top and embeds the PAE heatmap below. One such page is appended to
every per-run report (it replaces the old PAE-only page), so the
per-run report always carries PAE even when the matrix-derived PNG
is missing.
- generate_aggregate_report appends a "Per-complex evidence" section
after all per-interface slider pages, with one combined page per
unique complex (limited to top_n=10 to keep cohort PDFs bounded).
Each evidence page locates its PAE PNG via the new source_dir
column on every summary row.
Runner change:
- process_many stamps every aggregated row with an absolute source_dir
so the aggregate report can resolve per-complex side files (e.g. the
PAE PNG) without a separate --predictions-root flag.
Tests updated to match the new page counts.
In the per-interface aggregate, ``ranked`` holds one entry per chain-pair interface, so a single multimer can fill the entire ``ranked[:max_complexes]`` slice and silently exclude every other complex. That contradicts the option name and help string. Walk metascore-sorted instead and keep every interface row whose complex is among the first ``max_complexes`` complexes encountered; this preserves the per-complex semantics. The per-complex evidence section now also respects the same cap (``min(top_n, max_complexes)``) so user-supplied caps shrink both sections consistently.
806983b to
86511f5
Compare
The README still described --aggregate_report as "one slider page per interface" plus a cover, which has been incomplete since the layout change in the previous commit (09f1053): the aggregate report now also appends a Per-complex evidence section with one combined slider+PAE page per top-N complex, and the per-run report's last page now combines the complex-level confidence sliders with the PAE heatmap. Updates the bullet for --aggregate_report and the two output bullets that describe report.pdf / aggregate PDF contents.
Summary
Two focused tweaks to the AlphaJudge validation report layout, plus a README touch-up so the docs match.
1. Move complex-level features off the per-interface slider pages
confidence_scoreandpDockQ/mpDockQare scalars per predicted complex, not per chain pair — they had the same value on every interface page of a given model, which was visually misleading and duplicated information._AF_DERIVED_FEATURESdropsconfidence_scoreandpDockQ/mpDockQ.iptmstays because AF3 reports a per-pairchain_pair_iptm._COMPLEX_LEVEL_FEATURES = (confidence_score, pDockQ/mpDockQ)._complex_evidence_pagerenders the two complex-level sliders on top and the AlphaFold-DB-style green PAE heatmap below.runner.process_manystamps every aggregated row with an absolutesource_dirso the aggregate report can locate each complex's PAE PNG without an extra flag.2. Fix
--max-complexessemantics in the aggregate reportWith per-interface ranking,
ranked[:max_complexes]was silently slicing interface rows rather than complexes — a single multimer could fill the cap and exclude every other complex. The cap now walks metascore-sorted and keeps every interface row whose complex is among the firstmax_complexesdistinct complexes encountered. The per-complex evidence section honours the same cap (min(top_n, max_complexes)).3. README touch-up (docs drift)
The README still described
--aggregate_reportas "one slider page per interface plus a cover" andreport.pdfwithout mentioning the Complex-level confidence & PAE final page. Updated the--aggregate_reportbullet and the two output bullets so they describe the actual page sequence.Files changed
src/alphajudge/report.py— new_complex_evidence_page, updated_metric_rows_for_slider_panel, per-run + aggregate generators, max_complexes semantics.src/alphajudge/runner.py— stampsource_diron aggregated rows.test/test_report.py— updated page-count expectations.README.md— describe Per-complex evidence section.No new dependencies, no metascore math change, no CLI changes.
Test plan
pytest test/test_meta_score.py test/test_report.py -q— 9 passed.8hhy): 18 pages — cover, per-interface table, 15 slider pages, complex-evidence + PAE.--max-complexes 1on the mixed cohort: 17 pages = cover + all 15 interfaces of the top complex + 1 evidence page (rather than the old behaviour of capping at 1 interface row total).🤖 Generated with Claude Code