Skip to content

Move complex-level scores from per-interface to end-of-report (with PAE)#14

Merged
DimaMolod merged 3 commits into
mainfrom
report_percentiles
Jun 1, 2026
Merged

Move complex-level scores from per-interface to end-of-report (with PAE)#14
DimaMolod merged 3 commits into
mainfrom
report_percentiles

Conversation

@DimaMolod
Copy link
Copy Markdown
Collaborator

@DimaMolod DimaMolod commented Jun 1, 2026

Summary

Two focused tweaks to the AlphaJudge validation report layout, plus a README touch-up so the docs match.

1. Move complex-level features off the per-interface slider pages

confidence_score and pDockQ/mpDockQ are scalars per predicted complex, not per chain pair — they had the same value on every interface page of a given model, which was visually misleading and duplicated information.

  • _AF_DERIVED_FEATURES drops confidence_score and pDockQ/mpDockQ. iptm stays because AF3 reports a per-pair chain_pair_iptm.
  • New _COMPLEX_LEVEL_FEATURES = (confidence_score, pDockQ/mpDockQ).
  • New _complex_evidence_page renders the two complex-level sliders on top and the AlphaFold-DB-style green PAE heatmap below.
    • Per-run report: one such page is always appended at the end (replaces the old PAE-only page).
    • Aggregate report: appends a "Per-complex evidence" section after the per-interface slider pages, with one combined page per top-N complex.
  • runner.process_many stamps every aggregated row with an absolute source_dir so the aggregate report can locate each complex's PAE PNG without an extra flag.

2. Fix --max-complexes semantics in the aggregate report

With per-interface ranking, ranked[:max_complexes] was silently slicing interface rows rather than complexes — a single multimer could fill the cap and exclude every other complex. The cap now walks metascore-sorted and keeps every interface row whose complex is among the first max_complexes distinct complexes encountered. The per-complex evidence section honours the same cap (min(top_n, max_complexes)).

3. README touch-up (docs drift)

The README still described --aggregate_report as "one slider page per interface plus a cover" and report.pdf without mentioning the Complex-level confidence & PAE final page. Updated the --aggregate_report bullet and the two output bullets so they describe the actual page sequence.

Files changed

  • src/alphajudge/report.py — new _complex_evidence_page, updated _metric_rows_for_slider_panel, per-run + aggregate generators, max_complexes semantics.
  • src/alphajudge/runner.py — stamp source_dir on aggregated rows.
  • test/test_report.py — updated page-count expectations.
  • README.md — describe Per-complex evidence section.

No new dependencies, no metascore math change, no CLI changes.

Test plan

  • pytest test/test_meta_score.py test/test_report.py -q — 9 passed.
  • Per-run report on AF3 9-chain multimer (8hhy): 18 pages — cover, per-interface table, 15 slider pages, complex-evidence + PAE.
  • Aggregate report on mixed cohort (1 multimer + 2 random dimers): 21 pages — cover, 17 interface pages, 3 per-complex evidence pages with PAE.
  • --max-complexes 1 on the mixed cohort: 17 pages = cover + all 15 interfaces of the top complex + 1 evidence page (rather than the old behaviour of capping at 1 interface row total).

🤖 Generated with Claude Code

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0d3e01a841

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/alphajudge/report.py Outdated
ranked.sort(key=lambda t: t[3], reverse=True)

top_rows = [(label, score, r) for label, _, _, score, r in ranked[:top_n]]
ranked_per_page = ranked if max_complexes is None else ranked[:max_complexes]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Apply max_complexes to complexes, not raw interfaces

When --max-complexes is used on an aggregate report that includes multimers, ranked contains one entry per interface row, so this slice limits the PDF to the first N interfaces rather than the first N complexes. A single complex with many high-scoring chain pairs can consume the entire cap and exclude other complexes from the interface pages, which contradicts the max_complexes option/help and makes capped cohort reports misleading.

Useful? React with 👍 / 👎.

DimaMolod added 2 commits June 1, 2026 15:59
The per-interface slider panel previously repeated two scalars that are
properties of the predicted complex (confidence_score and
pDockQ/mpDockQ), even though they have the same value on every chain
pair of a given model. That was visually misleading and duplicated the
information across every page.

Reorganisation:
  - _AF_DERIVED_FEATURES drops confidence_score and pDockQ/mpDockQ.
    iptm stays because AF3 provides per-pair chain_pair_iptm so it is
    meaningfully per-interface.
  - New _COMPLEX_LEVEL_FEATURES = (confidence_score, pDockQ/mpDockQ).
  - New _complex_evidence_page renders the complex-level sliders on
    top and embeds the PAE heatmap below. One such page is appended to
    every per-run report (it replaces the old PAE-only page), so the
    per-run report always carries PAE even when the matrix-derived PNG
    is missing.
  - generate_aggregate_report appends a "Per-complex evidence" section
    after all per-interface slider pages, with one combined page per
    unique complex (limited to top_n=10 to keep cohort PDFs bounded).
    Each evidence page locates its PAE PNG via the new source_dir
    column on every summary row.

Runner change:
  - process_many stamps every aggregated row with an absolute source_dir
    so the aggregate report can resolve per-complex side files (e.g. the
    PAE PNG) without a separate --predictions-root flag.

Tests updated to match the new page counts.
In the per-interface aggregate, ``ranked`` holds one entry per
chain-pair interface, so a single multimer can fill the entire
``ranked[:max_complexes]`` slice and silently exclude every other
complex. That contradicts the option name and help string.

Walk metascore-sorted instead and keep every interface row whose
complex is among the first ``max_complexes`` complexes encountered;
this preserves the per-complex semantics. The per-complex evidence
section now also respects the same cap (``min(top_n, max_complexes)``)
so user-supplied caps shrink both sections consistently.
@DimaMolod DimaMolod force-pushed the report_percentiles branch from 806983b to 86511f5 Compare June 1, 2026 13:59
@DimaMolod DimaMolod changed the title AlphaJudge percentile-style validation reports Move complex-level scores from per-interface to end-of-report (with PAE) Jun 1, 2026
The README still described --aggregate_report as "one slider page per
interface" plus a cover, which has been incomplete since the layout
change in the previous commit (09f1053): the aggregate report now also
appends a Per-complex evidence section with one combined slider+PAE
page per top-N complex, and the per-run report's last page now combines
the complex-level confidence sliders with the PAE heatmap.

Updates the bullet for --aggregate_report and the two output bullets
that describe report.pdf / aggregate PDF contents.
@DimaMolod DimaMolod merged commit 99fb43a into main Jun 1, 2026
8 checks passed
@DimaMolod DimaMolod deleted the report_percentiles branch June 1, 2026 14:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant