Report PDB-like percentiles sliders by DimaMolod · Pull Request #13 · KosinskiLab/AlphaJudge

DimaMolod · 2026-05-27T13:56:16Z

No description provided.

Introduces a transparent, rank-style metascore that converts each of the 10 selected AlphaJudge interface features (LIS, ipSAE, pDockQ2, ipTM, confidence, average interface PAE, pDockQ/mpDockQ, shape complementarity, interface area, solvation energy) to its percentile against the frozen benchmark_26 reference distribution and averages them. PAE and solvation energy are sign-flipped so that higher percentile always means stronger interaction evidence, and missing or non-finite inputs are ignored. Reference deciles are baked into BENCHMARK_QUANTILES so the score is reproducible and independent of any per-call benchmark file. Helper calibrated_feature_percentile() can also be called directly to obtain the percentile for an individual feature. The scoring runner now writes interface_meta_score alongside the existing columns in interfaces.csv, so every per-run output has a single ranking number ready for downstream sorting or reporting. Tests in test/test_meta_score.py cover bounded output, NaN handling, direction of inverted features, and clamping at the unit interval.

Generates PDF reports that visualise each AlphaJudge interface metric as a percentile slider against the frozen benchmark_26 reference deciles, mirroring the wwPDB "Overall quality at a glance" layout: smooth red -> yellow -> green gradient bars with a single black marker, serif typography, page header rule with title/entry id, and a "Continued on next page" footer. Public API: - generate_per_run_report(run_dir) writes report.pdf next to interfaces.csv with a cover, an overall slider panel, a per-interface table, the PAE heatmap, and one per-model appendix page per non-best sample. - generate_aggregate_report(summary_csv) writes a multi-page PDF with a cohort cover (meta-score histogram, summary statistics, top-N table) plus one slider page per complex, ranked by interface_meta_score. - main_aggregate() exposes both modes through the alphajudge-report CLI (wired in a follow-up commit). Tests cover per-run + aggregate generation, the missing-CSV fallback, and the fallback that recomputes the meta score when an input row is missing the precomputed interface_meta_score column.

Surfaces the report module so reports can be generated as part of a scoring run without an explicit second step. CLI (alphajudge): - Mutually-exclusive --report / --no-report flags. Default is on when a single run is scored and off when --summary is requested, so benchmark aggregations stay fast. - --aggregate_report PATH writes a cohort PDF from the --summary CSV after scoring finishes (errors out if --summary is missing). Runner: - process_many / _process_one_run gain a write_per_run_report kwarg that invokes the report module via a defensive _safe_write_per_run_report helper after each per-run CSV is materialised (including the reuse paths). The helper imports the report module lazily and swallows any import or runtime error so a matplotlib hiccup never blocks scoring. Packaging: - pyproject bumps version to 1.0.1 and exposes the alphajudge-report console entry, which dispatches to per-run or aggregate mode based on whether the input is a directory or a CSV.

When a run has more than one chain-pair interface (any multimer prediction), include a "Per-interface raw scores" page between the overall quality slider panel and the PAE heatmap. Rows are sorted by interface_meta_score descending so the strongest interfaces appear first, making it easy to read off which subcomplex pairs are well predicted and which are not. For single-interface dimers nothing changes -- the page is skipped so the report stays compact. Also tightens the per-interface table layout: column widths give the Interface and Residues headers enough room, the intro text is split across two short lines instead of one wide one that clipped on A4, and the table sits at y_top=0.78 with row_height=0.024 so 15+ rows fit without crowding the footer.

Per-interface pages: generate_per_run_report now produces one "Overall quality at a glance" page per detected interface, sorted by interface_meta_score descending, instead of only showing the best chain pair. For a 15- interface multimer that means 15 slider pages numbered 2.1 .. 2.15, preceded by the cohort overview table. Dimers with a single interface still get exactly one quality page numbered "1" (no sub-index) so their reports stay compact. Each page's pre-header now shows the model name + chain pair + residue count for that specific interface, not just the best one. Cover info box: Replace "frozen benchmark deciles" with the more accurate "frozen benchmark distribution"; expand the explanation onto a fourth line so the pink box no longer crops the closing words.

The cohort cover and per-page summaries now treat each chain-pair interface as the unit of analysis, not the predicted complex. Concretely generate_aggregate_report no longer groups rows by complex and keeps a "best per complex" entry; it ranks every scorable interface row directly. Consequences: - The histogram, min/median/mean/max, ≥0.5 and ≥0.7 counters all run over interfaces. A 9-chain multimer with 15 detected interfaces contributes 15 data points; a dimer contributes 1. Mixing the two in one cohort no longer under-represents multimers. - The "Top N" table is now "Top N interfaces by meta score" and uses a "complex · pair" label so multimer rows show which chain pair they refer to. - Each per-page slider panel is now an interface page, not a complex page (renamed _complex_summary_page -> _interface_summary_page). The header subtitle leads with the interface label and shows "Rank K of N" against the global interface ranking. - Backend counts in the cover meta block stay per-complex so they are not double-counted across a multimer's interfaces. Cover sub-title says "N interfaces across M complexes" so users see both axes at a glance. test/test_report.py is updated to match the new page count (cover + one page per scorable interface row).

Two pieces of repeated text added unnecessary clutter to every page: - The "Percentile ranks are computed against the AlphaJudge benchmark distribution; higher is better for every metric (sign-flipped where needed)." note at the bottom of every quality page. The slider's "Worse <- -> Better" axis already conveys this. - The "AlphaJudge" left-side label and the "Continued on next page..." middle text in the page footer. The thin footer rule plus the page number on the right are enough. After this change each page has a header rule (Page N / title / entry id) and a footer with just the rule and "N / total" page counter.

The page header carried "Page N" on the left, "Title" in the middle, and "entry id" on the right. The footer already shows "N / total" at the bottom right, so the header repeat was redundant. Header now reads "Title ... entry id" only. Module docstring updated.

Adds the new report flags to the CLI synopsis, the option list, the output description, and an example invocation; documents the alphajudge-report console entry that exposes the per-run and aggregate modes outside the scoring pipeline.

After rebasing report_percentiles onto the Boltz-2 work, note explicitly that --report and --aggregate_report flow through the same scoring path for AF2, AF3, and Boltz-2 runs, and broaden the cohort-report example to cover a mixed-backend root.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: edd466c437

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-27T13:59:54Z

+    if args.aggregate_report:
+        if not args.summary:
+            p.error("--aggregate_report requires --summary")
+        generate_aggregate_report(args.summary, out_pdf=args.aggregate_report)


Do not build aggregate reports from stale summaries

When --aggregate_report is requested, process_many can return None without writing the requested summary (for example, no runnable directories or all processed runs produce no rows), but this call still reads args.summary afterward. If that path already contains a CSV from a previous run, the command will silently produce an aggregate PDF for stale data instead of failing, which can misreport the current cohort. Capture the returned summary path and only generate the report when this invocation actually produced/reused rows for the requested summary.

Useful? React with 👍 / 👎.

process_many returns None whenever it does not actually write the summary (no paths, no runnable dirs, all workers produced no rows). The CLI previously ignored that return value, so if --aggregate_report pointed at a path that already had a CSV from a previous invocation, generate_aggregate_report would silently consume that stale file and emit a PDF for the wrong cohort. Capture the returned summary path and fail loudly via p.error when it is None, instead of building a report from data this invocation did not produce.

The previous green RdYlGn gradient with boxed % and Value columns made the page look like a dashboard. Switch to the wwPDB look: - Red -> pale center -> blue percentile bars (LinearSegmentedColormap), deliberately thin (bar_height ~0.011 of page). - Drop the explicit % column; percentile is conveyed by marker position. Three columns only: Metric | Percentile Ranks | Value. - Single small black marker per row, plus a thin black polyline connecting valid percentiles down the chart, exactly as on the wwPDB "Overall quality at a glance" page. - "Worse" / "Better" italic labels directly under the bars and a marker-glyph legend, in the wwPDB position. Page chrome: - Cover has no running header. It leads with a small text PDB wordmark, the report title, and a blue circled-i icon, matching the RCSB cover. - Pages 2+ get a single-rule running header: "Page N | Title | entry". - Footer is just a small wordmark, no page-number rule (the running header already carries the page label). - Pink info box is square-cornered and uses wwPDB red. - Section headings render number + title at 17pt with a tight gap; the optional info icon is now off by default so it never overlaps chain-pair labels like "Interface B_C". Typography: - rcParams switched to a Computer-Modern-style serif stack (CMU Serif / Latin Modern / STIXGeneral / DejaVu Serif fallback), mathtext "cm", base font 10pt, PDF font type 42 so text stays searchable in Acrobat. The slider primitive now exposes a draw_marker switch so the per-row bar is clean and the marker polyline is overlaid in a single axes, which avoids axis clipping for off-bar marker rectangles.

Branding (no external logos): - Remove the generated PDB-style wordmark and the vector chain-pair logo. The cover page no longer carries any third-party mark; only the report title remains. The footer carries a plain "AlphaJudge report" text mark. - Title is now "AlphaJudge Interface validation Report". PAE page + standalone PAE PNG (shared rendering): - report.py exposes render_pae_png(out_path, pae, ...) which produces an AlphaFold-DB-like standalone PNG: green square heatmap, horizontal "Expected position error (Ångströms)" colour bar, Scored / Aligned residue axes, black inter-chain separator lines. - runner._save_pae_heatmap now delegates to render_pae_png so the pae_<model>.png files written during scoring match the in-report PAE page exactly. No if/else fallback path: the report just embeds the standalone PNG. Slider panel: - Show the overall meta score as a separate "Meta score" row at the top, visually offset from the per-feature rows by an inter-group gap, in the same typography as the other rows (PDB-validation-style uniform treatment). The marker polyline never crosses Meta score. - Split the polyline into two groups that connect related metrics: the AlphaFold-derived confidence features (LIS, ipSAE, pDockQ2, ipTM, confidence, avg interface PAE, pDockQ/mpDockQ) and the biophysical features (shape complementarity, interface area, solvation energy). Each group has its own connecting line. The cover page also drops the older "Overall meta score" label and restores the canonical "AlphaJudge Interface validation Report" title.

Inspecting the canonical benchmark_26 table (n=7,756 balanced) shows that interface_area and interface_solv_en are near-redundant (Pearson rho = -0.80), so the previous biophysical trio (sc / area / solv_en) was effectively (sc / size / size). interface_sc is the only biophysical feature that is genuinely orthogonal to the rest, and its AUROC (0.746) is the highest among biophysicals. Among the size-cluster features, interface_hb (0.703) is the most interpretable (count of polar contacts), is slightly less correlated with solv_en than area is, and adds a different physical concept (directional polar interactions) on top of geometry and hydrophobic burial. interface_sb (0.681), interface_ss (chance, disulfides too sparse) and interface_contact_pairs (0.694) are all weaker than interface_hb and more redundant with area. Changes: - META_SCORE_FEATURES swaps interface_area for interface_hb. - FEATURE_DIRECTIONS keeps interface_area for backward compat but adds interface_hb (direction +1). - BENCHMARK_QUANTILES gains an interface_hb entry computed from the final_sync_20260523 benchmark (deciles 0, 2, 4, 6, 8, 10, 12, 15, 20, 28, 129). - report.py's _BIOPHYSICAL_FEATURES shows (sc, hb, solv_en) and the display label "Hydrogen bonds". - Drop the small "AlphaJudge report" wordmark from the per-page footer; the running header at the top already identifies the report on every page. Existing interfaces.csv files have a stale interface_meta_score (computed with area instead of hb) until re-scored with --force_recompute. Tests updated to confirm the new metascore feature set behaves correctly.

Re-computes the 11-anchor decile table for every metascore feature on the final synchronized benchmark_26 best-interface CSV (benchmark_best.final_sync_20260523_225722_force_recompute_nointerfacefix.csv, n=7,756). The previous values were derived from an earlier n=7,345 April 22 snapshot before the pair-matched predictions were back-filled. Notable shifts (after sign-flip where applicable): - interface_LIS: p50 0.060 -> 0.041, p90 0.516 -> 0.510 (slightly tighter) - interface_pDockQ2: floor lowered from 7.5e-3 -> 0 (more dynamic range) - interface_sc: p10 -0.210 -> -0.091 (the new dataset has fewer extreme low-Sc cases at the tail) - interface_area: p100 23,847 -> 19,027 (one extreme outlier excluded by the back-fill) - interface_solv_en: p100 400.7 -> 233.0 - confidence_score: p0 0.127 -> -99.73 (sentinel for failed predictions now contributes to the tail) - interface_hb: unchanged (computed on this same table already) Header comment in BENCHMARK_QUANTILES updated to point at the new source. interface_meta_score values in existing interfaces.csv files will move by a few percentage points after re-scoring with --force_recompute; the 8hhy demo report was re-generated.

DimaMolod added 10 commits May 27, 2026 15:46

Document --report and --aggregate_report in README

edd466c

Adds the new report flags to the CLI synopsis, the option list, the output description, and an example invocation; documents the alphajudge-report console entry that exposes the per-run and aggregate modes outside the scoring pipeline.

chatgpt-codex-connector Bot reviewed May 27, 2026

View reviewed changes

DimaMolod added 5 commits May 28, 2026 09:30

DimaMolod merged commit 97c51ba into main May 28, 2026
12 checks passed

DimaMolod deleted the report_percentiles branch May 28, 2026 11:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Report PDB-like percentiles sliders#13

Report PDB-like percentiles sliders#13
DimaMolod merged 15 commits into
mainfrom
report_percentiles

DimaMolod commented May 27, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

DimaMolod commented May 27, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant