Skip to content

Report PDB-like percentiles sliders#13

Merged
DimaMolod merged 15 commits into
mainfrom
report_percentiles
May 28, 2026
Merged

Report PDB-like percentiles sliders#13
DimaMolod merged 15 commits into
mainfrom
report_percentiles

Conversation

@DimaMolod
Copy link
Copy Markdown
Collaborator

No description provided.

DimaMolod added 10 commits May 27, 2026 15:46
Introduces a transparent, rank-style metascore that converts each of the
10 selected AlphaJudge interface features (LIS, ipSAE, pDockQ2, ipTM,
confidence, average interface PAE, pDockQ/mpDockQ, shape complementarity,
interface area, solvation energy) to its percentile against the frozen
benchmark_26 reference distribution and averages them. PAE and solvation
energy are sign-flipped so that higher percentile always means stronger
interaction evidence, and missing or non-finite inputs are ignored.

Reference deciles are baked into BENCHMARK_QUANTILES so the score is
reproducible and independent of any per-call benchmark file. Helper
calibrated_feature_percentile() can also be called directly to obtain the
percentile for an individual feature.

The scoring runner now writes interface_meta_score alongside the existing
columns in interfaces.csv, so every per-run output has a single ranking
number ready for downstream sorting or reporting.

Tests in test/test_meta_score.py cover bounded output, NaN handling,
direction of inverted features, and clamping at the unit interval.
Generates PDF reports that visualise each AlphaJudge interface metric as a
percentile slider against the frozen benchmark_26 reference deciles,
mirroring the wwPDB "Overall quality at a glance" layout: smooth red ->
yellow -> green gradient bars with a single black marker, serif
typography, page header rule with title/entry id, and a "Continued on
next page" footer.

Public API:
  - generate_per_run_report(run_dir) writes report.pdf next to
    interfaces.csv with a cover, an overall slider panel, a per-interface
    table, the PAE heatmap, and one per-model appendix page per non-best
    sample.
  - generate_aggregate_report(summary_csv) writes a multi-page PDF with a
    cohort cover (meta-score histogram, summary statistics, top-N table)
    plus one slider page per complex, ranked by interface_meta_score.
  - main_aggregate() exposes both modes through the alphajudge-report CLI
    (wired in a follow-up commit).

Tests cover per-run + aggregate generation, the missing-CSV fallback,
and the fallback that recomputes the meta score when an input row is
missing the precomputed interface_meta_score column.
Surfaces the report module so reports can be generated as part of a
scoring run without an explicit second step.

CLI (alphajudge):
  - Mutually-exclusive --report / --no-report flags. Default is on when a
    single run is scored and off when --summary is requested, so
    benchmark aggregations stay fast.
  - --aggregate_report PATH writes a cohort PDF from the --summary CSV
    after scoring finishes (errors out if --summary is missing).

Runner:
  - process_many / _process_one_run gain a write_per_run_report kwarg
    that invokes the report module via a defensive
    _safe_write_per_run_report helper after each per-run CSV is
    materialised (including the reuse paths). The helper imports the
    report module lazily and swallows any import or runtime error so a
    matplotlib hiccup never blocks scoring.

Packaging:
  - pyproject bumps version to 1.0.1 and exposes the alphajudge-report
    console entry, which dispatches to per-run or aggregate mode based
    on whether the input is a directory or a CSV.
When a run has more than one chain-pair interface (any multimer
prediction), include a "Per-interface raw scores" page between the
overall quality slider panel and the PAE heatmap. Rows are sorted by
interface_meta_score descending so the strongest interfaces appear
first, making it easy to read off which subcomplex pairs are well
predicted and which are not.

For single-interface dimers nothing changes -- the page is skipped so
the report stays compact.

Also tightens the per-interface table layout: column widths give the
Interface and Residues headers enough room, the intro text is split
across two short lines instead of one wide one that clipped on A4, and
the table sits at y_top=0.78 with row_height=0.024 so 15+ rows fit
without crowding the footer.
Per-interface pages:
  generate_per_run_report now produces one "Overall quality at a glance"
  page per detected interface, sorted by interface_meta_score
  descending, instead of only showing the best chain pair. For a 15-
  interface multimer that means 15 slider pages numbered 2.1 .. 2.15,
  preceded by the cohort overview table. Dimers with a single interface
  still get exactly one quality page numbered "1" (no sub-index) so
  their reports stay compact. Each page's pre-header now shows the
  model name + chain pair + residue count for that specific interface,
  not just the best one.

Cover info box:
  Replace "frozen benchmark deciles" with the more accurate "frozen
  benchmark distribution"; expand the explanation onto a fourth line so
  the pink box no longer crops the closing words.
The cohort cover and per-page summaries now treat each chain-pair
interface as the unit of analysis, not the predicted complex. Concretely
generate_aggregate_report no longer groups rows by complex and keeps a
"best per complex" entry; it ranks every scorable interface row
directly. Consequences:

  - The histogram, min/median/mean/max, ≥0.5 and ≥0.7 counters all run
    over interfaces. A 9-chain multimer with 15 detected interfaces
    contributes 15 data points; a dimer contributes 1. Mixing the two
    in one cohort no longer under-represents multimers.
  - The "Top N" table is now "Top N interfaces by meta score" and uses
    a "complex · pair" label so multimer rows show which chain pair
    they refer to.
  - Each per-page slider panel is now an interface page, not a complex
    page (renamed _complex_summary_page -> _interface_summary_page).
    The header subtitle leads with the interface label and shows
    "Rank K of N" against the global interface ranking.
  - Backend counts in the cover meta block stay per-complex so they are
    not double-counted across a multimer's interfaces.

Cover sub-title says "N interfaces across M complexes" so users see
both axes at a glance. test/test_report.py is updated to match the new
page count (cover + one page per scorable interface row).
Two pieces of repeated text added unnecessary clutter to every page:

  - The "Percentile ranks are computed against the AlphaJudge
    benchmark distribution; higher is better for every metric
    (sign-flipped where needed)." note at the bottom of every quality
    page. The slider's "Worse <- -> Better" axis already conveys this.

  - The "AlphaJudge" left-side label and the "Continued on next page..."
    middle text in the page footer. The thin footer rule plus the page
    number on the right are enough.

After this change each page has a header rule (Page N / title / entry
id) and a footer with just the rule and "N / total" page counter.
The page header carried "Page N" on the left, "Title" in the middle,
and "entry id" on the right. The footer already shows "N / total" at
the bottom right, so the header repeat was redundant.

Header now reads "Title  ...  entry id" only. Module docstring updated.
Adds the new report flags to the CLI synopsis, the option list, the
output description, and an example invocation; documents the
alphajudge-report console entry that exposes the per-run and aggregate
modes outside the scoring pipeline.
After rebasing report_percentiles onto the Boltz-2 work, note explicitly
that --report and --aggregate_report flow through the same scoring path
for AF2, AF3, and Boltz-2 runs, and broaden the cohort-report example
to cover a mixed-backend root.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: edd466c437

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/alphajudge/cli.py Outdated
if args.aggregate_report:
if not args.summary:
p.error("--aggregate_report requires --summary")
generate_aggregate_report(args.summary, out_pdf=args.aggregate_report)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Do not build aggregate reports from stale summaries

When --aggregate_report is requested, process_many can return None without writing the requested summary (for example, no runnable directories or all processed runs produce no rows), but this call still reads args.summary afterward. If that path already contains a CSV from a previous run, the command will silently produce an aggregate PDF for stale data instead of failing, which can misreport the current cohort. Capture the returned summary path and only generate the report when this invocation actually produced/reused rows for the requested summary.

Useful? React with 👍 / 👎.

DimaMolod added 5 commits May 28, 2026 09:30
process_many returns None whenever it does not actually write the
summary (no paths, no runnable dirs, all workers produced no rows).
The CLI previously ignored that return value, so if --aggregate_report
pointed at a path that already had a CSV from a previous invocation,
generate_aggregate_report would silently consume that stale file and
emit a PDF for the wrong cohort.

Capture the returned summary path and fail loudly via p.error when it
is None, instead of building a report from data this invocation did
not produce.
The previous green RdYlGn gradient with boxed % and Value columns made
the page look like a dashboard. Switch to the wwPDB look:

  - Red -> pale center -> blue percentile bars (LinearSegmentedColormap),
    deliberately thin (bar_height ~0.011 of page).
  - Drop the explicit % column; percentile is conveyed by marker
    position. Three columns only: Metric | Percentile Ranks | Value.
  - Single small black marker per row, plus a thin black polyline
    connecting valid percentiles down the chart, exactly as on the
    wwPDB "Overall quality at a glance" page.
  - "Worse" / "Better" italic labels directly under the bars and a
    marker-glyph legend, in the wwPDB position.

Page chrome:

  - Cover has no running header. It leads with a small text PDB
    wordmark, the report title, and a blue circled-i icon, matching
    the RCSB cover.
  - Pages 2+ get a single-rule running header: "Page N | Title | entry".
  - Footer is just a small wordmark, no page-number rule (the running
    header already carries the page label).
  - Pink info box is square-cornered and uses wwPDB red.
  - Section headings render number + title at 17pt with a tight gap;
    the optional info icon is now off by default so it never overlaps
    chain-pair labels like "Interface B_C".

Typography:

  - rcParams switched to a Computer-Modern-style serif stack
    (CMU Serif / Latin Modern / STIXGeneral / DejaVu Serif fallback),
    mathtext "cm", base font 10pt, PDF font type 42 so text stays
    searchable in Acrobat.

The slider primitive now exposes a draw_marker switch so the per-row
bar is clean and the marker polyline is overlaid in a single axes,
which avoids axis clipping for off-bar marker rectangles.
Branding (no external logos):
  - Remove the generated PDB-style wordmark and the vector chain-pair
    logo. The cover page no longer carries any third-party mark; only
    the report title remains. The footer carries a plain "AlphaJudge
    report" text mark.
  - Title is now "AlphaJudge Interface validation Report".

PAE page + standalone PAE PNG (shared rendering):
  - report.py exposes render_pae_png(out_path, pae, ...) which produces
    an AlphaFold-DB-like standalone PNG: green square heatmap, horizontal
    "Expected position error (Ångströms)" colour bar, Scored / Aligned
    residue axes, black inter-chain separator lines.
  - runner._save_pae_heatmap now delegates to render_pae_png so the
    pae_<model>.png files written during scoring match the in-report
    PAE page exactly. No if/else fallback path: the report just embeds
    the standalone PNG.

Slider panel:
  - Show the overall meta score as a separate "Meta score" row at the
    top, visually offset from the per-feature rows by an inter-group
    gap, in the same typography as the other rows (PDB-validation-style
    uniform treatment). The marker polyline never crosses Meta score.
  - Split the polyline into two groups that connect related metrics:
    the AlphaFold-derived confidence features (LIS, ipSAE, pDockQ2,
    ipTM, confidence, avg interface PAE, pDockQ/mpDockQ) and the
    biophysical features (shape complementarity, interface area,
    solvation energy). Each group has its own connecting line.

The cover page also drops the older "Overall meta score" label and
restores the canonical "AlphaJudge Interface validation Report" title.
Inspecting the canonical benchmark_26 table (n=7,756 balanced) shows
that interface_area and interface_solv_en are near-redundant
(Pearson rho = -0.80), so the previous biophysical trio
(sc / area / solv_en) was effectively (sc / size / size). interface_sc
is the only biophysical feature that is genuinely orthogonal to the
rest, and its AUROC (0.746) is the highest among biophysicals.

Among the size-cluster features, interface_hb (0.703) is the most
interpretable (count of polar contacts), is slightly less correlated
with solv_en than area is, and adds a different physical concept
(directional polar interactions) on top of geometry and hydrophobic
burial. interface_sb (0.681), interface_ss (chance, disulfides too
sparse) and interface_contact_pairs (0.694) are all weaker than
interface_hb and more redundant with area.

Changes:
  - META_SCORE_FEATURES swaps interface_area for interface_hb.
  - FEATURE_DIRECTIONS keeps interface_area for backward compat but
    adds interface_hb (direction +1).
  - BENCHMARK_QUANTILES gains an interface_hb entry computed from the
    final_sync_20260523 benchmark (deciles 0, 2, 4, 6, 8, 10, 12, 15,
    20, 28, 129).
  - report.py's _BIOPHYSICAL_FEATURES shows
    (sc, hb, solv_en) and the display label "Hydrogen bonds".
  - Drop the small "AlphaJudge report" wordmark from the per-page
    footer; the running header at the top already identifies the
    report on every page.

Existing interfaces.csv files have a stale interface_meta_score
(computed with area instead of hb) until re-scored with
--force_recompute. Tests updated to confirm the new metascore
feature set behaves correctly.
Re-computes the 11-anchor decile table for every metascore feature
on the final synchronized benchmark_26 best-interface CSV
(benchmark_best.final_sync_20260523_225722_force_recompute_nointerfacefix.csv,
n=7,756). The previous values were derived from an earlier n=7,345
April 22 snapshot before the pair-matched predictions were back-filled.

Notable shifts (after sign-flip where applicable):
  - interface_LIS:        p50 0.060 -> 0.041, p90 0.516 -> 0.510 (slightly tighter)
  - interface_pDockQ2:    floor lowered from 7.5e-3 -> 0 (more dynamic range)
  - interface_sc:         p10 -0.210 -> -0.091 (the new dataset has fewer
                          extreme low-Sc cases at the tail)
  - interface_area:       p100 23,847 -> 19,027 (one extreme outlier
                          excluded by the back-fill)
  - interface_solv_en:    p100 400.7 -> 233.0
  - confidence_score:     p0 0.127 -> -99.73 (sentinel for failed
                          predictions now contributes to the tail)
  - interface_hb:         unchanged (computed on this same table already)

Header comment in BENCHMARK_QUANTILES updated to point at the new
source. interface_meta_score values in existing interfaces.csv files
will move by a few percentage points after re-scoring with
--force_recompute; the 8hhy demo report was re-generated.
@DimaMolod DimaMolod merged commit 97c51ba into main May 28, 2026
12 checks passed
@DimaMolod DimaMolod deleted the report_percentiles branch May 28, 2026 11:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant