Update Parameter Golf leaderboard#1900
Conversation
Co-authored-by: Codex <noreply@openai.com>
|
Thanks for the leaderboard update, @cocohearts. A couple of FYIs for context: #1797 → #1855. Just a heads-up: while the base #1797 PR has the validity concern you raised in your audit comment, the downstream PR #1855 — which is built on the #1797 stack — is itself valid, because that specific concern is fixed there. #1855 applies the #1530. For reference, this PR has its own structural concern open in its thread — the TTT compile warmup runs |
|
@cocohearts Hi, I'd like to note that #1518 changed the score since it's opening and thus messed up the timeline a bit. My PR #1529 was at the time of it's opening, better than PR 1518, which can be checked in the commit history of PR 1518, and thus I think should be included in the leaderboard as well. It did have some legality tweaks after opening, but no structural changes that improved the model since. Additionally, #1530 you mention here for inclusion mentions be as the SOTA at the time, just as additional proof. |
@cocohearts, the same is true for my subsequent PR #1584: when #1584 was first opened, its score was ahead of #1518's score at #1518's opening, and that ordering hasn't been disturbed by structural improvements since. By score-at-opening, #1584 also came in ahead of #1518. It's also worth noting that #1584 is valid irrespective of statistical significance, per the official README rule:
#1584 is a systems-optimization submission (no ML changes), so the statistical-significance bar doesn't apply to its inclusion. |
|
@cocohearts Thank you so much for taking a look; I know how busy you likely are and really appreciate you taking the time to review these PRs. |
This updates README leaderboard rows only.
Adds the p<0.25 accepted chain above #1493:
Intentionally does not add #1787, #1797, or #1801 because they remain validity/provenance blocked.