Record: PR #1855 base + activation-aware GPTQ mixed precision

romeerp · 2026-04-28T19:49:20Z

Matched-step 3-seed mean val_bpb: 1.06081076 (std 0.00089) | ~15.99 MB | 8×H100 SXM | full TTT eval

This submission keeps the PR #1855 training recipe unchanged and only changes quantization. The quantization change is an activation-aware mixed-precision GPTQ path:

collect per-input-channel activation RMS during the existing GPTQ calibration pass
score candidate column groups with an AWQ-style heuristic
- weight_score = mean(abs(w), dim=0)
- saliency = act_rms * weight_score
- group_score = saliency[start:end].sum()
select one salient 64-column group
quantize that group at int8 inside the same full-tensor GPTQ solve
keep stock PR Record: SP8192 + LQER + Sparse Attn Gate + BOS-Fixed SmearGate + 9-Hparam Greedy Stack — val_bpb 1.06108 (3-seed mean) #1855 LQER on top of the resulting AWQ-aware GPTQ base

The GPUs I had access to showed consistently worse performance than PR#1855, so to demonstrate the benefit of this quantization technique, I step-matched the 3 seeds used in PR#1855 using the same training code.

Results

Step-matched comparisons against PR #1855

Seed	Stop step	Prequant BPB (PR1855)	Prequant BPB (AWQ)	Quantized BPB (PR1855)	Quantized BPB (AWQ)	Post-TTT BPB (PR1855)	Post-TTT BPB (AWQ)	Artifact bytes (PR1855)	Artifact bytes (AWQ)
42	4945	1.06395844	1.06384082	1.07254371	1.07225564	1.05989454	1.05957221	15,897,259	15,985,824
0	4932	1.06544819	1.06555331	1.07406724	1.07403531	1.06124613	1.06127329	15,900,947	15,983,935
1234	4917	1.06596989	1.06574247	1.07477929	1.07427091	1.06208695	1.06158679	15,907,550	15,996,559
Mean	4931	1.06512551	1.06504553	1.07379675	1.07352062	1.06107587	1.06081076	15,901,918	15,988,772

Quantization-tax view

PR Record: SP8192 + LQER + Sparse Attn Gate + BOS-Fixed SmearGate + 9-Hparam Greedy Stack — val_bpb 1.06108 (3-seed mean) #1855 mean quantization tax:
- 1.07379675 - 1.06512551 = 0.00867124
AWQ mean quantization tax:
- 1.07352062 - 1.06504553 = 0.00847509

So the activation-aware GPTQ recipe recovers about 0.00019615 BPB of mean quantization tax on the matched-step 3-seed suite, while staying under the 16 MB cap on every seed.

At final post-TTT, the matched-step means are:

PR Record: SP8192 + LQER + Sparse Attn Gate + BOS-Fixed SmearGate + 9-Hparam Greedy Stack — val_bpb 1.06108 (3-seed mean) #1855: 1.06107587
activation-aware GPTQ: 1.06081076

for a mean reduction of 0.00026511 BPB.

What changed

Compared to the PR #1855 base stack, the functional change is in train_gpt.py:

add activation-stat collection during the existing GPTQ calibration pass
add exact mixed-bit GPTQ support for a selected group inside the same Hessian-based solve
keep stock LQER behavior on top of the AWQ-aware quantized base
add FORCE_STOP_STEP to support step-matched evaluation

No training hyperparameters were changed for these runs. The base model recipe is the PR #1855 seed-matched recipe.

Reproducing

This record folder assumes the same CaseOps sp8192 dataset/tokenizer used by PR #1855, sourced from Hugging Face:

dataset repo: romeerp/parameter-golf-caseops-v1
variant: sp8192_lossless_caps_caseops_v1_reserved

The three runs in this folder use:

seed 42, FORCE_STOP_STEP=4945
seed 0, FORCE_STOP_STEP=4932
seed 1234, FORCE_STOP_STEP=4917

The quantization knobs are:

AWQ_LITE_ENABLED=1
AWQ_LITE_BITS=8
AWQ_LITE_GROUP_TOP_K=1
AWQ_LITE_GROUP_SIZE=64
stock PR Record: SP8192 + LQER + Sparse Attn Gate + BOS-Fixed SmearGate + 9-Hparam Greedy Stack — val_bpb 1.06108 (3-seed mean) #1855 LQER settings:
- LQER_ENABLED=1
- LQER_ASYM_ENABLED=1
- LQER_RANK=4
- LQER_FACTOR_BITS=4
- LQER_ASYM_GROUP=64
- LQER_TOP_K=3

Included files

train_gpt.py — modified training/quantization script
README.md — this writeup
submission.json — structured metadata
requirements.txt — Python dependencies reference
train_seed42.log, train_seed0.log, train_seed1234.log — full matched-step run logs

romeerp · 2026-04-28T20:04:41Z

I personally don't want to have to re-run these three seeds, so this should be open for anyone who wants to claim a new record if they can re-run it under the 600s wallclock on a better GPU setup that matches PR#1855s throughput.

msisovic · 2026-04-28T20:54:36Z

Interestingly, the GPUs I've been renting today and yesterday have been consistently slower as well... IDK what could be going on

Four post-training specs to stack on 060A's openai#1855 port: - 060I: port PR openai#1908's activation-aware mixed-bit GPTQ (3-seed validated −0.000265 BPB on openai#1855 itself). 4 env vars + ~100 LOC port. - 060J: PHASED_TTT_NUM_PHASES 3→4 (low confidence; openai#1727 measured noise on weaker base, never tested with 2500 prefix). - 060L: PHASED_TTT_PREFIX_DOCS 2500→3000 (high confidence; codemath3000 greedy-validated 2000→2500 on this exact stack in openai#1855). - 060M: TTT_EPOCHS 3→4 (highest predicted Δ; PR openai#1812 reported −0.008 on weaker base; never tested on phased+SmearGate stack like openai#1855). All eval-only via RESUME_FROM_CKPT on 060A's seed_42_4h pt. No code change for 060J/L/M. 060K (rank-up) deleted — rowed against openai#1855's own greedy direction (which decreased rank 96→80). Idea files: research/ideas/{1908-awq-lite-mixed-bit-gptq,ttt-budget-reinvestment}.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Track B's PR openai#1493 base maxed at ~1.066 — mid-pack now that PR openai#1855 (1.06108) and PR openai#1908 (1.06081) landed. Pivot to PR openai#1908 train_gpt.py as the base and exploit knobs PR openai#1908 left at conservative defaults: - AWQ_LITE_GROUP_TOP_K=1 (only 1 protected group at int8) - LQER_TOP_K=3 (only 3 LQER-corrected tensors) - LQER_GAIN_SELECT=0 (uses error-norm, not actual gain) The QUANTIZE_ONLY=1 flag in train_gpt_pr1908.py lets us train a base once per seed and sweep many quant configs at ~$0.10 each. Pipeline (5 stages, all on 8xH100): scripts/top1_bootstrap.sh — apt+pip+lrzip+caseops data scripts/top1_repro_pr1908.sh — seed-42 repro to validate setup (~$4) scripts/top1_quant_sweep.sh — 9-config knob sweep on saved base (~$8) scripts/top1_final_3seed.sh — 3-seed final with winning knobs (~$12) scripts/top1_pack_submission.sh — bundle record dir + submission.json Final 3-seed uses organic 600s wallclock cap (no FORCE_STOP_STEP) for compliance safety. PR openai#1908's 4945-step run used 601153ms of the 600000ms cap — risk we will not take. Also adds scripts/jupyter_exec.py (HTTPS-proxy executor for SSH-firewalled networks) and PR1908 reference README. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Single script chains the full pipeline. Picks sweep winner by lowest post-TTT BPB with bytes < 16,000,000. Always runs the 3-seed final because PR openai#1908 admits 600s overshoot — a compliant 3-seed at their quality could take openai#1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

aquariouseworkman · 2026-04-29T04:23:38Z

I personally don't want to have to re-run these three seeds, so this should be open for anyone who wants to claim a new record if they can re-run it under the 600s wallclock on a better GPU setup that matches PR#1855s throughput.

If it's rerun with no change byte by byte, then it's still your score.

…d mean) Applies activation-aware mixed-precision GPTQ (from PR openai#1908 / romeerp) on top of codemath3000 PR openai#1855 stack. ## Results | Seed | val_bpb (post-TTT) | artifact bytes | steps | eval time | |------|--------------------|----------------|-------|-----------| | 42 | 1.06118 | 15,978,503 | 4989 | 392.8s | | 314 | 1.06005 | 15,976,469 | 4986 | 395.8s | | 1234 | 1.06135 | 15,976,673 | 4977 | 395.5s | | **mean** | **1.06086** | — | — | — | 3-seed std: 0.00069. Beats codemath3000 PR openai#1855 (1.06108) by 0.00022 BPB. ## Technique Training is identical to PR openai#1855. The only change is post-training quantization: **AWQ-lite (activation-aware GPTQ):** 1. Collect per-input-channel activation RMS during GPTQ calibration 2. Score column groups: `saliency = act_rms * mean(abs(weight))` 3. Select top-1 most salient 64-column group per matrix 4. Quantize that group at int8 inside the same full-tensor GPTQ solve (rest stays int6) Env vars: `AWQ_LITE_ENABLED=1 AWQ_LITE_BITS=8 AWQ_LITE_GROUP_TOP_K=1 AWQ_LITE_GROUP_SIZE=64` ## Setup 1. `pip install -r requirements.txt` 2. `apt-get install -y lrzip` 3. Install FA3: `pip install --no-deps flash_attn_3 --find-links https://windreamer.github.io/flash-attention3-wheels/cu128_torch291/` 4. Run `prepare_caseops_data.py` to build the dataset 5. `AWQ_LITE_ENABLED=1 AWQ_LITE_BITS=8 AWQ_LITE_GROUP_TOP_K=1 AWQ_LITE_GROUP_SIZE=64 torchrun --standalone --nproc_per_node=8 train_gpt.py` ## Environment - 8xH100 80GB SXM (RunPod) - PyTorch 2.9.1+cu128 - FlashAttention 3.0.0 - Triton 3.5.1

aquariouseworkman · 2026-04-29T07:01:08Z

The run went over 600 seconds of wallclock. (~1.2 seconds over). Possibly invalid run.

- spec 060N: compound AWQ-lite (PR openai#1908) + 4 TTT phases + 3000 prefix + 2 global-SGD epochs, eval-only on 060A's final_model.pt. Single-shot compound to use openai#1918's ~205s eval-time slack; safe fallback drops GLOBAL_TTT_EPOCHS if wallclock blows. - new idea 1925-matrix-lr-ttt-prefix-tune (PR openai#1925, hyperparam-only on openai#1855: MATRIX_LR=0.028 + PHASED_TTT_PREFIX_DOCS=3500 → 1.06109). - new idea 1915-per-doc-lora-ttt (PR openai#1915, per-doc-only LoRA TTT discipline; parked as fallback if global-SGD class is ruled out). - frontier scan: 21 new PRs (openai#1906-openai#1931). Headline: PRs openai#1908+openai#1918 independently confirm AWQ-lite mixed-bit GPTQ pattern at ~1.0608 on openai#1855 base; openai#1925 hyperparam-only at 1.06109; openai#1923 Asymmetric Logit Rescale = empirical negative; openai#1929 banned SLOT+prequant-TTT. - frontier-state.json: 21 PRs added; total 200. - diary/2026-04-29-frontier-scan.md: full scan report. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ams) After 4 parallel research agents reviewed 30+ open PRs and compliance issues, two new findings: 1. PR openai#1923 (AsymLogit) flagged "empirical negative" by sunnypatneedi 4-29 frontier-scan, BUT only on PR openai#1855 base with default WD=1.0. Never tested on PR openai#1908 + WD=2.0 combo. V19's specific stack is NOT directly invalidated. 2. PR openai#1925 simon-marcus 1.06049 (3-seed verified, vs PR openai#1855 base 1.06108 = -0.00059 BPB). Just 2 hparam env vars: MATRIX_LR 0.026 -> 0.028 PHASED_TTT_PREFIX_DOCS 2500 -> 3500 Orthogonal axis to AsymLogit (LR/TTT prefix vs logit head). Adds two new scout scripts: - run_v19c_stacked_scout.sh: PR openai#1908 + AsymLogit + simon-marcus + WD=2.0 (full stack, recommended first scout) - run_v19b_simonmarcus_scout.sh: PR openai#1908 + simon-marcus + WD=2.0 (ablation if V19c wins partially) Decision rule (CaseOps val baseline 0.97651, community floor 0.0006): V19c < 0.97591 -> CLEAR WIN, run 3-seed V19c 0.97591-0.9755 -> borderline, ablate via V19a/V19b V19c > 0.9755 -> abandon stack, try Lead B (PR openai#1884) Other research findings: - PR openai#1898 SpinQuant flagged regression vs parent openai#1851 (skip) - PR openai#1929 SLOT banned per openai#1722 precedent - PR openai#1911 pre-quant TTT chain banned per openai#1735 precedent - cocohearts 4-28 PR openai#1902 confirmed PR openai#1855 as official openai#1 - regina-openai + Alex Zhao 48h zero activity - CaseOps de-facto legal (PR openai#1855 merged into chain)

AWQ-lite from PR openai#1908 ported onto exp/060N-awq-ttt-compound: +167/-14 LOC in train_gpt.py, syntax-checked, default-off when AWQ_LITE_ENABLED=0 (byte-identical to baseline). Spec now frozen. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… V19 scouts Root cause discovered by inspecting train_gpt.py line 480: self.val_bytes = None if self.caseops_enabled: # <- key gate self.val_bytes = load_validation_byte_sidecar(...) When CASEOPS_ENABLED=0 (default), the code falls back to SentencePiece LUT byte counting which gives ~3.44 bytes/token effective. With CASEOPS_ENABLED=1 the code uses the byte sidecar (fineweb_val_bytes_*.bin) which gives 3.157 bytes/token matching PR openai#1908's reported 1.06081. Verified PR openai#1908 actual training log shows: caseops_enabled: True val_bytes_files: .../fineweb_val_bytes_*.bin So PR openai#1908's reported 1.06081 = 8xH100 SXM eval with byte sidecar enabled. Our V18 baseline 0.97651 was on the WRONG byte counting (no sidecar). Fix: - All scouts now set CASEOPS_ENABLED=1 + explicit DATA_PATH and TOKENIZER_PATH pointing to the CaseOps-tokenized variant. - Decision thresholds updated to 1.06 range to match PR openai#1908 reported. - Win threshold = PR openai#1908 reported (1.06081) - 0.0006 community floor = 1.06021. New script: run_baseline_verify.sh - Runs PR openai#1908 unchanged (no V19 changes) with CASEOPS_ENABLED=1 + FORCE_STOP_STEP=4945 to verify our setup reproduces seed 42's reported 1.05957. If this gives ~1.0596, our pipeline matches PR openai#1908. Updated decision rule on all scouts: V19c < 1.06021 -> CLEAR WIN (>floor), 3-seed V19c 1.06021-1.0608 -> borderline, ablate V19c > 1.0608 -> regression, fallback Lead B

V19c (seed 42) result: 1.06179 BPB (LOSS by +0.001 vs PR openai#1908 frontier 1.06081). V19c data attribution: pre-quant 1.06906 vs PR openai#1908 1.06384 = +0.0052 hurt -> primary cause: MATRIX_LR=0.028 (vs default 0.026) penalty on seed 42 TTT recovery -0.01489 vs PR openai#1908 -0.01269 = +0.0022 helped -> AsymLogit + PHASED_TTT_PREFIX=3500 actually working V20 strategy: remove LR penalty + keep TTT helpers + add LORA capacity: - DROP MATRIX_LR=0.028 -> default 0.026 (recovers +0.005 BPB on pre-quant) - KEEP ASYM_LOGIT_RESCALE=1 (eval-only, verified -0.001 to -0.002) - KEEP TTT_WEIGHT_DECAY=2.0 (stability fix) - KEEP PHASED_TTT_PREFIX_DOCS=3500 (verified more LoRA training data) - ADD TTT_LORA_RANK=144 (vs 96 default, +50% LoRA capacity) PR openai#1909 GodlyDonuts verified rank=192 gives small benefit on PR openai#1874 Conservative 144 to balance benefit vs eval-time budget (V19c was 527s, 73s buffer) Predicted (seed 42): pre-quant: ~1.063 (no train hparam changes from PR openai#1908) quantized: ~1.072 (matches PR openai#1908 quant tax) post-TTT: ~1.057 (TTT recovery -0.013 base + -0.002 AsymLogit/PHASED + -0.001 RANK = -0.016) Win threshold: < 1.06021 (PR openai#1908 - 0.0006 community floor) Probability of true win: ~50% Cost: ~$22 single-seed scout on 8xH100 SXM

V19c/V20 ran with FUNDAMENTALLY WRONG base config: - smear_gate_enabled: False (PR openai#1855 needs True) - sparse_attn_gate_enabled: False (PR openai#1855 needs True) - num_phases: 1 (PR openai#1855 needs 3) - compressor: brotli (PR openai#1855 needs pergroup with lrzip) - embed_bits: 8 (PR openai#1855 needs 7) - 11+ other hparams default-not-PR1855 Hence V19c/V20 artifacts hit 16.93 MB (over 16 MB cap, INVALID submission) and TTT recovery was 1-phase only, severely handicapped. V21 = exact PR openai#1855 README reproduction command env vars + AWQ-lite (PR openai#1908) + ASYM_LOGIT_RESCALE=1 (V19 innovation, V19c proved -0.001/-0.002 BPB benefit). Source: PR openai#1855 README lines 125-145 (codemath3000 official reproduction). Predicted (seed 42): pre-quant: ~1.064 (matches PR openai#1908 1.06384) quantized: ~1.072 (matches PR openai#1908 1.07226) artifact: ~15.99 MB (lrzip pergroup compression + EMBED_BITS=7) post-TTT: ~1.057 (PR openai#1908 1.05957 - 0.002 from AsymLogit) Win threshold: < 1.06021 Probability: 50-60% real frontier break Pre-req: apt-get install lrzip on RunPod pod (handled in setup script)

V21 single-seed (seed 42, FSS=4945): val_bpb 1.05829, wallclock 602.458s. Reduce FSS to 4920 (-25 steps) to ensure all 3 seeds finish under 600s. Cost: ~+0.0005 BPB per seed, predicted 3-seed mean ~1.0588 (still breaks PR openai#1908 frontier 1.06081 by 0.0019 BPB).

Seed 42 already completed at FSS=4920 GPTQ_RESERVE=0.5 -> 602s borderline, val_bpb 1.05834. Fix: GPTQ_RESERVE_SECONDS=4.0 reserves 4s of wallclock for GPTQ Hessian collection, leaving 596s for training. Last step overshoot ~2s -> total ~598s, strict under 600s cap. Predicted seed 0 + seed 1234 final BPB: ~1.0585-1.0590 (slightly higher than seed 42's 1.05834 due to ~5 fewer training steps) Predicted 3-seed mean: ~1.0585 (still breaks PR openai#1908 frontier 1.06081 by ~0.0023 BPB, well above community 0.0006 floor)

@alertcat

…1908 frontier V21 = PR openai#1855 base (cocohearts-merged openai#1) + PR openai#1908 AWQ-lite quantization + PR openai#1923 Asymmetric Logit Rescale. 3-seed results: seed 42: val_bpb 1.058336 (FSS=4920, wallclock 602.048s borderline*) seed 0: val_bpb 1.059394 (no FSS, wallclock 596.057s strict <600s) seed 1234: val_bpb 1.060243 (no FSS, wallclock 596.045s strict <600s) MEAN: 1.059324 STD: 0.000780 * seed 42 borderline matches PR openai#1908 seed 42 (601.153s, accepted by cocohearts) Seeds 0 + 1234 use GPTQ_RESERVE_SECONDS=4.0 to ensure strict <600s wallclock. Comparisons: vs PR openai#1908 frontier (1.06081): -0.00149 BPB ✅ WIN vs PR openai#1855 official openai#1 (1.06108): -0.00176 BPB ✅ vs win threshold (1.06021): -0.00089 BPB ✅ passes community floor vs MERGED SOTA bigbag (1.0810): -0.02168 BPB 🏆 vs record threshold (1.0738): -0.01448 BPB (breaks record by 2.0x margin) Welch one-sided t-test V21 vs PR openai#1908 (n=3 each, std 0.00078 vs 0.00089): t ≈ 2.18, p ≈ 0.045 — well below cocohearts-applied p<0.25 chain threshold Stack: - PR openai#1855 (codemath3000): 11L XSA + LQER + SparseAttnGate + BOS-fixed SmearGate + Polar-Express NS + Phased TTT 3-phase + lrzip pergroup - PR openai#1908 (romeerp): AWQ-lite mixed-precision GPTQ (1 group of 64 cols int8) - PR openai#1923 (jorge-asenjo): Asymmetric Logit Rescale (V21 INNOVATION on this stack) Code changes vs PR openai#1908: 5 surgical edits to train_gpt.py (+26 lines, eval-only). Train numerics bit-identical to PR openai#1908. Asymmetric softcap adds 8 bytes (2 fp16 passthrough scalars) to artifact. Compliance Issue openai#1017 Track A all 4 conditions verified: - Causality (VarLen + per-doc cu_seqlens) - Normalized softmax (full SP8192 vocab) - Score-before-update (Phased TTT 3-phase, gd:0 then gd:1) - Single pass (each val token scored exactly once) No SLOT, no pre-quant TTT, no n-gram cache, no ETLB. V21's empirical falsification of sunnypatneedi 2026-04-29 frontier-scan flag: PR openai#1923 standalone is -0.00469 BPB negative on PR openai#1855 base (1.06577 vs 1.06108) but +0.00128 BPB POSITIVE consistently across 3 seeds when stacked on PR openai#1908 quantization. Mechanism: per-doc LoRA in 3-phase TTT learns asymmetric logit distributions that the symmetric softcap cannot capture. Files included: - V21_README.md: full strategy + results + reproduction - submission.json: structured 3-seed metadata + comparison + attribution - train_seed42.log + train_seed0.log + train_seed1234.log: full per-seed logs - train_gpt.py: PR openai#1908 base + 5 V21 edits (already in branch) Hardware: 8xH100 80GB SXM (RunPod, AP-IN-1) Pytorch: 2.9.1+cu128 System dep: lrzip (apt-get install lrzip) Authors: V21 integration: @alertcat PR openai#1908 base: @romeerp PR openai#1855 stack: @codemath3000 PR openai#1923 axis: @jorge-asenjo

@aquariouseworkman

@aquariouseworkman + @romeerp pointed out seed 42's 602.048s wallclock makes the 3-seed test functionally a 2-seed (with invalid 3rd). @romeerp confirmed his own PR openai#1908 step-matched runs were for ablation, not record submission. This rerun uses GPTQ_RESERVE_SECONDS=4.0 and no FORCE_STOP_STEP, identical to V21 seeds 0 and 1234 (which both finished strict <600s).

…review Seed 42 v1: FORCE_STOP_STEP=4920 + GPTQ_RESERVE=0.5 -> wallclock 602.048s (borderline) Seed 42 v2: GPTQ_RESERVE=4.0, no FORCE_STOP_STEP -> wallclock 596.102s (strict <600s) v2 results: seed 42: val_bpb 1.058675 (was 1.058336 in v1, +0.000339 due to 12 fewer steps) seed 0: val_bpb 1.059394 (unchanged) seed 1234: val_bpb 1.060243 (unchanged) MEAN: 1.059434 (was 1.059324 in v1, +0.000110) STD: 0.000642 (was 0.000780 in v1, TIGHTER) All 3 seeds now strict <600s wallclock (596.045-596.102s). All 3 seeds use IDENTICAL config (GPTQ_RESERVE=4.0, no FSS). Comparisons: vs PR openai#1908 frontier (1.06081): -0.00138 (Welch t=2.18, p=0.045) vs PR openai#1855 official openai#1 (1.06108): -0.00165 vs PR openai#1934 liujshi (1.05993): -0.00050 (Welch t=0.85, p=0.22, edge of p<0.25) vs win threshold (1.06021): -0.00078 vs MERGED SOTA bigbag (1.0810): -0.02157 Compliance: all 3 seeds train+eval strict <600s, artifact <16MB, 3-phase TTT score-first, lossless CaseOps tokenizer, lrzip pergroup. Files updated: - V21_README.md: revised results table + revisions note - submission.json: v2 numbers + revisions field - train_seed42.log: replaced with strict <600s redo log

Add step-matched activation-aware GPTQ record folder

291d3ab

romeerp changed the title ~~Record candidate: PR #1855 base + activation-aware GPTQ mixed precision (step-matched)~~ Record: PR #1855 base + activation-aware GPTQ mixed precision - val_bpb 1.06081 (3-seed mean) Apr 28, 2026

cocohearts mentioned this pull request Apr 28, 2026

Update Parameter Golf leaderboard with BOS fix #1902

Merged

aquariouseworkman mentioned this pull request Apr 29, 2026

RECORD: 1855 base + AWQ-lite mixed-precision GPTQ — val_bpb 1.06086 (3-seed mean) #1918

Closed

dexhunter mentioned this pull request Apr 29, 2026

Record: PR #1855 base + Smear + LQER + LogitCalib + Phased TTT — val_bpb 1.06080 (3-seed) #1924

Closed

alertcat mentioned this pull request Apr 29, 2026

Record: PR #1908 base + AWQ-lite + Asymmetric Logit Rescale - val_bpb 1.05943 (3-seed mean, all strict <600s) #1945

Open

12 tasks

aquariouseworkman mentioned this pull request Apr 29, 2026

Record: base + AWQ-lite mixed-precision GPTQ — val_bpb 1.06086 (3-seed mean) #1946

Open

andrewbaggio1 mentioned this pull request Apr 30, 2026

Record: PR #1945 base + 2560 long-context + no_qv TTT mask + TTT LR 0.75 + QK_GAIN 5.25 — val_bpb 1.05855 (3-seed mean) #1953

Open

10 tasks

AayushBaniya2006 mentioned this pull request Apr 30, 2026

Record: PR #1908 reproduction with compliant 600s wallclock — val_bpb 1.06044 (3-seed mean) #1956

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: PR #1855 base + activation-aware GPTQ mixed precision - val_bpb 1.06081 (3-seed mean)#1908