Skip to content

Lowram: Share buffers with non-overlapping lifetimes in keygen#1011

Merged
mkannwischer merged 1 commit intomainfrom
keygen-buffer-sharing
Apr 17, 2026
Merged

Lowram: Share buffers with non-overlapping lifetimes in keygen#1011
mkannwischer merged 1 commit intomainfrom
keygen-buffer-sharing

Conversation

@mkannwischer
Copy link
Copy Markdown
Contributor

Reuse t0 as the accumulator in mld_compute_t0_t1_tr_from_sk_components, and have the caller provide s1 already in NTT form, removing two allocations (s1hat and t) from the helper.

In mld_sign_keypair_internal, share the s1 and t1 buffers via a union since s1hat is consumed before t1 is produced. Pack s1 into the secret key before the in-place NTT so the original coefficients are preserved.

Split mld_pack_sk into mld_pack_sk_s1 and mld_pack_sk_rho_key_tr_s2_t0 to support packing s1 independently before the NTT.

@mkannwischer mkannwischer force-pushed the keygen-buffer-sharing branch from 0b96d02 to 0db7b80 Compare April 2, 2026 04:00
@oqs-bot
Copy link
Copy Markdown
Contributor

oqs-bot commented Apr 2, 2026

CBMC Results (ML-DSA-44)

⚠️ Attention Required

Proof Status Current Previous Change
sign_keypair_internal ⚠️ 22s 4s +450%
Full Results (186 proofs)
Proof Status Current Previous Change
**TOTAL** 1854s 1731s +7.1%
polyvecl_pointwise_acc_montgomery_c 182s 154s +18%
sign_verify_internal 166s 150s +11%
poly_pointwise_montgomery_c 163s 143s +14%
rej_uniform_native 145s 133s +9%
mld_ct_memcmp 83s 72s +15%
mld_invntt_layer 67s 62s +8%
mld_ntt_layer 56s 51s +10%
mld_attempt_signature_generation 55s 57s -4%
polymat_permute_bitrev_to_custom 29s 29s +0%
polyvec_matrix_expand 28s 25s +12%
rej_uniform 23s 23s +0%
sign_keypair_internal ⚠️ 22s 4s +450%
fqmul 20s 19s +5%
poly_chknorm_c 20s 19s +5%
poly_uniform_eta_4x 19s 19s +0%
sign_pk_from_sk 17s 8s +112%
poly_uniform_4x 16s 13s +23%
sign_signature_internal 16s 22s -27%
keccakf1600x4_permute_native 15s 16s -6%
polyt0_unpack 15s 14s +7%
rej_uniform_c 15s 16s -6%
polyeta_unpack 14s 14s +0%
polyz_unpack_c 14s 12s +17%
poly_add 13s 11s +18%
mld_ntt_butterfly_block 11s 9s +22%
mld_check_pct 10s 16s -38%
polyveck_power2round 10s 4s +150%
keccak_absorb_once_x4 9s 10s -10%
poly_decompose_c 9s 10s -10%
polyveck_add 9s 8s +12%
keccak_absorb 8s 6s +33%
keccak_squeezeblocks_x4 8s 6s +33%
keccakf1600_permute 8s 7s +14%
keccakf1600_permute_native 8s 8s +0%
poly_caddq_c 8s 7s +14%
polyvec_matrix_pointwise_montgomery 8s 8s +0%
polyveck_shiftl 8s 8s +0%
mld_compute_pack_z 7s 6s +17%
poly_invntt_tomont_c 7s 5s +40%
polyvec_matrix_expand_serial 7s 7s +0%
polyveck_pointwise_poly_montgomery 7s 6s +17%
polyveck_reduce 7s 3s +133%
polyveck_use_hint 7s 6s +17%
rej_eta_native 7s 8s -12%
mld_polyvecl_permute_bitrev_to_custom_native 6s 7s -14%
pointwise_acc_native_aarch64 6s 8s -25%
poly_power2round 6s 3s +100%
poly_uniform_eta 6s 5s +20%
poly_uniform_gamma1_4x 6s 2s +200%
poly_use_hint_c 6s 5s +20%
polyt1_unpack 6s 2s +200%
polyveck_caddq 6s 4s +50%
polyvecl_ntt 6s 5s +20%
polyz_unpack 6s 4s +50%
sign 6s 4s +50%
unpack_sk 6s 7s -14%
fqscale 5s 2s +150%
mld_h 5s 4s +25%
mld_prepare_domain_separation_prefix 5s 7s -29%
pack_sig_c 5s 4s +25%
pointwise_acc_native_x86_64 5s 5s +0%
poly_challenge 5s 3s +67%
poly_invntt_tomont 5s 5s +0%
polyveck_invntt_tomont 5s 4s +25%
polyveck_sub 5s 8s -38%
polyveck_unpack_eta 5s 4s +25%
polyvecl_pointwise_acc_montgomery_native 5s 4s +25%
sign_signature_extmu 5s 3s +67%
sign_verify 5s 4s +25%
sign_verify_extmu 5s 6s -17%
sign_verify_pre_hash_internal 5s 7s -29%
unpack_hints 5s 6s -17%
keccak_f1600_x4_native_aarch64_v84a 4s 3s +33%
keccak_finalize 4s 4s +0%
keccakf1600_xor_bytes 4s 2s +100%
keccakf1600_xor_bytes (big endian) 4s 2s +100%
mld_ct_cmask_nonzero_u32 4s 3s +33%
mld_ct_get_optblocker_u32 4s 2s +100%
mld_ct_sel_int32 4s 2s +100%
mld_sample_s1_s2 4s 4s +0%
mld_sample_s1_s2_serial 4s 5s -20%
ntt_native_aarch64 4s 4s +0%
pointwise_native_aarch64 4s 6s -33%
poly_decompose_native 4s 3s +33%
poly_ntt 4s 5s -20%
poly_ntt_native 4s 3s +33%
poly_reduce 4s 3s +33%
poly_shiftl 4s 3s +33%
poly_uniform 4s 3s +33%
poly_uniform_gamma1 4s 1s +300%
poly_use_hint_native 4s 2s +100%
polyveck_decompose 4s 12s -67%
polyveck_ntt 4s 5s -20%
polyvecl_permute_bitrev_to_custom 4s 3s +33%
polyvecl_uniform_gamma1 4s 6s -33%
rej_eta_c 4s 3s +33%
shake128_init 4s 2s +100%
shake128x4_squeezeblocks 4s 5s -20%
shake256_finalize 4s 4s +0%
sign_open 4s 4s +0%
sign_signature 4s 5s -20%
sign_signature_pre_hash_shake256 4s 4s +0%
sign_verify_pre_hash_shake256 4s 4s +0%
unpack_sig 4s 4s +0%
caddq 3s 2s +50%
decompose 3s 2s +50%
intt_native_x86_64 3s 4s -25%
keccak_f1600_x1_native_aarch64 3s 1s +200%
keccak_squeeze 3s 3s +0%
keccakf1600x4_extract_bytes 3s 1s +200%
keccakf1600x4_xor_bytes 3s 2s +50%
make_hint 3s 2s +50%
mld_keccakf1600_extract_bytes 3s 2s +50%
mld_value_barrier_i64 3s 2s +50%
ntt_native_x86_64 3s 3s +0%
pack_pk 3s 6s -50%
pack_sig_h_poly 3s 4s -25%
poly_chknorm_native 3s 2s +50%
poly_ntt_c 3s 3s +0%
poly_use_hint 3s 1s +200%
polyeta_pack 3s 3s +0%
polyt1_pack 3s 7s -57%
polyveck_chknorm 3s 4s -25%
polyveck_pack_eta 3s 3s +0%
polyveck_pack_t0 3s 3s +0%
polyveck_pack_w1 3s 6s -50%
polyveck_unpack_t0 3s 2s +50%
polyvecl_pack_eta 3s 3s +0%
polyvecl_uniform_gamma1_serial 3s 2s +50%
polyvecl_unpack_eta 3s 5s -40%
polyvecl_unpack_z 3s 5s -40%
polyz_pack 3s 7s -57%
power2round 3s 2s +50%
shake128_absorb 3s 3s +0%
shake128_release 3s 1s +200%
shake256_init 3s 4s -25%
shake256x4_absorb_once 3s 2s +50%
shake256x4_squeezeblocks 3s 4s -25%
sign_keypair 3s 2s +50%
sign_signature_pre_hash_internal 3s 4s -25%
sys_check_capability 3s 6s -50%
unpack_pk 3s 4s -25%
use_hint 3s 3s +0%
keccak_f1600_x1_native_aarch64_v84a 2s 2s +0%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 2s 2s +0%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 2s 2s +0%
keccak_init 2s 2s +0%
keccakf1600x4_permute 2s 2s +0%
mld_ct_abs_i32 2s 1s +100%
mld_ct_cmask_neg_i32 2s 3s -33%
mld_ct_cmask_nonzero_u8 2s 4s -50%
mld_ct_get_optblocker_i64 2s 2s +0%
mld_value_barrier_u32 2s 2s +0%
mld_value_barrier_u8 2s 2s +0%
montgomery_reduce 2s 3s -33%
pack_sig_z 2s 1s +100%
pack_sk_rho_key_tr_s2_t0 2s - new
pack_sk_s1 2s - new
pointwise_native_x86_64 2s 2s +0%
poly_caddq 2s 4s -50%
poly_caddq_native 2s 2s +0%
poly_caddq_native_aarch64 2s 4s -50%
poly_chknorm 2s 4s -50%
poly_chknorm_native_aarch64 2s 2s +0%
poly_decompose 2s 3s -33%
poly_invntt_tomont_native 2s 5s -60%
poly_make_hint 2s 5s -60%
poly_pointwise_montgomery 2s 1s +100%
poly_pointwise_montgomery_native 2s 3s -33%
poly_sub 2s 2s +0%
polyt0_pack 2s 4s -50%
polyvecl_chknorm 2s 3s -33%
polyvecl_pointwise_acc_montgomery 2s 4s -50%
polyw1_pack 2s 3s -33%
reduce32 2s 1s +100%
shake128_squeeze 2s 2s +0%
shake128x4_absorb_once 2s 1s +100%
shake256 2s 4s -50%
shake256_absorb 2s 4s -50%
shake256_release 2s 1s +100%
shake256_squeeze 2s 2s +0%
keccakf1600_extract_bytes (big endian) 1s 2s -50%
mld_ct_get_optblocker_u8 1s 2s -50%
polyz_unpack_native 1s 3s -67%
rej_eta 1s 3s -67%
shake128_finalize 1s 1s +0%

@oqs-bot
Copy link
Copy Markdown
Contributor

oqs-bot commented Apr 2, 2026

CBMC Results (ML-DSA-87)

⚠️ Attention Required

Proof Status Current Previous Change
**TOTAL** ⚠️ 4197s 3002s +39.8%
polyveck_decompose ⚠️ 22s 13s +69%
polyvecl_pointwise_acc_montgomery_c ⚠️ 1751s 986s +78%
sign_keypair_internal ⚠️ 51s 6s +750%
sign_pk_from_sk ⚠️ 27s 8s +238%
Full Results (186 proofs)
Proof Status Current Previous Change
**TOTAL** ⚠️ 4197s 3002s +39.8%
polyvecl_pointwise_acc_montgomery_c ⚠️ 1751s 986s +78%
sign_verify_internal 263s 235s +12%
poly_pointwise_montgomery_c 213s 153s +39%
polyvec_matrix_expand 205s 170s +21%
rej_uniform_native 162s 138s +17%
polyvec_matrix_expand_serial 141s 121s +17%
mld_attempt_signature_generation 99s 73s +36%
mld_ct_memcmp 96s 74s +30%
mld_invntt_layer 76s 63s +21%
mld_ntt_layer 60s 50s +20%
sign_keypair_internal ⚠️ 51s 6s +750%
sign_signature_internal 40s 37s +8%
polyveck_invntt_tomont 31s 29s +7%
polymat_permute_bitrev_to_custom 30s 26s +15%
sign_pk_from_sk ⚠️ 27s 8s +238%
rej_uniform 26s 21s +24%
fqmul 23s 19s +21%
poly_chknorm_c 23s 18s +28%
poly_uniform_eta_4x 22s 16s +38%
polyveck_decompose ⚠️ 22s 13s +69%
poly_uniform_4x 17s 13s +31%
polyeta_unpack 17s 14s +21%
keccakf1600x4_permute_native 16s 13s +23%
rej_uniform_c 16s 12s +33%
polyt0_unpack 15s 12s +25%
mld_ntt_butterfly_block 12s 10s +20%
poly_add 12s 9s +33%
polyveck_use_hint 12s 7s +71%
keccak_absorb 11s 9s +22%
keccak_absorb_once_x4 11s 9s +22%
mld_polyvecl_permute_bitrev_to_custom_native 11s 9s +22%
polyvec_matrix_pointwise_montgomery 11s 8s +38%
polyveck_add 11s 10s +10%
polyz_unpack_c 11s 8s +38%
mld_check_pct 10s 7s +43%
polyveck_power2round 10s 10s +0%
polyvecl_ntt 10s 8s +25%
rej_eta_native 10s 6s +67%
unpack_sk 10s 9s +11%
keccakf1600_permute 9s 11s -18%
keccakf1600_permute_native 9s 8s +12%
polyveck_caddq 9s 10s -10%
polyveck_chknorm 9s 6s +50%
polyveck_reduce 9s 6s +50%
keccak_squeezeblocks_x4 8s 8s +0%
pointwise_acc_native_aarch64 8s 8s +0%
poly_caddq_c 8s 8s +0%
poly_decompose_c 8s 5s +60%
poly_power2round 8s 3s +167%
polyveck_ntt 8s 7s +14%
polyvecl_chknorm 8s 7s +14%
sign 8s 7s +14%
mld_compute_pack_z 7s 8s -12%
mld_sample_s1_s2_serial 7s 4s +75%
pointwise_acc_native_x86_64 7s 8s -12%
polyveck_pointwise_poly_montgomery 7s 6s +17%
polyveck_sub 7s 5s +40%
polyveck_unpack_eta 7s 7s +0%
unpack_hints 7s 6s +17%
mld_h 6s 5s +20%
poly_chknorm_native_aarch64 6s 4s +50%
poly_invntt_tomont_c 6s 9s -33%
poly_invntt_tomont_native 6s 2s +200%
polyveck_shiftl 6s 6s +0%
polyvecl_permute_bitrev_to_custom 6s 4s +50%
polyvecl_pointwise_acc_montgomery 6s 3s +100%
sign_signature_pre_hash_internal 6s 4s +50%
keccakf1600_extract_bytes (big endian) 5s 3s +67%
mld_ct_get_optblocker_i64 5s 4s +25%
mld_prepare_domain_separation_prefix 5s 2s +150%
mld_sample_s1_s2 5s 8s -38%
montgomery_reduce 5s 4s +25%
poly_challenge 5s 3s +67%
poly_chknorm_native 5s 2s +150%
poly_uniform 5s 4s +25%
poly_uniform_eta 5s 3s +67%
poly_uniform_gamma1_4x 5s 2s +150%
poly_use_hint 5s 1s +400%
poly_use_hint_native 5s 4s +25%
polyveck_pack_t0 5s 2s +150%
polyvecl_uniform_gamma1_serial 5s 3s +67%
rej_eta_c 5s 4s +25%
shake256 5s 1s +400%
shake256_init 5s 2s +150%
shake256x4_squeezeblocks 5s 2s +150%
sign_keypair 5s 5s +0%
sign_signature 5s 6s -17%
sign_signature_extmu 5s 5s +0%
sign_signature_pre_hash_shake256 5s 6s -17%
sign_verify_extmu 5s 4s +25%
sign_verify_pre_hash_shake256 5s 6s -17%
decompose 4s 4s +0%
intt_native_x86_64 4s 2s +100%
mld_ct_abs_i32 4s 3s +33%
pack_pk 4s 2s +100%
pack_sig_h_poly 4s 2s +100%
poly_chknorm 4s 4s +0%
poly_ntt 4s 4s +0%
poly_ntt_native 4s 4s +0%
poly_pointwise_montgomery_native 4s 4s +0%
poly_shiftl 4s 3s +33%
poly_uniform_gamma1 4s 5s -20%
poly_use_hint_c 4s 2s +100%
polyt1_unpack 4s 4s +0%
polyveck_pack_eta 4s 3s +33%
polyvecl_unpack_eta 4s 5s -20%
polyvecl_unpack_z 4s 4s +0%
polyz_pack 4s 3s +33%
polyz_unpack 4s 2s +100%
polyz_unpack_native 4s 2s +100%
shake256_finalize 4s 3s +33%
sign_open 4s 6s -33%
unpack_pk 4s 3s +33%
unpack_sig 4s 5s -20%
fqscale 3s 3s +0%
keccak_f1600_x4_native_aarch64_v84a 3s 3s +0%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 3s 2s +50%
keccak_finalize 3s 3s +0%
keccak_squeeze 3s 3s +0%
keccakf1600_xor_bytes 3s 2s +50%
keccakf1600x4_xor_bytes 3s 2s +50%
make_hint 3s 2s +50%
mld_ct_cmask_neg_i32 3s 2s +50%
mld_ct_cmask_nonzero_u8 3s 3s +0%
mld_ct_get_optblocker_u32 3s 2s +50%
mld_keccakf1600_extract_bytes 3s 3s +0%
mld_value_barrier_i64 3s 3s +0%
mld_value_barrier_u32 3s 5s -40%
mld_value_barrier_u8 3s 1s +200%
ntt_native_x86_64 3s 2s +50%
pack_sig_c 3s 5s -40%
pack_sig_z 3s 3s +0%
pack_sk_rho_key_tr_s2_t0 3s - new
pack_sk_s1 3s - new
pointwise_native_aarch64 3s 5s -40%
pointwise_native_x86_64 3s 3s +0%
poly_caddq_native 3s 3s +0%
poly_decompose 3s 3s +0%
poly_decompose_native 3s 5s -40%
poly_invntt_tomont 3s 3s +0%
poly_ntt_c 3s 3s +0%
poly_pointwise_montgomery 3s 5s -40%
poly_reduce 3s 2s +50%
polyeta_pack 3s 3s +0%
polyt0_pack 3s 2s +50%
polyt1_pack 3s 2s +50%
polyveck_pack_w1 3s 2s +50%
polyveck_unpack_t0 3s 3s +0%
polyvecl_pack_eta 3s 4s -25%
polyvecl_pointwise_acc_montgomery_native 3s 3s +0%
polyvecl_uniform_gamma1 3s 3s +0%
polyw1_pack 3s 3s +0%
power2round 3s 3s +0%
rej_eta 3s 3s +0%
shake128_finalize 3s 5s -40%
shake128_init 3s 3s +0%
shake128_squeeze 3s 3s +0%
shake256_release 3s 2s +50%
shake256x4_absorb_once 3s 3s +0%
sign_verify_pre_hash_internal 3s 2s +50%
caddq 2s 3s -33%
keccak_f1600_x1_native_aarch64 2s 2s +0%
keccak_f1600_x1_native_aarch64_v84a 2s 2s +0%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 2s 1s +100%
keccakf1600_xor_bytes (big endian) 2s 2s +0%
keccakf1600x4_extract_bytes 2s 3s -33%
mld_ct_cmask_nonzero_u32 2s 3s -33%
mld_ct_get_optblocker_u8 2s 2s +0%
ntt_native_aarch64 2s 2s +0%
poly_caddq_native_aarch64 2s 3s -33%
poly_make_hint 2s 4s -50%
reduce32 2s 1s +100%
shake128_absorb 2s 2s +0%
shake128_release 2s 2s +0%
shake128x4_absorb_once 2s 3s -33%
shake128x4_squeezeblocks 2s 2s +0%
shake256_absorb 2s 2s +0%
shake256_squeeze 2s 3s -33%
sign_verify 2s 4s -50%
sys_check_capability 2s 2s +0%
use_hint 2s 2s +0%
keccak_init 1s 3s -67%
keccakf1600x4_permute 1s 1s +0%
mld_ct_sel_int32 1s 2s -50%
poly_caddq 1s 5s -80%
poly_sub 1s 3s -67%

@oqs-bot
Copy link
Copy Markdown
Contributor

oqs-bot commented Apr 2, 2026

CBMC Results (ML-DSA-65)

⚠️ Attention Required

Proof Status Current Previous Change
sign_keypair_internal ⚠️ 29s 6s +383%
Full Results (186 proofs)
Proof Status Current Previous Change
**TOTAL** 2940s 2556s +15.0%
polyvecl_pointwise_acc_montgomery_c 860s 655s +31%
sign_verify_internal 244s 222s +10%
poly_pointwise_montgomery_c 201s 161s +25%
rej_uniform_native 162s 146s +11%
mld_ct_memcmp 101s 80s +26%
polyvec_matrix_expand 95s 87s +9%
mld_invntt_layer 75s 67s +12%
mld_attempt_signature_generation 69s 67s +3%
mld_ntt_layer 59s 56s +5%
polyvec_matrix_expand_serial 55s 51s +8%
polymat_permute_bitrev_to_custom 40s 37s +8%
sign_keypair_internal ⚠️ 29s 6s +383%
sign_signature_internal 27s 27s +0%
fqmul 24s 22s +9%
poly_chknorm_c 24s 24s +0%
rej_uniform 24s 23s +4%
polyveck_power2round 19s 14s +36%
sign_pk_from_sk 19s 8s +138%
poly_uniform_4x 18s 17s +6%
poly_uniform_eta_4x 16s 16s +0%
rej_uniform_c 16s 14s +14%
keccakf1600x4_permute_native 15s 15s +0%
poly_add 15s 11s +36%
polyt0_unpack 15s 14s +7%
polyvec_matrix_pointwise_montgomery 14s 13s +8%
polyveck_add 14s 10s +40%
mld_ntt_butterfly_block 13s 13s +0%
polyveck_decompose 13s 18s -28%
keccak_absorb_once_x4 11s 11s +0%
mld_sample_s1_s2 11s 5s +120%
mld_check_pct 10s 15s -33%
polyveck_caddq 10s 10s +0%
polyveck_pointwise_poly_montgomery 10s 5s +100%
sign 10s 9s +11%
unpack_sk 10s 8s +25%
keccakf1600_permute_native 9s 8s +12%
mld_compute_pack_z 9s 9s +0%
polyveck_sub 9s 9s +0%
keccak_absorb 8s 6s +33%
keccakf1600_permute 8s 8s +0%
pointwise_acc_native_x86_64 8s 7s +14%
poly_invntt_tomont_c 8s 5s +60%
polyveck_ntt 8s 7s +14%
polyveck_reduce 8s 6s +33%
polyveck_shiftl 8s 6s +33%
polyvecl_ntt 8s 10s -20%
mld_polyvecl_permute_bitrev_to_custom_native 7s 7s +0%
poly_decompose_c 7s 6s +17%
poly_power2round 7s 6s +17%
sign_signature_pre_hash_shake256 7s 4s +75%
keccak_f1600_x1_native_aarch64 6s 4s +50%
keccak_squeezeblocks_x4 6s 6s +0%
mld_h 6s 5s +20%
mld_prepare_domain_separation_prefix 6s 5s +20%
pointwise_acc_native_aarch64 6s 10s -40%
poly_uniform_eta 6s 5s +20%
poly_use_hint_c 6s 6s +0%
polyveck_invntt_tomont 6s 5s +20%
polyveck_pack_t0 6s 2s +200%
polyz_unpack_c 6s 2s +200%
rej_eta_native 6s 3s +100%
sign_keypair 6s 7s -14%
sign_open 6s 7s -14%
sign_verify 6s 3s +100%
unpack_hints 6s 9s -33%
keccak_f1600_x4_native_aarch64_v84a 5s 3s +67%
montgomery_reduce 5s 3s +67%
poly_caddq_c 5s 4s +25%
poly_invntt_tomont_native 5s 3s +67%
poly_reduce 5s 3s +67%
poly_shiftl 5s 5s +0%
poly_uniform 5s 7s -29%
polyeta_unpack 5s 5s +0%
polyveck_use_hint 5s 10s -50%
polyvecl_pointwise_acc_montgomery_native 5s 6s -17%
rej_eta 5s 2s +150%
shake256_finalize 5s 2s +150%
shake256_init 5s 3s +67%
sign_signature_pre_hash_internal 5s 4s +25%
fqscale 4s 1s +300%
intt_native_x86_64 4s 3s +33%
keccak_finalize 4s 4s +0%
make_hint 4s 3s +33%
mld_ct_cmask_nonzero_u8 4s 3s +33%
mld_keccakf1600_extract_bytes 4s 4s +0%
mld_sample_s1_s2_serial 4s 7s -43%
pack_sk_s1 4s - new
pointwise_native_x86_64 4s 4s +0%
poly_chknorm_native_aarch64 4s 2s +100%
poly_decompose 4s 3s +33%
poly_invntt_tomont 4s 3s +33%
poly_pointwise_montgomery 4s 3s +33%
poly_uniform_gamma1 4s 3s +33%
poly_use_hint 4s 3s +33%
polyt0_pack 4s 4s +0%
polyt1_pack 4s 3s +33%
polyveck_pack_w1 4s 3s +33%
polyveck_unpack_t0 4s 4s +0%
polyvecl_uniform_gamma1 4s 4s +0%
polyvecl_unpack_z 4s 3s +33%
polyz_pack 4s 4s +0%
rej_eta_c 4s 2s +100%
shake256_absorb 4s 2s +100%
sign_signature_extmu 4s 5s -20%
sign_verify_pre_hash_shake256 4s 6s -33%
sys_check_capability 4s 2s +100%
caddq 3s 4s -25%
keccak_f1600_x1_native_aarch64_v84a 3s 2s +50%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 3s 1s +200%
keccak_squeeze 3s 3s +0%
keccakf1600_xor_bytes 3s 2s +50%
keccakf1600x4_extract_bytes 3s 2s +50%
mld_ct_cmask_neg_i32 3s 2s +50%
mld_ct_get_optblocker_i64 3s 1s +200%
mld_ct_get_optblocker_u32 3s 3s +0%
mld_ct_sel_int32 3s 1s +200%
ntt_native_aarch64 3s 2s +50%
ntt_native_x86_64 3s 4s -25%
pack_pk 3s 4s -25%
pack_sig_z 3s 1s +200%
pack_sk_rho_key_tr_s2_t0 3s - new
poly_caddq 3s 2s +50%
poly_caddq_native_aarch64 3s 4s -25%
poly_chknorm 3s 4s -25%
poly_chknorm_native 3s 2s +50%
poly_ntt 3s 2s +50%
poly_ntt_c 3s 3s +0%
poly_pointwise_montgomery_native 3s 5s -40%
poly_sub 3s 4s -25%
poly_use_hint_native 3s 2s +50%
polyeta_pack 3s 3s +0%
polyt1_unpack 3s 4s -25%
polyveck_chknorm 3s 8s -62%
polyveck_pack_eta 3s 3s +0%
polyveck_unpack_eta 3s 3s +0%
polyvecl_chknorm 3s 7s -57%
polyvecl_pack_eta 3s 2s +50%
polyvecl_permute_bitrev_to_custom 3s 3s +0%
polyvecl_uniform_gamma1_serial 3s 5s -40%
polyvecl_unpack_eta 3s 2s +50%
polyw1_pack 3s 3s +0%
polyz_unpack 3s 2s +50%
power2round 3s 2s +50%
reduce32 3s 1s +200%
shake128_absorb 3s 2s +50%
shake128_squeeze 3s 2s +50%
shake128x4_squeezeblocks 3s 2s +50%
shake256 3s 3s +0%
shake256_squeeze 3s 2s +50%
sign_signature 3s 5s -40%
sign_verify_pre_hash_internal 3s 5s -40%
unpack_pk 3s 2s +50%
unpack_sig 3s 2s +50%
decompose 2s 3s -33%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 2s 3s -33%
keccak_init 2s 2s +0%
keccakf1600_extract_bytes (big endian) 2s 3s -33%
keccakf1600_xor_bytes (big endian) 2s 3s -33%
keccakf1600x4_permute 2s 3s -33%
keccakf1600x4_xor_bytes 2s 3s -33%
mld_ct_abs_i32 2s 2s +0%
mld_ct_cmask_nonzero_u32 2s 3s -33%
mld_ct_get_optblocker_u8 2s 1s +100%
mld_value_barrier_u32 2s 3s -33%
mld_value_barrier_u8 2s 5s -60%
pack_sig_c 2s 3s -33%
pack_sig_h_poly 2s 4s -50%
pointwise_native_aarch64 2s 2s +0%
poly_caddq_native 2s 4s -50%
poly_challenge 2s 3s -33%
poly_decompose_native 2s 4s -50%
poly_make_hint 2s 5s -60%
poly_ntt_native 2s 4s -50%
poly_uniform_gamma1_4x 2s 4s -50%
polyvecl_pointwise_acc_montgomery 2s 4s -50%
polyz_unpack_native 2s 6s -67%
shake128_finalize 2s 3s -33%
shake128_init 2s 1s +100%
shake128_release 2s 3s -33%
shake128x4_absorb_once 2s 4s -50%
shake256_release 2s 1s +100%
shake256x4_absorb_once 2s 3s -33%
sign_verify_extmu 2s 3s -33%
use_hint 2s 2s +0%
mld_value_barrier_i64 1s 2s -50%
shake256x4_squeezeblocks 1s 3s -67%

@mkannwischer mkannwischer marked this pull request as ready for review April 2, 2026 04:31
@mkannwischer mkannwischer requested a review from a team as a code owner April 2, 2026 04:31
@mkannwischer mkannwischer force-pushed the keygen-buffer-sharing branch 3 times, most recently from 93856fc to 05fc00a Compare April 8, 2026 07:08
@mkannwischer mkannwischer force-pushed the keygen-buffer-sharing branch from 05fc00a to 035e9aa Compare April 8, 2026 09:17
Comment thread mldsa/src/poly.h
Comment thread mldsa/src/polyvec.h
Comment thread mldsa/src/sign.c Outdated
Copy link
Copy Markdown
Contributor

@hanno-becker hanno-becker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to what we did with poly_add, I think we should make the now-destructive nature of power2round explicit in the signature and documentation.

We'd also need to adjust the CBMC spec for compute_t0_t1_tr_from_sk_components to account for s1hat and t1 no longer being disjoint in memory. Can we do this already now (even if the call-site still uses a struct)?

Comment thread mldsa/src/sign.c Outdated
@mkannwischer mkannwischer force-pushed the keygen-buffer-sharing branch from b839b38 to ea474d8 Compare April 17, 2026 10:59
Comment thread mldsa/src/sign.c Outdated
@mkannwischer mkannwischer force-pushed the keygen-buffer-sharing branch from ea474d8 to b0a4b1a Compare April 17, 2026 13:41
Reuse t0 as the accumulator in mld_compute_t0_t1_tr_from_sk_components,
and have the caller provide s1 already in NTT form, removing two
allocations (s1hat and t) from the helper.

In mld_sign_keypair_internal, share the s1 and t1 buffers via a union
since s1hat is consumed before t1 is produced. Pack s1 into the secret
key before the in-place NTT so the original coefficients are preserved.

Split mld_pack_sk into mld_pack_sk_s1 and mld_pack_sk_rho_key_tr_s2_t0
to support packing s1 independently before the NTT.

Aliasing s1 and t1 makes mld_compute_t0_t1_tr_from_sk_components's
memory_no_alias contract on both arguments incompatible with the keygen
call site. As a workaround, drop the contract and inline it into the
proofs of both call sites.
This will go away in a follow-up PR that eliminates
mld_compute_t0_t1_tr_from_sk_components altogether
(#1030).

Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>
@mkannwischer mkannwischer force-pushed the keygen-buffer-sharing branch from b0a4b1a to c9443ad Compare April 17, 2026 13:44
@mkannwischer mkannwischer merged commit b21396d into main Apr 17, 2026
403 checks passed
@mkannwischer mkannwischer deleted the keygen-buffer-sharing branch April 17, 2026 14:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants