[RegisterCoalescer] Improve register allocation for return values by limiting rematerialization #163047

rez5427 · 2025-10-12T07:18:43Z

Hi, I recently find this case:

define i8 @foo() {
  store i32 42, ptr @bytes1, align 4
  %l = load i8, ptr @bytes1, align 1
  ret i8 %l
}

clang produce like this:

	lui	a1, %hi(bytes1)
	li	a2, 42
	li	a0, 42
	sw	a2, %lo(bytes1)(a1)
	ret

    lui  a1, %hi(bytes1)
    li   a0, 42          ; Reused for both store and return
    sw   a0, %lo(bytes1)(a1)
    ret

If this swap a2 and a0, it can remove one line of code for this. I put this case in riscv test.

This patch detect when all of the following conditions are met:

A COPY instruction and its def are in the same basic block
The destination register is a physical register
The physical register is used as a return value
No duplicate identical instructions exist before the COPY

In such cases, skip rematerialization in RegisterCoalescer.

llvmbot · 2025-10-12T07:19:15Z

@llvm/pr-subscribers-clang
@llvm/pr-subscribers-backend-mips
@llvm/pr-subscribers-backend-loongarch
@llvm/pr-subscribers-backend-arm
@llvm/pr-subscribers-llvm-regalloc
@llvm/pr-subscribers-backend-x86

@llvm/pr-subscribers-backend-aarch64

Author: guan jian (rez5427)

Changes

Hi, I recently find this case:

define i8 @<!-- -->cast_and_load_1() {
  store i32 42, ptr @<!-- -->bytes1, align 4
  %l = load i8, ptr @<!-- -->bytes1, align 1
  ret i8 %l
}

clang produce like this:

	lui	a1, %hi(bytes1)
	li	a2, 42
	li	a0, 42
	sw	a2, %lo(bytes1)(a1)
	ret

        lui  a1, %hi(bytes1)
        li   a0, 42          ; Reused for both store and return
        sw   a0, %lo(bytes1)(a1)
        ret

If this swap a2 and a0, it can remove one line of code for this. I put this case in riscv test.

This patch detect when all of the following conditions are met:

A COPY instruction and its def are in the same basic block
The destination register is a physical register
The physical register is used as a return value
No duplicate identical instructions exist before the COPY

In such cases, skip rematerialization in RegisterCoalescer.

Patch is 128.37 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/163047.diff

87 Files Affected:

(modified) llvm/lib/CodeGen/RegisterCoalescer.cpp (+35)
(modified) llvm/test/CodeGen/AArch64/aarch64_win64cc_vararg.ll (+3-6)
(modified) llvm/test/CodeGen/AArch64/arm64-neon-copy.ll (+2)
(modified) llvm/test/CodeGen/AArch64/arm64-vector-ext.ll (+13-10)
(modified) llvm/test/CodeGen/AArch64/arm64-vshuffle.ll (+10-3)
(modified) llvm/test/CodeGen/AArch64/bitcast.ll (+2)
(modified) llvm/test/CodeGen/AArch64/combine-mul.ll (+2)
(modified) llvm/test/CodeGen/AArch64/ext-narrow-index.ll (+3)
(modified) llvm/test/CodeGen/AArch64/fast-isel-const-float.ll (+1)
(modified) llvm/test/CodeGen/AArch64/movi64_sve.ll (+12)
(modified) llvm/test/CodeGen/AArch64/neon-abd.ll (+1)
(modified) llvm/test/CodeGen/AArch64/neon-compare-instructions.ll (+2)
(modified) llvm/test/CodeGen/AArch64/neon-mov.ll (+3)
(modified) llvm/test/CodeGen/AArch64/remat-const-float-simd.ll (+1)
(modified) llvm/test/CodeGen/AArch64/sve-implicit-zero-filling.ll (+1-1)
(modified) llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-insert-vector-elt.ll (+1)
(modified) llvm/test/CodeGen/AArch64/sve-streaming-mode-test-register-mov.ll (+2)
(modified) llvm/test/CodeGen/AArch64/win64_vararg.ll (+3-6)
(modified) llvm/test/CodeGen/RISCV/float-imm.ll (+1)
(modified) llvm/test/CodeGen/RISCV/half-imm.ll (+2)
(added) llvm/test/CodeGen/RISCV/ret-remat.ll (+16)
(modified) llvm/test/CodeGen/RISCV/zfhmin-imm.ll (+4)
(modified) llvm/test/CodeGen/X86/2006-04-27-ISelFoldingBug.ll (+1)
(modified) llvm/test/CodeGen/X86/2007-02-16-BranchFold.ll (+1)
(modified) llvm/test/CodeGen/X86/2007-10-12-CoalesceExtSubReg.ll (+1)
(modified) llvm/test/CodeGen/X86/2007-10-12-SpillerUnfold2.ll (+1)
(modified) llvm/test/CodeGen/X86/2007-10-29-ExtendSetCC.ll (+1)
(modified) llvm/test/CodeGen/X86/2008-04-16-ReMatBug.ll (+1)
(modified) llvm/test/CodeGen/X86/add.ll (+6)
(modified) llvm/test/CodeGen/X86/addcarry.ll (+2-1)
(modified) llvm/test/CodeGen/X86/all-ones-vector.ll (+38-38)
(modified) llvm/test/CodeGen/X86/apx/ccmp.ll (+10)
(modified) llvm/test/CodeGen/X86/apx/ctest.ll (+10)
(modified) llvm/test/CodeGen/X86/apx/imulzu.ll (+1)
(modified) llvm/test/CodeGen/X86/atomic-unordered.ll (+13-26)
(modified) llvm/test/CodeGen/X86/bfloat.ll (+4-4)
(modified) llvm/test/CodeGen/X86/cet_endbr_imm_enhance.ll (+3-3)
(modified) llvm/test/CodeGen/X86/cmp.ll (+1)
(modified) llvm/test/CodeGen/X86/coalescer-implicit-def-regression.ll (+6-6)
(modified) llvm/test/CodeGen/X86/combine-mulo.ll (+1)
(modified) llvm/test/CodeGen/X86/combine-shl.ll (+1-1)
(modified) llvm/test/CodeGen/X86/combine-srem.ll (+1)
(modified) llvm/test/CodeGen/X86/combine-subo.ll (+3-2)
(modified) llvm/test/CodeGen/X86/combine-urem.ll (+1)
(modified) llvm/test/CodeGen/X86/divmod128.ll (+16-16)
(modified) llvm/test/CodeGen/X86/fast-isel-fcmp.ll (+3)
(modified) llvm/test/CodeGen/X86/fast-isel-load-i1.ll (+1)
(modified) llvm/test/CodeGen/X86/hoist-and-by-const-from-lshr-in-eqcmp-zero.ll (+1)
(modified) llvm/test/CodeGen/X86/i128-immediate.ll (+1-1)
(modified) llvm/test/CodeGen/X86/int-to-fp-demanded.ll (+2-1)
(modified) llvm/test/CodeGen/X86/is_fpclass.ll (+2)
(modified) llvm/test/CodeGen/X86/isel-fpclass.ll (+11)
(modified) llvm/test/CodeGen/X86/knownbits-div.ll (+2)
(modified) llvm/test/CodeGen/X86/lack-of-signed-truncation-check.ll (+1)
(modified) llvm/test/CodeGen/X86/memcmp-constant.ll (+4)
(modified) llvm/test/CodeGen/X86/memcmp-more-load-pairs-x32.ll (+1)
(modified) llvm/test/CodeGen/X86/memcmp-more-load-pairs.ll (+1)
(modified) llvm/test/CodeGen/X86/memcmp-x32.ll (+1)
(modified) llvm/test/CodeGen/X86/memcmp.ll (+1)
(modified) llvm/test/CodeGen/X86/negate.ll (+1)
(modified) llvm/test/CodeGen/X86/oddshuffles.ll (+5-5)
(modified) llvm/test/CodeGen/X86/oddsubvector.ll (+4-5)
(modified) llvm/test/CodeGen/X86/overflow.ll (+1-1)
(modified) llvm/test/CodeGen/X86/peephole-na-phys-copy-folding.ll (+8)
(modified) llvm/test/CodeGen/X86/pmulh.ll (+1-1)
(modified) llvm/test/CodeGen/X86/pr108728.ll (+1)
(modified) llvm/test/CodeGen/X86/pr132844.ll (+5-5)
(modified) llvm/test/CodeGen/X86/shuffle-combine-crash-2.ll (+2-2)
(modified) llvm/test/CodeGen/X86/shuffle-combine-crash-3.ll (+1)
(modified) llvm/test/CodeGen/X86/smul-with-overflow.ll (+12-8)
(modified) llvm/test/CodeGen/X86/sub-with-overflow.ll (+2)
(modified) llvm/test/CodeGen/X86/subcarry.ll (+2-1)
(modified) llvm/test/CodeGen/X86/subreg-to-reg-6.ll (+1-1)
(modified) llvm/test/CodeGen/X86/tail-opts.ll (+1)
(modified) llvm/test/CodeGen/X86/tailcall-cgp-dup.ll (+1)
(modified) llvm/test/CodeGen/X86/trunc-to-bool.ll (+1)
(modified) llvm/test/CodeGen/X86/umul-with-carry.ll (+1)
(modified) llvm/test/CodeGen/X86/vec_minmax_sint.ll (+2-2)
(modified) llvm/test/CodeGen/X86/vec_minmax_uint.ll (+2-2)
(modified) llvm/test/CodeGen/X86/vec_umulo.ll (+4-4)
(modified) llvm/test/CodeGen/X86/vector-partial-undef.ll (+2-2)
(modified) llvm/test/CodeGen/X86/vector-shift-lut.ll (+96-96)
(modified) llvm/test/CodeGen/X86/vectorcall.ll (+5-78)
(modified) llvm/test/CodeGen/X86/xaluo.ll (+12)
(modified) llvm/test/CodeGen/X86/xmulo.ll (+38-12)
(modified) llvm/test/DebugInfo/MIR/X86/regcoalescer.mir (+7-4)
(modified) llvm/test/Other/machine-size-remarks.ll (+5-2)

diff --git a/llvm/lib/CodeGen/RegisterCoalescer.cpp b/llvm/lib/CodeGen/RegisterCoalescer.cpp
index ebfea8e5581bf..c29491488f06e 100644
--- a/llvm/lib/CodeGen/RegisterCoalescer.cpp
+++ b/llvm/lib/CodeGen/RegisterCoalescer.cpp
@@ -1326,6 +1326,41 @@ bool RegisterCoalescer::reMaterializeDef(const CoalescerPair &CP,
   if (!TII->isAsCheapAsAMove(*DefMI))
     return false;
 
+  // Skip rematerialization for physical registers used as return values within
+  // the same basic block to enable better coalescing.
+  if (DstReg.isPhysical()) {
+    MachineBasicBlock *MBB = CopyMI->getParent();
+    if (DefMI->getParent() == MBB) {
+      // Check if there's already an identical instruction before CopyMI
+      // If so, allow rematerialization to avoid redundant instructions
+      bool FoundCopy = false;
+      for (MachineInstr &MI : *MBB) {
+        if (&MI == CopyMI) {
+          FoundCopy = true;
+          continue;
+        }
+
+        // Before CopyMI: check for duplicate instructions
+        if (!FoundCopy && &MI != DefMI &&
+            MI.isIdenticalTo(*DefMI, MachineInstr::IgnoreDefs)) {
+          break;  // Found duplicate, allow rematerialization
+        } else if (FoundCopy) {
+          // After CopyMI: check if used as return register
+          // If the register is redefined, it's not a return register
+          if (MI.modifiesRegister(DstReg, TRI))
+            break;
+
+          // If there's a return instruction that uses this register, skip remat
+          if (MI.isReturn() && MI.readsRegister(DstReg, TRI)) {
+            LLVM_DEBUG(dbgs() << "\tSkip remat for return register: "
+                              << printReg(DstReg, TRI) << '\n');
+            return false;
+          }
+        }
+      }
+    }
+  }
+
   if (!TII->isReMaterializable(*DefMI))
     return false;
 
diff --git a/llvm/test/CodeGen/AArch64/aarch64_win64cc_vararg.ll b/llvm/test/CodeGen/AArch64/aarch64_win64cc_vararg.ll
index 7d488c9ca2002..ea268ed83f3de 100644
--- a/llvm/test/CodeGen/AArch64/aarch64_win64cc_vararg.ll
+++ b/llvm/test/CodeGen/AArch64/aarch64_win64cc_vararg.ll
@@ -52,9 +52,8 @@ define win64cc ptr @f9(i64 %a0, i64 %a1, i64 %a2, i64 %a3, i64 %a4, i64 %a5, i64
 ; CHECK-LABEL: f9:
 ; CHECK:       // %bb.0: // %entry
 ; CHECK-NEXT:    str x18, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    add x8, sp, #24
 ; CHECK-NEXT:    add x0, sp, #24
-; CHECK-NEXT:    str x8, [sp, #8]
+; CHECK-NEXT:    str x0, [sp, #8]
 ; CHECK-NEXT:    ldr x18, [sp], #16 // 8-byte Folded Reload
 ; CHECK-NEXT:    ret
 ;
@@ -78,9 +77,8 @@ define win64cc ptr @f8(i64 %a0, i64 %a1, i64 %a2, i64 %a3, i64 %a4, i64 %a5, i64
 ; CHECK-LABEL: f8:
 ; CHECK:       // %bb.0: // %entry
 ; CHECK-NEXT:    str x18, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    add x8, sp, #16
 ; CHECK-NEXT:    add x0, sp, #16
-; CHECK-NEXT:    str x8, [sp, #8]
+; CHECK-NEXT:    str x0, [sp, #8]
 ; CHECK-NEXT:    ldr x18, [sp], #16 // 8-byte Folded Reload
 ; CHECK-NEXT:    ret
 ;
@@ -104,10 +102,9 @@ define win64cc ptr @f7(i64 %a0, i64 %a1, i64 %a2, i64 %a3, i64 %a4, i64 %a5, i64
 ; CHECK-LABEL: f7:
 ; CHECK:       // %bb.0: // %entry
 ; CHECK-NEXT:    str x18, [sp, #-32]! // 8-byte Folded Spill
-; CHECK-NEXT:    add x8, sp, #24
 ; CHECK-NEXT:    add x0, sp, #24
 ; CHECK-NEXT:    str x7, [sp, #24]
-; CHECK-NEXT:    str x8, [sp, #8]
+; CHECK-NEXT:    str x0, [sp, #8]
 ; CHECK-NEXT:    ldr x18, [sp], #32 // 8-byte Folded Reload
 ; CHECK-NEXT:    ret
 ;
diff --git a/llvm/test/CodeGen/AArch64/arm64-neon-copy.ll b/llvm/test/CodeGen/AArch64/arm64-neon-copy.ll
index e18a5f695ba29..98c3071de3ae8 100644
--- a/llvm/test/CodeGen/AArch64/arm64-neon-copy.ll
+++ b/llvm/test/CodeGen/AArch64/arm64-neon-copy.ll
@@ -2156,6 +2156,7 @@ define <4 x i16> @concat_vector_v4i16_const() {
 ; CHECK-LABEL: concat_vector_v4i16_const:
 ; CHECK:       // %bb.0:
 ; CHECK-NEXT:    movi v0.2d, #0000000000000000
+; CHECK-NEXT:    // kill: def $d0 killed $d0 killed $q0
 ; CHECK-NEXT:    ret
  %r = shufflevector <1 x i16> zeroinitializer, <1 x i16> undef, <4 x i32> zeroinitializer
  ret <4 x i16> %r
@@ -2183,6 +2184,7 @@ define <8 x i8> @concat_vector_v8i8_const() {
 ; CHECK-LABEL: concat_vector_v8i8_const:
 ; CHECK:       // %bb.0:
 ; CHECK-NEXT:    movi v0.2d, #0000000000000000
+; CHECK-NEXT:    // kill: def $d0 killed $d0 killed $q0
 ; CHECK-NEXT:    ret
  %r = shufflevector <1 x i8> zeroinitializer, <1 x i8> undef, <8 x i32> zeroinitializer
  ret <8 x i8> %r
diff --git a/llvm/test/CodeGen/AArch64/arm64-vector-ext.ll b/llvm/test/CodeGen/AArch64/arm64-vector-ext.ll
index 197a385b0e7cb..91f0fbcd5c46b 100644
--- a/llvm/test/CodeGen/AArch64/arm64-vector-ext.ll
+++ b/llvm/test/CodeGen/AArch64/arm64-vector-ext.ll
@@ -1,15 +1,16 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 6
 ; RUN: llc < %s -mtriple=arm64-eabi -aarch64-neon-syntax=apple | FileCheck %s
 
-;CHECK: @func30
-;CHECK: movi.4h v1, #1
-;CHECK: and.8b v0, v0, v1
-;CHECK: ushll.4s  v0, v0, #0
-;CHECK: str  q0, [x0]
-;CHECK: ret
-
 %T0_30 = type <4 x i1>
 %T1_30 = type <4 x i32>
 define void @func30(%T0_30 %v0, ptr %p1) {
+; CHECK-LABEL: func30:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    movi.4h v1, #1
+; CHECK-NEXT:    and.8b v0, v0, v1
+; CHECK-NEXT:    ushll.4s v0, v0, #0
+; CHECK-NEXT:    str q0, [x0]
+; CHECK-NEXT:    ret
   %r = zext %T0_30 %v0 to %T1_30
   store %T1_30 %r, ptr %p1
   ret void
@@ -18,9 +19,11 @@ define void @func30(%T0_30 %v0, ptr %p1) {
 ; Extend from v1i1 was crashing things (PR20791). Make sure we do something
 ; sensible instead.
 define <1 x i32> @autogen_SD7918() {
-; CHECK-LABEL: autogen_SD7918
-; CHECK: movi.2d v0, #0000000000000000
-; CHECK-NEXT: ret
+; CHECK-LABEL: autogen_SD7918:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    movi.2d v0, #0000000000000000
+; CHECK-NEXT:    // kill: def $d0 killed $d0 killed $q0
+; CHECK-NEXT:    ret
   %I29 = insertelement <1 x i1> zeroinitializer, i1 false, i32 0
   %ZE = zext <1 x i1> %I29 to <1 x i32>
   ret <1 x i32> %ZE
diff --git a/llvm/test/CodeGen/AArch64/arm64-vshuffle.ll b/llvm/test/CodeGen/AArch64/arm64-vshuffle.ll
index b225d9a1acaf5..fd0f2433f2c2b 100644
--- a/llvm/test/CodeGen/AArch64/arm64-vshuffle.ll
+++ b/llvm/test/CodeGen/AArch64/arm64-vshuffle.ll
@@ -1,9 +1,11 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 6
 ; RUN: llc < %s -mtriple=arm64-apple-ios7.0 -mcpu=cyclone | FileCheck %s
 
 define <8 x i1> @test1() {
 ; CHECK-LABEL: test1:
 ; CHECK:       ; %bb.0: ; %entry
 ; CHECK-NEXT:    movi.16b v0, #0
+; CHECK-NEXT:    ; kill: def $d0 killed $d0 killed $q0
 ; CHECK-NEXT:    ret
 entry:
   %Shuff = shufflevector <8 x i1> <i1 0, i1 1, i1 2, i1 3, i1 4, i1 5, i1 6,
@@ -58,9 +60,14 @@ bb:
 ; CHECK:         .byte   0                       ; 0x0
 ; CHECK:         .byte   0                       ; 0x0
 define <16 x i1> @test4(ptr %ptr, i32 %v) {
-; CHECK-LABEL: _test4:
-; CHECK:         adrp    x[[REG3:[0-9]+]], lCPI3_0@PAGE
-; CHECK:         ldr     q[[REG2:[0-9]+]], [x[[REG3]], lCPI3_0@PAGEOFF]
+; CHECK-LABEL: test4:
+; CHECK:       ; %bb.0: ; %bb
+; CHECK-NEXT:  Lloh0:
+; CHECK-NEXT:    adrp x8, lCPI3_0@PAGE
+; CHECK-NEXT:  Lloh1:
+; CHECK-NEXT:    ldr q0, [x8, lCPI3_0@PAGEOFF]
+; CHECK-NEXT:    ret
+; CHECK-NEXT:    .loh AdrpLdr Lloh0, Lloh1
 bb:
   %Shuff = shufflevector <16 x i1> zeroinitializer,
      <16 x i1> <i1 0, i1 1, i1 1, i1 0, i1 0, i1 1, i1 0, i1 0, i1 0, i1 1,
diff --git a/llvm/test/CodeGen/AArch64/bitcast.ll b/llvm/test/CodeGen/AArch64/bitcast.ll
index 20f19fddf790a..d462d2269f6bc 100644
--- a/llvm/test/CodeGen/AArch64/bitcast.ll
+++ b/llvm/test/CodeGen/AArch64/bitcast.ll
@@ -8,6 +8,7 @@ define <4 x i16> @foo1(<2 x i32> %a) {
 ; CHECK-SD-LABEL: foo1:
 ; CHECK-SD:       // %bb.0:
 ; CHECK-SD-NEXT:    movi v0.2d, #0000000000000000
+; CHECK-SD-NEXT:    // kill: def $d0 killed $d0 killed $q0
 ; CHECK-SD-NEXT:    ret
 ;
 ; CHECK-GI-LABEL: foo1:
@@ -28,6 +29,7 @@ define <4 x i16> @foo2(<2 x i32> %a) {
 ; CHECK-SD-LABEL: foo2:
 ; CHECK-SD:       // %bb.0:
 ; CHECK-SD-NEXT:    movi v0.2d, #0000000000000000
+; CHECK-SD-NEXT:    // kill: def $d0 killed $d0 killed $q0
 ; CHECK-SD-NEXT:    ret
 ;
 ; CHECK-GI-LABEL: foo2:
diff --git a/llvm/test/CodeGen/AArch64/combine-mul.ll b/llvm/test/CodeGen/AArch64/combine-mul.ll
index ff6d1a571a084..5d65b21f902b7 100644
--- a/llvm/test/CodeGen/AArch64/combine-mul.ll
+++ b/llvm/test/CodeGen/AArch64/combine-mul.ll
@@ -18,6 +18,7 @@ define <4 x i1> @PR48683_vec(<4 x i32> %x) {
 ; CHECK-LABEL: PR48683_vec:
 ; CHECK:       // %bb.0:
 ; CHECK-NEXT:    movi v0.2d, #0000000000000000
+; CHECK-NEXT:    // kill: def $d0 killed $d0 killed $q0
 ; CHECK-NEXT:    ret
   %a = mul <4 x i32> %x, %x
   %b = and <4 x i32> %a, <i32 2, i32 2, i32 2, i32 2>
@@ -29,6 +30,7 @@ define <4 x i1> @PR48683_vec_undef(<4 x i32> %x) {
 ; CHECK-LABEL: PR48683_vec_undef:
 ; CHECK:       // %bb.0:
 ; CHECK-NEXT:    movi v0.2d, #0000000000000000
+; CHECK-NEXT:    // kill: def $d0 killed $d0 killed $q0
 ; CHECK-NEXT:    ret
   %a = mul <4 x i32> %x, %x
   %b = and <4 x i32> %a, <i32 2, i32 2, i32 2, i32 undef>
diff --git a/llvm/test/CodeGen/AArch64/ext-narrow-index.ll b/llvm/test/CodeGen/AArch64/ext-narrow-index.ll
index f62cfef9baf28..017971df99d6e 100644
--- a/llvm/test/CodeGen/AArch64/ext-narrow-index.ll
+++ b/llvm/test/CodeGen/AArch64/ext-narrow-index.ll
@@ -251,6 +251,7 @@ define <8 x i8> @i8_zero_off22(<16 x i8> %arg1) {
 ; CHECK-SD-LABEL: i8_zero_off22:
 ; CHECK-SD:       // %bb.0: // %entry
 ; CHECK-SD-NEXT:    movi v0.2d, #0000000000000000
+; CHECK-SD-NEXT:    // kill: def $d0 killed $d0 killed $q0
 ; CHECK-SD-NEXT:    ret
 ;
 ; CHECK-GISEL-LABEL: i8_zero_off22:
@@ -302,6 +303,7 @@ define <4 x i16> @i16_zero_off8(<8 x i16> %arg1) {
 ; CHECK-LABEL: i16_zero_off8:
 ; CHECK:       // %bb.0: // %entry
 ; CHECK-NEXT:    movi v0.2d, #0000000000000000
+; CHECK-NEXT:    // kill: def $d0 killed $d0 killed $q0
 ; CHECK-NEXT:    ret
 entry:
   %shuffle = shufflevector <8 x i16> %arg1, <8 x i16> zeroinitializer, <4 x i32> <i32 8, i32 9, i32 10, i32 11>
@@ -346,6 +348,7 @@ define <2 x i32> @i32_zero_off4(<4 x i32> %arg1) {
 ; CHECK-LABEL: i32_zero_off4:
 ; CHECK:       // %bb.0: // %entry
 ; CHECK-NEXT:    movi v0.2d, #0000000000000000
+; CHECK-NEXT:    // kill: def $d0 killed $d0 killed $q0
 ; CHECK-NEXT:    ret
 entry:
   %shuffle = shufflevector <4 x i32> %arg1, <4 x i32> zeroinitializer, <2 x i32> <i32 4, i32 5>
diff --git a/llvm/test/CodeGen/AArch64/fast-isel-const-float.ll b/llvm/test/CodeGen/AArch64/fast-isel-const-float.ll
index 4de2c934a672e..fbb71ba1c295f 100644
--- a/llvm/test/CodeGen/AArch64/fast-isel-const-float.ll
+++ b/llvm/test/CodeGen/AArch64/fast-isel-const-float.ll
@@ -9,6 +9,7 @@ define float @select_fp_const() {
 ; GISEL-LABEL: select_fp_const:
 ; GISEL:       // %bb.0: // %entry
 ; GISEL-NEXT:    movi v0.2s, #79, lsl #24
+; GISEL-NEXT:    // kill: def $s0 killed $s0 killed $d0
 ; GISEL-NEXT:    ret
 ;
 ; FISEL-LABEL: select_fp_const:
diff --git a/llvm/test/CodeGen/AArch64/movi64_sve.ll b/llvm/test/CodeGen/AArch64/movi64_sve.ll
index 1d4e00d0c3d10..3253b35d77470 100644
--- a/llvm/test/CodeGen/AArch64/movi64_sve.ll
+++ b/llvm/test/CodeGen/AArch64/movi64_sve.ll
@@ -12,6 +12,7 @@ define <2 x i64> @movi_1_v2i64() {
 ; SVE-LABEL: movi_1_v2i64:
 ; SVE:       // %bb.0:
 ; SVE-NEXT:    mov z0.d, #1 // =0x1
+; SVE-NEXT:    // kill: def $q0 killed $q0 killed $z0
 ; SVE-NEXT:    ret
   ret <2 x i64> splat (i64 1)
 }
@@ -26,6 +27,7 @@ define <2 x i64> @movi_127_v2i64() {
 ; SVE-LABEL: movi_127_v2i64:
 ; SVE:       // %bb.0:
 ; SVE-NEXT:    mov z0.d, #127 // =0x7f
+; SVE-NEXT:    // kill: def $q0 killed $q0 killed $z0
 ; SVE-NEXT:    ret
   ret <2 x i64> splat (i64 127)
 }
@@ -40,6 +42,7 @@ define <2 x i64> @movi_m128_v2i64() {
 ; SVE-LABEL: movi_m128_v2i64:
 ; SVE:       // %bb.0:
 ; SVE-NEXT:    mov z0.d, #-128 // =0xffffffffffffff80
+; SVE-NEXT:    // kill: def $q0 killed $q0 killed $z0
 ; SVE-NEXT:    ret
   ret <2 x i64> splat (i64 -128)
 }
@@ -54,6 +57,7 @@ define <2 x i64> @movi_256_v2i64() {
 ; SVE-LABEL: movi_256_v2i64:
 ; SVE:       // %bb.0:
 ; SVE-NEXT:    mov z0.d, #256 // =0x100
+; SVE-NEXT:    // kill: def $q0 killed $q0 killed $z0
 ; SVE-NEXT:    ret
   ret <2 x i64> splat (i64 256)
 }
@@ -68,6 +72,7 @@ define <2 x i64> @movi_32512_v2i64() {
 ; SVE-LABEL: movi_32512_v2i64:
 ; SVE:       // %bb.0:
 ; SVE-NEXT:    mov z0.d, #32512 // =0x7f00
+; SVE-NEXT:    // kill: def $q0 killed $q0 killed $z0
 ; SVE-NEXT:    ret
   ret <2 x i64> splat (i64 32512)
 }
@@ -82,6 +87,7 @@ define <2 x i64> @movi_m32768_v2i64() {
 ; SVE-LABEL: movi_m32768_v2i64:
 ; SVE:       // %bb.0:
 ; SVE-NEXT:    mov z0.d, #-32768 // =0xffffffffffff8000
+; SVE-NEXT:    // kill: def $q0 killed $q0 killed $z0
 ; SVE-NEXT:    ret
   ret <2 x i64> splat (i64 -32768)
 }
@@ -98,6 +104,7 @@ define <4 x i32> @movi_v4i32_1() {
 ; SVE-LABEL: movi_v4i32_1:
 ; SVE:       // %bb.0:
 ; SVE-NEXT:    mov z0.d, #127 // =0x7f
+; SVE-NEXT:    // kill: def $q0 killed $q0 killed $z0
 ; SVE-NEXT:    ret
   ret <4 x i32> <i32 127, i32 0, i32 127, i32 0>
 }
@@ -112,6 +119,7 @@ define <4 x i32> @movi_v4i32_2() {
 ; SVE-LABEL: movi_v4i32_2:
 ; SVE:       // %bb.0:
 ; SVE-NEXT:    mov z0.d, #32512 // =0x7f00
+; SVE-NEXT:    // kill: def $q0 killed $q0 killed $z0
 ; SVE-NEXT:    ret
   ret <4 x i32> <i32 32512, i32 0, i32 32512, i32 0>
 }
@@ -126,6 +134,7 @@ define <8 x i16> @movi_v8i16_1() {
 ; SVE-LABEL: movi_v8i16_1:
 ; SVE:       // %bb.0:
 ; SVE-NEXT:    mov z0.d, #127 // =0x7f
+; SVE-NEXT:    // kill: def $q0 killed $q0 killed $z0
 ; SVE-NEXT:    ret
   ret <8 x i16> <i16 127, i16 0, i16 0, i16 0, i16 127, i16 0, i16 0, i16 0>
 }
@@ -140,6 +149,7 @@ define <8 x i16> @movi_v8i16_2() {
 ; SVE-LABEL: movi_v8i16_2:
 ; SVE:       // %bb.0:
 ; SVE-NEXT:    mov z0.d, #32512 // =0x7f00
+; SVE-NEXT:    // kill: def $q0 killed $q0 killed $z0
 ; SVE-NEXT:    ret
   ret <8 x i16> <i16 32512, i16 0, i16 0, i16 0, i16 32512, i16 0, i16 0, i16 0>
 }
@@ -154,6 +164,7 @@ define <16 x i8> @movi_v16i8_1() {
 ; SVE-LABEL: movi_v16i8_1:
 ; SVE:       // %bb.0:
 ; SVE-NEXT:    mov z0.d, #127 // =0x7f
+; SVE-NEXT:    // kill: def $q0 killed $q0 killed $z0
 ; SVE-NEXT:    ret
   ret <16 x i8> <i8 127, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 127, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0>
 }
@@ -168,6 +179,7 @@ define <16 x i8> @movi_v16i8_2() {
 ; SVE-LABEL: movi_v16i8_2:
 ; SVE:       // %bb.0:
 ; SVE-NEXT:    mov z0.d, #32512 // =0x7f00
+; SVE-NEXT:    // kill: def $q0 killed $q0 killed $z0
 ; SVE-NEXT:    ret
   ret <16 x i8> <i8 0, i8 127, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 127, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0>
 }
diff --git a/llvm/test/CodeGen/AArch64/neon-abd.ll b/llvm/test/CodeGen/AArch64/neon-abd.ll
index 314edd2fc81a7..c81438aa2250e 100644
--- a/llvm/test/CodeGen/AArch64/neon-abd.ll
+++ b/llvm/test/CodeGen/AArch64/neon-abd.ll
@@ -525,6 +525,7 @@ define <4 x i16> @combine_sabd_4h_zerosign(<4 x i16> %a, <4 x i16> %b) #0 {
 ; CHECK-LABEL: combine_sabd_4h_zerosign:
 ; CHECK:       // %bb.0:
 ; CHECK-NEXT:    movi v0.2d, #0000000000000000
+; CHECK-NEXT:    // kill: def $d0 killed $d0 killed $q0
 ; CHECK-NEXT:    ret
   %a.ext = ashr <4 x i16> %a, <i16 7, i16 8, i16 9, i16 10>
   %b.ext = ashr <4 x i16> %b, <i16 11, i16 12, i16 13, i16 14>
diff --git a/llvm/test/CodeGen/AArch64/neon-compare-instructions.ll b/llvm/test/CodeGen/AArch64/neon-compare-instructions.ll
index 11b3b62ec1c8d..47ceeece0a6e5 100644
--- a/llvm/test/CodeGen/AArch64/neon-compare-instructions.ll
+++ b/llvm/test/CodeGen/AArch64/neon-compare-instructions.ll
@@ -2482,6 +2482,7 @@ define <2 x i32> @fcmal2xfloat(<2 x float> %A, <2 x float> %B) {
 ; CHECK-SD-LABEL: fcmal2xfloat:
 ; CHECK-SD:       // %bb.0:
 ; CHECK-SD-NEXT:    movi v0.2d, #0xffffffffffffffff
+; CHECK-SD-NEXT:    // kill: def $d0 killed $d0 killed $q0
 ; CHECK-SD-NEXT:    ret
 ;
 ; CHECK-GI-LABEL: fcmal2xfloat:
@@ -2535,6 +2536,7 @@ define <2 x i32> @fcmnv2xfloat(<2 x float> %A, <2 x float> %B) {
 ; CHECK-LABEL: fcmnv2xfloat:
 ; CHECK:       // %bb.0:
 ; CHECK-NEXT:    movi v0.2d, #0000000000000000
+; CHECK-NEXT:    // kill: def $d0 killed $d0 killed $q0
 ; CHECK-NEXT:    ret
   %tmp3 = fcmp false <2 x float> %A, %B
   %tmp4 = sext <2 x i1> %tmp3 to <2 x i32>
diff --git a/llvm/test/CodeGen/AArch64/neon-mov.ll b/llvm/test/CodeGen/AArch64/neon-mov.ll
index 5be9394f61b30..4e5b099d62e7f 100644
--- a/llvm/test/CodeGen/AArch64/neon-mov.ll
+++ b/llvm/test/CodeGen/AArch64/neon-mov.ll
@@ -16,6 +16,7 @@ define <8 x i8> @movi8b_0() {
 ; CHECK-LABEL: movi8b_0:
 ; CHECK:       // %bb.0:
 ; CHECK-NEXT:    movi v0.2d, #0000000000000000
+; CHECK-NEXT:    // kill: def $d0 killed $d0 killed $q0
 ; CHECK-NEXT:    ret
    ret <8 x i8> zeroinitializer
 }
@@ -48,6 +49,7 @@ define <2 x i32> @movi2s_0() {
 ; CHECK-LABEL: movi2s_0:
 ; CHECK:       // %bb.0:
 ; CHECK-NEXT:    movi v0.2d, #0000000000000000
+; CHECK-NEXT:    // kill: def $d0 killed $d0 killed $q0
 ; CHECK-NEXT:    ret
    ret <2 x i32> zeroinitializer
 }
@@ -417,6 +419,7 @@ define <2 x float> @fmov2s_0() {
 ; CHECK-LABEL: fmov2s_0:
 ; CHECK:       // %bb.0:
 ; CHECK-NEXT:    movi v0.2d, #0000000000000000
+; CHECK-NEXT:    // kill: def $d0 killed $d0 killed $q0
 ; CHECK-NEXT:    ret
 	ret <2 x float> zeroinitializer
 }
diff --git a/llvm/test/CodeGen/AArch64/remat-const-float-simd.ll b/llvm/test/CodeGen/AArch64/remat-const-float-simd.ll
index 2a19d258f1adf..6f1b68dbcd667 100644
--- a/llvm/test/CodeGen/AArch64/remat-const-float-simd.ll
+++ b/llvm/test/CodeGen/AArch64/remat-const-float-simd.ll
@@ -10,6 +10,7 @@ define float @foo() {
 ; CHECK-NEON-LABEL: foo:
 ; CHECK-NEON:       // %bb.0: // %entry
 ; CHECK-NEON-NEXT:    movi v0.2s, #79, lsl #24
+; CHECK-NEON-NEXT:    // kill: def $s0 killed $s0 killed $d0
 ; CHECK-NEON-NEXT:    ret
 ;
 ; CHECK-SCALAR-LABEL: foo:
diff --git a/llvm/test/CodeGen/AArch64/sve-implicit-zero-filling.ll b/llvm/test/CodeGen/AArch64/sve-implicit-zero-filling.ll
index ebec275c92c52..1bdfac8d6c979 100644
--- a/llvm/test/CodeGen/AArch64/sve-implicit-zero-filling.ll
+++ b/llvm/test/CodeGen/AArch64/sve-implicit-zero-filling.ll
@@ -195,8 +195,8 @@ define <vscale x 2 x i64> @zero_fill_non_zero_index(<vscale x 2 x i1> %pg, <vsca
 define <vscale x 4 x i64> @zero_fill_type_mismatch(<vscale x 2 x i1> %pg, <vscale x 2 x i64> %a) #0 {
 ; CHECK-LABEL: zero_fill_type_mismatch:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    uminv d0, p0, z0.d
 ; CHECK-NEXT:    movi v1.2d, #0000000000000000
+; CHECK-NEXT:    uminv d0, p0, z0.d
 ; CHECK-NEXT:    ret
   %t1 = call i64 @llvm.aarch64.sve.uminv.nxv2i64(<vscale x 2 x i1> %pg, <vscale x 2 x i64> %a)
   %t2 = insertelement <vscale x 4 x i64> zeroinitializer, i64 %t1, i64 0
diff --git a/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-insert-vector-elt.ll b/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-insert-vector-elt.ll
index ad00e99b704dd..275d13ebfd949 100644
--- a/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-insert-vector-elt.ll
+++ b/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-insert-vector-elt.ll
@@ -419,6 +419,7 @@ define <1 x i64> @insertelement_v1i64(<1 x i64> %op1) {
 ; CHECK-LABEL: insertelement_v1i64:
 ; CHECK:       // %bb.0:
 ; CHECK-NEXT:    mov z0.d, #5 // =0x5
+; CHECK-NEXT:    // kill: def $d0 killed $d0 killed $z0
 ; CHECK-NEXT:    ret
 ;
 ; NONEON-NOSVE-LABEL: insertelement_v1i64:
diff --git a/llvm/test/CodeGen/AArch64/sve-streaming-mode-test-register-mov.ll b/llvm/test/CodeGen/AArch64/sve-streaming-mode-test-register-mov.ll
index 37435e35ceabf..9c7a3d5046d0e 100644
--- a/llvm/test/CodeGen/AArch64/sve-streaming-mode-test-register-mov.ll
+++ b/llvm/test/CodeGen/AArch64/sve-streaming-mode-test-register-mov.ll
@@ -39,6 +39,7 @@ define <2 x i64> @fixed_vec_zero_constant() {
 ; CHECK-LABEL: fixed_vec_zero_constant:
 ; CHECK:       // %bb.0:
 ; CHECK-NEXT:    mov z0.d, #0 // =0x0
+; CHECK-NEXT:    // kill: def $q0 killed $q0 killed $z0
 ; CHECK-NEXT:    ret
 ;
 ; NONEON-NOSVE-LABEL: fixed_vec_zero_constant:
@@ -53,6 +54,7 @@ define <2 x double> @fixed_vec_fp_zero_constant() {
 ; CHECK-LABEL: fixed_vec_fp_zero_constant:
 ; CHECK:       // %bb.0:
 ; CHECK-NEXT:    mov z0.d, #0 // =0x0
+; CHECK-NEXT:    // kill: def $q0 killed $q0 killed $z0
 ; CHECK-NEXT:    ret
 ;
 ; NONEON-NOSVE-LABEL: fixed_vec_fp_zero_constant:
diff --git a/llvm/test/CodeGen/AArch64/win64_vararg.ll b/llvm/test/CodeGen/AA...
[truncated]

github-actions · 2025-10-12T07:20:51Z

✅ With the latest revision this PR passed the C/C++ code formatter.

rez5427 · 2025-10-15T12:58:45Z

@arsenm Can you please review this?

bzEq · 2025-10-15T16:19:04Z

From the motivation case, do you know why MachineCSE fails to optimize?

llvm/test/CodeGen/X86/combine-subo.ll

llvm/lib/CodeGen/RegisterCoalescer.cpp

rez5427 · 2025-10-16T04:11:56Z

From the motivation case, do you know why MachineCSE fails to optimize?

Machine CSE is before this register coalescer, Machine CSE will see something like:

     li	a2, 42
     a0 = copy a2

So Machine CSE will not eliminate this pattern.

github-actions · 2025-10-21T14:42:06Z

✅ With the latest revision this PR passed the undef deprecator.

rez5427 · 2025-10-22T01:04:36Z

ping

rez5427 · 2025-10-27T01:59:58Z

@arsenm @preames @lukel97 @aengelke ping

rez5427 · 2025-11-03T14:09:17Z

ping

antoniofrighetto

I agree MachineCSE has not a chance to CSE anything really here. One may expect RegisterCoalescer to be able to merge vreg %1 and $x10 together during aggressive coalescing, thus ending up with something like:

bb.0.entry:
  %0:gpr = LUI target-flags(riscv-hi) @bytes1
  $x10   = ADDI $x0, 42
  SW  $x10, killed %0:gpr, target-flags(riscv-lo) @bytes1
  PseudoRET implicit killed $x10

Though I don't think this is taken into account in joinReservedPhysReg (and even so, $x10 is not a reserved register). Considering that virtual register rewriter will remove the identity copy, I believe that preserving the copy here, and not recomputing the constant is the correct direction (maybe cc/ @qcolombet to confirm this).

antoniofrighetto · 2025-11-04T14:45:41Z

llvm/lib/CodeGen/RegisterCoalescer.cpp


+  // Skip rematerialization for physical registers used as return values within
+  // the same basic block to enable better coalescing.
+  if (DstReg.isPhysical()) {


Could you please move this into a dedicated helper, e.g., shouldSkipRematerializationForReturn and add a brief doc-comment?

antoniofrighetto · 2025-11-04T14:47:48Z

llvm/lib/CodeGen/RegisterCoalescer.cpp

+          for (MachineBasicBlock::iterator I = MBB->begin(); &*I != CopyMI;
+               ++I) {
+            if (&*I != DefMI &&
+                I->isIdenticalTo(*DefMI, MachineInstr::IgnoreDefs)) {


Why do we need to check for duplicate identical of DefMI? Is this necessary? We also check for isReturn() twice. I feel like all this first part of the heuristic could be simplified into something like:

MachineInstr *Ret = nullptr; if (DstReg.isPhysical() && DefMI->getParent() == MBB) { for (auto I = std::next(CopyMI->getIterator()); I != MBB->end(); ++I) { if (I->isReturn()) { Ret = &*I; break; } if (I->modifiesRegister(DstReg, TRI) { RedefinedAfterCopy = true; } } } if (!RedefinedAfterCopy && Ret && Ret->readsRegister(DstReg, TRI)) { // Check uses... }

qcolombet

I don't feel this is the right way to fix this problem.
This looks like too specific and not really a coalescing problem.

antoniofrighetto · 2025-11-04T19:23:48Z

I don't feel this is the right way to fix this problem. This looks like too specific and not really a coalescing problem.

Could you please provide an alternative direction for this? I second this is not a coalescing problem, though couldn't we let RegisterCoalescer avoid suboptimally recomputing the constant value when unneeded, as in cases above?

qcolombet · 2025-11-05T01:35:57Z

I don't feel this is the right way to fix this problem. This looks like too specific and not really a coalescing problem.

Could you please provide an alternative direction for this? I second this is not a coalescing problem, though couldn't we let RegisterCoalescer avoid suboptimally recomputing the constant value when unneeded, as in cases above?

@rez5427 What does the IR before instruction selection look like?
The IR you shared and the MachineIR don't match in your example.

antoniofrighetto · 2025-11-05T11:48:58Z

I don't feel this is the right way to fix this problem. This looks like too specific and not really a coalescing problem.

Could you please provide an alternative direction for this? I second this is not a coalescing problem, though couldn't we let RegisterCoalescer avoid suboptimally recomputing the constant value when unneeded, as in cases above?

@rez5427 What does the IR before instruction selection look like? The IR you shared and the MachineIR don't match in your example.

IR before ISel:

@bytes1 = external global ptr
define i8 @foo() {
  store i32 42, ptr @bytes1, align 4
  %l = load i8, ptr @bytes1, align 1
  ret i8 %l
}

Machine IR after ISel:

bb.0 (%ir-block.0):
  %0:gpr = LUI target-flags(riscv-hi) @bytes1
  %1:gpr = ADDI $x0, 42
  SW %1:gpr, killed %0:gpr, target-flags(riscv-lo) @bytes1 :: (store (s32) into @bytes1, align 8)
  $x10 = COPY %1:gpr
  PseudoRET implicit $x10

Subsequent passes do not change the MIR until RegisterCoalescer, which changes it as follows:

0B      bb.0 (%ir-block.0):
16B       %0:gpr = LUI target-flags(riscv-hi) @bytes1
32B       %1:gpr = ADDI $x0, 42
48B       SW %1:gpr, %0:gpr, target-flags(riscv-lo) @bytes1 :: (store (s32) into @bytes1, align 8)
64B       $x10 = ADDI $x0, 42
80B       PseudoRET implicit killed $x10

The move of the constant has been rematerialized. This happens with optimized IR too: https://llvm.godbolt.org/z/WWM733rdn.

bzEq · 2025-11-05T15:04:57Z

This happens with optimized IR too:

A more simplified version

@bytes1 = external local_unnamed_addr global ptr

define noundef i8 @foo() local_unnamed_addr #0 {
  store i32 42, ptr @bytes1, align 8
  ret i8 42
}

bzEq · 2025-11-05T16:41:32Z

Is it viable we introduce a simple LVN pass in the backend post-RA?

rez5427 · 2025-11-06T05:15:40Z

I think register coalescer is pretty much the same thing as gcc's early_remat. The llvm's register coalescer decide to remat this, because the return register is been used. And gcc's early_remat decide not to remat this. I put part of the gcc's log in here:

cast_and_load_1.c.319r.sched1:

;;   ======================================================
;;   -- basic block 2 from 5 to 13 -- before reload
;;   ======================================================

;; Pressure summary (bb 2): GR_REGS:3 FP_REGS:0 V_REGS:0

;;	  0--> b  0: i   5 r135=high(`bytes1')                     :alu:GR_REGS+1(1)FP_REGS+0(0)V_REGS+0(0):model 0
;;	  1--> b  0: i   6 r136=0x2a                               :alu:GR_REGS+1(1)FP_REGS+0(0)V_REGS+0(0):model 1
;;	  2--> b  0: i   7 [r135+low(`bytes1')]=r136#0             :alu:GR_REGS+0(-1)FP_REGS+0(0)V_REGS+0(0):model 2
;;	  3--> b  0: i  12 a0=r136                                 :alu:GR_REGS+1(0)FP_REGS+0(0)V_REGS+0(0):model 3
;;	  4--> b  0: i  13 use a0                                  :nothing:GR_REGS+0(0)FP_REGS+0(0)V_REGS+0(0):model 4
;;	Ready list (final):  
;;   total time = 4
;;   new head = 5
;;   new tail = 13

cast_and_load_1.c.321r.early_remat:


;; Function cast_and_load_1 (cast_and_load_1, funcdef_no=0, decl_uid=2297, cgraph_uid=1, symbol_order=0)

starting the processing of deferred insns
ending the processing of deferred insns


cast_and_load_1

Dataflow summary:
;;  fully invalidated by EH 	 0 [zero] 1 [ra] 3 [gp] 4 [tp] 5 [t0] 6 [t1] 7 [t2] 10 [a0] 11 [a1] 12 [a2] 13 [a3] 14 [a4] 15 [a5] 16 [a6] 17 [a7] 28 [t3] 29 [t4] 30 [t5] 31 [t6] 32 [ft0] 33 [ft1] 34 [ft2] 35 [ft3] 36 [ft4] 37 [ft5] 38 [ft6] 39 [ft7] 42 [fa0] 43 [fa1] 44 [fa2] 45 [fa3] 46 [fa4] 47 [fa5] 48 [fa6] 49 [fa7] 60 [ft8] 61 [ft9] 62 [ft10] 63 [ft11] 66 [vl] 67 [vtype] 68 [vxrm] 69 [frm] 70 [vxsat] 71 [N/A] 72 [N/A] 73 [N/A] 74 [N/A] 75 [N/A] 76 [N/A] 77 [N/A] 78 [N/A] 79 [N/A] 80 [N/A] 81 [N/A] 82 [N/A] 83 [N/A] 84 [N/A] 85 [N/A] 86 [N/A] 87 [N/A] 88 [N/A] 89 [N/A] 90 [N/A] 91 [N/A] 92 [N/A] 93 [N/A] 94 [N/A] 95 [N/A] 96 [v0] 97 [v1] 98 [v2] 99 [v3] 100 [v4] 101 [v5] 102 [v6] 103 [v7] 104 [v8] 105 [v9] 106 [v10] 107 [v11] 108 [v12] 109 [v13] 110 [v14] 111 [v15] 112 [v16] 113 [v17] 114 [v18] 115 [v19] 116 [v20] 117 [v21] 118 [v22] 119 [v23] 120 [v24] 121 [v25] 122 [v26] 123 [v27] 124 [v28] 125 [v29] 126 [v30] 127 [v31]
;;  hardware regs used 	 2 [sp] 64 [arg] 65 [frame] 68 [vxrm] 69 [frm]
;;  regular block artificial uses 	 2 [sp] 8 [s0] 64 [arg] 65 [frame]
;;  eh block artificial uses 	 2 [sp] 8 [s0] 64 [arg] 65 [frame]
;;  entry block defs 	 1 [ra] 2 [sp] 8 [s0] 10 [a0] 11 [a1] 12 [a2] 13 [a3] 14 [a4] 15 [a5] 16 [a6] 17 [a7] 42 [fa0] 43 [fa1] 44 [fa2] 45 [fa3] 46 [fa4] 47 [fa5] 48 [fa6] 49 [fa7] 64 [arg] 65 [frame] 68 [vxrm] 69 [frm]
;;  exit block uses 	 1 [ra] 2 [sp] 8 [s0] 10 [a0] 65 [frame] 68 [vxrm] 69 [frm]
;;  regs ever live 	 10 [a0] 69 [frm]
;;  ref usage 	r1={1d,1u} r2={1d,2u} r8={1d,2u} r10={2d,2u} r11={1d} r12={1d} r13={1d} r14={1d} r15={1d} r16={1d} r17={1d} r42={1d} r43={1d} r44={1d} r45={1d} r46={1d} r47={1d} r48={1d} r49={1d} r64={1d,1u} r65={1d,2u} r68={1d,1u} r69={1d,1u} r135={1d,1u} r136={1d,2u} 
;;    total ref usage 41{26d,15u,0e} in 5{5 regular + 0 call} insns.
(note 1 0 15 NOTE_INSN_DELETED)
(note 15 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
(note 2 15 17 2 NOTE_INSN_FUNCTION_BEG)
(note 17 2 5 2 NOTE_INSN_DELETED)
(insn 5 17 6 2 (set (reg/f:DI 135)
        (high:DI (symbol_ref:DI ("bytes1") [flags 0xc4]  <var_decl 0x7403341e32f8 bytes1>))) "out/value-simplify-pointer-info/cast_and_load_1/cast_and_load_1.c":35:10 277 {*movdi_64bit}
     (nil))cast_and_load_1.c.321r.early_remat
(insn 6 5 7 2 (set (reg:DI 136)
        (const_int 42 [0x2a])) "out/value-simplify-pointer-info/cast_and_load_1/cast_and_load_1.c":35:10 277 {*movdi_64bit}
     (nil))
(insn 7 6 12 2 (set (mem/c:SI (lo_sum:DI (reg/f:DI 135)
                (symbol_ref:DI ("bytes1") [flags 0xc4]  <var_decl 0x7403341e32f8 bytes1>)) [1 bytes1+0 S4 A32])
        (subreg:SI (reg:DI 136) 0)) "out/value-simplify-pointer-info/cast_and_load_1/cast_and_load_1.c":35:10 278 {*movsi_internal}
     (expr_list:REG_DEAD (reg/f:DI 135)
        (nil)))
(insn 12 7 13 2 (set (reg/i:DI 10 a0)
        (reg:DI 136)) "out/value-simplify-pointer-info/cast_and_load_1/cast_and_load_1.c":38:1 277 {*movdi_64bit}
     (expr_list:REG_DEAD (reg:DI 136)
        (expr_list:REG_EQUAL (const_int 42 [0x2a])
            (nil))))
(insn 13 12 20 2 (use (reg/i:DI 10 a0)) "out/value-simplify-pointer-info/cast_and_load_1/cast_and_load_1.c":38:1 -1
     (nil))
(note 20 13 0 NOTE_INSN_DELETED)

cast_and_load_1.c.322r.ira:

+++Allocating 16 bytes for conflict table (uncompressed size 16)
;; a0(r136,l0) conflicts: a1(r135,l0)
;;     total conflict hard regs:
;;     conflict hard regs:

;; a1(r135,l0) conflicts: a0(r136,l0)
;;     total conflict hard regs:
;;     conflict hard regs:


  pref0:a0(r136)<-hr10@2000
  regions=1, blocks=3, points=2
    allocnos=2 (big 0), copies=0, conflicts=0, ranges=2

If you want this full log, you can try with gcc15 riscv64-unknown-linux-gnu-gcc -std=c11 -march=rv64gcv_zvl128b -O3 -fomit-frame-pointer -S -fdump-tree-all -fdump-rtl-all -dumpdir

I think maybe should extrat remat related logic out of register coalescer would be good?

rez5427 · 2025-11-06T05:16:25Z

Is it viable we introduce a simple LVN pass in the backend post-RA?

I think it should be done pre-ra, it relate to the register allocation.

bzEq · 2025-11-06T10:34:34Z

Is it viable we introduce a simple LVN pass in the backend post-RA?

I think it should be done pre-ra, it relate to the register allocation.

Extending MachineCSE with LVN might be easier IMO. Currently, MCSE runs on SSA MIR and perform CSE via ScopeVN. Can extend MCSE for post-RA via LVN.

antoniofrighetto · 2025-11-25T14:18:02Z

Kind ping @qcolombet.

…limiting rematerialization

llvmbot added backend:AArch64 backend:RISC-V backend:X86 llvm:codegen llvm:regalloc debuginfo labels Oct 12, 2025

rez5427 force-pushed the regcol branch from 3075cdd to 269cb63 Compare October 12, 2025 07:39

rez5427 marked this pull request as draft October 12, 2025 11:00

rez5427 force-pushed the regcol branch 8 times, most recently from a503ad2 to 3d25933 Compare October 15, 2025 11:49

rez5427 marked this pull request as ready for review October 15, 2025 12:57

llvmbot added clang Clang issues not falling into any other category backend:ARM backend:AMDGPU backend:MIPS backend:PowerPC backend:loongarch labels Oct 15, 2025

aengelke reviewed Oct 15, 2025

View reviewed changes

llvm/test/CodeGen/X86/combine-subo.ll Show resolved Hide resolved

aengelke reviewed Oct 15, 2025

View reviewed changes

llvm/lib/CodeGen/RegisterCoalescer.cpp Outdated Show resolved Hide resolved

rez5427 force-pushed the regcol branch from 408fa93 to 757fc0a Compare October 20, 2025 14:21

rez5427 marked this pull request as draft October 21, 2025 01:13

rez5427 marked this pull request as ready for review October 21, 2025 01:13

rez5427 force-pushed the regcol branch from 757fc0a to f10db27 Compare October 21, 2025 12:24

rez5427 force-pushed the regcol branch 3 times, most recently from 9e253cc to 68eda02 Compare October 21, 2025 17:51

rez5427 requested a review from aengelke October 22, 2025 01:04

antoniofrighetto reviewed Nov 4, 2025

View reviewed changes

antoniofrighetto requested a review from qcolombet November 4, 2025 14:52

qcolombet requested changes Nov 4, 2025

View reviewed changes

rez5427 added 4 commits December 7, 2025 12:11

[RegisterCoalescer] Improve register allocation for return values by …

e8ad892

…limiting rematerialization

Simplify logic

c71b51d

Add zero-idiom test

e911b1f

Use shouldSkipRematerializationForReturn

1c2298f

rez5427 force-pushed the regcol branch from 68eda02 to 1c2298f Compare December 7, 2025 11:47

remove return

ce49504

[RegisterCoalescer] Improve register allocation for return values by limiting rematerialization #163047

Are you sure you want to change the base?

[RegisterCoalescer] Improve register allocation for return values by limiting rematerialization #163047

Uh oh!

Conversation

rez5427 commented Oct 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Oct 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Oct 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rez5427 commented Oct 15, 2025

Uh oh!

bzEq commented Oct 15, 2025

Uh oh!

Uh oh!

Uh oh!

rez5427 commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rez5427 commented Oct 22, 2025

Uh oh!

rez5427 commented Oct 27, 2025

Uh oh!

rez5427 commented Nov 3, 2025

Uh oh!

antoniofrighetto left a comment

Choose a reason for hiding this comment

Uh oh!

antoniofrighetto Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

antoniofrighetto Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

qcolombet left a comment

Choose a reason for hiding this comment

Uh oh!

antoniofrighetto commented Nov 4, 2025

Uh oh!

qcolombet commented Nov 5, 2025

Uh oh!

antoniofrighetto commented Nov 5, 2025

Uh oh!

bzEq commented Nov 5, 2025

Uh oh!

bzEq commented Nov 5, 2025

Uh oh!

rez5427 commented Nov 6, 2025

Uh oh!

rez5427 commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bzEq commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

antoniofrighetto commented Nov 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

rez5427 commented Oct 12, 2025 •

edited

Loading

llvmbot commented Oct 12, 2025 •

edited

Loading

github-actions bot commented Oct 12, 2025 •

edited

Loading

rez5427 commented Oct 16, 2025 •

edited

Loading

github-actions bot commented Oct 21, 2025 •

edited

Loading

rez5427 commented Nov 6, 2025 •

edited

Loading

bzEq commented Nov 6, 2025 •

edited

Loading