Skip to content

Improve constant folding for associative operations#23215

Open
hadrian-reppas wants to merge 5 commits into
apache:mainfrom
hadrian-reppas:assoc_expr_simplify
Open

Improve constant folding for associative operations#23215
hadrian-reppas wants to merge 5 commits into
apache:mainfrom
hadrian-reppas:assoc_expr_simplify

Conversation

@hadrian-reppas

@hadrian-reppas hadrian-reppas commented Jun 27, 2026

Copy link
Copy Markdown

Which issue does this PR close?

Closes #17158

Rationale for this change

The query planner currently does not optimize expressions like c1 || 'a' || 'b' || c2 (which it could simplify it to c1 || 'ab' || c2). This is because the ConstEvaluator looks for BinaryExprs where both arguments are Literals. The expression above looks like

        ||
       /  \
      ||   c2
     /  \
   ||   'b'
  /  \
c1   'a'

so it does not get optimized. Since string concatenation is associative, we can rewrite the expression to look like

        ||
       /  \
      ||   c2
     /  \
   c1   ||
       /  \
     'a'  'b'

which the ConstEvaluator can optimize to

        ||
       /  \
      ||   c2
     /  \
   c1   'ab'

We start by flattening nested occurrences of a the same associative operator into a list. Then we rebuild the expression tree from this list so that adjacent Literals appear together in the same subtree. This rewrite happens in the expr_simplifier::Simplifier so that the ConstEvaluator can do it's work on a subsequent pass.

What changes are included in this PR?

This PR implements the has_adjacent_literals and reassociate_literals functions.

There are some remaining questions:

  1. Is this a sensible approach?
  2. What other operators/types are associative?
  3. Should we do something similar for commutative operations (eg move all literals to the front so they can be constant folded together)?

Are these changes tested?

Yes, but I'll add more tests.

Are there any user-facing changes?

No

@github-actions github-actions Bot added the optimizer Optimizer rules label Jun 27, 2026
impl<'n> TreeNodeVisitor<'n> for AdjacentLiteralVisitor {
type Node = Expr;

fn f_down(&mut self, node: &'n Self::Node) -> Result<TreeNodeRecursion> {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

last_expr_was_literal is not reset when visiting a nonliteral. For (lit(1) + col("c1")) + lit(2), the rewrite appears to rebuild the same expression but still results in Transformed::yes

Instead you could make has_adjacent_literals inspect the flattened operand sequence for the same operator, and reset adjacency whenever a nonliteral operand appears.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept the AdjacentLiteralVisitor and changed it so that it updates last_expr_was_literal correctly. is_associative_with_adjacent_literals needs to take the expression by reference because we call it from the big match statement in the Simplifier. I'm not sure I can construct the flattened operand sequence with just a reference to the expression. I suppose I could construct a Vec<&Expr>, but I'm not sure how much better that would be. Unless I could somehow write one function that does &Expr -> Vec<&Expr> and Expr -> Vec<Expr>.

}

#[test]
fn test_simplify_nested_associative_expr() {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small: maybe add negative coverage for float addition and for separated literals like (1 + c1) + 2

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some more tests

@hadrian-reppas

Copy link
Copy Markdown
Author

Also I realized that you have to check that the types match because otherwise reassociating the expression can change when casts happen which can change the answer if there is overflow. For example:

> CREATE TABLE t(x INT);
> INSERT INTO t(x) VALUES (1);
> SELECT (x + CAST(1 AS TINYINT)) + CAST(127 AS TINYINT) FROM t;
129
> SELECT x + (CAST(1 AS TINYINT) + CAST(127 AS TINYINT)) FROM t;
-127

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

optimizer Optimizer rules

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Simplify col1 || 'a' || 'b' || col2 to col1 || 'ab' || col2

2 participants