-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Fix double-unescaping of SQL string literals in single-stage engine #17438
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
d8f8ae2 to
7dfb5c8
Compare
7dfb5c8 to
f23ac59
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #17438 +/- ##
============================================
+ Coverage 63.26% 63.29% +0.03%
- Complexity 1474 1476 +2
============================================
Files 3152 3163 +11
Lines 187881 188826 +945
Branches 28765 28900 +135
============================================
+ Hits 118855 119516 +661
- Misses 59810 60045 +235
- Partials 9216 9265 +49
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes a double-unescaping bug in the single-stage query engine where SQL string literals containing multiple consecutive single quotes were incorrectly processed. The Calcite SQL parser already handles standard SQL escaping (converting '' to '), but legacy PQL behavior in RequestUtils::getLiteral was performing an additional unescaping pass, causing strings like '''' to become ' instead of the correct ''.
Key changes:
- Adds a configuration flag (
pinot.sse.parsing.legacy.literal.unescaping) to control the legacy unescaping behavior, defaulting to false (disabled) - Introduces
InstanceConfigProvidersingleton for global access to instance configuration across modules - Updates broker and server startup to initialize the
InstanceConfigProvider
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| pinot-spi/src/main/java/org/apache/pinot/spi/utils/CommonConstants.java | Adds configuration constants for the legacy literal unescaping feature flag |
| pinot-spi/src/main/java/org/apache/pinot/spi/config/instance/InstanceConfigProvider.java | New singleton class providing global access to instance configuration |
| pinot-server/src/main/java/org/apache/pinot/server/starter/helix/BaseServerStarter.java | Initializes InstanceConfigProvider during server startup |
| pinot-broker/src/main/java/org/apache/pinot/broker/broker/helix/BaseBrokerStarter.java | Initializes InstanceConfigProvider during broker startup |
| pinot-common/src/main/java/org/apache/pinot/common/utils/request/RequestUtils.java | Updates getLiteral to conditionally apply legacy unescaping based on configuration |
| pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/OfflineClusterIntegrationTest.java | Adds integration tests for SQL string literal escaping behavior |
| pinot-common/src/test/java/org/apache/pinot/sql/parsers/CalciteSqlParserTest.java | Adds comprehensive unit tests for the double-unescaping fix |
pinot-spi/src/main/java/org/apache/pinot/spi/config/instance/InstanceConfigProvider.java
Outdated
Show resolved
Hide resolved
| boolean useLegacyUnescaping = | ||
| InstanceConfigProvider.getProperty(CommonConstants.Helix.CONFIG_OF_SSE_LEGACY_LITERAL_UNESCAPING, | ||
| CommonConstants.Helix.DEFAULT_SSE_LEGACY_LITERAL_UNESCAPING); | ||
| literal.setStringValue(useLegacyUnescaping ? StringUtils.replace(node.toValue(), "''", "'") : node.toValue()); |
Copilot
AI
Dec 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The configuration property is read on every literal conversion, which could be called many times per query. Consider caching this configuration value as a static final field initialized during class loading or reading it once per query to avoid repeated lookups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a valid concern, will refactor.
pinot-common/src/test/java/org/apache/pinot/sql/parsers/CalciteSqlParserTest.java
Outdated
Show resolved
Hide resolved
…nstanceConfigProvider.java Co-authored-by: Copilot <[email protected]>
| String.valueOf(Integer.MAX_VALUE); | ||
|
|
||
| // SQL parsing | ||
| public static final String CONFIG_OF_SSE_LEGACY_LITERAL_UNESCAPING = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given this is a broker config, consider following the existing config pattern: pinot.broker.sse.*
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't just a broker config though, since it can be used in some server side paths as well. From the PR description:
Added
InstanceConfigProvidersingleton inpinot-spifor global access to instance config (broker / server, can be extended to controller / minion) from any module. Note that this pattern can be used in various scenarios in the future but is particularly useful here because we need to access the configuration from a static context that can be called from a broker path (Main SQL query parsing, time series request handler, RLS filter rewriter) or a server path (JSON index filter parsing, theta sketch agg function, time series server side execution).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other pattern we already have is pinot.query.sse.* which can be applied to multiple components
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm I don't see any such configs, where do you see this config pattern?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pinot.query.mse.stats.mode
''with'in string literals ([PQL Cleanup] Remove all PQL related code #8626)''''(4 quotes) →''(Calcite) →'(RequestUtils::getLiteral) instead of staying as''InstanceConfigProvidersingleton inpinot-spifor global access to instance config (broker / server, can be extended to controller / minion) from any module. Note that this pattern can be used in various scenarios in the future but is particularly useful here because we need to access the configuration from a static context that can be called from a broker path (Main SQL query parsing, time series request handler, RLS filter rewriter) or a server path (JSON index filter parsing, theta sketch agg function, time series server side execution).