feat(tracing): add ClickHouse tracing backend and benchmark suite#3872
feat(tracing): add ClickHouse tracing backend and benchmark suite#3872
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
@jp-agenta burn after reading |
| if isinstance(operator, ExistenceOperator): | ||
| return f"{expr} IS {'NOT ' if operator == ExistenceOperator.NOT_EXISTS else ''}NULL" |
There was a problem hiding this comment.
🔴 ExistenceOperator logic is inverted — EXISTS produces IS NULL, NOT_EXISTS produces IS NOT NULL
The ExistenceOperator handling in _build_uuid_condition, _build_string_condition, and _build_list_json_condition is inverted compared to the Postgres DAO reference implementation.
Root Cause and Impact
The ternary expression uses ExistenceOperator.NOT_EXISTS to decide whether to insert NOT, but the logic is backwards:
f"{expr} IS {'NOT ' if operator == ExistenceOperator.NOT_EXISTS else ''}NULL"NOT_EXISTS→ producesIS NOT NULL(should beIS NULL)EXISTS→ producesIS NULL(should beIS NOT NULL)
The Postgres DAO (oss/src/dbs/postgres/tracing/utils.py:336-348) correctly implements:
ExistenceOperator.EXISTS→isnot(None)(IS NOT NULL)ExistenceOperator.NOT_EXISTS→is_(None)(IS NULL)
This bug is present in three locations:
_build_uuid_conditionat line 302_build_string_conditionat line 387_build_list_json_conditionat line 428
Impact: Any filter using exists or not_exists operators will return the opposite result set — spans where the field is missing will be returned when present ones are requested, and vice versa.
Prompt for agents
Fix the inverted ExistenceOperator logic in three locations in api/oss/src/dbs/clickhouse/tracing/dao.py:
1. Line 302 in _build_uuid_condition: Change ExistenceOperator.NOT_EXISTS to ExistenceOperator.EXISTS
2. Line 387 in _build_string_condition: Change ExistenceOperator.NOT_EXISTS to ExistenceOperator.EXISTS
3. Line 428 in _build_list_json_condition: Change ExistenceOperator.NOT_EXISTS to ExistenceOperator.EXISTS
In all three cases the pattern should be:
f"{expr} IS {'NOT ' if operator == ExistenceOperator.EXISTS else ''}NULL"
This makes EXISTS produce IS NOT NULL and NOT_EXISTS produce IS NULL, matching the Postgres DAO behavior.
Was this helpful? React with 👍 or 👎 to provide feedback.
| def _escape_sql(self, value: str) -> str: | ||
| return value.replace("'", "''") |
There was a problem hiding this comment.
🔴 SQL escape insufficient for ClickHouse — backslashes not escaped, enabling SQL injection
The _escape_sql method only escapes single quotes (' → '') but does not escape backslashes. ClickHouse interprets \' as an escaped single quote in string literals by default (setting enable_backslash_quoting=1).
Detailed Explanation
The method at api/oss/src/dbs/clickhouse/tracing/dao.py:98-99:
def _escape_sql(self, value: str) -> str:
return value.replace("'", "''")Consider a user-controlled filter value of \. After _escape_sql, it remains \. When interpolated into SQL as 'test\', ClickHouse parses \' as an escaped single quote, leaving the string literal unterminated.
More critically, a crafted value like \' OR 1=1 -- becomes \'' OR 1=1 -- after escaping. ClickHouse parses \' as ', then ' terminates the string, and OR 1=1 -- becomes executable SQL.
This affects every method that uses _escape_sql to build SQL strings, including:
_format_literal(line 112) — used throughout condition building_condition_to_sqlforcontentfield (line 239) — usesmatch()with user input_string_match_expr(lines 188-216) — all LIKE/= pattern building_json_extract_expr(line 156) — JSON path keys
Impact: An authenticated user with API access can inject arbitrary SQL into ClickHouse queries via filter condition values, potentially accessing data from other projects or causing denial of service.
| def _escape_sql(self, value: str) -> str: | |
| return value.replace("'", "''") | |
| def _escape_sql(self, value: str) -> str: | |
| return value.replace("\\", "\\\\").replace("'", "''") |
Was this helpful? React with 👍 or 👎 to provide feedback.
| if case_sensitive: | ||
| return clause | ||
| if "LIKE" in clause: | ||
| rhs = clause.split("LIKE")[-1].strip() | ||
| return f"lower({expr}) LIKE lower({rhs})" | ||
| rhs = clause.split("=")[-1].strip() | ||
| return f"lower({expr}) = lower({rhs})" |
There was a problem hiding this comment.
🟡 Case-insensitive string matching breaks when value contains 'LIKE' or '=' substring
The _string_match_expr method uses naive string splitting on "LIKE" and "=" to extract the RHS of the clause for wrapping in lower(). When the user-provided search value contains those substrings, the split produces incorrect SQL.
Root Cause
At api/oss/src/dbs/clickhouse/tracing/dao.py:210-216, when case_sensitive=False (the default per TextOptions):
if "LIKE" in clause:
rhs = clause.split("LIKE")[-1].strip()
return f"lower({expr}) LIKE lower({rhs})"
rhs = clause.split("=")[-1].strip()
return f"lower({expr}) = lower({rhs})"For example, with operator=CONTAINS and value="I LIKE cats", the clause is:
span_name LIKE '%I LIKE cats%'
Splitting by "LIKE" produces ["span_name ", " '%I ", " cats%'"]. Taking [-1] gives " cats%'". The resulting SQL is:
lower(span_name) LIKE lower( cats%') — syntactically broken SQL that will cause a ClickHouse query error.
Impact: Any case-insensitive string filter (the default) where the search value contains "LIKE" or "=" will produce malformed SQL, causing the query to fail.
Prompt for agents
In api/oss/src/dbs/clickhouse/tracing/dao.py, refactor _string_match_expr (lines 173-216) to avoid parsing the clause string. Instead of building the clause first and then trying to split it apart, build the case-insensitive version directly. For example:
Replace lines 210-216 with logic that wraps expr and the value in lower() calls during clause construction rather than post-hoc string splitting. Something like:
ci_expr = f"lower({expr})"
ci_wrap = lambda text: f"lower('{self._escape_sql(text)}')"
Then use ci_expr and ci_wrap when building LIKE/= clauses for the case-insensitive branch.
Was this helpful? React with 👍 or 👎 to provide feedback.
| self.url = os.getenv( | ||
| "CLICKHOUSE_TRACING_URL", | ||
| "https://er3fulih5c.eu-central-1.aws.clickhouse.cloud:8443", | ||
| ) |
There was a problem hiding this comment.
🟡 Hardcoded production ClickHouse Cloud URL as default credential
The ClickHouseTracingDAO.__init__ defaults the ClickHouse URL to a production ClickHouse Cloud instance when CLICKHOUSE_TRACING_URL is not set.
Details
At api/oss/src/dbs/clickhouse/tracing/dao.py:70-72:
self.url = os.getenv(
"CLICKHOUSE_TRACING_URL",
"https://er3fulih5c.eu-central-1.aws.clickhouse.cloud:8443",
)This hardcodes a production ClickHouse Cloud endpoint as the default. If a developer or staging environment enables TRACING_BACKEND=clickhouse without setting CLICKHOUSE_TRACING_URL, queries will silently hit the production ClickHouse Cloud instance. Combined with the default CLICKHOUSE_TRACING_PASSWORD being empty string (line 75), this could cause confusing connection failures or — if the password happens to be set in the environment for other purposes — unintended writes to production.
Impact: Risk of accidental production data access from non-production environments.
Was this helpful? React with 👍 or 👎 to provide feedback.
Summary
TRACING_BACKEND=clickhouseinto API + tracing/evaluations workersdocs/design/clickhouse-tracing/plus helper benchmark scriptsNotes