Skip to content

feat(tracing): add ClickHouse tracing backend and benchmark suite#3872

Draft
mmabrouk wants to merge 1 commit intomainfrom
feat/clickhouse-tracing-backend
Draft

feat(tracing): add ClickHouse tracing backend and benchmark suite#3872
mmabrouk wants to merge 1 commit intomainfrom
feat/clickhouse-tracing-backend

Conversation

@mmabrouk
Copy link
Member

@mmabrouk mmabrouk commented Mar 1, 2026

Summary

  • add a new ClickHouse tracing DAO and wire TRACING_BACKEND=clickhouse into API + tracing/evaluations workers
  • implement ClickHouse read/write paths for traces (query, fetch, analytics, legacy analytics, sessions/users) with compatibility fixes for filtering and timestamps
  • add ClickHouse tracing design docs, schemas, and benchmark tooling/results notes under docs/design/clickhouse-tracing/ plus helper benchmark scripts

Notes

  • this PR intentionally includes only ClickHouse-related changes from the current worktree
  • unrelated local edits (EE throttling/auth/postgres tuning and raw benchmark result artifacts) were left out

Open with Devin

@vercel
Copy link

vercel bot commented Mar 1, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agenta-documentation Building Building Preview, Comment Mar 1, 2026 8:48pm

Request Review

@mmabrouk
Copy link
Member Author

mmabrouk commented Mar 1, 2026

@jp-agenta burn after reading

Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 4 potential issues.

View 7 additional findings in Devin Review.

Open in Devin Review

Comment on lines +301 to +302
if isinstance(operator, ExistenceOperator):
return f"{expr} IS {'NOT ' if operator == ExistenceOperator.NOT_EXISTS else ''}NULL"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 ExistenceOperator logic is inverted — EXISTS produces IS NULL, NOT_EXISTS produces IS NOT NULL

The ExistenceOperator handling in _build_uuid_condition, _build_string_condition, and _build_list_json_condition is inverted compared to the Postgres DAO reference implementation.

Root Cause and Impact

The ternary expression uses ExistenceOperator.NOT_EXISTS to decide whether to insert NOT, but the logic is backwards:

f"{expr} IS {'NOT ' if operator == ExistenceOperator.NOT_EXISTS else ''}NULL"
  • NOT_EXISTS → produces IS NOT NULL (should be IS NULL)
  • EXISTS → produces IS NULL (should be IS NOT NULL)

The Postgres DAO (oss/src/dbs/postgres/tracing/utils.py:336-348) correctly implements:

  • ExistenceOperator.EXISTSisnot(None) (IS NOT NULL)
  • ExistenceOperator.NOT_EXISTSis_(None) (IS NULL)

This bug is present in three locations:

  • _build_uuid_condition at line 302
  • _build_string_condition at line 387
  • _build_list_json_condition at line 428

Impact: Any filter using exists or not_exists operators will return the opposite result set — spans where the field is missing will be returned when present ones are requested, and vice versa.

Prompt for agents
Fix the inverted ExistenceOperator logic in three locations in api/oss/src/dbs/clickhouse/tracing/dao.py:

1. Line 302 in _build_uuid_condition: Change ExistenceOperator.NOT_EXISTS to ExistenceOperator.EXISTS
2. Line 387 in _build_string_condition: Change ExistenceOperator.NOT_EXISTS to ExistenceOperator.EXISTS
3. Line 428 in _build_list_json_condition: Change ExistenceOperator.NOT_EXISTS to ExistenceOperator.EXISTS

In all three cases the pattern should be:
  f"{expr} IS {'NOT ' if operator == ExistenceOperator.EXISTS else ''}NULL"

This makes EXISTS produce IS NOT NULL and NOT_EXISTS produce IS NULL, matching the Postgres DAO behavior.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +98 to +99
def _escape_sql(self, value: str) -> str:
return value.replace("'", "''")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 SQL escape insufficient for ClickHouse — backslashes not escaped, enabling SQL injection

The _escape_sql method only escapes single quotes (''') but does not escape backslashes. ClickHouse interprets \' as an escaped single quote in string literals by default (setting enable_backslash_quoting=1).

Detailed Explanation

The method at api/oss/src/dbs/clickhouse/tracing/dao.py:98-99:

def _escape_sql(self, value: str) -> str:
    return value.replace("'", "''")

Consider a user-controlled filter value of \. After _escape_sql, it remains \. When interpolated into SQL as 'test\', ClickHouse parses \' as an escaped single quote, leaving the string literal unterminated.

More critically, a crafted value like \' OR 1=1 -- becomes \'' OR 1=1 -- after escaping. ClickHouse parses \' as ', then ' terminates the string, and OR 1=1 -- becomes executable SQL.

This affects every method that uses _escape_sql to build SQL strings, including:

  • _format_literal (line 112) — used throughout condition building
  • _condition_to_sql for content field (line 239) — uses match() with user input
  • _string_match_expr (lines 188-216) — all LIKE/= pattern building
  • _json_extract_expr (line 156) — JSON path keys

Impact: An authenticated user with API access can inject arbitrary SQL into ClickHouse queries via filter condition values, potentially accessing data from other projects or causing denial of service.

Suggested change
def _escape_sql(self, value: str) -> str:
return value.replace("'", "''")
def _escape_sql(self, value: str) -> str:
return value.replace("\\", "\\\\").replace("'", "''")
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +210 to +216
if case_sensitive:
return clause
if "LIKE" in clause:
rhs = clause.split("LIKE")[-1].strip()
return f"lower({expr}) LIKE lower({rhs})"
rhs = clause.split("=")[-1].strip()
return f"lower({expr}) = lower({rhs})"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Case-insensitive string matching breaks when value contains 'LIKE' or '=' substring

The _string_match_expr method uses naive string splitting on "LIKE" and "=" to extract the RHS of the clause for wrapping in lower(). When the user-provided search value contains those substrings, the split produces incorrect SQL.

Root Cause

At api/oss/src/dbs/clickhouse/tracing/dao.py:210-216, when case_sensitive=False (the default per TextOptions):

if "LIKE" in clause:
    rhs = clause.split("LIKE")[-1].strip()
    return f"lower({expr}) LIKE lower({rhs})"
rhs = clause.split("=")[-1].strip()
return f"lower({expr}) = lower({rhs})"

For example, with operator=CONTAINS and value="I LIKE cats", the clause is:
span_name LIKE '%I LIKE cats%'

Splitting by "LIKE" produces ["span_name ", " '%I ", " cats%'"]. Taking [-1] gives " cats%'". The resulting SQL is:
lower(span_name) LIKE lower( cats%') — syntactically broken SQL that will cause a ClickHouse query error.

Impact: Any case-insensitive string filter (the default) where the search value contains "LIKE" or "=" will produce malformed SQL, causing the query to fail.

Prompt for agents
In api/oss/src/dbs/clickhouse/tracing/dao.py, refactor _string_match_expr (lines 173-216) to avoid parsing the clause string. Instead of building the clause first and then trying to split it apart, build the case-insensitive version directly. For example:

Replace lines 210-216 with logic that wraps expr and the value in lower() calls during clause construction rather than post-hoc string splitting. Something like:

  ci_expr = f"lower({expr})"
  ci_wrap = lambda text: f"lower('{self._escape_sql(text)}')"

Then use ci_expr and ci_wrap when building LIKE/= clauses for the case-insensitive branch.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +70 to +73
self.url = os.getenv(
"CLICKHOUSE_TRACING_URL",
"https://er3fulih5c.eu-central-1.aws.clickhouse.cloud:8443",
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Hardcoded production ClickHouse Cloud URL as default credential

The ClickHouseTracingDAO.__init__ defaults the ClickHouse URL to a production ClickHouse Cloud instance when CLICKHOUSE_TRACING_URL is not set.

Details

At api/oss/src/dbs/clickhouse/tracing/dao.py:70-72:

self.url = os.getenv(
    "CLICKHOUSE_TRACING_URL",
    "https://er3fulih5c.eu-central-1.aws.clickhouse.cloud:8443",
)

This hardcodes a production ClickHouse Cloud endpoint as the default. If a developer or staging environment enables TRACING_BACKEND=clickhouse without setting CLICKHOUSE_TRACING_URL, queries will silently hit the production ClickHouse Cloud instance. Combined with the default CLICKHOUSE_TRACING_PASSWORD being empty string (line 75), this could cause confusing connection failures or — if the password happens to be set in the environment for other purposes — unintended writes to production.

Impact: Risk of accidental production data access from non-production environments.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant