Skip to content

feat(profiling): Add Perfetto trace format support#5659

Open
markushi wants to merge 31 commits intomasterfrom
feat/markushi/perfetto-profiling-support
Open

feat(profiling): Add Perfetto trace format support#5659
markushi wants to merge 31 commits intomasterfrom
feat/markushi/perfetto-profiling-support

Conversation

@markushi
Copy link
Copy Markdown
Member

@markushi markushi commented Feb 25, 2026

This PR adds the ability to ingest binary Perfetto traces (.pftrace) and convert them into the existing Sample v2 JSON format used by Sentry's profiling pipeline. This targets Android
profiling, where Perfetto is the native tracing format.

Key Changes

  1. New feature flag organizations:continuous-profiling-perfetto — gates all Perfetto processing.
  2. Compound item format via meta_length — profile chunk items can now carry [JSON metadata][binary blob], split at meta_length. This is how Perfetto traces arrive: metadata describes the
    profile, the binary blob is the raw .pftrace.
  3. Perfetto → Sample v2 conversion — new relay-profiling/src/perfetto/ module (proto definitions + conversion logic) parses binary Perfetto traces and produces the existing Sample v2
    JSON format.
  4. Raw profile passthrough to Kafka — after expansion, the original Perfetto binary is preserved alongside the expanded JSON. Two new fields on ProfileChunkKafkaMessage: raw_profile and
    raw_profile_content_type.
  5. New fields on existing structs:
    - Frame.package — library/container path for native/Java frames
    - ProfileMetadata.content_type — carries "perfetto" through the pipeline
    - DebugImage::native_image() constructor — creates debug images from Perfetto mapping data

markushi and others added 11 commits February 25, 2026 09:16
Add support for ingesting binary Perfetto traces (.pftrace) as profile
chunks. The SDK sends an envelope with a ProfileChunk metadata item
paired with a ProfileChunkData item containing the raw Perfetto protobuf.

Relay decodes the Perfetto trace, extracts CPU profiling samples
(PerfSample and StreamingProfilePacket), converts them to the internal
Sample v2 format, and forwards both the expanded JSON and the original
binary blob to Kafka for downstream processing.

Key changes:
- New `perfetto` module in relay-profiling for protobuf decoding and
  conversion to Sample v2
- New `ProfileChunkData` envelope item type for binary profile payloads
- Pairing logic to associate ProfileChunk metadata with ProfileChunkData
- Raw profile blob preserved through to Kafka for further processing

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
# Conflicts:
#	relay-server/src/envelope/item.rs
@markushi
Copy link
Copy Markdown
Member Author

@sentry review

Comment thread relay-profiling/src/perfetto/mod.rs Outdated
markushi and others added 2 commits March 23, 2026 09:01
…fer main thread

Separate the shared `strings` intern table into distinct `function_names`,
`mapping_paths`, and `build_ids` tables matching the Perfetto spec where
each InternedData field has its own ID namespace. Also infer the main
thread name when tid equals pid and no explicit name is provided.

Co-Authored-By: Claude <noreply@anthropic.com>
@markushi markushi marked this pull request as ready for review March 23, 2026 08:57
@markushi markushi requested a review from a team as a code owner March 23, 2026 08:57
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: First timestamp delta skipped in StreamingProfilePacket processing
    • Removed the 'i > 0' guard so that timestamp_delta_us[0] is now correctly applied to the first sample's timestamp.

Create PR

Or push these changes by commenting:

@cursor push ab5f3ec796
Preview (ab5f3ec796)
diff --git a/relay-profiling/src/perfetto/mod.rs b/relay-profiling/src/perfetto/mod.rs
--- a/relay-profiling/src/perfetto/mod.rs
+++ b/relay-profiling/src/perfetto/mod.rs
@@ -229,9 +229,7 @@
             Some(Data::StreamingProfilePacket(spp)) => {
                 let mut ts = packet.timestamp.unwrap_or(0);
                 for (i, &cs_iid) in spp.callstack_iid.iter().enumerate() {
-                    if i > 0
-                        && let Some(&delta) = spp.timestamp_delta_us.get(i)
-                    {
+                    if let Some(&delta) = spp.timestamp_delta_us.get(i) {
                         // `delta` is i64 (can be negative for out-of-order samples).
                         // Casting to u64 wraps negative values, which is correct because
                         // `wrapping_add` of a wrapped negative value subtracts as expected.

This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.

Comment thread relay-profiling/src/perfetto/mod.rs
markushi and others added 2 commits March 23, 2026 10:27
Replace `get().is_none()` with `!contains_key()` to satisfy clippy
and add a CHANGELOG entry for the Perfetto interning changes.

Co-Authored-By: Claude <noreply@anthropic.com>
The first delta in timestamp_delta_us was skipped due to an i > 0
guard, but per the Perfetto spec the first sample's timestamp should
be TracePacket.timestamp + timestamp_delta_us[0]. Update tests to use
non-zero first deltas to verify the fix.

Co-Authored-By: Claude <noreply@anthropic.com>
Comment thread relay-profiling/src/perfetto/mod.rs Outdated
markushi and others added 2 commits April 1, 2026 08:59
…kDescriptor

StreamingProfilePacket samples were all assigned tid=0 because the code
never resolved the trusted_packet_sequence_id → TrackDescriptor →
ThreadDescriptor chain. This collapsed multi-thread profiles into a
single thread. Now we build a seq_id→tid mapping from TrackDescriptor
packets and use it when processing StreamingProfilePacket samples.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment thread relay-profiling/src/perfetto/mod.rs Outdated
…al state resets

The two-pass architecture resolved all samples against the final state
of the intern tables. If a trace contained an incremental state reset
that reused an interning ID, samples collected before the reset would
silently resolve to the wrong function names. This merges the two
passes into a single pass that resolves callstacks immediately using
the current intern table state, and extracts debug images inline.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment thread relay-server/src/processing/profile_chunks/mod.rs Outdated
markushi and others added 2 commits April 1, 2026 10:36
process_compound_item was parsing the entire metadata JSON into a
serde_json::Value tree just to read the content_type field, and
split_item_payload was doing the same on the expanded JSON (potentially
hundreds of KB). Instead, surface content_type from the already-
deserialized ProfileMetadata via ExpandedPerfettoChunk, and hardcode
"perfetto" in split_item_payload since compound items are always
validated as perfetto by process_compound_item.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The previous commit moved expand_perfetto before the content_type
check, which would waste work on non-perfetto payloads and produce
confusing errors. Restore the early content_type check using a minimal
serde struct that only deserializes the one field, and remove the
unused content_type field from ExpandedPerfettoChunk.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment thread relay-server/src/processing/profile_chunks/mod.rs Outdated
Comment thread relay-server/src/processing/profile_chunks/process.rs Outdated
Replace the hard-coded "perfetto" content type in split_item_payload
with a lightweight deserialization that reads only the content_type
field from the metadata JSON. This avoids a silent coupling that would
produce incorrect values if compound items support other formats.

Also remove a dead item.set_platform() call that wrote to an item
immediately replaced by a new one.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment thread relay-profiling/src/perfetto/mod.rs Outdated
@markushi
Copy link
Copy Markdown
Member Author

markushi commented Apr 1, 2026

@Dav1dde I think this PR is finally in a ready-for-review state. I've implemented all feedback from the meeting we had a while ago:

  • utilize meta_length to split the meta / blob parts instead of introducing another envelope item
  • feature flag
  • remove compile time code generation

I did not find a suitable crate which offers the generated perfetto .proto definitions directly. So as a middle ground there's now a README + script which could be used to regenerate the proto definitions.

Overall the PR is a bit chunky because of the perfetto parsing logic, which is a bit hard to split up. Also, it's my first major relay contribution and some heavy agent usage. Don't be shy to provide some dramatic feedback and stop the review early if something goes completely into the wrong direction 😅

Comment thread relay-server/src/processing/profile_chunks/process.rs
Comment thread relay-profiling/src/perfetto/mod.rs Outdated
Comment thread relay-server/src/processing/profile_chunks/process.rs
@Dav1dde
Copy link
Copy Markdown
Member

Dav1dde commented Apr 8, 2026

@markushi Just came back from vacation and catching up on a bunch of stuff, will take a look soon!

Copy link
Copy Markdown
Member

@jjbayer jjbayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some unstructured comments for now. I would actually prefer splitting the PR into a protocol PR that introduces the new format (either by loading a submodule or by adding the protoc script), and a separate PR that contains the logic.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not 100% sure what we discussed. Did we say we wanted the protobuf schema generation in a separate repository, like we have with sentry-conventions?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we talked about it, but it's actually a lot less than I thought it would be. Maybe we just treat it like cbindgen dependency in relay-cabi.

Comment thread relay-profiling/src/perfetto/mod.rs Outdated
Comment thread relay-profiling/src/perfetto/mod.rs Outdated
Comment thread CHANGELOG.md Outdated
Comment thread relay-profiling/src/perfetto/mod.rs Outdated
Comment thread relay-profiling/src/perfetto/mod.rs Outdated
Comment thread relay-profiling/src/perfetto/mod.rs Outdated
Comment thread relay-profiling/src/perfetto/mod.rs Outdated
Comment thread relay-profiling/src/perfetto/mod.rs Outdated
Comment thread relay-profiling/src/perfetto/mod.rs Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we talked about it, but it's actually a lot less than I thought it would be. Maybe we just treat it like cbindgen dependency in relay-cabi.

Comment thread relay-profiling/src/perfetto/mod.rs Outdated
Comment thread relay-profiling/src/perfetto/mod.rs Outdated
Comment thread relay-profiling/src/sample/v2.rs Outdated
Comment thread relay-profiling/src/debug_image.rs Outdated
Comment thread relay-server/src/processing/profile_chunks/mod.rs Outdated
Comment thread relay-server/src/processing/profile_chunks/mod.rs Outdated
///
/// For plain items, returns `(full_payload, None, None)`.
#[cfg_attr(not(feature = "processing"), allow(dead_code))]
fn split_item_payload(item: &Item) -> (bytes::Bytes, Option<bytes::Bytes>, Option<String>) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to move this and the tests out to a utils or store module.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. How about doing a follow up PR for this?

/// The item payload is `[JSON metadata bytes][binary blob bytes]`, split at `meta_length`.
/// After expansion, the item is rebuilt with `[expanded JSON][raw binary]` and an updated
/// `meta_length`, so that `forward_store` can still extract the raw profile.
fn process_compound_item(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does look very duplicated with the code for the store, we should see if we can instead only do it once and introduce a new type to keep the parsed state instead of keeping the Item around.

Internally this type can still store the raw bytes if it makes sense and just remember metadata, offsets etc.

Comment thread relay-server/src/services/store.rs Outdated
@Dav1dde
Copy link
Copy Markdown
Member

Dav1dde commented Apr 10, 2026

Mostly nits/style things, overall architecturally I think it looks good. Regarding functionality I haven't taken a closer look yet.

@markushi it'd be also great if you can add some integration tests for this, which end2end assert that right data is put into Kafka.

Comment thread relay-profiling/src/perfetto/mod.rs Outdated
Comment thread relay-profiling/src/perfetto/mod.rs Outdated
Comment thread relay-server/src/processing/profile_chunks/process.rs
Comment thread relay-profiling/src/lib.rs Outdated
Comment thread relay-server/src/services/store.rs
Comment thread relay-server/src/processing/profile_chunks/mod.rs
Comment thread relay-profiling/src/perfetto/mod.rs
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit e1c0bac. Configure here.

Comment thread relay-profiling/src/sample/v2.rs Outdated
Comment thread relay-profiling/src/lib.rs
Comment on lines +210 to +214
sample_count += 1;
if let Some(stack_id) =
resolve_callstack(callstack_iid, seq_id, &tables_by_seq, &mut ctx)
{
resolved_samples.push((ts, tid, stack_id));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The sample_count is incremented for all samples with a callstack_iid, but only resolved samples are stored. This can lead to incorrect error reporting for traces with many unresolvable samples.
Severity: MEDIUM

Suggested Fix

The sample_count variable should only be incremented after a callstack has been successfully resolved, inside the if let Some(stack_id) = ... block. This ensures the sample limit check applies only to valid, resolved samples, aligning it with the final validation logic.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.

Location: relay-profiling/src/perfetto/mod.rs#L210-L214

Potential issue: The code increments `sample_count` for every `PerfSample` packet
containing a `callstack_iid`, regardless of whether the callstack can be resolved.
However, only successfully resolved samples are added to the `resolved_samples` vector.
If a trace contains more than `MAX_SAMPLES` unresolvable samples, the function will
prematurely and incorrectly return a `ProfileError::ExceedSizeLimit` error. The correct
behavior would be to process all samples and, if no valid samples are found, return
`ProfileError::NotEnoughSamples`. This bug can cause profiles containing a mix of
resolvable and unresolvable callstacks to be improperly discarded.

Copy link
Copy Markdown
Member

@Dav1dde Dav1dde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few small things in the conversion code. I tried to mostly follow along logically but the interning and things are quite complicated according to the docs, so I mostly left like small improvements there.

For the actual processing pipeline, see my larger comment in the processor, we need to get some type safety in here. It will also help with code duplication of the duplicated split functions.
Let me know if you want me to help out with that.

Comment on lines +8 to +22
TEST_CONFIG = {
"outcomes": {
"emit_outcomes": True,
"batch_size": 1,
"batch_interval": 1,
"aggregator": {
"bucket_interval": 1,
"flush_interval": 1,
},
},
"aggregator": {
"bucket_interval": 1,
"initial_delay": 0,
},
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
TEST_CONFIG = {
"outcomes": {
"emit_outcomes": True,
"batch_size": 1,
"batch_interval": 1,
"aggregator": {
"bucket_interval": 1,
"flush_interval": 1,
},
},
"aggregator": {
"bucket_interval": 1,
"initial_delay": 0,
},
}
TEST_CONFIG = {
"outcomes": {
"emit_outcomes": True,
},
}

Should be enough, a lot of these settings are default for all integration tests (now)

binary]` concatenated, delimited by the `meta_length` item header.
"""
profiles_consumer = profiles_consumer()
outcomes_consumer = outcomes_consumer(timeout=2)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
outcomes_consumer = outcomes_consumer(timeout=2)
outcomes_consumer = outcomes_consumer()

Best to leave timeouts in their defaults, we've invested some time to choose defaults that work with unstable github CI

Comment on lines +73 to +74
#[serde(rename = "organizations:continuous-profiling-perfetto")]
ContinuousProfilingPerfetto,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the long-term plan is to roll this out to everyone as long as they have continous profiling enabled?

Comment on lines +33 to +36
/// Perfetto builtin clock IDs.
/// See <https://perfetto.dev/docs/concepts/clock-sync>.
const CLOCK_REALTIME: u32 = 1;
const CLOCK_BOOTTIME: u32 = 6;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Perfetto builtin clock IDs.
/// See <https://perfetto.dev/docs/concepts/clock-sync>.
const CLOCK_REALTIME: u32 = 1;
const CLOCK_BOOTTIME: u32 = 6;
/// Perfetto builtin real time clock ID.
///
/// See <https://perfetto.dev/docs/concepts/clock-sync>.
const CLOCK_REALTIME: u32 = 1;
/// Perfetto builtin boot time clock ID.
///
/// See <https://perfetto.dev/docs/concepts/clock-sync>.
const CLOCK_BOOTTIME: u32 = 6;

I try to optimized for IDE/rust-analyzer usage (where someone click show docs on a symbol)

Comment on lines +94 to +104
for s in data.function_names {
if let Some(iid) = s.iid {
let value = s
.r#str
.as_deref()
.and_then(|b| std::str::from_utf8(b).ok())
.unwrap_or("")
.to_owned();
self.function_names.insert(iid, value);
}
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is just:

        self.function_names.extend(
            data.function_names
                .into_iter()
                .filter_map(|is| Some((is.iid?, String::from_utf8(is.str?).ok()?))),
        );

The function takes ownership, there is no need to go to deref() try to parse a str to then allocate the String.

Potentially worth it to make it a helper: self.function_names(bike_shed_me(data.function_names)) where the signature could be fn bike_shed_me(_: Vec<InternedString>) -> impl Iterator<Item = (u64, String)>.

Falling back to "" for invalid strings seems not necessary, that can be handled on lookup, the same as a missing mapping, if that is not desirable a change to insert an empty string is as easy as ok()? -> unwrap_or_default().

fn test_convert_minimal_trace() {
let bytes = build_minimal_trace();
let result = convert(&bytes);
assert!(result.is_ok(), "conversion failed: {result:?}");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assert is redundant with the unwrap

Comment on lines +115 to +126
#[derive(serde::Deserialize)]
struct ContentTypeProbe {
content_type: Option<String>,
}
match serde_json::from_slice::<ContentTypeProbe>(meta_json)
.ok()
.and_then(|v| v.content_type)
.as_deref()
{
Some("perfetto") => {}
_ => return Err(relay_profiling::ProfileError::PlatformNotSupported.into()),
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to probe this, can we not have that inferred from the item header?

profile_chunks.retain(
|pc| &mut pc.profile_chunks,
|item, records| -> Result<()> {
if let Some(meta_length) = item.meta_length() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Afaik we discussed using a different content_type for perfetto chunks, if we have that, then this should also be the decider which branch to take.

Comment on lines +83 to +86
let mut new_item = Item::new(ItemType::ProfileChunk);
new_item.set_platform(pc.platform().to_owned());
new_item.set_payload(ContentType::Json, expanded);
new_item
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accidental?

let retention_days = ctx.event_retention().standard;

for item in profile_chunks.split(|pc| pc.profile_chunks) {
let (kafka_payload, raw_profile) = split_item_payload(&item);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes a lot of implicit decisions which are based on how something was processed again. We have a type-system and should use it.

  1. Make the decision how to process an item once, encode it in the type system (enum)
  2. Process the item accordingly
  3. Make the decision how to send/forward based also on the same type level information

That prevents bugs but also means we can extend it in the future much more easily.

Other processors have an expand -> process -> store workflow. The existing profile chunk code took a shortcut and compressed all 3 steps into process because it happened to be easy enough, now we should align this processor with the others.

I am happy to help out here, but I want to do this before we merge it. Can also just be done as a separate PR with this one as a base.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants