feat(profiling): Add Perfetto trace format support#5659
feat(profiling): Add Perfetto trace format support#5659
Conversation
Add support for ingesting binary Perfetto traces (.pftrace) as profile chunks. The SDK sends an envelope with a ProfileChunk metadata item paired with a ProfileChunkData item containing the raw Perfetto protobuf. Relay decodes the Perfetto trace, extracts CPU profiling samples (PerfSample and StreamingProfilePacket), converts them to the internal Sample v2 format, and forwards both the expanded JSON and the original binary blob to Kafka for downstream processing. Key changes: - New `perfetto` module in relay-profiling for protobuf decoding and conversion to Sample v2 - New `ProfileChunkData` envelope item type for binary profile payloads - Pairing logic to associate ProfileChunk metadata with ProfileChunkData - Raw profile blob preserved through to Kafka for further processing Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tto-profiling-support
# Conflicts: # relay-server/src/envelope/item.rs
|
@sentry review |
…fer main thread Separate the shared `strings` intern table into distinct `function_names`, `mapping_paths`, and `build_ids` tables matching the Perfetto spec where each InternedData field has its own ID namespace. Also infer the main thread name when tid equals pid and no explicit name is provided. Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Autofix Details
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: First timestamp delta skipped in StreamingProfilePacket processing
- Removed the 'i > 0' guard so that timestamp_delta_us[0] is now correctly applied to the first sample's timestamp.
Or push these changes by commenting:
@cursor push ab5f3ec796
Preview (ab5f3ec796)
diff --git a/relay-profiling/src/perfetto/mod.rs b/relay-profiling/src/perfetto/mod.rs
--- a/relay-profiling/src/perfetto/mod.rs
+++ b/relay-profiling/src/perfetto/mod.rs
@@ -229,9 +229,7 @@
Some(Data::StreamingProfilePacket(spp)) => {
let mut ts = packet.timestamp.unwrap_or(0);
for (i, &cs_iid) in spp.callstack_iid.iter().enumerate() {
- if i > 0
- && let Some(&delta) = spp.timestamp_delta_us.get(i)
- {
+ if let Some(&delta) = spp.timestamp_delta_us.get(i) {
// `delta` is i64 (can be negative for out-of-order samples).
// Casting to u64 wraps negative values, which is correct because
// `wrapping_add` of a wrapped negative value subtracts as expected.This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.
Replace `get().is_none()` with `!contains_key()` to satisfy clippy and add a CHANGELOG entry for the Perfetto interning changes. Co-Authored-By: Claude <noreply@anthropic.com>
The first delta in timestamp_delta_us was skipped due to an i > 0 guard, but per the Perfetto spec the first sample's timestamp should be TracePacket.timestamp + timestamp_delta_us[0]. Update tests to use non-zero first deltas to verify the fix. Co-Authored-By: Claude <noreply@anthropic.com>
…kDescriptor StreamingProfilePacket samples were all assigned tid=0 because the code never resolved the trusted_packet_sequence_id → TrackDescriptor → ThreadDescriptor chain. This collapsed multi-thread profiles into a single thread. Now we build a seq_id→tid mapping from TrackDescriptor packets and use it when processing StreamingProfilePacket samples. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…al state resets The two-pass architecture resolved all samples against the final state of the intern tables. If a trace contained an incremental state reset that reused an interning ID, samples collected before the reset would silently resolve to the wrong function names. This merges the two passes into a single pass that resolves callstacks immediately using the current intern table state, and extracts debug images inline. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
process_compound_item was parsing the entire metadata JSON into a serde_json::Value tree just to read the content_type field, and split_item_payload was doing the same on the expanded JSON (potentially hundreds of KB). Instead, surface content_type from the already- deserialized ProfileMetadata via ExpandedPerfettoChunk, and hardcode "perfetto" in split_item_payload since compound items are always validated as perfetto by process_compound_item. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The previous commit moved expand_perfetto before the content_type check, which would waste work on non-perfetto payloads and produce confusing errors. Restore the early content_type check using a minimal serde struct that only deserializes the one field, and remove the unused content_type field from ExpandedPerfettoChunk. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the hard-coded "perfetto" content type in split_item_payload with a lightweight deserialization that reads only the content_type field from the metadata JSON. This avoids a silent coupling that would produce incorrect values if compound items support other formats. Also remove a dead item.set_platform() call that wrote to an item immediately replaced by a new one. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
@Dav1dde I think this PR is finally in a ready-for-review state. I've implemented all feedback from the meeting we had a while ago:
I did not find a suitable crate which offers the generated perfetto Overall the PR is a bit chunky because of the perfetto parsing logic, which is a bit hard to split up. Also, it's my first major relay contribution and some heavy agent usage. Don't be shy to provide some dramatic feedback and stop the review early if something goes completely into the wrong direction 😅 |
|
@markushi Just came back from vacation and catching up on a bunch of stuff, will take a look soon! |
jjbayer
left a comment
There was a problem hiding this comment.
Just some unstructured comments for now. I would actually prefer splitting the PR into a protocol PR that introduces the new format (either by loading a submodule or by adding the protoc script), and a separate PR that contains the logic.
There was a problem hiding this comment.
Not 100% sure what we discussed. Did we say we wanted the protobuf schema generation in a separate repository, like we have with sentry-conventions?
There was a problem hiding this comment.
Yeah we talked about it, but it's actually a lot less than I thought it would be. Maybe we just treat it like cbindgen dependency in relay-cabi.
There was a problem hiding this comment.
Yeah we talked about it, but it's actually a lot less than I thought it would be. Maybe we just treat it like cbindgen dependency in relay-cabi.
| /// | ||
| /// For plain items, returns `(full_payload, None, None)`. | ||
| #[cfg_attr(not(feature = "processing"), allow(dead_code))] | ||
| fn split_item_payload(item: &Item) -> (bytes::Bytes, Option<bytes::Bytes>, Option<String>) { |
There was a problem hiding this comment.
Would be nice to move this and the tests out to a utils or store module.
There was a problem hiding this comment.
Good point. How about doing a follow up PR for this?
| /// The item payload is `[JSON metadata bytes][binary blob bytes]`, split at `meta_length`. | ||
| /// After expansion, the item is rebuilt with `[expanded JSON][raw binary]` and an updated | ||
| /// `meta_length`, so that `forward_store` can still extract the raw profile. | ||
| fn process_compound_item( |
There was a problem hiding this comment.
This does look very duplicated with the code for the store, we should see if we can instead only do it once and introduce a new type to keep the parsed state instead of keeping the Item around.
Internally this type can still store the raw bytes if it makes sense and just remember metadata, offsets etc.
|
Mostly nits/style things, overall architecturally I think it looks good. Regarding functionality I haven't taken a closer look yet. @markushi it'd be also great if you can add some integration tests for this, which end2end assert that right data is put into Kafka. |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit e1c0bac. Configure here.
| sample_count += 1; | ||
| if let Some(stack_id) = | ||
| resolve_callstack(callstack_iid, seq_id, &tables_by_seq, &mut ctx) | ||
| { | ||
| resolved_samples.push((ts, tid, stack_id)); |
There was a problem hiding this comment.
Bug: The sample_count is incremented for all samples with a callstack_iid, but only resolved samples are stored. This can lead to incorrect error reporting for traces with many unresolvable samples.
Severity: MEDIUM
Suggested Fix
The sample_count variable should only be incremented after a callstack has been successfully resolved, inside the if let Some(stack_id) = ... block. This ensures the sample limit check applies only to valid, resolved samples, aligning it with the final validation logic.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.
Location: relay-profiling/src/perfetto/mod.rs#L210-L214
Potential issue: The code increments `sample_count` for every `PerfSample` packet
containing a `callstack_iid`, regardless of whether the callstack can be resolved.
However, only successfully resolved samples are added to the `resolved_samples` vector.
If a trace contains more than `MAX_SAMPLES` unresolvable samples, the function will
prematurely and incorrectly return a `ProfileError::ExceedSizeLimit` error. The correct
behavior would be to process all samples and, if no valid samples are found, return
`ProfileError::NotEnoughSamples`. This bug can cause profiles containing a mix of
resolvable and unresolvable callstacks to be improperly discarded.
There was a problem hiding this comment.
A few small things in the conversion code. I tried to mostly follow along logically but the interning and things are quite complicated according to the docs, so I mostly left like small improvements there.
For the actual processing pipeline, see my larger comment in the processor, we need to get some type safety in here. It will also help with code duplication of the duplicated split functions.
Let me know if you want me to help out with that.
| TEST_CONFIG = { | ||
| "outcomes": { | ||
| "emit_outcomes": True, | ||
| "batch_size": 1, | ||
| "batch_interval": 1, | ||
| "aggregator": { | ||
| "bucket_interval": 1, | ||
| "flush_interval": 1, | ||
| }, | ||
| }, | ||
| "aggregator": { | ||
| "bucket_interval": 1, | ||
| "initial_delay": 0, | ||
| }, | ||
| } |
There was a problem hiding this comment.
| TEST_CONFIG = { | |
| "outcomes": { | |
| "emit_outcomes": True, | |
| "batch_size": 1, | |
| "batch_interval": 1, | |
| "aggregator": { | |
| "bucket_interval": 1, | |
| "flush_interval": 1, | |
| }, | |
| }, | |
| "aggregator": { | |
| "bucket_interval": 1, | |
| "initial_delay": 0, | |
| }, | |
| } | |
| TEST_CONFIG = { | |
| "outcomes": { | |
| "emit_outcomes": True, | |
| }, | |
| } |
Should be enough, a lot of these settings are default for all integration tests (now)
| binary]` concatenated, delimited by the `meta_length` item header. | ||
| """ | ||
| profiles_consumer = profiles_consumer() | ||
| outcomes_consumer = outcomes_consumer(timeout=2) |
There was a problem hiding this comment.
| outcomes_consumer = outcomes_consumer(timeout=2) | |
| outcomes_consumer = outcomes_consumer() |
Best to leave timeouts in their defaults, we've invested some time to choose defaults that work with unstable github CI
| #[serde(rename = "organizations:continuous-profiling-perfetto")] | ||
| ContinuousProfilingPerfetto, |
There was a problem hiding this comment.
I assume the long-term plan is to roll this out to everyone as long as they have continous profiling enabled?
| /// Perfetto builtin clock IDs. | ||
| /// See <https://perfetto.dev/docs/concepts/clock-sync>. | ||
| const CLOCK_REALTIME: u32 = 1; | ||
| const CLOCK_BOOTTIME: u32 = 6; |
There was a problem hiding this comment.
| /// Perfetto builtin clock IDs. | |
| /// See <https://perfetto.dev/docs/concepts/clock-sync>. | |
| const CLOCK_REALTIME: u32 = 1; | |
| const CLOCK_BOOTTIME: u32 = 6; | |
| /// Perfetto builtin real time clock ID. | |
| /// | |
| /// See <https://perfetto.dev/docs/concepts/clock-sync>. | |
| const CLOCK_REALTIME: u32 = 1; | |
| /// Perfetto builtin boot time clock ID. | |
| /// | |
| /// See <https://perfetto.dev/docs/concepts/clock-sync>. | |
| const CLOCK_BOOTTIME: u32 = 6; |
I try to optimized for IDE/rust-analyzer usage (where someone click show docs on a symbol)
| for s in data.function_names { | ||
| if let Some(iid) = s.iid { | ||
| let value = s | ||
| .r#str | ||
| .as_deref() | ||
| .and_then(|b| std::str::from_utf8(b).ok()) | ||
| .unwrap_or("") | ||
| .to_owned(); | ||
| self.function_names.insert(iid, value); | ||
| } | ||
| } |
There was a problem hiding this comment.
I think this is just:
self.function_names.extend(
data.function_names
.into_iter()
.filter_map(|is| Some((is.iid?, String::from_utf8(is.str?).ok()?))),
);The function takes ownership, there is no need to go to deref() try to parse a str to then allocate the String.
Potentially worth it to make it a helper: self.function_names(bike_shed_me(data.function_names)) where the signature could be fn bike_shed_me(_: Vec<InternedString>) -> impl Iterator<Item = (u64, String)>.
Falling back to "" for invalid strings seems not necessary, that can be handled on lookup, the same as a missing mapping, if that is not desirable a change to insert an empty string is as easy as ok()? -> unwrap_or_default().
| fn test_convert_minimal_trace() { | ||
| let bytes = build_minimal_trace(); | ||
| let result = convert(&bytes); | ||
| assert!(result.is_ok(), "conversion failed: {result:?}"); |
There was a problem hiding this comment.
The assert is redundant with the unwrap
| #[derive(serde::Deserialize)] | ||
| struct ContentTypeProbe { | ||
| content_type: Option<String>, | ||
| } | ||
| match serde_json::from_slice::<ContentTypeProbe>(meta_json) | ||
| .ok() | ||
| .and_then(|v| v.content_type) | ||
| .as_deref() | ||
| { | ||
| Some("perfetto") => {} | ||
| _ => return Err(relay_profiling::ProfileError::PlatformNotSupported.into()), | ||
| } |
There was a problem hiding this comment.
Why do we need to probe this, can we not have that inferred from the item header?
| profile_chunks.retain( | ||
| |pc| &mut pc.profile_chunks, | ||
| |item, records| -> Result<()> { | ||
| if let Some(meta_length) = item.meta_length() { |
There was a problem hiding this comment.
Afaik we discussed using a different content_type for perfetto chunks, if we have that, then this should also be the decider which branch to take.
| let mut new_item = Item::new(ItemType::ProfileChunk); | ||
| new_item.set_platform(pc.platform().to_owned()); | ||
| new_item.set_payload(ContentType::Json, expanded); | ||
| new_item |
| let retention_days = ctx.event_retention().standard; | ||
|
|
||
| for item in profile_chunks.split(|pc| pc.profile_chunks) { | ||
| let (kafka_payload, raw_profile) = split_item_payload(&item); |
There was a problem hiding this comment.
This makes a lot of implicit decisions which are based on how something was processed again. We have a type-system and should use it.
- Make the decision how to process an item once, encode it in the type system (
enum) - Process the item accordingly
- Make the decision how to send/forward based also on the same type level information
That prevents bugs but also means we can extend it in the future much more easily.
Other processors have an expand -> process -> store workflow. The existing profile chunk code took a shortcut and compressed all 3 steps into process because it happened to be easy enough, now we should align this processor with the others.
I am happy to help out here, but I want to do this before we merge it. Can also just be done as a separate PR with this one as a base.


This PR adds the ability to ingest binary Perfetto traces (.pftrace) and convert them into the existing Sample v2 JSON format used by Sentry's profiling pipeline. This targets Android
profiling, where Perfetto is the native tracing format.
Key Changes
organizations:continuous-profiling-perfetto— gates all Perfetto processing.meta_length— profile chunk items can now carry[JSON metadata][binary blob], split atmeta_length. This is how Perfetto traces arrive: metadata describes theprofile, the binary blob is the raw .pftrace.
relay-profiling/src/perfetto/module (proto definitions + conversion logic) parses binary Perfetto traces and produces the existing Sample v2JSON format.
raw_profileandraw_profile_content_type.- Frame.package — library/container path for native/Java frames
- ProfileMetadata.content_type — carries "perfetto" through the pipeline
- DebugImage::native_image() constructor — creates debug images from Perfetto mapping data