Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
9fb66fd
Create datadog-metrics-collector crate to collect instance value with…
kathiehuang Apr 13, 2026
0bd9a30
Categorize metrics with azure.functions prefix as enhanced metrics
kathiehuang Apr 13, 2026
0504362
Use metrics collector in main loop and refactor start_dogstatsd
kathiehuang Apr 13, 2026
25c1d5a
Add windows-enhanced-metrics feature to CI
kathiehuang Apr 13, 2026
ccb697d
Update collection interval, change info log to debug, and update libd…
kathiehuang Apr 14, 2026
6c8537e
Change instance metric collection interval to 3, update comments
kathiehuang Apr 14, 2026
c2799f6
Remove windows feature for now
kathiehuang Apr 15, 2026
4c46f88
Add precondition for enhanced metrics collector in tokio select loop
kathiehuang Apr 15, 2026
6e39d78
Precompute tags in new() rather than building them in collect_and_sub…
kathiehuang Apr 15, 2026
1c7cbde
Don't check DD_ENHANCED_METRICS_ENABLED
kathiehuang Apr 17, 2026
77895c7
Resolve instance ID based on hosting plan
kathiehuang Apr 23, 2026
ba56366
Add unit test for Windows fallback
kathiehuang Apr 23, 2026
e2adcff
Use Tag::new() from libdd_common, make unknown a constant
kathiehuang Apr 24, 2026
4172461
Refactor build_enhanced_metrics_tags and add unit tests for build_tags
kathiehuang Apr 24, 2026
6bcf0e2
Rename Azure-specific files
kathiehuang Apr 29, 2026
83c1b23
Add unit tests for starting metrics components
kathiehuang Apr 30, 2026
19df72c
Couple metrics aggregator and flusher together
kathiehuang Apr 30, 2026
a01cb99
Add clarifying comment
kathiehuang Apr 30, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 40 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

11 changes: 11 additions & 0 deletions crates/datadog-metrics-collector/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
[package]
name = "datadog-metrics-collector"
version = "0.1.0"
edition.workspace = true
license.workspace = true
description = "Collector to read, compute, and submit enhanced metrics in Serverless environments"

[dependencies]
dogstatsd = { path = "../dogstatsd", default-features = true }
tracing = { version = "0.1", default-features = false }
libdd-common = { git = "https://github.com/DataDog/libdatadog", rev = "8c88979985154d6d97c0fc2ca9039682981eacad", default-features = false }
176 changes: 176 additions & 0 deletions crates/datadog-metrics-collector/src/azure_instance.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,176 @@
// Copyright 2023-Present Datadog, Inc. https://www.datadoghq.com/
// SPDX-License-Identifier: Apache-2.0

//! Instance identity metric collector for Azure Functions.
//!
//! Submits `azure.functions.enhanced.instance` with value 1.0 on each
//! collection tick, tagged with the instance identifier.

use dogstatsd::aggregator::AggregatorHandle;
use dogstatsd::metric::{Metric, MetricValue, SortedTags};
use std::env;
use tracing::{error, warn};

const INSTANCE_METRIC: &str = "azure.functions.enhanced.instance";
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this business logic is all specific to Azure Functions right now maybe the file name should reflect that? I imagine when we add instance metrics for other environments this can be refactored to be more generic but for now we should be clear that this is only for Azure Functions.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, I renamed both instance.rs and tags.rs since the logic there is Azure-specific. When we support other environments, we can pull out the Azure-specific logic to its own file and keep reused logic like the collector and the creation of tags in dedicated, shared files

6bcf0e2


/// Resolves the instance ID from explicit values (used by tests).
///
/// Picks the env var that matches the Azure integration metric's `instance`
/// tag for the current hosting plan with fallback logic
/// if the preferred source is empty.
fn resolve_instance_id_from(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should libddcommon be thinking about website pod name / container name? Will there be potential inconsistencies?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! I had created a ticket for this actually https://datadoghq.atlassian.net/browse/SVLS-8931 - but this led me to realize that the instance ID used in libddcommon / spans is different from the instance tag on integration metrics.

I compared the env var values to the instance tag on integration metrics across hosting plans and found that in Elastic Premium and Premium plans, the integration metrics actually match the COMPUTERNAME env var rather than WEBSITE_INSTANCE_ID which the spans use

And for Flex Consumption and Consumption, on spans the instance id is often unknown. I documented my env var investigations here as well as in the ticket above

website_sku: Option<&str>,
container_name: Option<&str>,
website_pod_name: Option<&str>,
computer_name: Option<&str>,
) -> Option<String> {
fn non_empty(s: Option<&str>) -> Option<&str> {
s.filter(|v| !v.is_empty())
}

let sku_preferred = match website_sku {
Some("FlexConsumption") | Some("Dynamic") => {
non_empty(container_name).or(non_empty(website_pod_name))
}
Some(_) => non_empty(computer_name),
None => None,
};

sku_preferred
.or_else(|| non_empty(container_name))
.or_else(|| non_empty(website_pod_name))
.or_else(|| non_empty(computer_name))
.map(|s| s.to_lowercase())
}
Comment on lines +16 to +44
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm good point. I think if we were to put this in libdd-common, the logic would replace how instance_name gets populated here. This would make the "instance" tag from this enhanced metric become consistent with the "instance_name" span attribute (and address this ticket)

It looks like that ticket got assigned to APM so I'll double-check that there isn't any duplicate work being done, but I think moving this logic to libdd-common makes sense and would help populate the instance_name span attribute for some hosting plans that had it as unknown before!


/// Resolves the instance ID from environment variables.
fn resolve_instance_id() -> Option<String> {
resolve_instance_id_from(
env::var("WEBSITE_SKU").ok().as_deref(),
env::var("CONTAINER_NAME").ok().as_deref(),
env::var("WEBSITE_POD_NAME").ok().as_deref(),
env::var("COMPUTERNAME").ok().as_deref(),
)
Comment on lines +46 to +53
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolve_instance_id() reads COMPUTERNAME but the intended Azure Functions instance identifier (per the PR description and the test comment below) appears to be WEBSITE_INSTANCE_ID. If WEBSITE_INSTANCE_ID is set on some plans where COMPUTERNAME is not, the instance metric may never be submitted. Consider including WEBSITE_INSTANCE_ID in the resolution inputs (and updating the preference/fallback order accordingly), or updating the PR/docs to explicitly state why COMPUTERNAME is the correct source.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description is updated, removed the test comment in e2adcff

WEBSITE_INSTANCE_ID should not be used as the instance tag, this is explained in the PR description + linked doc

}

pub struct InstanceMetricsCollector {
aggregator: AggregatorHandle,
tags: Option<SortedTags>,
}

impl InstanceMetricsCollector {
/// Creates a new collector, returning `None` if no instance ID is found.
pub fn new(aggregator: AggregatorHandle, tags: Option<SortedTags>) -> Option<Self> {
let instance_id = resolve_instance_id();
let Some(instance_id) = instance_id else {
warn!("No instance ID found, instance metric will not be submitted");
return None;
};

// Precompute tags: enhanced metrics tags + instance tag
let instance_tag = format!("instance:{}", instance_id);
let tags = match tags {
Some(mut existing) => {
if let Ok(id_tag) = SortedTags::parse(&instance_tag) {
existing.extend(&id_tag);
}
Some(existing)
}
None => SortedTags::parse(&instance_tag).ok(),
};

Some(Self { aggregator, tags })
}

pub fn collect_and_submit(&self) {
let metric = Metric::new(
INSTANCE_METRIC.into(),
MetricValue::gauge(1.0),
self.tags.clone(),
None,
);

if let Err(e) = self.aggregator.insert_batch(vec![metric]) {
error!("Failed to insert instance metric: {}", e);
}
}
}

#[cfg(test)]
mod tests {
use super::*;

#[test]
fn test_flex_consumption_uses_container_name() {
let id = resolve_instance_id_from(
Some("FlexConsumption"),
Some("0--abc-DEF"),
Some("0--abc-DEF"),
None,
);
assert_eq!(id, Some("0--abc-def".to_string()));
}

#[test]
fn test_flex_consumption_falls_back_to_pod_name_if_container_missing() {
let id = resolve_instance_id_from(Some("FlexConsumption"), None, Some("pod-XYZ"), None);
assert_eq!(id, Some("pod-xyz".to_string()));
}

#[test]
fn test_consumption_uses_container_name() {
let id = resolve_instance_id_from(
Some("Dynamic"),
Some("ABCD1234-111122223333444455"),
None,
None,
);
assert_eq!(id, Some("abcd1234-111122223333444455".to_string()));
}

#[test]
fn test_elastic_premium_uses_computer_name() {
let id =
resolve_instance_id_from(Some("ElasticPremium"), None, None, Some("ep0fakewk0000A1"));
assert_eq!(id, Some("ep0fakewk0000a1".to_string()));
}

#[test]
fn test_dedicated_uses_computer_name() {
let id = resolve_instance_id_from(Some("PremiumV3"), None, None, Some("p3fakewk0000B2"));
assert_eq!(id, Some("p3fakewk0000b2".to_string()));
}

#[test]
fn test_empty_string_is_treated_as_missing() {
let id =
resolve_instance_id_from(Some("ElasticPremium"), Some(""), Some(""), Some("worker-1"));
assert_eq!(id, Some("worker-1".to_string()));
}

#[test]
fn test_unknown_sku_falls_back_to_search_order() {
let id = resolve_instance_id_from(Some("SomeNewSku"), Some("container-1"), None, None);
assert_eq!(id, Some("container-1".to_string()));
}

#[test]
fn test_missing_sku_falls_back_to_search_order() {
let id = resolve_instance_id_from(None, Some("container-1"), None, Some("worker-1"));
assert_eq!(id, Some("container-1".to_string()));
}

#[test]
fn test_no_env_vars_returns_none() {
let id = resolve_instance_id_from(None, None, None, None);
assert_eq!(id, None);
}
Comment on lines +103 to +167
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The instance ID resolution tests only cover the fallback cases. To fully lock in the intended precedence, add a test asserting WEBSITE_INSTANCE_ID wins over WEBSITE_POD_NAME/CONTAINER_NAME, and (optionally) a test for the all-None case returning None.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If any of these env vars exist, they should give the correct instance value, so I think this test case is unnecessary. The all-None case also isn't relevant since these are env vars injected by Azure. End to end tests to ensure these instance-identifying environment variables don't change would be more helpful


// On Windows Consumption we've observed CONTAINER_NAME and WEBSITE_POD_NAME
// unset but COMPUTERNAME set
#[test]
fn test_windows_consumption_falls_through_to_computer_name() {
let id = resolve_instance_id_from(Some("Dynamic"), None, None, Some("10-20-30-40"));
assert_eq!(id, Some("10-20-30-40".to_string()));
}
}
123 changes: 123 additions & 0 deletions crates/datadog-metrics-collector/src/azure_tags.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
// Copyright 2023-Present Datadog, Inc. https://www.datadoghq.com/
// SPDX-License-Identifier: Apache-2.0

//! Shared tag builder for enhanced metrics.
//!
//! Tags are attached to all enhanced metrics submitted by the metrics collector.

use dogstatsd::metric::SortedTags;
use libdd_common::{azure_app_services, tag::Tag};
use std::env;
use tracing::warn;

/// `libdd_common::azure_app_services` returns this value when the corresponding Azure metadata isn't populated.
const AAS_UNKNOWN_VALUE: &str = "unknown";

/// Builds the common tags for all enhanced metrics.
///
/// Sources:
/// - Azure metadata (resource_group, subscription_id, name) from libdd_common
/// - Environment variables (region, plan_tier, service, env, version, serverless_compat_version)
///
/// The DogStatsD origin tag (e.g. `origin:azurefunction`) is added by the metrics aggregator,
/// not here.
pub fn build_enhanced_metrics_tags() -> Option<SortedTags> {
let mut pairs: Vec<(&'static str, String)> = Vec::new();

if let Some(aas_metadata) = &*azure_app_services::AAS_METADATA_FUNCTION {
for (name, value) in [
("resource_group", aas_metadata.get_resource_group()),
Copy link
Copy Markdown
Contributor

@Lewis-E Lewis-E Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"resource_group" vs ""aas.resource.group" (used in common metadata)? should we have both? Probably not given the whole cardinality choice, but wondering why to decide one way or another.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the same confusion initially - we want to use the same tags that integration metrics are using so that we can JOIN them, which is why we don't have the aas* prefix!

("subscription_id", aas_metadata.get_subscription_id()),
("name", aas_metadata.get_site_name()),
] {
if value != AAS_UNKNOWN_VALUE {
pairs.push((name, value.to_string()));
}
}
}

for (tag_name, env_var) in [
("region", "REGION_NAME"),
("plan_tier", "WEBSITE_SKU"),
Comment on lines +40 to +41
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using WEBSITE_SKU from https://github.com/DataDog/libdatadog/blob/main/libdd-common/src/azure_app_services.rs. Also consider adding REGION_NAME.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also consider adding REGION_NAME.

Do you mean adding REGION_NAME as a const in azure_app_services.rs but leaving it unused there and using it only here, or do you mean adding region as a span attribute?

It looks like WEBSITE_SKU is a private const. If I'm going to make a change in libdatadog, I could combine the suggestions from this PR

  • Make WEBSITE_SKU a public const
  • Make UNKNOWN_VALUE a public const (and remove AAS_UNKNOWN_VALUE here)
  • Move the logic for getting the instance unique identifier to libdatadog

("service", "DD_SERVICE"),
("env", "DD_ENV"),
("version", "DD_VERSION"),
("serverless_compat_version", "DD_SERVERLESS_COMPAT_VERSION"),
] {
if let Ok(val) = env::var(env_var) {
pairs.push((tag_name, val));
}
Comment on lines +39 to +49
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

build_enhanced_metrics_tags concatenates raw environment variable values into a comma-separated tag string and then parses it. If any value contains a comma, it will be split into multiple tags (producing incorrect tags or parse failures). Consider sanitizing/escaping tag values (or dropping values containing reserved delimiters like ,/|) before building the tag list.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

build_tags(pairs)
}

fn build_tags(pairs: impl IntoIterator<Item = (&'static str, String)>) -> Option<SortedTags> {
let mut tags: Vec<Tag> = Vec::new();
for (key, value) in pairs {
if value.is_empty() {
continue;
}
// Tag::new validates the combined "key:value" string: it must be
// non-empty and not start or end with a colon
match Tag::new(key, &value) {
Ok(t) => tags.push(t),
Err(e) => warn!("Skipping invalid tag {key}:{value}: {e}"),
}
}
if tags.is_empty() {
return None;
}
let joined = tags
.iter()
.map(|t| t.as_ref())
.collect::<Vec<&str>>()
.join(",");
SortedTags::parse(&joined).ok()
}

#[cfg(test)]
mod tests {
use super::*;

#[test]
fn test_build_tags_returns_none_when_no_pairs() {
let pairs: Vec<(&'static str, String)> = Vec::new();
assert!(build_tags(pairs).is_none());
}

#[test]
fn test_build_tags_returns_none_when_all_values_empty() {
let pairs = vec![("service", String::new()), ("env", String::new())];
assert!(build_tags(pairs).is_none());
}

#[test]
fn test_build_tags_skips_empty_values() {
let pairs = vec![("service", String::new()), ("env", "dev".to_string())];
let tags = build_tags(pairs).unwrap().to_strings();
assert_eq!(tags, vec!["env:dev"]);
}

#[test]
fn test_build_tags_includes_all_nonempty_pairs() {
let pairs = vec![
("service", "svc-1".to_string()),
("env", "dev".to_string()),
("version", "1.2.3".to_string()),
];
let mut tags = build_tags(pairs).unwrap().to_strings();
tags.sort();
assert_eq!(tags, vec!["env:dev", "service:svc-1", "version:1.2.3"]);
}

#[test]
fn test_build_tags_rejects_trailing_colon_values() {
let pairs = vec![
("service", "svc-1:".to_string()),
("env", "dev".to_string()),
];
let tags = build_tags(pairs).unwrap().to_strings();
assert_eq!(tags, vec!["env:dev"]);
}
}
11 changes: 11 additions & 0 deletions crates/datadog-metrics-collector/src/lib.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
// Copyright 2023-Present Datadog, Inc. https://www.datadoghq.com/
// SPDX-License-Identifier: Apache-2.0

#![cfg_attr(not(test), deny(clippy::panic))]
#![cfg_attr(not(test), deny(clippy::unwrap_used))]
#![cfg_attr(not(test), deny(clippy::expect_used))]
#![cfg_attr(not(test), deny(clippy::todo))]
#![cfg_attr(not(test), deny(clippy::unimplemented))]

pub mod azure_instance;
pub mod azure_tags;
1 change: 1 addition & 0 deletions crates/datadog-serverless-compat/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ windows-pipes = ["datadog-trace-agent/windows-pipes", "dogstatsd/windows-pipes"]

[dependencies]
datadog-logs-agent = { path = "../datadog-logs-agent" }
datadog-metrics-collector = { path = "../datadog-metrics-collector" }
datadog-trace-agent = { path = "../datadog-trace-agent" }
libdd-trace-utils = { git = "https://github.com/DataDog/libdatadog", rev = "27aa92cfeeca073d8730a8b4974bd3fdef7ddf3a" }
datadog-fips = { path = "../datadog-fips", default-features = false }
Expand Down
Loading
Loading