Skip to content

Conversation

@fregataa
Copy link
Member

@fregataa fregataa commented Jan 23, 2026

resolves #7536 (BA-3522)

  • Remove automatic cleanup of unused kernel metrics from collect_container_stat() to prevent race conditions where remove_kernel_metric() could fail when kernel info was already removed.
  • Add warning log when attempting to remove metrics for an unknown kernel.

Checklist: (if applicable)

  • Milestone metadata specifying the target backport version
  • Mention to the original issue
  • Installer updates including:
    • Fixtures for db schema changes
    • New mandatory config options
  • Update of end-to-end CLI integration tests in ai.backend.test
  • API server-client counterparts (e.g., manager API -> client SDK)
  • Test case(s) to:
    • Demonstrate the difference of before/after
    • Demonstrate the flow of abstract/conceptual models with a concrete implementation
  • Documentation
    • Contents in the docs directory
    • docstrings in public interfaces and type annotations

@fregataa fregataa added this to the 25.15 milestone Jan 23, 2026
@fregataa fregataa requested a review from HyeockJinKim January 23, 2026 09:42
@fregataa fregataa self-assigned this Jan 23, 2026
Copilot AI review requested due to automatic review settings January 23, 2026 09:42
@github-actions github-actions bot added size:XS ~10 LoC comp:agent Related to Agent component labels Jan 23, 2026
@fregataa fregataa requested a review from jopemachine January 23, 2026 09:43
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@fregataa fregataa marked this pull request as draft January 23, 2026 10:10
@github-actions github-actions bot added size:S 10~30 LoC and removed size:XS ~10 LoC labels Jan 23, 2026
@fregataa fregataa marked this pull request as ready for review January 23, 2026 10:19
Copy link
Collaborator

@HyeockJinKim HyeockJinKim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand—if this task isn't included, how does it break the behavior?

Remove automatic cleanup of unused kernel metrics from
collect_container_stat() to prevent race conditions where
remove_kernel_metric() could fail when kernel info was already
removed. Add warning log when attempting to remove metrics for
an unknown kernel.
@HyeockJinKim HyeockJinKim added this pull request to the merge queue Jan 28, 2026
Merged via the queue into main with commit 3a53f92 Jan 28, 2026
30 checks passed
@HyeockJinKim HyeockJinKim deleted the fix/BA-3522 branch January 28, 2026 05:38
lablup-octodog pushed a commit that referenced this pull request Jan 28, 2026
Backported-from: main (26.1)
Backported-to: 26.1
Backport-of: 8250
lablup-octodog pushed a commit that referenced this pull request Jan 28, 2026
Backported-from: main (26.1)
Backported-to: 25.15
Backport-of: 8250
github-merge-queue bot pushed a commit that referenced this pull request Jan 28, 2026
fregataa added a commit that referenced this pull request Jan 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp:agent Related to Agent component size:S 10~30 LoC

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix immediate timing issue in kernel metrics cleanup

3 participants