ci: file GitHub issues when scheduled, outerloop, and quarantine CI fails by radical · Pull Request #18058 · microsoft/aspire

radical · 2026-06-09T17:32:04Z

What's missing

Scheduled GitHub Actions workflows fail silently. GitHub only emails whoever last edited the workflow file, so a broken nightly automation job — or a failing outerloop / quarantine run — can sit unnoticed for days. There's no durable, deduplicated signal that "this scheduled thing is broken."

What this adds

Two complementary, fully deterministic mechanisms (ordinary Actions + JS — no model budget) on one shared engine:

Shared engine — tracking-issue.js
One deduplicated issue per subject, identified by a hidden HTML-comment marker, carrying a fenced run-table that accrues one row per failed run (append, cap at 50, collapse overflow). Repo/label/workflow-agnostic; owns the mechanics, callers own content + policy.

Scanner — monitor-scheduled-workflows.yml (every 2h)
Watches a config-driven list of otherwise-silent scheduled automation workflows. On a failed latest scheduled run of main, files/appends one automation-broken issue per workflow; auto-closes on the next green run. The watch-list lives in monitor-scheduled-workflows.config.json (one flag stops watching a workflow).

In-pipeline reporter (embedded in tests-outerloop.yml / tests-quarantine.yml)
On a scheduled failure, files either a failing-test issue listing the failed tests, or an automation-broken infra issue when the run broke before producing results. Quarantine swallows test failures upstream, so a failed quarantine run is always infra.

GenerateTestSummary --failed-tests-json
Extracts distinct failed test names from TRX ({ failedTests, count, extractionFailed }), so the reporter can tell a real test failure from an infrastructure break.

Why two mechanisms, not one

The scanner only knows "the latest run failed." Distinguishing an infrastructure break from a genuine test failure needs the run's TRX results, which are only available inside the pipeline. So outerloop/quarantine get an embedded reporter that reads their TRX; the rest of the silent automations get the external scanner.

Design notes

Pure decision/formatting logic lives in unit-tested modules (node harnesses invoked from xUnit, tests/Infrastructure.Tests/WorkflowScripts/); the thin octokit runners are intentionally not unit-tested.
Dedup is via hidden markers. The two mechanisms use disjoint markers (automation-broken:<file> vs ci-failure:<file>:<kind>), so they never manage each other's issues.
Only scheduled runs file issues — PR-triggered runs (e.g. editing a workflow) never do.

Call-outs

The paths: filters on the quarantine/outerloop workflows mean those test workflows will run on this PR — verify they pass before merging.

…ine fail Scheduled GitHub Actions workflows fail silently. GitHub only emails whoever last edited the workflow file, so a broken nightly job (generate diffs, refresh manifests, update models, deployment cleanup) — or a broken outerloop / quarantine run — can sit unnoticed for days. Adds two complementary mechanisms, both filing/appending a deduplicated issue per source and following the repo's workflow-script convention (pure JS logic + Node harness + xUnit behavior test + doc). Scanner (monitor-scheduled-workflows.yml, every 2h) - Watches the 14 currently-silent automation workflows. On a failed latest run of `main`, files/appends one `automation-broken` issue per workflow; closes it automatically on the next green run. - Excludes workflows that already self-notify, workflow_call building blocks, and the agentic *.lock.yml workflows. In-pipeline reporter (tests-outerloop.yml, tests-quarantine.yml) - Embedded in the test pipelines because telling an infrastructure break from a test failure needs the run's test results. - Outerloop: downloads the run's TRX, extracts failed test names, and files a `failing-test` issue listing them — or an `automation-broken` infra issue when the run broke before producing results. - Quarantine passes ignoreTestFailures, so failing quarantined tests never red the run (run-tests.yml swallows them and only checks TRX were produced). A failed quarantine run is therefore always infrastructure, and only an `automation-broken` issue is filed. - Dedup is one open issue per (workflow, kind) with a row appended per failed run; a human closes it once fixed (no auto-close-on-green). Markers use a distinct `ci-failure:` prefix so the scanner and reporter never manage each other's issues. Only scheduled runs file issues; PR-triggered runs do not. To support the reporter, GenerateTestSummary gains a `--failed-tests-json` mode that lists distinct failed test names (Failed/Error/Timeout) from a set of TRX files. Docs: docs/ci/monitor-scheduled-workflows.md, docs/ci/specialized-test-failure-issues.md. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Captures the design direction for follow-on CI-health and test-triage automation so the individual pieces can be designed and built separately: - CI-health report (repo-pulse-shaped, read-only) — do first. - Auto-quarantine / un-quarantine driven by a deterministic flakiness analyzer over per-test history kept in gh-aw repo-memory (durable), with the human-facing quarantine PR authored via capped safe-outputs. - CI-health that launches work (run-time trends, transient-failure rerun-list additions) plus a self-tuning feedback companion, modeled on dotnet/runtime's ci-failure-scan pair. The doc consumes the existing failure-issue plumbing (automation-broken / ci-failure / failing-test issues) rather than rebuilding it, and records two findings from verifying current state: - Agentic (gh-aw) billing is already resolved for this repo — repo-pulse, a purely agentic workflow, succeeds daily. The blocker is capacity, not enablement: milestone-changelog.lock.yml is currently failing on the per-run effective-tokens rate limit. - Agentic .lock.yml workflows are a monitoring blind spot — the scheduled-workflow scanner excludes them, so milestone-changelog's ~24h regression surfaced nothing. The CI-health report should treat agentic-workflow run health as a first-class monitored surface. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Addresses five issues found in review of the failure-issue automation: 1. (High) GenerateTestSummary --failed-tests-json read TRX via TrxReader.GetTestResultsFromTrx, which TimeSpan.Parse-es the UnitTestResult startTime/endTime attributes. Real MTP --report-trx files emit those as ISO-8601 DateTimeOffset, on which TimeSpan.Parse throws FormatException — the per-file catch then skipped every real .trx and reported zero failures, so outerloop filed an infra issue with no test list instead of a failing-test issue. Switch to GetDetailedTestResultsFromTrx (no timestamp parsing; resolves the fully-qualified CanonicalName). The existing tests passed only because TestTrxBuilder did not emit startTime/endTime; add that capability and a regression test that fails on the old reader. 2. (Medium) A genuinely-red outerloop run whose results could not be extracted (artifact-download or tool flake) was indistinguishable from "zero failed tests" and was misfiled as infra, losing the failing-test list and opening a second mismatched issue. The extract step now records extractionFailed, and classifyFailure treats an unknown-result red run as test-failures (with a "could not enumerate — see artifacts" comment) rather than infra. Drops the `|| echo count:0` mask. 3. (Medium) The scheduled-workflow scanner runs every 2h but most watched workflows run less often, so the same still-latest failed run was re-appended and re-commented on every tick. decideAction now no-ops when the latest failed run is already recorded in the issue body. 4. (Medium) Both reporter jobs set a permissions block without contents: read, then run actions/checkout. Add contents: read. 5. (Low) Failed test names were inserted into single-backtick Markdown unescaped; a name containing a backtick/newline could break out of the code span and inject Markdown. Normalize newlines and wrap in a fence longer than any backtick run in the name. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The scheduled-workflow scanner and the specialized-test reporter each carried their own copy of the same issue mechanics — marker-based dedup lookup, the fenced failures/runs table (parse, append, cap, collapse), and the octokit create/append/comment/close calls. That duplication is where the recent review found bugs, and a third consumer (a CI-health report) is on the roadmap. Extract the reusable, repo-agnostic mechanics into a single engine, tracking-issue.js: - pure: findOpenIssueForMarker (oldest-wins), tableHeader, parseTable, appendRow (cap + collapse), nextIndex, bodyIncludesRun - octokit primitives: ensureLabel, listOpenIssuesByLabel, createIssue, updateIssueBody, addComment, closeIssue The engine holds no repo/label/workflow/product specifics. Each consumer keeps only its own content and decisions (marker namespace, titles, body, row layout, classify) and composes the engine via thin wrappers, so their tested public contracts are unchanged. The collapse summary noun is unified to "N earlier rows omitted" across both. The scanner's inline github-script orchestration moves into monitor-scheduled-workflows-runner.js (mirroring the reporter's orchestrator), and its watch list moves out of the script into monitor-scheduled-workflows.config.json — an array of { file, name, enabled } entries. Watching can be turned off per workflow with "enabled": false without deleting the entry. Adds TrackingIssueTests (+ harness) for the engine and a selectEnabled test for the config filter; adjusts the two consumers' collapse-wording assertions. Net ~210 fewer lines in the consumers. All four workflow-script test classes pass (53 tests). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The roadmap covered only the agentic (gh-aw) direction. Capture the complementary deterministic work next to it: finish moving the repo's remaining bespoke failure-issue reporters onto the shared tracking-issue.js engine. - A: promote the test-run failure policy off the outerloop/quarantine shape and add a per-consumer content hook (folds into this branch). - B/C: migrate deployment-tests.yml and tests-daily-smoke.yml from their hand-rolled, date-titled create-issue jobs to engine consumers, using GenerateTestSummary --failed-tests-json instead of inline TRX parsing. - D: file a "main is red" issue from the auto-rerun tail (reuse its classifier), not a higher-frequency cron scanner. Documents the embedded-vs-observer mechanism split and the boundary with the agentic scanner (no classification intelligence in the deterministic layer). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- A `timed_out` run DOES file an issue (it is in FAILURE_CONCLUSIONS in monitor-scheduled-workflows.js), but the doc listed timeouts among the ignored conclusions alongside cancelled/skipped. Corrected, and noted that timeouts are treated as failures. - The initial issue is created without a comment (only failures after the first post comments), so "full history preserved in the issue's comments" overstated it. Reworded the row-collapse note accordingly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

H1 (monitor watchdog acted on non-scheduled runs): listWorkflowRuns filtered only branch/status, so the "latest completed run" could be a workflow_dispatch or push run. A manual/push success would auto-close a real scheduled-failure issue (masking the silent failure the watchdog exists to catch); a manual/push failure would file a false issue. Add event: 'schedule' to the request. M1 (specialized reporter double-posted on re-run): unlike the monitor, the reporter appended a row + posted a comment without checking whether the run was already recorded. context.runId is stable across "Re-run all jobs", so a re-run duplicated the row and re-fired the notification. Add a tested isRunRecorded guard (mirrors the monitor) and no-op when the run is already in the body. M2 (red run misfiled as infra when all TRX unreadable): GenerateTestSummary caught per-file parse errors and still exited 0 with count:0, never signaling extraction failure, so classifyFailure filed an infra issue and dropped the failing-test list. Emit extractionFailed:true when reads errored and zero failures were collected; the reporter then classifies it as a test failure. Adds falsifiable tests for M1 and M2; updates the reporter doc shape. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions · 2026-06-09T17:33:23Z

🚀 Dogfood this PR with:

⚠️ WARNING: Do not do this without first carefully reviewing the code of this PR to satisfy yourself it is safe.

curl -fsSL https://raw.githubusercontent.com/microsoft/aspire/main/eng/scripts/get-aspire-cli-pr.sh | bash -s -- 18058

Or

Run remotely in PowerShell:

iex "& { $(irm https://raw.githubusercontent.com/microsoft/aspire/main/eng/scripts/get-aspire-cli-pr.ps1) } 18058"

…-issues

The roadmap captured only proposed/future work (the agentic gh-aw direction and a deterministic-consolidation plan), which does not belong in the PR that implements the failure-issue automation. The implemented mechanisms remain documented in docs/ci/monitor-scheduled-workflows.md and docs/ci/specialized-test-failure-issues.md. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…ests Self-review hardening of the new scheduled/outerloop/quarantine failure-issue automation. Two issues could silently drop the failing-test signal the automation exists to surface: - The specialized-test reporter wrote the run URL (the dedup key) into the issue body before posting the comment that carries the failing-test list. A transient addComment failure left the run recorded, so the re-run short-circuited via isRunRecorded and never posted the list. Now the issue is created with an empty table, the comment is posted first, and the run is recorded last - a failed comment leaves the run unrecorded so the next run re-posts. - GenerateTestSummary --failed-tests-json omitted "Aborted" from the failed-outcome set (the sibling CreateFailingTestIssue includes it), so a red run whose only failures aborted reported zero failures and was misfiled as infra. Added "Aborted". Lower-severity hardening: - Treat a missing FAILED_TESTS_PATH on a red outerloop run as an extraction failure (test-failures, not infra). - Filter pull requests out of listForRepo so a labelled PR carrying a tracking marker is never mistaken for the managed issue. - Skip the ensureLabel mutation under dry-run. - Don't post a notification comment when the managed table fence is missing (the run can't be recorded, so commenting would re-notify every tick). - Soften the findOpenIssueForMarker comment to match actual behavior; add the trailing newline the editorconfig rule requires. Adds fake-octokit harnesses and integration tests for both runners (previously untested), covering the comment-before-record ordering, the dry-run no-mutation contract, the corrupted-table guard, PR filtering, and the missing-results-path path. Updates the specialized-test docs for the Aborted outcome and the missing-path extractionFailed synthesis. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…-issues

…engine The scheduled Deployment E2E (deployment-tests.yml) and Daily CLI Smoke (tests-daily-smoke.yml) workflows each carried ~70 lines of bespoke inline github-script that filed a brand-new issue per day a failure persisted, deduped on a date-in-title substring match. A breakage that lasted a week produced seven issues. Migrate both onto the existing repo-agnostic tracking-issue.js engine as a third consumer, mirroring the specialized-test reporter (specialized-test-failure-runner.js). Now one rolling deduplicated issue is kept open per workflow (marker `ci-failure:<file>:scheduled`), a row is appended per failed run, and a human closes it once fixed (no auto-close-on-green, matching the specialized reporter). Each issue carries the workflow's existing labels plus automation-broken. The runner looks issues up by automation-broken and always force-includes it in the create call, so the lookup key and the created labels cannot drift — a caller that omits it cannot strand an issue the next run fails to find and then duplicate. The label query is a superset shared with the scanner and the specialized reporter; the per-workflow marker keeps the three from managing each other's issues. The smoke suite's per-route CLI versions now ride on the notification comment (per-run detail) rather than being baked into the body; its artifact-parsing helpers stay inline in the workflow since they are smoke-specific. Both reporter jobs gain a `github.repository_owner == 'microsoft'` guard and check out `main` so the local .js modules load (and forks stay quiet). New pure logic (report-pipeline-failure.js) and the network orchestrator (pipeline-failure-runner.js) are unit-tested via Node harnesses by ReportPipelineFailureTests and PipelineFailureRunnerTests. Docs added at docs/ci/pipeline-failure-issues.md; the scanner doc's "already self-notify" note now points at the shared reporter. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…-issues

The failure-issue automation kept per-run history in a managed markdown table inside the issue body, and each of the three runners (scheduled-workflow scanner, nightly-pipeline reporter, specialized-test reporter) re-implemented the same find-or-create / append-row / dedup / notify loop. That loop held the trickiest invariant in the system — notify-first, record-last, plus a missing-fence guard — copy-pasted in three places, where a fix to one would miss the others. Replace the table with a comment-per-failed-run model and consolidate the loop into one shared engine function: - tracking-issue.js drops all table mechanics (parse/append/cap/collapse, body rewriting) and gains recordRun: find-or-create the issue, then post the failure comment unless this run is already recorded. Dedup is a hidden `` marker scanned across comments — never collapsed or truncated, so strictly more robust than the old body check. - The issue body is now a static description written once at filing. - All three runners delegate to recordRun. The scanner shares it for its record path (close-on-green stays unique); decideAction collapses to record/close/noop now that dedup lives in the engine. - Content modules keep only what differs (markers, titles, classification, comment formatting); the table/row forwarders are gone. Tests move the dedup/ordering coverage into TrackingIssueTests (the one place it now lives) and assert comments instead of table rows. Docs updated to the comment-timeline model. No workflow YAML changes — runner signatures are unchanged. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Follow-up cleanup on the comment-based failure-issue automation. Two changes, no behavior change: - Extract a shared buildBody({ marker, lead, note }) skeleton into tracking-issue.js so all three consumers assemble identical issue-body structure (marker placement, spacing) and supply only the lead and the closing/docs note. - Merge the two thin orchestrators back into the modules they belong to. Once recordRun absorbed the dedup/notify loop, pipeline-failure-runner.js and monitor-scheduled-workflows-runner.js were just glue: read inputs, build the run/comment, call recordRun. Each is now a report()/run() export on its content module (report-pipeline-failure.js, monitor-scheduled-workflows.js), dropping the reporter-indirection and the two-require dance at the YAML call sites. The specialized-test family stays split: its runner owns TRX/fs reads and failure classification, so keeping that out of the pure formatters preserves a real "no I/O here" boundary. Docs, test doc-comments, and the integration harnesses are updated to the merged modules. Runtime js files: 7 -> 5. Tests unchanged in substance and green (61/61). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The pipeline and scanner runner modules were folded into their reporter modules, but their integration harnesses and test classes still carried the "-runner" name, pointing at modules that no longer exist. Rename to match what they now drive (report()/run()): - pipeline-failure-runner.harness.js -> report-pipeline-failure.integration.harness.js - monitor-scheduled-workflows-runner.harness.js -> monitor-scheduled-workflows.integration.harness.js - PipelineFailureRunnerTests -> ReportPipelineFailureIntegrationTests - MonitorScheduledWorkflowsRunnerTests -> MonitorScheduledWorkflowsIntegrationTests Updates harness paths, NodeCommand labels, stale header comments, and doc references. No test logic changes; 61/61 green. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

This PR adds infrastructure to surface failures in scheduled GitHub Actions workflows that otherwise fail silently (only emailing the last file editor). It implements three coordinated mechanisms: (1) an external scanner (monitor-scheduled-workflows.yml) polling automation workflows every 2h, (2) an in-pipeline reporter for outerloop/quarantine test workflows that inspects TRX results to classify test failures vs. infrastructure breaks, and (3) a nightly-pipeline reporter for deployment and smoke test workflows. All share a generic, repo-agnostic tracking-issue engine (tracking-issue.js) for deduplicated issue management with per-run comment recording.

Changes:

Adds a shared tracking-issue.js engine for marker-based dedup, find-or-create + comment-dedup orchestration, and closes-on-green lifecycle, consumed by three distinct reporters.
Adds --failed-tests-json mode to GenerateTestSummary that extracts distinct failed test names from TRX files, enabling the outerloop reporter to distinguish test failures from infrastructure breaks.
Replaces the per-day issue-filing logic in deployment-tests.yml and tests-daily-smoke.yml with the deduplicated single-issue-per-workflow approach, and adds new reporter jobs to tests-outerloop.yml and tests-quarantine.yml.

Reviewed changes

Copilot reviewed 31 out of 31 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
`.github/workflows/tracking-issue.js`	Shared engine for tracking issues: marker-based dedup, recordRun orchestration, octokit primitives
`.github/workflows/report-specialized-test-failures.js`	Pure helpers for outerloop/quarantine: classify failure, build markers/titles/bodies/comments
`.github/workflows/specialized-test-failure-runner.js`	Network orchestrator that reads TRX results and delegates to the engine
`.github/workflows/report-pipeline-failure.js`	Nightly-pipeline reporter with `report()` orchestrator for deployment/smoke workflows
`.github/workflows/monitor-scheduled-workflows.js`	Watchdog: reads config, polls workflows, files/comments/closes issues
`.github/workflows/monitor-scheduled-workflows.yml`	Workflow definition for the 2-hourly scheduled scanner
`.github/workflows/monitor-scheduled-workflows.config.json`	Watch-list configuration (14 workflows)
`.github/workflows/tests-outerloop.yml`	Adds `report_failures` job that downloads TRX, extracts failures, files issues
`.github/workflows/tests-quarantine.yml`	Adds `report_infra_failure` job (always infra since test failures are swallowed)
`.github/workflows/deployment-tests.yml`	Replaces per-day issue logic with deduplicated `report-pipeline-failure.js` approach
`.github/workflows/tests-daily-smoke.yml`	Replaces per-day issue logic with deduplicated reporter; per-run CLI versions ride on comment
`tools/GenerateTestSummary/Program.cs`	Adds `--failed-tests-json` option producing `{ failedTests, count, extractionFailed }`
`tests/Infrastructure.Tests/Shared/TestTrxBuilder.cs`	Adds optional `StartTime`/`EndTime` to test TRX generation
`tests/Infrastructure.Tests/GenerateTestSummary/GenerateTestSummaryToolTests.cs`	Tests for `--failed-tests-json` (5 new scenarios)
`tests/Infrastructure.Tests/WorkflowScripts/TrackingIssueTests.cs`	Unit tests for the shared tracking-issue engine
`tests/Infrastructure.Tests/WorkflowScripts/tracking-issue.harness.js`	Node harness for tracking-issue engine tests
`tests/Infrastructure.Tests/WorkflowScripts/ReportSpecializedTestFailuresTests.cs`	Unit tests for specialized reporter pure helpers
`tests/Infrastructure.Tests/WorkflowScripts/report-specialized-test-failures.harness.js`	Node harness for specialized reporter tests
`tests/Infrastructure.Tests/WorkflowScripts/SpecializedTestFailureRunnerTests.cs`	Integration tests for the runner's orchestration
`tests/Infrastructure.Tests/WorkflowScripts/specialized-test-failure-runner.harness.js`	Node harness for runner integration tests
`tests/Infrastructure.Tests/WorkflowScripts/ReportPipelineFailureTests.cs`	Unit tests for pipeline failure reporter pure helpers
`tests/Infrastructure.Tests/WorkflowScripts/report-pipeline-failure.harness.js`	Node harness for pipeline failure helper tests
`tests/Infrastructure.Tests/WorkflowScripts/ReportPipelineFailureIntegrationTests.cs`	Integration tests for `report()` orchestrator
`tests/Infrastructure.Tests/WorkflowScripts/report-pipeline-failure.integration.harness.js`	Node harness for pipeline failure integration tests
`tests/Infrastructure.Tests/WorkflowScripts/MonitorScheduledWorkflowsTests.cs`	Unit tests for watchdog pure helpers
`tests/Infrastructure.Tests/WorkflowScripts/monitor-scheduled-workflows.harness.js`	Node harness for watchdog helper tests
`tests/Infrastructure.Tests/WorkflowScripts/MonitorScheduledWorkflowsIntegrationTests.cs`	Integration tests for watchdog `run()` orchestrator
`tests/Infrastructure.Tests/WorkflowScripts/monitor-scheduled-workflows.integration.harness.js`	Node harness for watchdog integration tests
`docs/ci/specialized-test-failure-issues.md`	Documentation for the outerloop/quarantine reporter
`docs/ci/pipeline-failure-issues.md`	Documentation for the nightly-pipeline reporter
`docs/ci/monitor-scheduled-workflows.md`	Documentation for the scheduled-workflow scanner

+The network orchestrator (`specialized-test-failure-runner.js`) is a thin wiring
+layer over the engine and is not unit-tested.


github-actions · 2026-06-15T23:47:03Z

Retrying the failed CI jobs for this pull request from the CI run attempt. The rerun is being tracked in the rerun attempt.

…fakes Review fixes for the scheduled/outerloop/quarantine failure-issue scripts. GenerateTestSummary --failed-tests-json: the doc comment for extractionFailed contradicted the implementation. It claimed a clean read of *some* .trx made the result trustworthy, but the code flags any run where at least one .trx is unreadable and no failures were collected -- a "zero failures" result that cannot be trusted, since the unreadable file may have held the failures. Rewrote the comment to match the code and added two regression tests pinning the previously untested partial-corrupt cases. Also added Aborted to the failed-outcome doc line (it was already in the set). Test fakes: the in-memory octokit fakes ignored the production query filters, so a regression dropping them would not fail any test. - listForRepo fakes now require the lookup-label filter that listOpenIssuesByLabel passes. - The watchdog's listWorkflowRuns fake now requires branch=main, event=schedule, status=completed -- the filters that stop a manual/push run from auto-closing or falsely filing a scheduled-failure issue. tracking-issue.js: softened a comment that claimed all consumer workflows serialize via a concurrency group; tests-daily-smoke.yml has none and relies on schedule-only gating plus its daily cadence. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions · 2026-06-16T04:12:10Z

Retrying the failed CI jobs for this pull request from the CI run attempt. The rerun is being tracked in the rerun attempt.

The shared tracking-issue engine had no way to record, per issue, whether a watchdog may close it automatically when the subject's latest run goes green. Infra/automation issues should auto-close on green; test-failure issues should not (a single green run does not prove a flaky test fixed, and a dev mid-triage should not have the issue closed out from under them). Add three pure helpers to tracking-issue.js: - autoCloseStamp(autoClose) -> a hidden body comment . - buildBody({ ..., autoClose }) embeds the stamp right after the marker when provided, and omits it entirely when unset (no change for existing callers). - readAutoClose(body) returns true/false, or null when the stamp is missing or unparseable. readAutoClose returns null conservatively on purpose: callers MUST treat null as "do not auto-close" so a human-edited body, or an issue filed before stamping existed, is never closed automatically. This is foundational for the close-policy work; no consumer is wired yet. Tests: 8 new cases in TrackingIssueTests (buildBody embeds true/false/omits; readAutoClose true/false/null-missing/null-unparseable), all passing. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 31 out of 31 changed files in this pull request and generated no new comments.

github-actions · 2026-06-16T17:26:11Z

Retrying the failed CI jobs for this pull request from the CI run attempt. The rerun is being tracked in the rerun attempt.

A push to a protected branch (main, release/**) that fails CI leaves the branch red, but unlike a PR failure — which the author sees in their own checks — nobody necessarily owns it. ci.yml filed nothing automatically; red main was only tracked if someone noticed and ran /create-issue. Add a self-closing red-main reporter as a fourth consumer of the shared tracking-issue engine: - report-ci-failure.js files/updates one deduplicated issue per branch when a push is red, and closes it when a later push to that branch is green. - ci.yml gains two push-only jobs (job-level issues: write): report_ci_failure (if any dep failed) and resolve_ci_failure (if all deps succeeded). Design notes: - Per-branch keying: the marker embeds the ref (ci-failure:ci.yml:push:<ref>), so a red main and a red release/13.3 are separate issues rather than one collapsed issue. - Self-closing: ci.yml runs on every push, so a green push is the most timely "no longer red" signal — no external poll needed. The issue still carries the autoClose:true stamp so the scheduled watchdog can close it as a backstop. - The comment links the run but does NOT assert a failing-test list: a push can fail for non-test reasons (setup, build, a non-TRX job). - The if: conditions use explicit needs.*.result checks, not failure()/success(), because tests/stabilization_check are *skipped* on the no-relevant-changes path (skip_workflow). resolve requires all three to have genuinely succeeded so a docs-only push never closes a still-red issue without re-running CI; report fires only on a real failure. Tests: ReportCiFailureTests (pure helpers: per-branch marker, title, body stamped autoClose:true, comment) and ReportCiFailureIntegrationTests (reportFailure file/comment/dedup/per-branch isolation; resolveSuccess close/noop/other-branch-left-open), driven against an in-memory octokit fake. 12 new cases, all passing. Docs: docs/ci/ci-failure-issues.md, with the sibling failure-issue docs cross-linked. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The red-main reporter used two ci.yml jobs — report_ci_failure (if any dep failed) and resolve_ci_failure (if all deps succeeded) — duplicating the checkout and github-script boilerplate. Closing red-main genuinely must live in ci.yml: the scheduled watchdog only polls schedule-event runs, never push runs, so it cannot observe ci.yml going green (and cross-producer close-by-stamp is not built). But the open and close do not need to be two jobs. Fold them into one ci_failure_tracker job that runs on every push (if: always()) and dispatches inside the script: the workflow passes the aggregate result in CI_RED / CI_GREEN (derived from needs.*.result), and a new track() entry point calls reportFailure() on red, resolveSuccess() on green, and no-ops otherwise (a skipped no-relevant-changes push, or a cancelled run, is neither — so an open issue is left untouched). CI_RED is checked first so a real failure is always reported. Tests: 3 new track() integration cases (opens on red, closes on green, no-op on neither). report-ci-failure.js open/close logic is unchanged. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 38 out of 38 changed files in this pull request and generated no new comments.

Four scheduled workflows file their own failure issues in-pipeline via an if: failure() reporter job: tests-outerloop, tests-quarantine, tests-daily-smoke, deployment-tests. Two conclusions slip past that gate and were filed by nobody: - startup_failure: the run never started a job (bad YAML, an unresolvable uses:, a dead runner), so the reporter job itself never runs. - timed_out: a job-level timeout is cancelled-class, so failure() is false and the reporter job does not run. So the workflow that is *most* broken (won't even start) produced no issue. Teach the scheduled-workflow watchdog to backstop exactly these cases: - Add a per-entry selfReports flag to the watch-list. For such entries the monitor records ONLY startup_failure and timed_out — never plain failure, which the in-pipeline reporter owns under its ci-failure:<file> marker (recording it here would double-file under the automation-broken:<file> marker). - Add the four workflows as selfReports entries. Also: - Add per-entry labels[] support (e.g. area-cli, deployment-e2e), merged with automation-broken on the filed issue. - Stamp watchdog-filed issues autoClose:true (they already close on green; the stamp records the policy and lets a future cross-producer closer act). The backstop reuses the watchdog's existing latest-run model: a startup_failure is sticky (a broken workflow stays broken until fixed), and a timeout that recovered before the next tick is self-healed and intentionally not filed. Verified that timed_out does not trigger failure() against GitHub's status-check function docs, and that the in-pipeline runner assumes a 'failure' conclusion (specialized-test-failure-runner.js). Tests: pure decideAction backstop cases (record startup_failure/timed_out, noop on plain failure, close on green), autoClose-stamp and backstop-body assertions, and integration cases (per-entry labels filed on startup_failure; no issue on plain failure). 91 failure-issue tests pass. Docs updated across the three sibling failure-issue docs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

radical and others added 7 commits June 8, 2026 19:20

radical and others added 4 commits June 9, 2026 13:49

Merge remote-tracking branch 'origin/main' into ankj/workflow-failure…

63bef83

…-issues

Merge remote-tracking branch 'origin/main' into ankj/workflow-failure…

43104f1

…-issues

radical mentioned this pull request Jun 9, 2026

Tracking: CI improvements #18036

Open

15 tasks

radical and others added 6 commits June 14, 2026 14:49

Merge remote-tracking branch 'origin/main' into ankj/workflow-failure…

4933472

…-issues

Merge remote-tracking branch 'origin/main' into ankj/workflow-failure…

f4b89df

…-issues

Copilot AI review requested due to automatic review settings June 15, 2026 23:12

Copilot started reviewing on behalf of radical June 15, 2026 23:12 View session

Copilot AI reviewed Jun 15, 2026

View reviewed changes

Comment thread docs/ci/specialized-test-failure-issues.md

Comment on lines +115 to +116

The network orchestrator (`specialized-test-failure-runner.js`) is a thin wiring

layer over the engine and is not unit-tested.

Copilot AI review requested due to automatic review settings June 16, 2026 16:56

Copilot started reviewing on behalf of radical June 16, 2026 16:56 View session

Copilot AI reviewed Jun 16, 2026

View reviewed changes

github-actions Bot mentioned this pull request Jun 16, 2026

[repo-pulse] 📊 Repo Pulse — Daily Activity Dashboard #16404

Open

radical and others added 2 commits June 16, 2026 14:11

Copilot AI review requested due to automatic review settings June 16, 2026 18:22

Copilot started reviewing on behalf of radical June 16, 2026 18:23 View session

Copilot AI reviewed Jun 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: file GitHub issues when scheduled, outerloop, and quarantine CI fails#18058

ci: file GitHub issues when scheduled, outerloop, and quarantine CI fails#18058
radical wants to merge 22 commits into
microsoft:mainfrom
radical:ankj/workflow-failure-issues

radical commented Jun 9, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 9, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

github-actions Bot commented Jun 15, 2026

Uh oh!

github-actions Bot commented Jun 16, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

github-actions Bot commented Jun 16, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		The network orchestrator (`specialized-test-failure-runner.js`) is a thin wiring
		layer over the engine and is not unit-tested.

Conversation

radical commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What's missing

What this adds

Why two mechanisms, not one

Design notes

Call-outs

Uh oh!

github-actions Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

github-actions Bot commented Jun 15, 2026

Uh oh!

github-actions Bot commented Jun 16, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

github-actions Bot commented Jun 16, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

radical commented Jun 9, 2026 •

edited

Loading

github-actions Bot commented Jun 9, 2026 •

edited

Loading