Skip to content

feat: skip Claude step gracefully when Anthropic API is unavailable#29

Merged
sfreudenthaler merged 1 commit intomainfrom
feat/35328-graceful-api-unavailability
Apr 15, 2026
Merged

feat: skip Claude step gracefully when Anthropic API is unavailable#29
sfreudenthaler merged 1 commit intomainfrom
feat/35328-graceful-api-unavailability

Conversation

@sfreudenthaler
Copy link
Copy Markdown
Member

Summary

  • Add pre-flight API availability check before running claude-code-action
  • Skip the Claude step gracefully (warning, not failure) when the API returns 5xx or is unreachable
  • Belt-and-suspenders: continue-on-error: true + post-execution re-check distinguishes service outages from legitimate errors

Problem

When the Anthropic API is down, the Claude step fails with a hard error, blocking the entire CI pipeline. Example: dotCMS/core run 24461196854

API Error: 500 {"type":"error","error":{"type":"api_error","message":"Internal server error"}} · check status.claude.com

Solution

Two layers of protection in claude-executor.yml:

Layer 1 — Pre-flight check (catches most outages):

  • curl the /v1/models endpoint with a 15s timeout before running Claude
  • 5xx / network failures → available=false → skip Claude step → warn and succeed
  • Auth errors (401/403), rate limits (429) → available=true → proceed so action can surface the specific error

Layer 2 — Runtime protection (catches mid-execution degradation):

  • continue-on-error: true on the Claude step
  • Post-execution step checks if Claude failed
  • If failed AND API is now returning 500 → skip gracefully (service issue)
  • If failed AND API is now returning 200 → re-fail with "legitimate error" message

Test

Validated in dotCMS/core-workflow-test#460:

  • Pre-flight check correctly passes when API is available
  • Handle Claude execution result correctly re-fails for non-service errors (workflow validation failure in test PR)
  • The skip path is code-correct (would activate when API returns 5xx)

Consumer repos to update after merge

  • dotCMS/core — update @v2.0.0 → new tag
  • dotCMS/core-workflow-test — update @v2.0.0 → new tag

Fixes: dotCMS/core#35328

Add a pre-flight availability check before running claude-code-action.
When the Anthropic API returns 5xx or is unreachable, the step is skipped
with a warning rather than failing the pipeline. A second check after any
runtime failure distinguishes service outages from legitimate errors.

Fixes: dotCMS/core#35328

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@sfreudenthaler sfreudenthaler requested review from a team as code owners April 15, 2026 18:42
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6a0d3d308f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +136 to +137
*)
echo "::warning::Claude Code step failed due to API service degradation (HTTP $HTTP_CODE). Skipping gracefully."
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Fail workflow on non-service Claude errors

The post-failure classifier treats every non-200 response as a service outage, so legitimate errors like invalid/missing API key (401/403) or rate limiting (429) will now be silently downgraded to a warning and the job will succeed. This is a regression introduced by continue-on-error: true plus this broad * branch: when Run Claude Code fails for auth/config issues, Handle Claude execution result no longer re-fails the job, which can hide broken CI configuration.

Useful? React with 👍 / 👎.

@sfreudenthaler sfreudenthaler merged commit 5523a5a into main Apr 15, 2026
3 checks passed
@sfreudenthaler sfreudenthaler deleted the feat/35328-graceful-api-unavailability branch April 15, 2026 19:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CI/CD: Skip Claude AI Orchestrator step when Claude service is unavailable

1 participant