Skip to content

fix: resolve disk-space exhaustion in backward-compat CI#6002

Open
Copilot wants to merge 2 commits into
masterfrom
copilot/fix-backward-compat-test
Open

fix: resolve disk-space exhaustion in backward-compat CI#6002
Copilot wants to merge 2 commits into
masterfrom
copilot/fix-backward-compat-test

Conversation

Copilot AI commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

The backward-compat-test CI job was failing with no space left on device when loading Docker images to the kind cluster. The backward-compat workflow consumes more disk than the regular e2e test because it first pulls old Fluid images via helm install from the chart repo before building and loading 10 new images—leaving ~36 MB free, not enough for docker save temp files.

Changes

  • .github/workflows/backward-compatibility-e2e.yml: Add a "Free disk space" step before the Docker build that removes unused pre-installed runner tools (dotnet ~1.5 GB, Android SDK ~6 GB, GHC, CodeQL) to reclaim headroom before any image work begins.

  • .github/scripts/build-all-images.sh: Remove each image from local Docker storage immediately after loading it into kind. Once an image is in the kind node, the local copy is not needed—keeping it only contends for the same disk space required by docker save for the next image.

for img in "${images[@]}"; do
    echo "Loading image ${img} to kind cluster..."
    kind load docker-image "${img}" --name "${KIND_CLUSTER}"
    echo "Removing local image ${img} to free disk space..."
    docker rmi "${img}" || echo "Warning: Failed to remove image ${img}, continuing..."
done

@fluid-e2e-bot

fluid-e2e-bot Bot commented Jun 14, 2026

Copy link
Copy Markdown

Hi @Copilot. Thanks for your PR.

I'm waiting for a fluid-cloudnative member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copilot AI changed the title [WIP] Fix failing GitHub Actions job backward-compat-test (v1.28.15) fix: resolve disk-space exhaustion in backward-compat CI Jun 14, 2026
Copilot AI requested a review from RongGu June 14, 2026 10:43
@sonarqubecloud

Copy link
Copy Markdown

@cheyang cheyang marked this pull request as ready for review June 15, 2026 08:59
@codecov

codecov Bot commented Jun 15, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 63.56%. Comparing base (da69206) to head (2758112).
⚠️ Report is 4 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #6002   +/-   ##
=======================================
  Coverage   63.56%   63.56%           
=======================================
  Files         479      479           
  Lines       33276    33276           
=======================================
  Hits        21151    21151           
  Misses      10445    10445           
  Partials     1680     1680           

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@cheyang

cheyang commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

Looks good. This is a standard and well-tested pattern for reclaiming disk space on GitHub Actions runners.

The two-pronged approach works well:

  1. Pre-build runner cleanup removes ~14GB of pre-installed toolchains (.NET, Android SDK, GHC, CodeQL) that this workflow never uses.
  2. Eager per-image removal in build-all-images.sh frees each Docker image right after loading it into kind, preventing accumulation during the loop.

Verified:

  • Only CI files touched (.github/scripts/ and .github/workflows/)
  • No secret exfiltration or unexpected network calls
  • docker image prune --all --force is placed after kind cluster creation, so the kindest/node image is protected by its running container
  • Error handling (|| echo "Warning...") is appropriate and won't break the pipeline

LGTM — this should unblock the backward-compat CI jobs that have been running out of disk.

@fluid-e2e-bot

fluid-e2e-bot Bot commented Jun 16, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

Approval requirements bypassed by manually added approval.

This pull-request has been approved by:

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants