Add downstream BGP LGW tech preview jobs for CNO/OVNK 4.22 dev branch with FRR10#74160
Add downstream BGP LGW tech preview jobs for CNO/OVNK 4.22 dev branch with FRR10#74160jechen0648 wants to merge 6 commits intoopenshift:mainfrom
Conversation
|
/test all |
7bc14a4 to
a3541c4
Compare
|
/pj-rehearse pull-ci-openshift-ovn-kubernetes-release-4.22-e2e-metal-ipi-ovn-dualstack-bgp-local-gw-techpreview |
|
@kyrtapz: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
kyrtapz
left a comment
There was a problem hiding this comment.
Thanks @jechen0648!
The changes overall look good but I have a few suggestions.
There was a problem hiding this comment.
Did you consider naming this something like FRR_IMAGE and using the value as the image itself?
I think this would give us slightly better visibility when looking at the job definition.
There was a problem hiding this comment.
@jcaamano @jechen0648 I am wondering if there is any particular reason that ties us to this hard-coded version? There is already 10.4.2 and 10.5.1 available
There was a problem hiding this comment.
Making CNO unmanaged should work but it has it's drawbacks, we won't reconcile the manifests if something unexpected happens in the cluster.
To avoid that potential surprise in the future I would like to suggest modifying the CNO deployment instead by unmanaging it from CVO, something like this:
$KCLI patch clusterversion version --type json -p '[{"op":"add","path":"/spec/overrides","value":[]},{"op":"add","path":"/spec/overrides/-","value":{"kind":"Deployment","name":"network-operator","group":"apps","namespace":"openshift-network-operator","unmanaged":true}}]'
$KCLI set env deployment/network-operator -n openshift-network-operator FRR_K8S_IMAGE=$FRR_IMAGE
...wait for cno to redeploy...
...wait for frr to be ready with the new image...
|
@kyrtapz Thanks for your review comments, I made changes into 4th commit, please review it, if it is ok to you, I will squash it into previous commits, so fix goes to original commit like what Surya prefers. Thanks a lot! |
3d080bc to
2fc55d5
Compare
|
/pj-rehearse pull-ci-openshift-ovn-kubernetes-release-4.22-e2e-metal-ipi-ovn-dualstack-bgp-local-gw-techpreview |
|
@jechen0648: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
@jechen0648 please try to verify whether the job that ran:
|
2fc55d5 to
14e1ff3
Compare
|
/retest |
Add support for overriding FRR-K8s with an upstream FRR 10 image (quay.io/frrouting/frr:10.4.1) needed to support EVPN while waiting for OCP builds with FRR 10. When USE_UPSTREAM_FRR_IMAGE=true: - Set CNO to Unmanaged - Update frr-k8s daemonset with FRR 10.4.1 image - Wait for rollout - Re-enable CNO management - Wait for network operator to be healthy This will be removed once OCP builds with FRR 10 are available. Signed-off-by: Jean Chen <jechen@redhat.com>
… 4.22 Add a new tech preview BGP local gateway test job that uses upstream FRR 10.4.1 image to support EVPN testing while waiting for OCP builds with FRR 10. The job is optional and not always-run. Signed-off-by: Jean Chen <jechen@redhat.com>
…r 4.22 Add a new tech preview BGP local gateway test job that uses upstream FRR 10.4.1 image to support EVPN testing while waiting for OCP builds with FRR 10. The job is optional and not always-run. Signed-off-by: Jean Chen <jechen@redhat.com>
14e1ff3 to
a170be1
Compare
Signed-off-by: Jean Chen <jechen@redhat.com>
a170be1 to
9fea758
Compare
|
/pj-rehearse pull-ci-openshift-ovn-kubernetes-release-4.22-e2e-metal-ipi-ovn-dualstack-bgp-local-gw-techpreview |
|
@jechen0648: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
jcaamano
left a comment
There was a problem hiding this comment.
We will also need this change:
| # Override FRR-K8s with a custom image | ||
| # This is used while waiting for OCP builds with FRR 10 | ||
| # This will be removed once OCP builds with FRR 10 are available | ||
| if [ -n "${FRR_IMAGE:-}" ]; then |
There was a problem hiding this comment.
This is part of a script that is being executed remotely through ssh. You will see this line above:
ssh "${SSHOPTS[@]}" "root@${IP}" bash -x - << 'EOFTOP'
That means that ${FRR_IMAGE:-} is seen as a local variable in the remote host which is undefined and won't ever enter the if block.
You need to pass in the value in the ssh command above:
ssh "${SSHOPTS[@]}" "root@${IP}" "FRR_IMAGE='$FRR_IMAGE'" bash -x - << 'EOFTOP'
There was a problem hiding this comment.
@jechen0648 you can check the logs to see this code never gets executed: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_release/74160/rehearse-74160-pull-ci-openshift-ovn-kubernetes-release-4.22-e2e-metal-ipi-ovn-dualstack-bgp-local-gw-techpreview/2019848571280429056/artifacts/e2e-metal-ipi-ovn-dualstack-bgp-local-gw-techpreview/baremetalds-e2e-ovn-bgp-pre/build-log.txt
|
@jechen0648 Would be good to change |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: jechen0648 The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
dee917a to
9ef6d22
Compare
|
/pj-rehearse pull-ci-openshift-ovn-kubernetes-release-4.22-e2e-metal-ipi-ovn-dualstack-bgp-local-gw-techpreview |
|
@jcaamano: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
| cluster_profile: equinix-ocp-metal | ||
| env: | ||
| FEATURE_SET: TechPreviewNoUpgrade | ||
| FRR_IMAGE: quay.io/metallb/frr-k8s:v0.0.21 |
There was a problem hiding this comment.
This should say
| FRR_IMAGE: quay.io/metallb/frr-k8s:v0.0.21 | |
| FRR_IMAGE: quay.io/frrouting/frr:10.4.1 |
|
/pj-rehearse pull-ci-openshift-cluster-network-operator-release-4.22-e2e-metal-ipi-ovn-dualstack-bgp-local-gw-techpreview |
|
@kyrtapz: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
Signed-off-by: Jean Chen <jechen@redhat.com>
9ef6d22 to
71b0c0f
Compare
|
/retest-required |
|
/pj-rehearse pull-ci-openshift-cluster-network-operator-release-4.22-e2e-metal-ipi-ovn-dualstack-bgp-local-gw-techpreview |
|
@jcaamano: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
There was a problem hiding this comment.
So this won't work unfortunately.
The FRR daemonset uses a custom OCP image for all of its containers.
This means some contaieners can work with an upstream frr image and some cannot, there is no distinction in cno as all of them use FRR_K8S_IMAGE.
In that case we need to move back to making CNO unmanged and only replace frr/reloader container images.
$KCLI patch Network.operator.openshift.io cluster --type='merge' \
-p='{"spec":{"managementState":"Unmanaged"}}'
# Update the FRR and reloader container images
$KCLI set image daemonset/frr-k8s -n openshift-frr-k8s \
frr=${FRR_IMAGE} \
reloader=${FRR_IMAGE}
echo "Waiting for daemonset 'frr-k8s' to rollout with new image..."
until $KCLI rollout status daemonset -n openshift-frr-k8s frr-k8s --timeout 2m &> /dev/null; do
sleep 5
done
Sorry for the initial suggestion.
|
@kyrtapz I don't think this approach is working: replacing the FRR image in all frr-k8s containers. It might be better to just do the original approach of replacing the frr image for frr and reloader contaainers. @jechen0648 can you try the originally proposed approach? |
Yeah: #74160 (comment) |
CNO uses FRR_K8S_IMAGE for all frr-k8s containers; some containers do not work with an upstream FRR image. Make Network.operator Unmanaged and set only the frr and reloader container images via oc set image on the daemonset, instead of overriding the network-operator deployment and FRR_K8S_IMAGE. Signed-off-by: Jean Chen <jechen@redhat.com>
|
[REHEARSALNOTIFIER]
A total of 98 jobs have been affected by this change. The above listing is non-exhaustive and limited to 25 jobs. A full list of affected jobs can be found here Interacting with pj-rehearseComment: Once you are satisfied with the results of the rehearsals, comment: |
|
/pj-rehearse pull-ci-openshift-cluster-network-operator-release-4.22-e2e-metal-ipi-ovn-dualstack-bgp-local-gw-techpreview |
|
@kyrtapz: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
@kyrtapz: requesting more than one rehearsal in one comment is not supported. If you would like to rehearse multiple specific jobs, please separate the job names by a space in a single command. |
|
/pj-rehearse pull-ci-openshift-ovn-kubernetes-release-4.22-e2e-metal-ipi-ovn-dualstack-bgp-local-gw-techpreview |
|
@kyrtapz: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
@jechen0648: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
The change works and we are successfully replacing the FRR images 🎉 The tests that are failing though... Considering they are updating network.operator I am concerned it might be related to our change. NOTE: |
|
/pj-rehearse pull-ci-openshift-ovn-kubernetes-release-4.22-e2e-metal-ipi-ovn-dualstack-bgp-local-gw-techpreview |
|
@kyrtapz: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
That's how the second attempt fails, but the first attempt fails as: But that is expected. We will need to skip this test and any other test that expect CNO to react to something as long as it is unmanaged. |


https://issues.redhat.com/browse/OCPBUGS-73788
This PR adds a new tech preview BGP local gateway test job (e2e-metal-ipi-ovn-dualstack-bgp-local-gw-techpreview) for the 4.22 development branches of both ovn-kubernetes and cluster-network-operator repositories.
The job uses an upstream FRR 10.4.1 image (quay.io/frrouting/frr:10.4.1) to enable EVPN testing capabilities.
Why
It will be a while until we can consume an OCP build with FRR 10. In the interim, we need to use the upstream image of FRR 10 for our 4.22 downstream tests to support EVPN functionality.
This is a temporary solution that will be removed once OCP builds with FRR 10 are available.
The implementation:
Adds a new USE_UPSTREAM_FRR_IMAGE environment variable to the baremetalds-e2e-ovn-bgp-pre step
When enabled, the pre step:
Sets CNO to Unmanaged state
Replaces the FRR image in the frr-k8s daemonset with quay.io/frrouting/frr:10.4.1
Waits for the daemonset rollout to complete
Re-enables CNO management (sets back to Managed)
Waits for the network cluster operator to be healthy before proceeding
Adds the new tech preview job to both repos with:
FEATURE_SET: TechPreviewNoUpgrade
USE_UPSTREAM_FRR_IMAGE: "true"
Assisted-By: claude-4.5-opus-high