Skip to content

Native client-go implementation (big rewrite)#129

Open
inteon wants to merge 2 commits intocert-manager:mainfrom
inteon:clientgo
Open

Native client-go implementation (big rewrite)#129
inteon wants to merge 2 commits intocert-manager:mainfrom
inteon:clientgo

Conversation

@inteon
Copy link
Member

@inteon inteon commented Jan 6, 2026

I spent a lot of time porting this controller to client-go and iterating on the solution.

Current state of the PR:

  • based on client-go alone
  • works with very short certificate lifetime and 200+ controllers
  • added tests using the new testing/synctest package
  • much more resilient during CA rotation compared to cm/cm

You can run make test-smoke to test the example controller.

/cc @erikgb

@cert-manager-prow cert-manager-prow bot added dco-signoff: yes Indicates that all commits in the pull request have the valid DCO sign-off message. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jan 6, 2026
@inteon inteon force-pushed the clientgo branch 8 times, most recently from 4688445 to 69c45fe Compare January 8, 2026 13:45
@cert-manager-prow cert-manager-prow bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 8, 2026
@cert-manager-prow cert-manager-prow bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 9, 2026
@inteon inteon force-pushed the clientgo branch 2 times, most recently from 34afd96 to fbac5d7 Compare January 9, 2026 18:49
@cert-manager-prow cert-manager-prow bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 11, 2026
@cert-manager-prow cert-manager-prow bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 15, 2026
@inteon inteon requested review from Copilot and erikgb January 15, 2026 18:00
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request represents a major architectural rewrite of the webhook certificate authority controller, migrating from controller-runtime to a native client-go implementation. The changes introduce significant improvements in certificate rotation handling, particularly for scenarios with very short certificate lifetimes and large numbers of controllers.

Changes:

  • Complete migration from controller-runtime to native client-go with custom informer factories
  • New pending/serving CA certificate rotation mechanism with promotion delay to ensure smooth transitions
  • Tests now use Go's testing/synctest package instead of Ginkgo/Gomega
  • Removed unused internal error package and simplified PKI utilities

Reviewed changes

Copilot reviewed 72 out of 77 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
test/combined_controller_test.go New comprehensive test suite using synctest for authority controller behavior
test/util_test.go Helper functions for testing CA secrets and ValidatingWebhookConfigurations
test/delayed_watch.go Simulates delayed watch events for testing race conditions
pkg/authority/authority.go Core authority implementation with native client-go and workqueue
pkg/authority/options.go Configuration validation and defaults for authority setup
pkg/authority/injectable/validating_webhook.go Refactored CA bundle injection using informers and listers
internal/pki/cert_parser.go New certificate parser with caching to avoid redundant parsing
internal/certificate/tls.go Certificate generation utilities
examples/webhook-controller/main.go Example webhook server demonstrating library usage

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@inteon
Copy link
Member Author

inteon commented Jan 16, 2026

/approve

@cert-manager-prow
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: inteon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@cert-manager-prow cert-manager-prow bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 16, 2026
Copy link
Member

@erikgb erikgb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @inteon! This is a HUGE PR, and I have no chance to review it properly. Are you able to split it up? You seem to change "everything" without explaining the motivations for it. 😅 I intentionally didn't change the code copied from cert-manager (PKI) and trust-manager (CertPool) to allow us to eventually stop this copy-pasting across projects. And now you are rewriting most of it.....

I have added some initial comments/questions, but I will not be able to LGTM this with the current size of the PR.

Namespace: "my-namespace",
Name: "my-webhook-ca",
},
Duration: 1 * time.Hour, // Optional: Default 1h
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a default of 1 hour is too short. 😅

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that if we build it to renew frequently by default, this forces us to make renewals happen without downtime.

Atm. renewal does not happen often and can be a hard to debug cause for temporary downtime.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, by using short-lived certs, we improve our security posture.


### CA Rotation Lifecycle

To ensure zero downtime, the controller uses a "Pending CA" strategy. This allows the new CA to be distributed to all clients (via the `ValidatingWebhookConfiguration` CA bundle) *before* it is used to sign the serving certificate.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the formulation "(via the ValidatingWebhookConfiguration CA bundle" fully correct? We also support injecting other resources with a CA bundle.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we implemented other resources yet. I would updated the README once we add support for the other resources.


- **Leader first**: The controller that issued the pending CA can perform injection and promotion directly, while other controllers have to wait between 3-6s first.
- **Jitter**: Non-issuing controllers wait with randomized jitter (between 3 and 6 seconds) to avoid thundering-herd updates.
- **Renewal window**: The renewal timestamp is randomized to avoid all controllers renewing the certs at the same time.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if we should reimplement the old, obsolete leader election mechanism using configs/secrets, and I'd prefer to use the standard leader election mechanism.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +21 to +22
# CA bundle will be injected by the Authority controller; keep empty here.
caBundle: ""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# CA bundle will be injected by the Authority controller; keep empty here.
caBundle: ""
# CA bundle will be injected by the Authority controller, so omit it here.

func main() {
kubeconfig := flag.String("kubeconfig", "", "Path to kubeconfig (optional)")
addr := flag.String("addr", ":8443", "Address to serve HTTPS on")
serviceName := flag.String("service-name", "example-webhook", "will be used in the DNS name for the webhook server certificate (<service-name>.<namespace>.svc)")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also add a flag for the CA secret name. Mainly to make the RBAC with resourceNames less magic?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// TLSPendingCertKey stores a pending (new) CA cert PEM while it is being
// propagated to targets. It will be promoted to `tls_serving.crt` after
// `Options.PropagationDelay` has elapsed since the rotation timestamp.
TLSPendingCertKey = corev1.TLSCertKey
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed previously, I would find it less confusing if tls.crt contained the currently used serving CA, and instead use something like tls-pending.crt for the pending one. And OFC the same for the key.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cert-manager-prow cert-manager-prow bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 27, 2026
@cert-manager-prow cert-manager-prow bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 2, 2026
@cert-manager-prow cert-manager-prow bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 5, 2026
@cert-manager-prow cert-manager-prow bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 5, 2026
Signed-off-by: Tim Ramlot <42113979+inteon@users.noreply.github.com>
Signed-off-by: Tim Ramlot <42113979+inteon@users.noreply.github.com>
@inteon
Copy link
Member Author

inteon commented Feb 6, 2026

@erikgb I added your outstanding comments to the TODO file.
I would like to move forward with this PR and merge it.

Next, I would like to figure out if we can use cert-manager as a source for our CA certificates. This will also help us make some of these design decisions.

Also, I think we should try improve our test framework further. Adding even more failure/ delay points to our simulation should help us make decisions when it comes to eg. leader election.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates that all commits in the pull request have the valid DCO sign-off message. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants