-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Description
Checks
- I've already read https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/troubleshooting-actions-runner-controller-errors and I'm sure my issue is not covered in the troubleshooting guide.
- I am using charts that are officially provided
Controller Version
0.12.1
Deployment Method
ArgoCD
Checks
- This isn't a question or user support case (For Q&A and community support, go to Discussions).
- I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes
To Reproduce
1. Install the Actions Runner Controller on a Kubernetes cluster using Karpenter as the autoscaler.
2. Configure an AutoscalingListener for a repository or organization.
3. Observe that the listener creates a single pod in the cluster.
4. Trigger a node drain (e.g., scale down the cluster, or Karpenter evicts a node).
5. Notice that the listener pod is evicted during the node drain.
6. After eviction, the pod is not automatically recreated. It remains evicted.
7. The only way to get a new listener pod is to manually delete the evicted pod, so the controller recognizes it as missing and creates a replacement.
Attempting to use a PodDisruptionBudget to prevent eviction will block node drains, which is not a viable solution.Describe the bug
The AutoscalingListener currently creates only a single listener pod, which is responsible for monitoring scaling events. When this pod is evicted (for example, by Karpenter during a node drain), it is not automatically recreated by the controller. This leads to a temporary loss of autoscaling functionality.
Because the listener pod is a single point of failure, attempts to prevent eviction using a PodDisruptionBudget (PDB) are not effective: either the pod is evicted and scaling stops, or the PDB blocks node drains, interfering with cluster operations.
In practice, the only way to restore the listener pod is to manually delete the evicted pod so that the controller recognizes it as missing and creates a new one. This behavior makes the AutoscalingListener unreliable in clusters that perform frequent node scaling or eviction operations.
Describe the expected behavior
The AutoscalingListener should remain operational even if a node drain or eviction occurs. Specifically:
If the listener pod is evicted or terminated, the controller should automatically recreate it.
Optionally, the listener could support multiple pods with leader election, so that eviction of a single pod does not disrupt autoscaling.
In short: the listener should never become unavailable due to pod eviction and should recover automatically without manual intervention.
Additional Context
Nothing to mention hereController Logs
https://gist.github.com/naldrey/9c05239618aaa5e2994f56888ca9fdd1Runner Pod Logs
https://gist.github.com/naldrey/f92a19f1d19daef6aad179853bce0d0f