-
Notifications
You must be signed in to change notification settings - Fork 1.3k
CKS: fix NPE when remove a failed external node #12407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 4.22
Are you sure you want to change the base?
CKS: fix NPE when remove a failed external node #12407
Conversation
9750c2a to
415ebb1
Compare
|
This pull request has merge conflicts. Dear author, please fix the conflicts and sync your branch with the base branch. |
|
@blueorangutan package |
|
@weizhouapache a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
...ernetes-service/src/main/java/com/cloud/kubernetes/cluster/KubernetesClusterManagerImpl.java
Outdated
Show resolved
Hide resolved
415ebb1 to
fb18a23
Compare
|
This pull request has merge conflicts. Dear author, please fix the conflicts and sync your branch with the base branch. |
|
Packaging result [SF]: ✖️ el8 ✖️ el9 ✖️ debian ✖️ suse15. SL-JID 16336 |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## 4.22 #12407 +/- ##
=========================================
Coverage 17.59% 17.59%
+ Complexity 15600 15599 -1
=========================================
Files 5910 5910
Lines 529733 529755 +22
Branches 64719 64724 +5
=========================================
+ Hits 93218 93227 +9
- Misses 426023 426035 +12
- Partials 10492 10493 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@blueorangutan package |
|
@weizhouapache a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
|
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 16345 |
|
@blueorangutan test |
|
@weizhouapache a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests |
|
[SF] Trillian test result (tid-15176)
|
vishesh92
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clgtm
|
@blueorangutan package |
|
@vishesh92 a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
|
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 16359 |
|
@blueorangutan test |
|
@weizhouapache a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests |
|
[SF] Trillian Build Failed (tid-15188) |
|
@weizhouapache the issue and PR are marked for 20.3 but the branch is based off of 4.22. What should happen? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue is still present
-
Create a cks cluster
-
Add a external node to the cks cluster
-
The external node fails to add due to space issues
-
Cluster running state
-
Try to remove the external node
Exception
logs
[root@ref-trl-10752-k-Mol8-kiran-chavala-mgmt1 ~]# cat /var/log/cloudstack/management/management-server.log |grep -i "job-47"
2026-01-28 10:27:01,000 INFO [o.a.c.f.j.i.AsyncJobMonitor] (API-Job-Executor-2:[ctx-7c702c5d, job-47]) (logid:9e9c5bd4) Add job-47 into job monitoring
2026-01-28 10:27:01,003 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (qtp1390913202-17:[ctx-0b3912d3, ctx-14f879bd]) (logid:97df2239) submit async job-47, details: AsyncJob {"accountId":2,"cmd":"org.apache.cloudstack.api.command.user.kubernetes.cluster.RemoveNodesFromKubernetesClusterCmd","cmdInfo":"{\"response\":\"json\",\"ctxUserId\":\"2\",\"sessionkey\":\"mnK0SnDiHU-7yH8r_GeAGoLWmVk\",\"httpmethod\":\"POST\",\"ctxStartEventId\":\"207\",\"id\":\"e2889f42-b5f7-4732-864a-b27c5525c88b\",\"ctxDetails\":\"{\\\"interface com.cloud.vm.VirtualMachine\\\":\\\"9dfc270b-4924-4114-83d2-4cf9117435c5\\\",\\\"interface com.cloud.kubernetes.cluster.KubernetesCluster\\\":\\\"e2889f42-b5f7-4732-864a-b27c5525c88b\\\"}\",\"ctxAccountId\":\"2\",\"uuid\":\"e2889f42-b5f7-4732-864a-b27c5525c88b\",\"nodeids\":\"9dfc270b-4924-4114-83d2-4cf9117435c5\",\"cmdEventType\":\"KUBERNETES.CLUSTER.NODES.REMOVE\"}","cmdVersion":0,"completeMsid":null,"created":null,"id":47,"initMsid":32985365610879,"instanceId":1,"instanceType":"KubernetesCluster","lastPolled":null,"lastUpdated":null,"processStatus":0,"removed":null,"result":null,"resultCode":0,"status":"IN_PROGRESS","userId":2,"uuid":"dbee196a-4267-4a76-9a50-2eb1cfa82a9b"}
2026-01-28 10:27:01,004 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl$5] (API-Job-Executor-2:[ctx-7c702c5d, job-47]) (logid:dbee196a) Executing AsyncJob {"accountId":2,"cmd":"org.apache.cloudstack.api.command.user.kubernetes.cluster.RemoveNodesFromKubernetesClusterCmd","cmdInfo":"{\"response\":\"json\",\"ctxUserId\":\"2\",\"sessionkey\":\"mnK0SnDiHU-7yH8r_GeAGoLWmVk\",\"httpmethod\":\"POST\",\"ctxStartEventId\":\"207\",\"id\":\"e2889f42-b5f7-4732-864a-b27c5525c88b\",\"ctxDetails\":\"{\\\"interface com.cloud.vm.VirtualMachine\\\":\\\"9dfc270b-4924-4114-83d2-4cf9117435c5\\\",\\\"interface com.cloud.kubernetes.cluster.KubernetesCluster\\\":\\\"e2889f42-b5f7-4732-864a-b27c5525c88b\\\"}\",\"ctxAccountId\":\"2\",\"uuid\":\"e2889f42-b5f7-4732-864a-b27c5525c88b\",\"nodeids\":\"9dfc270b-4924-4114-83d2-4cf9117435c5\",\"cmdEventType\":\"KUBERNETES.CLUSTER.NODES.REMOVE\"}","cmdVersion":0,"completeMsid":null,"created":null,"id":47,"initMsid":32985365610879,"instanceId":1,"instanceType":"KubernetesCluster","lastPolled":null,"lastUpdated":null,"processStatus":0,"removed":null,"result":null,"resultCode":0,"status":"IN_PROGRESS","userId":2,"uuid":"dbee196a-4267-4a76-9a50-2eb1cfa82a9b"}
2026-01-28 10:27:05,108 ERROR [c.c.k.c.a.KubernetesClusterRemoveWorker] (API-Job-Executor-2:[ctx-7c702c5d, job-47, ctx-a5f8ed50]) (logid:dbee196a) Error trying to remove node 9dfc270b-4924-4114-83d2-4cf9117435c5 from Kubernetes Cluster e2889f42-b5f7-4732-864a-b27c5525c88b: Error during SCP transfer. com.cloud.utils.exception.CloudRuntimeException: Error during SCP transfer.
2026-01-28 10:27:05,121 ERROR [o.a.c.a.c.u.k.c.RemoveNodesFromKubernetesClusterCmd] (API-Job-Executor-2:[ctx-7c702c5d, job-47, ctx-a5f8ed50]) (logid:dbee196a) Failed to remove node(s) from Kubernetes cluster ID: 1 due to: Failed to remove node(s) from Kubernetes cluster ID: 1 org.apache.cloudstack.api.ServerApiException: Failed to remove node(s) from Kubernetes cluster ID: 1
2026-01-28 10:27:05,122 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-2:[ctx-7c702c5d, job-47]) (logid:dbee196a) Complete async job-47, jobStatus: FAILED, resultCode: 530, result: org.apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":[],"errorcode":"530","errortext":"Failed to remove node(s) from Kubernetes cluster ID: 1 due to: Failed to remove node(s) from Kubernetes cluster ID: 1"}
2026-01-28 10:27:05,123 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-2:[ctx-7c702c5d, job-47]) (logid:dbee196a) Publish async job-47 complete on message bus
2026-01-28 10:27:05,123 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-2:[ctx-7c702c5d, job-47]) (logid:dbee196a) Wake up jobs related to job-47
2026-01-28 10:27:05,123 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-2:[ctx-7c702c5d, job-47]) (logid:dbee196a) Update db status for job-47
2026-01-28 10:27:05,124 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-2:[ctx-7c702c5d, job-47]) (logid:dbee196a) Wake up jobs joined with job-47 and disjoin all subjobs created from job- 47
2026-01-28 10:27:05,128 DEBUG [c.c.a.ApiServer] (API-Job-Executor-2:[ctx-7c702c5d, job-47]) (logid:dbee196a) Retrieved cmdEventType from job info: KUBERNETES.CLUSTER.NODES.REMOVE
2026-01-28 10:27:05,130 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl$5] (API-Job-Executor-2:[ctx-7c702c5d, job-47]) (logid:dbee196a) Done executing org.apache.cloudstack.api.command.user.kubernetes.cluster.RemoveNodesFromKubernetesClusterCmd for job-47
2026-01-28 10:27:05,130 INFO [o.a.c.f.j.i.AsyncJobMonitor] (API-Job-Executor-2:[ctx-7c702c5d, job-47]) (logid:dbee196a) Remove job-47 from job monitoring
Description
This PR fixes #11581
Types of changes
Feature/Enhancement Scale or Bug Severity
Feature/Enhancement Scale
Bug Severity
Screenshots (if appropriate):
How Has This Been Tested?
How did you try to break this feature and the system with this change?