-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't delete PVC, finalizer pvc-as-source-protection does not finish #2670
Comments
that snapshot creation is stuck there, I could help troubleshooting if you could provide the aks cluster fqdn. |
@andyzhangx thank you for offering help. We figured out that this was caused by having a lot of VolumeSnapshots and VolumeSnapshotcontents in this cluster (some cleanup did not work as expected). Once we cleaned everything up, it started working again. However, seeing this got me thinking: How many VolumeSnapshots and VolumesnapshotContents can the csi-driver safely handle before we reach this problem again? Do you have any numbers there? |
The downside is now, with removing the finalizer, we have to delete the actual azure snapshots manually from the azure portal, because the csi driver did not do it. "az delete snapshot" seems to be rather slow for this (even with using --now-wait=true), needing about 5 seconds for every snapshot delete command. I'll check again if we can delete all snapshots at once but my first try failed as we have ~20000 snapshots to delete and bash said "to many arguments" :D |
@kaitimmer as long as the snapshot container is working fine, that's ok. Recently we found that the memory limit of snapshot container is too small when there are lots of snapshots, finally the snapshot container is OOM. So I think the question is about snapshot num and memory limit of snapshot container, how fast can the csi driver handle the snapshot to avoid snapshot content accumulated, just let me know when your cluster is stuck on creating snapshot, I could increase the memory limit immediately. Later on, we will increase the memory limit since Azure service is in CCOA now. |
Hi @andyzhangx, One of our clusters is again in the state where the finalizer does not work. I will send you the ID and URI via EMail. Since we cleaned up all the VolumeSnapshots, the amount is not the problem. I assume that we are again in a state where the problem started. |
What happened:
When deleting a PVC, the deletion process is "stuck".
The finalizer:
snapshot.storage.kubernetes.io/pvc-as-source-protection
does not finish.If I patch the PVC in `Terminating" state and remove the finalizer, everything works as expected.
I've seen this behavior randomly in multiple clusters. But in the current one, it has been persisting for a couple of weeks already.
What you expected to happen:
Finalizer finishes and I can delete the PVC without the need to patch it first.
How to reproduce it:
k delete PVC pvc-something-0
It does not matter which StorageClass or SKU is behind the PVC. If it is not working in a cluster, it is not working for all PVCs.
Anything else we need to know?:
When this error exists, I cannot get a VolumeSnapshot into "ReadyToUse." It looks like everything that interacts with Snapshots is broken in this cluster.
Environment:
kubectl version
): - Client Version: v1.31.2 Kustomize Version: v5.4.2 Server Version: v1.30.3The text was updated successfully, but these errors were encountered: