Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cinder-csi-plugin] Detaching a volume on controller has too much of a delay #2661

Open
CallMeFoxie opened this issue Sep 18, 2024 · 2 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@CallMeFoxie
Copy link

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind bug

/kind feature

What happened:
Whenever we have maintenance in Openstack we do a graceful shutdown on a node. That triggers draining the pods. However detaching the cinder volume gets noticed by the controllerplugin too late in the stack.

I0917 15:21:34.010628       1 csi_handler.go:234] Error processing "csi-045061d9a5678eed8e0d57a59e01420bb5e13899386cc6581378d5558301b39e": failed to detach: rpc error: code = Internal desc = ControllerUnpublishVolume Detach Volume failed with error failed to detach volume cbb2daa9-6cf7-4bd9-aa0b-1902a1b30498 from compute e7eda096-1267-45a1-9164-f7a7d85c7f4e : Expected HTTP response code [202 204] when accessing [DELETE https://api.ouropenst.ack:8774/v2.1/servers/e7eda096-1267-45a1-9164-f7a7d85c7f4e/os-volume_attachments/cbb2daa9-6cf7-4bd9-aa0b-1902a1b30498], but got 409 instead: {"conflictingRequest": {"code": 409, "message": "Cannot 'detach_volume' instance e7eda096-1267-45a1-9164-f7a7d85c7f4e while it is in task_state reboot_started"}}

which means that while the server is in maintenance we cannot re-attach our volumes.

I0917 15:21:34.429151       1 csi_handler.go:251] Attaching "csi-ae19a426cd7dbc8de74de71d711bc63ae0e71e6c28a0238785e84338ad3beb6c"
I0917 15:21:34.599180       1 csi_handler.go:234] Error processing "csi-ae19a426cd7dbc8de74de71d711bc63ae0e71e6c28a0238785e84338ad3beb6c": failed to attach: rpc error: code = Internal desc = [ControllerPublishVolume] Attach Volume failed with error failed to attach cbb2daa9-6cf7-4bd9-aa0b-1902a1b30498 volume to b3d51121-6c84-4670-a4a7-e83729d2004a compute: Expected HTTP response code [200] when accessing [POST https://api.ouropenst.ack:8774/v2.1/servers/b3d51121-6c84-4670-a4a7-e83729d2004a/os-volume_attachments], but got 400 instead: {"badRequest": {"code": 400, "message": "Invalid volume: volume cbb2daa9-6cf7-4bd9-aa0b-1902a1b30498 is already attached to instances: e7eda096-1267-45a1-9164-f7a7d85c7f4e"}}

What you expected to happen:
Unmount the volume before node goes into reboot

How to reproduce it:
do a graceful shutdown in a cluster

Anything else we need to know?:
don't hink so

Environment:

  • openstack-cloud-controller-manager(or other related binary) version: 1.31.0 cinder-csi, 2.31.0 the whole package (helm chart)
  • OpenStack version: 20.3.1 (cinder module)
  • Others:
@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Sep 18, 2024
@CallMeFoxie
Copy link
Author

Actually it might be problem in the whole idea of ACPI shutdown - OpenStack seems to set the reboot_started flag rightaway as a user presses ctrl-alt-del / shutdown / whatever button in horizon (or API) so by the time the pods get drained off the node (and volumes unmounted) the OpenStack no longer accepts any volume detachments.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

3 participants