Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

driver matches private storage accounts without networkEndpointType set #2085

Open
RomanBednar opened this issue Aug 30, 2024 · 7 comments
Open
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@RomanBednar
Copy link
Contributor

While there is an existing storage account in Azure which is configured to use private endpoints and disallow public network access the driver can match this storage account even with networkEndpointType SC parameter unset. However it will not configure a private endpoint for worker subnet because networkEndpointType is not set to privateendpoint - this will cause volume mounts to fail on worker nodes.

This seems like a bug and the driver probably should not match "private" storage accounts if networkEndpointType parameter is omitted or set to "". Maybe there is some other parameter(s) for this scenario that we missed?

Driver version: v1.30.3

Reproducer:

  • a prerequisite is to have an existing storage account with private endpoints and disallowed public network access (in the example below it's the imageregistryrbednah6mbs)

Storage Class has networkEndpointType: "" (or omitted):

$ oc get sc/azurefile-csi-pub -o yaml
allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  creationTimestamp: "2024-08-30T11:35:13Z"
  name: azurefile-csi-pub
  resourceVersion: "55979"
  uid: 0e152336-614b-41fc-aafc-3c0ffefd787a
mountOptions:
- mfsymlinks
- cache=strict
- nosharesock
- actimeo=30
parameters:
  networkEndpointType: ""
  skuName: Standard_LRS
provisioner: file.csi.azure.com
reclaimPolicy: Delete
volumeBindingMode: Immediate

Driver matches an existing storage account:

I0830 11:43:02.734943       1 utils.go:101] GRPC call: /csi.v1.Controller/CreateVolume
I0830 11:43:02.734967       1 utils.go:102] GRPC request: {"capacity_range":{"required_bytes":1073741824},"name":"pvc-f3da48fa-7952-4918-aedd-67c1c814b6e2","parameters":{"csi.storage.k8s.io/pv/name":"pvc-f3da48fa-7952-4918-aedd-67c1c814b6e2","csi.storage.k8s.io/pvc/name":"pvc-1","csi.storage.k8s.io/pvc/namespace":"default","networkEndpointType":"","skuName":"Standard_LRS"},"volume_capabilities":[{"AccessType":{"Mount":{"mount_flags":["mfsymlinks","cache=strict","nosharesock","actimeo=30"]}},"access_mode":{"mode":7}}]}
I0830 11:43:03.097163       1 controllerserver.go:561] begin to create file share(pvc-f3da48fa-7952-4918-aedd-67c1c814b6e2) on account(imageregistryrbednah6mbs) type(Standard_LRS) subID() rg(rbednar-az-01-8z785-rg) location() size(1) protocol(SMB)
I0830 11:43:03.383519       1 controllerserver.go:593] create file share pvc-f3da48fa-7952-4918-aedd-67c1c814b6e2 on storage account imageregistryrbednah6mbs successfully
I0830 11:43:03.399781       1 controllerserver.go:639] store account key to k8s secret(azure-storage-account-imageregistryrbednah6mbs-secret) in default namespace
I0830 11:43:03.399821       1 utils.go:108] GRPC response: {"volume":{"capacity_bytes":1073741824,"volume_context":{"csi.storage.k8s.io/pv/name":"pvc-f3da48fa-7952-4918-aedd-67c1c814b6e2","csi.storage.k8s.io/pvc/name":"pvc-1","csi.storage.k8s.io/pvc/namespace":"default","networkEndpointType":"","secretnamespace":"default","skuName":"Standard_LRS"},"volume_id":"rbednar-az-01-8z785-rg#imageregistryrbednah6mbs#pvc-f3da48fa-7952-4918-aedd-67c1c814b6e2###default"}}

The storage account matched is private:

$ az storage account show --name imageregistryrbednah6mbs --resource-group rbednar-az-01-8z785-rg --query "publicNetworkAccess" --output json
"Disabled"

With this configuration worker nodes can not mount any newly provisioned volumes as there is no private endpoint configured for the worker subnet:

I0830 11:51:25.588693       1 nodeserver.go:326] cifsMountPath(/var/lib/kubelet/plugins/kubernetes.io/csi/file.csi.azure.com/ff63a5ad356eed4e6c55d99793032daf932997f18546669ac219a9df4571a26f/globalmount) fstype() volumeID(rbednar-az-01-8z785-rg#imageregistryrbednah6mbs#pvc-f3da48fa-7952-4918-aedd-67c1c814b6e2###default) context(map[csi.storage.k8s.io/pv/name:pvc-f3da48fa-7952-4918-aedd-67c1c814b6e2 csi.storage.k8s.io/pvc/name:pvc-1 csi.storage.k8s.io/pvc/namespace:default networkEndpointType: secretnamespace:default skuName:Standard_LRS storage.kubernetes.io/csiProvisionerIdentity:1725012839930-7153-file.csi.azure.com]) mountflags([mfsymlinks cache=strict nosharesock actimeo=30]) mountOptions([mfsymlinks cache=strict nosharesock actimeo=30 file_mode=0777 dir_mode=0777]) volumeMountGroup()
E0830 11:51:25.606841       1 mount_linux.go:230] Mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t cifs -o mfsymlinks,cache=strict,nosharesock,actimeo=30,file_mode=0777,dir_mode=0777,<masked> //imageregistryrbednah6mbs.file.core.windows.net/pvc-f3da48fa-7952-4918-aedd-67c1c814b6e2 /var/lib/kubelet/plugins/kubernetes.io/csi/file.csi.azure.com/ff63a5ad356eed4e6c55d99793032daf932997f18546669ac219a9df4571a26f/globalmount
Output: mount error(13): Permission denied
Refer to the mount.cifs(8) manual page (e.g. man mount.cifs) and kernel log messages (dmesg)

E0830 11:51:25.606924       1 utils.go:106] GRPC error: rpc error: code = Internal desc = volume(rbednar-az-01-8z785-rg#imageregistryrbednah6mbs#pvc-f3da48fa-7952-4918-aedd-67c1c814b6e2###default) mount //imageregistryrbednah6mbs.file.core.windows.net/pvc-f3da48fa-7952-4918-aedd-67c1c814b6e2 on /var/lib/kubelet/plugins/kubernetes.io/csi/file.csi.azure.com/ff63a5ad356eed4e6c55d99793032daf932997f18546669ac219a9df4571a26f/globalmount failed with mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t cifs -o mfsymlinks,cache=strict,nosharesock,actimeo=30,file_mode=0777,dir_mode=0777,<masked> //imageregistryrbednah6mbs.file.core.windows.net/pvc-f3da48fa-7952-4918-aedd-67c1c814b6e2 /var/lib/kubelet/plugins/kubernetes.io/csi/file.csi.azure.com/ff63a5ad356eed4e6c55d99793032daf932997f18546669ac219a9df4571a26f/globalmount
Output: mount error(13): Permission denied
Refer to the mount.cifs(8) manual page (e.g. man mount.cifs) and kernel log messages (dmesg)

Please refer to http://aka.ms/filemounterror for possible causes and solutions for mount errors.

NOTE: If I create SC with networkEndpointType=privateendpoint and provision a volume with it, the driver will set the private endpoint correctly on the existing storage account which fixes the mount issue for any other consecutive mounts that would reuse this storage account. This can be a temporary workaround.

Followup question: We have the above mentioned configuration in OpenShift private clusters where registry operator creates it's own storage resources including a (private) storage account and CSI provisioning happens after that. Even if this issue is resolved still the driver can in similar scenarios reuse "foreign" storage accounts and we're not sure if that's a good practice. Later, we would like to introduce a fix for this as well. Is there a recommended way to ensure the driver will never reuse a storage account not created by it? Could storageaccount or createaccount parameter be sufficient and safe solution?

@andyzhangx
Copy link
Member

if networkEndpointType is unset, that means the csi driver would matching any account with or without networkEndpointType; if you only want to match networkEndpointType: privateEndpoint accounts, then you need to specify this parameter in storage class. @RomanBednar

@RomanBednar
Copy link
Contributor Author

if networkEndpointType is unset, that means the csi driver would matching any account with or without networkEndpointType

@andyzhangx This is the actual issue, simplified. With this behavior we can not tell the driver to "avoid" private endpoint accounts. Either we request a private one specifically (predictable) or leave unset which means driver can match both private or public (unpredictable). And what we need is a way to reliably not match private endpoint accounts making it predictable.

RomanBednar added a commit to RomanBednar/csi-operator that referenced this issue Sep 5, 2024
Adding the tags should ensure that we do not match a storage account
that was created by someone else which would be undesirable.

Additionally it solves the problem of matching storage accounts with
private endpoints - currently driver can match those even if
`networkEndpointType` is unset. Using the tags and matching should make
this more predictable and driver should no longer match those.
However, even if this ever gets fixed, we should probably still keep
the tags to prevent matching "foreign" storage accounts.

Upstream issue: kubernetes-sigs/azurefile-csi-driver#2085
@RomanBednar
Copy link
Contributor Author

@andyzhangx Do you think this could be considered a bug? We're looking for a fix that we could ship soon and it depends on the decision made for this issue.

I've found a fix that might work for now - if we set parameters to have a tag and require matching the tag, the driver will always create it's own storage account with first volume provisioned and then reuse it (unless there's one with a matching tag, which is unlikely yet still possible). With this we should be almost certain we don't match "foreign" storage accounts right? Or is it a bad idea?

@jsafrane
Copy link
Contributor

jsafrane commented Sep 9, 2024

if networkEndpointType is unset, that means the csi driver would matching any account with or without networkEndpointType

I think that's a bug. It should match an usable account, not just any account. A private account + StorageClass with networkEndpointType: "" will lead to a PV that cannot be mounted today. Maybe there is another storage account that does not require private endpoints and would fit that CreateVolume call better.

@andyzhangx
Copy link
Member

if networkEndpointType is unset, that means the csi driver would matching any account with or without networkEndpointType

I think that's a bug. It should match an usable account, not just any account. A private account + StorageClass with networkEndpointType: "" will lead to a PV that cannot be mounted today. Maybe there is another storage account that does not require private endpoints and would fit that CreateVolume call better.

@jsafrane CSI driver does not know whether the account is usable or not, it depends on the application pods, whether it's running on allowed vnet node. there is a parameter mapping rule: in general, if the value is not specified, the driver would just check the corresponding properties.

@RomanBednar I think best way to solve this issue is specifying networkEndpointType: privateEndpoint in sc though tag also works.

@RomanBednar
Copy link
Contributor Author

@andyzhangx I've tested that and it rather masks the problem and poses a few others: if we set networkEndpointType: privateEndpoint the driver will always configure a private endpoint on new storage account. And if there already is one that matched it will update it with private endpoint - but we don't know which account gets updated.

Then we should still allow "public" storage accounts for CSI driver to exist alongside arbitrary "private" ones (supported) and with this solution it would not be possible.

openshift-cherrypick-robot pushed a commit to openshift-cherrypick-robot/csi-operator that referenced this issue Oct 9, 2024
Adding the tags should ensure that we do not match a storage account
that was created by someone else which would be undesirable.

Additionally it solves the problem of matching storage accounts with
private endpoints - currently driver can match those even if
`networkEndpointType` is unset. Using the tags and matching should make
this more predictable and driver should no longer match those.
However, even if this ever gets fixed, we should probably still keep
the tags to prevent matching "foreign" storage accounts.

Upstream issue: kubernetes-sigs/azurefile-csi-driver#2085
dfajmon pushed a commit to dfajmon/csi-operator that referenced this issue Oct 14, 2024
Adding the tags should ensure that we do not match a storage account
that was created by someone else which would be undesirable.

Additionally it solves the problem of matching storage accounts with
private endpoints - currently driver can match those even if
`networkEndpointType` is unset. Using the tags and matching should make
this more predictable and driver should no longer match those.
However, even if this ever gets fixed, we should probably still keep
the tags to prevent matching "foreign" storage accounts.

Upstream issue: kubernetes-sigs/azurefile-csi-driver#2085
dfajmon pushed a commit to dfajmon/csi-operator that referenced this issue Oct 14, 2024
Adding the tags should ensure that we do not match a storage account
that was created by someone else which would be undesirable.

Additionally it solves the problem of matching storage accounts with
private endpoints - currently driver can match those even if
`networkEndpointType` is unset. Using the tags and matching should make
this more predictable and driver should no longer match those.
However, even if this ever gets fixed, we should probably still keep
the tags to prevent matching "foreign" storage accounts.

Upstream issue: kubernetes-sigs/azurefile-csi-driver#2085
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

5 participants