Kubernetes volume snapshot failures: VolumeSnapshot CRDs and CSI snapshotter

A VolumeSnapshot that never becomes readyToUse, a restore yielding an empty PVC, or a CSI snapshotter sidecar logging GRPC errors with no cluster progress all indicate the same problem: the snapshot pipeline broke between the API server and the storage backend.

This guide covers failure modes between applying a VolumeSnapshot manifest and the storage backend completing the operation. It covers verifying CRDs, the snapshot-controller, the CSI snapshotter sidecar, driver names, capabilities, and secrets, and reading the error signatures each misalignment produces.

What this means

VolumeSnapshot, VolumeSnapshotContent, and VolumeSnapshotClass are CRDs in the snapshot.storage.k8s.io API group. They are not part of core Kubernetes and must be installed separately. The snapshot lifecycle splits across two controllers: the snapshot-controller watches VolumeSnapshot and VolumeSnapshotContent objects to create the binding, and the CSI external-snapshotter sidecar watches VolumeSnapshotContent to call the CSI driver’s CreateSnapshot and DeleteSnapshot RPCs. A CSI driver must advertise the CREATE_DELETE_SNAPSHOT capability.

Because these layers are independent, failures produce distinct signatures. A missing snapshot-controller leaves VolumeSnapshots unbound. A driver name mismatch causes the sidecar to ignore VolumeSnapshotContent entirely, producing no events and no snapshot. Missing CRDs cause the API server to reject the resource. Missing secrets cause the driver to fail the RPC. Understanding which layer is silent versus noisy is the key to fast diagnosis.

Common causes

CauseWhat it looks likeFirst thing to check
Missing CRDs or snapshot-controllerkubectl get volumesnapshot returns “no matches for kind”, or the VolumeSnapshot never creates a VolumeSnapshotContentPresence of snapshot.storage.k8s.io CRDs and the snapshot-controller deployment
Driver name mismatchNo VolumeSnapshotContent created; no events on the VolumeSnapshot; sidecar logs show nothing about the objectVolumeSnapshotClass driver field versus the CSI driver’s registered name
CSI driver lacks snapshot capabilitySidecar sends CreateSnapshot RPC; driver returns UNIMPLEMENTED or INVALID_ARGUMENTDriver documentation or sidecar startup logs for CREATE_DELETE_SNAPSHOT support
CSI snapshotter sidecar deployed without CRDsSidecar logs repeat “failed to list VolumeSnapshotClass: the server could not find the requested resource”CRD installation status before sidecar startup
Missing snapshotter secretCreateSnapshot fails silently or with NotFound in sidecar logsSecret existence in the namespace referenced by the VolumeSnapshotClass
Stale v1alpha1 or v1beta1 objectsSnapshot-controller crash-loops with “request to convert CR from an invalid group/version: snapshot.storage.k8s.io/v1alpha1”Stored API version of existing snapshot objects
Optimistic concurrency conflictController logs show “the object has been modified; please apply your changes to the latest version and try again”API server load and controller RBAC
Wrong storage class on restorePVC creation succeeds but the volume is empty because it was provisioned from the default class instead of restored from the snapshotPVC storageClassName versus the snapshot source

Quick checks

# Verify snapshot CRDs are installed
kubectl get crd volumesnapshots.snapshot.storage.k8s.io volumesnapshotcontents.snapshot.storage.k8s.io volumesnapshotclasses.snapshot.storage.k8s.io

# Verify snapshot-controller is present (namespace varies by distribution)
kubectl get deployment -n kube-system snapshot-controller

# List VolumeSnapshotClasses and their drivers
kubectl get volumesnapshotclass -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.driver}{"\n"}{end}'

# Check VolumeSnapshot status and events
kubectl describe volumesnapshot <name> -n <namespace>

# Check VolumeSnapshotContent status
kubectl describe volumesnapshotcontent <name>

# Check CSI snapshotter sidecar logs for GRPC errors
kubectl logs -n <csi-namespace> <csi-controller-pod> -c csi-snapshotter

# Check if a referenced snapshotter secret exists
kubectl get secret -n <secret-namespace> <secret-name>

What good and bad output looks like:

  • Good: kubectl get crd volumesnapshots.snapshot.storage.k8s.io returns the CRD definition. kubectl get volumesnapshot shows READYTOUSE as true within minutes.
  • Bad: kubectl get volumesnapshot returns “error: the server doesn’t have a resource type ‘VolumeSnapshot’”. The CRDs were never installed.
  • Bad: kubectl get volumesnapshot shows READYTOUSE as false with no VolumeSnapshotContent ever created. This usually means the snapshot-controller is missing or the VolumeSnapshotClass driver name does not match the CSI driver.

How to diagnose it

  1. Confirm CRDs exist. Run kubectl get crd and grep for snapshot.storage.k8s.io. If any of the three CRDs are missing, the API server cannot store snapshot objects. This is the most common bootstrap failure on kubeadm-initialized clusters and minimal distributions.

  2. Confirm the snapshot-controller is running. Without it, VolumeSnapshot objects are never bound to VolumeSnapshotContent. Managed Kubernetes bundles this as a cluster addon; self-managed clusters frequently omit it. Check the deployment and its pod logs.

  3. Inspect the VolumeSnapshotClass driver field. The value must exactly match the CSI driver name registered in the cluster, visible in the CSIDriver object name or driver pod configuration. A mismatch, even a subtle variant like aws-ebs-csi-driver versus ebs.csi.aws.com, causes the CSI snapshotter sidecar to ignore VolumeSnapshotContent entirely. The sidecar filters content objects by matching the Driver field against the driver name it was started with.

  4. Verify the CSI driver advertises snapshot support. Check the driver documentation or sidecar startup logs for the CREATE_DELETE_SNAPSHOT capability. Drivers without this capability fail snapshot RPCs.

  5. Check the CSI controller pod for the snapshotter sidecar. Some charts deploy the csi-snapshotter sidecar even when snapshots are disabled, which produces repeated log errors because the sidecar cannot list VolumeSnapshotClasses. Ensure CRDs are installed before the sidecar starts, or disable the sidecar entirely.

  6. Read VolumeSnapshot and VolumeSnapshotContent status and events. The snapshot-controller emits events when binding fails. The content object carries the sidecar’s progress. Look for terminal errors versus retry loops.

  7. Check the CSI snapshotter sidecar logs. Snapshot creation failures surface as GRPC status codes: Unavailable means the CSI endpoint is unreachable or the driver is not responding; DeadlineExceeded means the storage backend timed out; Internal means a driver-side error, often from malformed parameters or secret misconfiguration. The sidecar logs these as GRPC error: rpc error: code = X desc = Y.

  8. Validate snapshotter secrets. If the VolumeSnapshotClass references a secret via csi.storage.k8s.io/snapshotter-secret-name and csi.storage.k8s.io/snapshotter-secret-namespace, the secret must exist in the specified namespace. Missing secrets cause CreateSnapshot to fail with NotFound or an Internal error.

  9. If restoring, verify the PVC storage class. When creating a PVC from a VolumeSnapshot, specifying the wrong storageClassName, or omitting it and relying on a default class that is not the snapshot source class, causes Kubernetes to provision a fresh empty volume instead of restoring from the snapshot. The PVC binds but contains no data.

flowchart TD
    A[VolumeSnapshot stuck or failed] --> B{CRDs installed?}
    B -->|No| C[Install snapshot.storage.k8s.io CRDs]
    B -->|Yes| D{snapshot-controller running?}
    D -->|No| E[Deploy snapshot-controller]
    D -->|Yes| F{VolumeSnapshotClass driver matches CSI driver name?}
    F -->|No| G[Correct driver field in VolumeSnapshotClass]
    F -->|Yes| H{CSI driver supports CREATE_DELETE_SNAPSHOT?}
    H -->|No| I[Use different driver or upgrade]
    H -->|Yes| J[Check sidecar logs, secrets, and GRPC status]

Metrics and signals to monitor

SignalWhy it mattersWarning sign
VolumeSnapshot readyToUseIndicates whether the backend snapshot completed and is usableRemains false for more than a few minutes after creation
VolumeSnapshotContent status.readyToUseShows sidecar and driver progress toward a finished snapshotNever becomes true, or transitions to an error state
CSI snapshotter sidecar log errorsReveals driver-level and storage-backend failuresGRPC Unavailable, DeadlineExceeded, or Internal codes
VolumeSnapshot .status.errorCaptures controller or sidecar terminal and retry errorsNon-nil error field with terminal or retry states
CRD presenceWithout CRDs the entire snapshot feature is offlineMissing volumesnapshots.snapshot.storage.k8s.io or related CRDs
CSI driver snapshot capabilityDrivers without this capability fail CreateSnapshot RPCsSidecar or driver logs show UNIMPLEMENTED or capability errors
PVC restore outcomeA successful PVC creation does not guarantee data was restoredPVC bound but application sees empty data directory

Fixes

If the cause is missing CRDs or controller

Install the CRDs and snapshot-controller from the external-snapshotter release matching your Kubernetes version. For Kubernetes 1.25 and later, target the release-8.2 branch. For Kubernetes 1.20 through 1.24, target release-6.2. This is state-changing; plan it during a maintenance window.

# Install CRDs from the upstream release branch
# Change the branch to match your Kubernetes version
kubectl kustomize github.com/kubernetes-csi/external-snapshotter/client/config/crd?ref=release-8.2 | kubectl create -f -

Deploy the snapshot-controller in the namespace recommended by your distribution, typically kube-system.

If the cause is driver mismatch or missing capability

Correct the VolumeSnapshotClass driver field to exactly match the CSI driver name. Verify the match with kubectl get csidriver. If the driver does not support snapshots, either upgrade the driver or do not use snapshot features with that driver.

If the cause is secret misconfiguration or GRPC errors

Create the secret referenced by the VolumeSnapshotClass in the correct namespace. For DeadlineExceeded errors, investigate storage backend latency and network paths. For Unavailable errors, verify the CSI controller pod readiness and the driver Unix socket or TCP endpoint. For Internal errors, review driver logs for malformed requests.

If the cause is stale API version objects

If the snapshot-controller crash-loops because of v1alpha1 objects after upgrading to external-snapshotter v4.0.0 or later, remove the stale objects or temporarily install legacy v1alpha1 CRDs to allow cleanup, then migrate all manifests to snapshot.storage.k8s.io/v1.

If the cause is an empty restored PVC

Delete the empty PVC and recreate it with the correct storageClassName that matches the snapshot source. Do not rely on the cluster default if it is not the snapshot-capable class.

Prevention

  • Install snapshot CRDs and the snapshot-controller before enabling the CSI snapshotter sidecar. Many sidecar errors occur because charts deploy the sidecar while CRDs are absent.
  • Pin VolumeSnapshotClass driver names to the exact CSI driver identity. Do not assume naming conventions match across distributions.
  • Verify the driver supports snapshots before deploying snapshot-dependent workloads.
  • Migrate manifests from snapshot.storage.k8s.io/v1beta1 to v1. The v1beta1 API is deprecated and will be removed in a future release.
  • Validate snapshotter secrets before creating VolumeSnapshotClasses that reference them.
  • Monitor VolumeSnapshot readyToUse duration and alert when it exceeds a baseline.
  • When restoring, always set PVC storageClassName to the same class used by the source volume.

How Netdata helps

Correlate these failures with infrastructure signals:

  • CSI driver pod health: Restarts or OOM kills in CSI controller pods hosting the snapshotter sidecar.
  • Node disk I/O: Backend saturation correlates with snapshot latency and GRPC DeadlineExceeded.
  • Control plane latency: High API server or etcd latency correlates with snapshot-controller binding delays and optimistic concurrency conflicts.
  • Container resource pressure: CPU or memory throttling on the snapshot-controller slows reconciliation.

For diagnosing failures in the CSI driver itself, see Kubernetes CSI driver failures: detection, recovery, and version skew.