Glossary
This article overviews Kubernetes volumes, focusing on their types, use cases, and best practices. It addresses the challenges of data persistence in Kubernetes, such as data loss on pod deletion or restart and sharing data across replicas. The article categorizes volumes into ephemeral and persistent types, explaining key types like EmptyDir, HostPath, and Persistent Volumes (PVs). It also covers Persistent Volume Claims (PVCs) and Storage Classes (SCs) for dynamic provisioning. Practical demos using MongoDB Compass illustrate the application of these concepts. The article concludes with best practices for selecting and managing Kubernetes volumes effectively.
Kubernetes Volumes Made Easy
Kubernetes, the popular container orchestration platform, has transformed how applications are deployed and managed. One of the critical aspects of Kubernetes is its handling of data persistence through volumes. This blog aims to simplify the concept of Kubernetes volumes, focusing on their types, use cases, and best practices.
Understanding the Challenges of Data Persistence
When working with containers in Kubernetes, two primary challenges arise concerning data persistence:
Data Loss on Pod Deletion or Restart: When a pod is terminated or restarted, any data stored within that pod can be lost.
Sharing Data Across Replicas: In scenarios where multiple replicas of an application are running, sharing data between them can be problematic.
Types of Kubernetes Volumes
Kubernetes offers various volume types to address these challenges. Here are some key volume types:
ephemeral volumes
1. EmptyDir Volumes
2. HostPath volumes
persistent volumes
1. Static Volumes
2. Dynamic Volumes
Again, Persistent volumes are made up of three important resources of kubernetes:
Persistent Volume(PV)
Persistent Volume Claim(PVC)
StorageClass(SC)
Ephemeral Volumes:
1. EmptyDir Volume
The EmptyDir volume is a temporary storage solution created when a pod is assigned to a node. It persists as long as the pod is running on that node. Key characteristics include:
Pod Scope: Data stored in an EmptyDir volume is accessible to all containers within the same pod.
Temporary Storage: The data persists through pod restarts but is deleted if the pod itself is removed.
While EmptyDir volumes are useful for temporary storage needs, they do not support data persistence beyond the lifecycle of a pod.
we will perform all the demos in this blog using MongoDB Compass which is a client to access Mongo db server, and we will deploy the Mongo db server in Kubernetes with different types of volumes
demo 1:
First install and setup MongoDB compass client in your local to test the functionality of this demo
create a file named emptydir-deploy.yaml and apply it in kubernetes to create a mongo server with empty-dir volumes type
apiVersion: apps/v1
kind: Deployment
metadata:
name: mongo
spec:
replicas: 1
selector:
matchLabels:
app: mongo
template:
metadata:
labels:
app: mongo
spec:
containers:
- image: mongo
name: mongo
volumeMounts:
- mountPath: /data/db
name: mongo-volume
args: ["--dbpath", "/data/db"]
env:
- name: MONGO_INITDB_ROOT_USERNAME
value: "admin"
- name: MONGO_INITDB_ROOT_PASSWORD
value: "password"
volumes:
- name: mongo-volume
emptyDir: {}
kubectl apply -f emptydir-deploy.yaml
create a service file mongo-svc.yaml and apply to use this in our mongo compass client
apiVersion: v1
kind: Service
metadata:
name: mongo-svc
spec:
ports:
- port: 27017
protocol: TCP
targetPort: 27017
nodePort: 32000
selector:
app: mongo
type: NodePort
kubectl apply -f mongo-svc.yaml
use your node IP and nodeport(32000) in Mongo Compass to connect to Mongodb server also pass your username and password in the advanced options section. once it is connected, create a db and collection and write some documents in the json format or key values pair.
Testing:
exec into the pod and restart the MongoDB process which is the container and observe the data will persist across container restarts.
restart the pod and observe that the data will be deleted once the pod identity changes because the actual data will be stored with pod uuid at /var/lib/kubelet/ location
delete the deployment once the testing is completed
kubectl delete -f emptydir-deploy.yaml
2. HostPath Volumes
The HostPath volume in Kubernetes allows you to mount a file or directory from the host node's filesystem into a pod. This volume type is particularly useful for applications that require access to specific files or directories on the host, such as Docker internals or system logs. Key characteristics include:
Node Scope: The data in a HostPath volume is tied to the specific node where the pod is running, meaning it cannot be shared across multiple nodes.
Persistence: Unlike ephemeral storage, data in a HostPath volume remains intact even if the pod crashes or is terminated, but it will be lost if the pod is deleted.
Security Risks: Using HostPath volumes can expose sensitive host filesystem data to containers, leading to potential security vulnerabilities. Therefore, it is recommended to use them cautiously and consider alternatives like PersistentVolumes for production environments.
While HostPath volumes offer flexibility for certain use cases, their limitations and security implications necessitate careful consideration when integrating them into your Kubernetes architecture.
demo 2:
create a file named hostpath-deploy.yaml and apply it in kubernetes to create a mongo server with empty-dir volumes type
apiVersion: apps/v1
kind: Deployment
metadata:
name: mongo
spec:
replicas: 3
selector:
matchLabels:
app: mongo
template:
metadata:
labels:
app: mongo
spec:
containers:
- image: mongo
name: mongo
volumeMounts:
- mountPath: /data/db
name: mongo-volume
args: ["--dbpath", "/data/db"]
env:
- name: MONGO_INITDB_ROOT_USERNAME
value: "admin"
- name: MONGO_INITDB_ROOT_PASSWORD
value: "password"
volumes:
- name: mongo-volume
hostPath:
path: /data
kubectl apply -f hostpath-deploy.yaml
Use the same service to test this demo.
Testing:
this will automatically create the directory on the host in the mentioned path for data storage on the node where the pod is scheduled. in this case it is /data location
In this demo the data will persist even after pod restarts but it will not be shared across nodes also, we will lost the data if the node gets deleted.
Clean up the hostpath-deploy.yaml after testing this usecase
Persistent Volumes:
1. Persistent Volumes (PVs)
To overcome the limitations of EmptyDir and hostPath volumes, Kubernetes provides persistent Volumes. These volumes are independent of individual pods or nodes and offer several advantages:
Data Retention: PV data remains intact even if pods or nodes are deleted or restarted.
Decoupled Storage Management: PVs can be managed separately from the pods that use them.
2. Persistent Volume Claims (PVCs)
Persistent Volume Claims are requests for storage resources that specify size and access modes. When a PVC is created, Kubernetes automatically binds it to an appropriate PV based on availability and requirements. This process simplifies storage management by automating volume allocation.
3. Storage Management with Storage Classes(SC)
Kubernetes introduced Storage Classes, which define how PVs should be created dynamically. When you create a PVC and specify a storage class name, Kubernetes handles the creation of the corresponding PV based on your specified access mode and capacity. This automation streamlines storage management and reduces manual intervention.
But, you can also use the storage class concept in static volume provisioning by manually creating PVs and PVCs.
Static Persistent Volumes
These volumes are manually provisioned by an administrator before being used by applications. These volumes are defined as Persistent Volume (PV) objects, which specify the storage characteristics such as size, access modes, and the underlying storage technology. Key characteristics include:
Manual Provisioning: Administrators create PVs ahead of time, making them available for use by Persistent Volume Claims (PVCs) made by pods.
Lifecycle Independence: Data in static PVs persists beyond the lifecycle of individual pods, ensuring that application data remains accessible even if pods are deleted or restarted.
Management Complexity: While static provisioning offers control over storage resources, it can become cumbersome at scale, requiring careful management of multiple PVs and PVCs.
demo 3:
Create a file named pv.yaml which will have both pv and pvc configuration.
apiVersion: v1
kind: PersistentVolume
metadata:
name: mongo-pv
spec:
capacity:
storage: 5Gi
accessModes:
- ReadWriteMany
local:
path: /storage/data
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- tf-01
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: mongo-pvc
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 5Gi
storageClassName: ""
kubectl apply -f pv.yaml
node affinity section is about on which node the volume should be created and local.path is about which path on the node.
all pods which are using this volume will be scheduled onto the same node
the path on the node needs to be pre-created before applying this pv.yaml
access mode and storage capacity should match to bind a PVC to PV.
create the deploy.yaml file
apiVersion: apps/v1
kind: Deployment
metadata:
name: mongo
spec:
replicas: 1
selector:
matchLabels:
app: mongo
template:
metadata:
labels:
app: mongo
spec:
containers:
- image: mongo
name: mongo
args: ["--dbpath", "/data/db"]
env:
- name: MONGO_INITDB_ROOT_USERNAME
value: "admin"
- name: MONGO_INITDB_ROOT_PASSWORD
value: "password"
volumeMounts:
- mountPath: /data/db
name: mongo-volume
volumes:
- name: mongo-volume
persistentVolumeClaim:
claimName: mongo-pvc
kubectl apply -f deploy.yaml
use the same service to test this demo using mongodb compass client.
Testing:
observe that data won’t be deleted for pod restarts
observe that data will be shared across all replicas because all replicas will be scheduled onto the same node
if we remove the nodeaffinity, the replicas will be scheduled onto different nodes. in this case the data persistence will happen but data sharing will not happen even though we are giving readwrite many because we are using local storage of a node.
To share the storage across nodes we need to use network storage solutions like NFS, ceph or cloud.
Cleanup all the resources before proceeding to next demo
Dynamic Persistent Volumes
these volumes are created on the fly by Kubernetes when a PVC is submitted, eliminating the need for manual provisioning. This approach streamlines storage management and enhances flexibility. Key characteristics include:
Automatic Provisioning: When a PVC requests storage, Kubernetes automatically provisions a PV that meets the specified criteria, utilizing defined StorageClasses provided we need to pre-install the required CSI plugins like ex: Longhorn, OpenEBS, ceph etc
Provisioner: This approach uses a provisioner based on the type of storage request and it depends on the plugin that we installed.
Efficiency and Scalability: Dynamic provisioning simplifies the process of managing storage resources, allowing administrators to focus on higher-level tasks rather than manually creating PVs.
Consistency Across Environments: This method ensures that applications can consistently access the required storage without being tied to specific physical resources, facilitating easier deployment across different environments
demo 4:
we need to install any CSI plugin like longhorn, openEBS or ceph etc.
In our case we are Installing longhorn CSI driver for dynamic volume provisioning purposepurposes
Installation of Longhorn using helm:
helm repo add longhorn https://charts.longhorn.io
helm repo update
helm install longhorn longhorn/longhorn --namespace longhorn-system --create-namespace --version 1.6.0
To study more about longhorn, Installation and Its working please refer to its offucuLonghornal documentation
wait for all the pods to be up and running in longhorn-systam namespace, also one defualt storage class named longhorn will be created for you
Now we need to deploy our mongo server and create a PVC for that, our longhorn will automatically provision one PV according to PVC requirements.
Create a file named dynamic-deploy.yaml which has both deployment and PVC configuration and apply it
apiVersion: apps/v1
kind: Deployment
metadata:
name: mongo
spec:
replicas: 1
selector:
matchLabels:
app: mongo
template:
metadata:
labels:
app: mongo
spec:
containers:
- image: mongo
name: mongo
args: ["--dbpath", "/data/db"]
env:
- name: MONGO_INITDB_ROOT_USERNAME
value: "admin"
- name: MONGO_INITDB_ROOT_PASSWORD
value: "password"
volumeMounts:
- mountPath: /data/db
name: mongo-volume
volumes:
- name: mongo-volume
persistentVolumeClaim:
claimName: mongo-pvc
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: mongo-pvc
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 5Gi
kubectl apply -f dynamic-deploy.yaml
Testing:
observer that a PV will be automatically created for you based on the PVC
observe that if you don’t have a default storage class you need to mention SC in your pvc configuration
observer if data sharing is possible in this case
Best Practices for Using Kubernetes Volumes
To effectively utilize Kubernetes volumes, consider the following best practices:
Choose the Right Volume Type: Evaluate your application's requirements to select between EmptyDir, PVs, and other volume types.
Use Persistent Volumes for Critical Data: For applications requiring data retention beyond pod lifecycles, always opt for Persistent Volumes.
Leverage Storage Classes for Automation: Utilize storage classes to automate volume provisioning and management, making your deployment processes more efficient.
Conclusion
Understanding Kubernetes volumes is essential for managing data persistence effectively in containerized applications. By leveraging different volume types like EmptyDir, Persistent Volumes, and Persistent Volume Claims alongside Storage Classes, developers can ensure their applications handle data reliably and efficiently. With these tools at your disposal, navigating the complexities of Kubernetes storage becomes significantly easier.