Featured image of post Docker for Beginners: Understanding Docker Storage and Volumes

Docker for Beginners: Understanding Docker Storage and Volumes

Discover the fundamentals of Docker storage, including volumes and bind mounts. Learn how to manage Docker storage effectively, ensuring data persistence and reflecting code changes seamlessly.

Introduction

Welcome back, everyone! Today, we’re diving into a crucial aspect of Docker: storage and volumes. This topic is essential for anyone looking to build robust, production-ready containerized applications.

In our previous articles, we’ve covered the basics of Docker, its architecture, creating custom images, and Docker networking. If you haven’t had a chance to review those topics, I strongly recommend doing so before we proceed.

Now, let’s consider a scenario: You’ve built a fantastic containerized application, but suddenly, you realize that every time you restart your container, all your data disappears. Frustrating, right? This is where Docker storage solutions come to the rescue.

In this lesson, we’ll explore:

  1. The fundamentals of Docker storage
  2. Various types of Docker storage, with a focus on volumes and bind mounts
  3. How to create and manage Docker volumes effectively
  4. A hands-on example using volumes with a database container
  5. Best practices for Docker storage
  6. Common troubleshooting tips to save you time and headaches

By the end of this article, you’ll be equipped with the knowledge to:

  • Separate your data’s lifecycle from your container’s lifecycle
  • Ensure data persistence across container restarts and upgrades
  • Reflect code changes immediately in development environments without rebuilding images

Remember, the key principle we’re working with is this: while containers are ephemeral (temporary), our data often needs to be permanent.

So, are you ready to unlock the power of Docker storage and volumes? Let’s get started!

In the next section, we’ll begin by exploring the basics of Docker storage and why it’s so important in containerized environments.

Docker Storage Basics

Now that we’ve set the stage, let’s dive into the fundamentals of Docker storage. To understand how Docker manages data, we need to look at three key concepts:

  1. Immutable images
  2. Temporary container storage
  3. Persistent data

Understanding these will help you grasp how Docker manages data and why we need different storage solutions.

The diagram below gives you a snapshot of how Docker’s storage architecture fits together, showing how containers, the Docker Engine, and storage options like bind mounts and Docker volumes interact. Don’t worry about the details for now, we’ll break down each part as we go along.

Docker Storage Architecture

Immutable Docker Images

Docker images are designed to be immutable - once created, they don’t change. When you need to update an application, you would typically:

  1. Stop the current container
  2. Remove it
  3. Replace it with a new one based on an updated image

This ensures consistent, reproducible deployments. In development, you might change a running container for quick testing, but this isn’t typical in production.

Temporary Container Storage

By default, data written inside a container goes to its temporary storage (the “writable layer”). This storage:

  • Is tied to the container’s lifecycle
  • Disappears when the container stops or is deleted

This works for stateless apps but isn’t suitable for most real-world applications that need to keep data.

The Need for Persistent Storage

Imagine running a database in a container. You wouldn’t want to lose all data when updating or restarting. This is where persistent storage comes in:

  • It allows data to survive beyond individual containers.
  • It enables quick iterations in development environments without rebuilding the entire container.

Understanding these three concepts is crucial for effective data management in Docker. Next, we’ll explore Docker’s persistent storage options, focusing on volumes and bind mounts.

Understanding Docker Volumes

In our previous section, we’ve covered Docker storage basics and the need for persistent data. Now, let’s explore Docker volumes - the preferred way to manage persistent data in Docker.

What are Docker volumes?

Docker volumes are a mechanism for storing data generated by and used by Docker containers. They’re specially designated folders on your host machine, managed by Docker. These folders can be mounted into containers, allowing data to persist even when containers are stopped or removed.

Creating and managing volumes

Let’s start by looking at how to create and manage Docker volumes. The Docker CLI provides several commands for this purpose:

  1. To create a volume:

    1
    
    docker volume create my_volume
    
  2. To list all volumes:

    1
    
    docker volume ls
    
  3. To inspect a volume:

    1
    
    docker volume inspect my_volume
    
  4. To remove a volume:

    1
    
    docker volume rm my_volume
    

Using volumes with containers

To use a volume with a container, we use the -v or --volume flag when running a container. Here’s an example:

1
docker run -d --name my_container -v my_volume:/app/data my_image

In this command:

  • my_volume is the name we’ve given to our Docker volume
  • /app/data is the directory inside the container where the volume will be mounted
  • my_image is the name of the Docker image we’re using to create the container

Any data written to /app/data in the container will be stored in the volume on the host machine, but it is managed by Docker.

When to Use Volumes

Now that we’ve covered creating and using volumes, let’s explore specific scenarios where they’re most beneficial:

  1. Persistent Data Storage: Use volumes when your app needs to keep data safe, even after removing the container.
  2. Sharing Data Between Containers: Volumes are ideal when multiple containers need to access the same information. This is common in microservices or separate app and database setups.
  3. Performance Boost: Use volumes when you need faster data reading and writing. Volumes often outperform the container’s built-in storage for large data operations.
  4. Host-Independence: While volumes live on the host machine, Docker manages them. This makes your containerized apps more portable, as they don’t rely on the host’s file structure.

In our next section, we’ll apply this knowledge with a practical example. We’ll use a volume with a database container to show how volumes maintain data across container lifecycles.

Hands-on Example: Using Volumes with a Database Container

Let’s put our knowledge into practice by creating a MySQL database container and using a Docker volume to persist its data.

Step 1: Create a Docker Volume

First, let’s create a volume to store our MySQL data:

1
docker volume create mysql_data

Step 2: Run a MySQL Container with the Volume

Now, we’ll start a MySQL container and mount our volume to it:

1
docker run -d --name mysql_db -e MYSQL_ROOT_PASSWORD=secretpassword -v mysql_data:/var/lib/mysql mysql:latest

Here’s what each part of this command does:

  • d: Run the container in detached mode (in the background)
  • -name mysql_db: Name our container ‘mysql_db’
  • e MYSQL_ROOT_PASSWORD=secretpassword: Set the root password for MySQL
  • v mysql_data:/var/lib/mysql: Mount our ‘mysql_data’ volume to ‘/var/lib/mysql’ in the container
  • mysql:latest: Use the latest MySQL image

Step 3: Use the MySQL Database

Let’s connect to our MySQL container and create a sample database:

1
docker exec -it mysql_db mysql -p

This command:

  • Uses docker exec to run a command inside our running container
  • it makes it interactive, so we can type commands
  • mysql -p starts the MySQL client and prompts for a password

When prompted, enter the password we set earlier (‘secretpassword’). You’ll now be in the MySQL shell. Enter these commands:

1
2
3
4
5
6
CREATE DATABASE test_db;  -- Creates a new database
USE test_db;              -- Switches to the new database
CREATE TABLE users (id INT, name VARCHAR(20));  -- Creates a new table
INSERT INTO users VALUES (1, 'John Doe');       -- Inserts a row into the table
SELECT * FROM users;      -- Retrieves all rows from the users table
EXIT;                     -- Exits the MySQL shell

Note: If you’re not familiar with SQL commands, it’s best to copy and paste these exactly as shown, including the semicolons.

Step 4: Demonstrate Data Persistence

To show that our data persists even if the container is removed, let’s stop and remove our MySQL container:

1
2
docker stop mysql_db
docker rm mysql_db

Now, let’s create a new MySQL container using the same volume:

1
docker run -d --name new_mysql_db -e MYSQL_ROOT_PASSWORD=secretpassword -v mysql_data:/var/lib/mysql mysql:latest

Connect to this new container and check if our data is still there:

1
docker exec -it new_mysql_db mysql -p

In the MySQL prompt, enter:

1
2
USE test_db;
SELECT * FROM users;

You should see the data we inserted earlier (ID: 1, Name: John Doe), demonstrating that our volume has persisted the data across container removals and creations.

This example illustrates the power of Docker volumes for data persistence. Even though we completely removed our original MySQL container, the data remained intact in our volume and was immediately available to our new container.

Bind Mounts:

Now that we’ve explored Docker volumes, let’s discuss another method for persisting data in Docker: bind mounts.

What are Bind Mounts?

Bind mounts are a way to mount a file or directory on the host machine into a container. Unlike volumes, which are managed by Docker, bind mounts rely on the host machine’s file system structure and can be accessed and modified by processes outside of Docker.

How to Use Bind Mounts

To use a bind mount, you use the -v or --mount flag when running a container, specifying both the path on the host and the path in the container. Here’s an example:

1
docker run -d --name nginx_container -v /path/on/host:/usr/share/nginx/html nginx:latest

In this command:

  • /path/on/host is the directory on your host machine
  • /usr/share/nginx/html is where it’s mounted in the container

You don’t need to create a bind mount beforehand; Docker will create it automatically when you run the container.

When to Use Bind Mounts

Bind mounts, while powerful, are best suited for specific scenarios. Understanding these can help you decide when to use bind mounts over Volumes:

  1. Development Environments: Use bind mounts when actively developing an application. They allow you to mount your source code directory into a container, change code on your host machine, and immediately see changes reflected in the container without having to rebuild it. This significantly speeds up your development cycle.
  2. Direct Host-Container Data Sharing: Bind mounts are ideal when you need containers to access specific files or directories on the host system. This is useful for processing data stored on the host or writing container output directly to host directories.
  3. Large Data Set Access: Use bind mounts when containers need to work with substantial amounts of data that are impractical to include in the container image. This keeps your images smaller and more portable.

While bind mounts offer flexibility, especially in development environments, volumes are generally recommended for production use due to their portability and the fact that they’re managed by Docker.

Comparing Docker Volumes and Bind Mounts: A Closer Look

Now that we’ve explored Docker volumes and bind mounts, let’s dive deeper into their differences. Both Docker volumes and bind mounts store data on the host’s physical storage, but their management and accessibility differ significantly:

  1. Data Management:
    • Docker volumes are fully managed by Docker. The host file system doesn’t directly interact with this data, creating a layer of abstraction.
    • Bind mounts, on the other hand, are managed by the host file system. Docker can access this data, but so can processes outside of Docker.
  2. Visibility to Host:
    • Docker volumes are essentially hidden from the host file system’s perspective. You’d need to use Docker commands to interact with this data.
    • Bind mounts are fully visible and accessible to the host file system. You can navigate to the bind mount directory just like any other folder on your system.
  3. Portability:
    • Docker volumes shine when it comes to portability. Since Docker manages these volumes, you can easily move them between different host systems without worrying about specific file paths.
    • Bind mounts are tied to the host file system’s structure. Moving a bind mount to another system requires ensuring the same directory structure exists on the new host.
  4. Isolation:
    • Docker volumes provide a higher level of isolation. Containers interact with volumes through Docker, not directly with the host file system.
    • Bind mounts offer less isolation since they’re directly linked to the host file system.

Understanding these differences is key to choosing the right storage solution for your Docker containers. In the next section, we’ll some best practices and troubleshooting tips.

Best Practices and Troubleshooting

As we’ve explored Docker volumes and bind mounts, let’s discuss some best practices and common troubleshooting tips to help you use Docker storage more effectively.

Best Practices

  1. Use volumes for persistent data: Whenever possible, use Docker volumes for data that needs to persist. They’re easier to back up and migrate than bind mounts.
  2. Consider cloud storage for long-term persistence: If you’re running Docker in a cloud environment, volume drivers that integrate with cloud storage solutions can provide better durability and scalability for long-lived data.
  3. Be cautious with bind mounts: While useful in development, bind mounts can pose security risks in production. If a container with a bind mount is compromised, it could potentially access or modify files on the host system. Use read-only mounts when possible.

Troubleshooting Tips

  1. Permission issues: Check the ownership and permissions of the mounted directory or volume. You might need to adjust the user or group ID that the container process runs as.
  2. Volume not mounting: Verify that the volume exists (docker volume ls) and that you’ve spelled the volume name correctly in your docker run command.
  3. Disk space issues: Use docker system df to check Docker’s disk usage. docker system prune can help clean up, but use it cautiously as it removes stopped containers, unused networks, dangling images, and build cache.

Remember, docker inspect is invaluable for troubleshooting. It provides detailed information about containers, volumes, and networks.

Conclusion

In this article, we’ve taken a deep dive into Docker storage, focusing on volumes and bind mounts. We’ve covered:

  1. The basics of Docker storage and the need for persistent data
  2. Docker volumes: what they are, how to create and manage them, and their benefits
  3. A hands-on example using volumes with a MySQL database container
  4. Bind mounts: an alternative approach to Docker storage
  5. Best practices for using Docker storage effectively
  6. Troubleshooting tips for common Docker storage issues

Understanding how to manage data effectively is crucial when working with Docker. Whether you’re developing applications, managing databases, or deploying complex systems, the concepts we’ve covered will help you make informed decisions about how to handle persistent data in your Docker environments.

In our next article, we’ll explore container runtimes beyond Docker, including containerd and CRI-O (Container Runtime Interface - Open Container Initiative). These technologies are becoming increasingly important in the container ecosystem, especially in Kubernetes environments.