An introduction to the container ecosystem

Containerisation has been a big technology over the past couple of years and as any curious engineer I've been trying to gather what information I can on the technology before I start using it in production. While researching and experimenting I've found it really difficult to understand the layers, where a technology ends, and what the use cases are for each. It seems people are really great at making new technologies, but no one has quite figured out how to explain them to those who are new.

That's what this is essay is, it's my attempt to explain what containerisation is, and what the various technologies are. It's my hopes that this will give you enough to understand the containerisation languague everyone seems to be speaking so you can dive into the more technical resources with a bit more confidence.

I've been learning this stuff over the past year or so now, so I'm no expert. I'm not managing infrastructure serving billions of requests a day, and I'm not a contributor to any of the projects you'll see my mention. I'm just a full stack engineer trying to navigate this new found world and hoping I can help you too. If you spot mistakes or think I could make things simpler, please get in touch with me!

Finally, there are many guides out there explaining how to get going with Docker, containers, and Kuberwhatever. If you're looking for a quick start guide or specific step by step instructions for a technology, this is not the post you want.

Virtual Machines

Over the last decade we've had the rise of Virtual Machines (VMs) for application deployment and management. This is essentially one physical computer called the host which has an OS and some sort of VM management layer. Then ontop of the management layer we have a bunch of isolated virtual computers each with it's own OS and networking. We are free to buy huge machines and split them up into VM's so we can reap as much of the host resources as possible, while maintaining isolation within our applications.

VM's have been the dominant technology for most sys admins / ops teams managing large application stacks. If you work in a startup or a smaller business, you are probably not even aware of the host machine very often. Platforms like AWS, Azure, and GCP abstract this from you. Instead you can simply request a VM with a particular set of specs.

Note: many platforms expose their "compute" platforms like VM's, but some actually use containers under the hood. The point is, it's treated the same by us working with the machines.

But now we are seeing the rise of Containers. This begs the question, what is containerisation and why are people moving over to it?

What are containers?

I'm about to give a very simplistic overview of what containers are, I'd highly recommend you go and find out how they work under the hood to get a better understanding.

Containers are a different approach to application isolation across a fleet of physical machines. They use a single host operating system, and then use a number of systems on the host OS to create isolated environments.

Containerised systems only require a host OS and some sort of container management tool. The result is that each container (isolated process) looks and acts as if it's a machine of it's own, it has it's own sub processes, it's own root folders, environment variables, etc. If you inspect an empty container, it should look like it's own clean machine but with a lot less overhead. In addition to the lower resource requirement, startup time can be much faster as well as you no longer need to bootup a whole operating system just to run an application.

It's worth noting that none of this technology is new, containerisation is simply leveraging tools that have existed in kernels for decades. Containerisation is more about the use of the tools together and the automation around it. Companies like Google have been using these tools for a long time now.

Docker

Docker is a technology developed and maintained by Docker, Inc. Docker is a tool for container management, it is used mostly to create images and then run and operate containers.

Dockerfile and Docker CLI

The most common use of Docker at the moment is the creation of container images. These images are much like a VM image, it is a blue-print to a container. Once you have an image, you can create as many instances (containers) of it as you wish.

Docker has created a DSL for defining the creation of a container image. This image actually follows the Docker Image format, but it is widely supported by a number of other applications and systems these days.

Using a Dockerfile and the Docker CLI you can create an image, you can then create as many containers from this image as you like. You can also publish this packaged image to a variety of different repositories. Think of this like publishing a library, you can publish it to a private repository (such as your own GCP repository) or to a public repository such as Docker Hub.

Orchestration

So far we've got the creation of individual containers, this is a great way of packing and running an application. It means someone can run applications of all different technologies, only having to interact with something like docker. But there are still a lot of problems to solve if you want to run long running, resilient, applications in a production setting. This is where "Orchestration" tools come into play.

Docker Compose

Compose is a tool for defining and running multi-container Docker applications.

Compose is traditionally targeted to development and testing environments, rather than production systems. However it can also be used on single host production systems if you want the simplicity.

Compose is baked into the Docker toolchain and allows you to define a collection of containers, the resources they need, and any networking between them. This way you can quickly and easily get complex applications running locally. This can be very useful for local development if you have a collection of microservices which all come together to form one large application.

The compose file format (often called docker-compose.yaml) is a very simple format and has a dramatically smaller API when compared to tools like Kubernetes. This makes it really perfect for small environments where you want to get up and running quickly.

It's important to note that Compose will only function across a single host, it does not have any sort of multi-node management systems. This is the big difference between Compose and tools such as Kubernetes, and Swarm.

Kubernetes, Swarm, and OpenShift

Kubernetes (k8s for short), Swarm, and OpenShift are "container orchestration" platforms. Essentially you can tell them what containers to use, and how to connect them, and let that platform manage things such as networking, auto-scaling, restarting dead components, and more.

Your orchestration tool sits across your fleet of hardware and manages your container services, their networking, and many other aspects. In most cases you will tend to run a instance of the master orchestration service, and then each node will also run some sort of agent responsible for checking in and carrying out the will of the master serivce.

Many popular cloud providers such as Google, Amazon, and Microsoft all provide some sort of managed orchestration setup. In this case, you won't have to install your orchestration tool, instead you just request X number of nodes and can run anything you want across that pool of resources.

Basic stack example

To help illustrate things, lets work through a simple example. In this example we have a custom web application. This application consists of a custom web application which serves HTML, it talks to a Postgres database, and uses a Redis instance as a cache. Here are the top level steps we might go through, each identifying the technologies we would use at that point.

  1. Our web application will consume environment variables to configure hosts, ports, and credentials for Postgres and Redis. The web application also listens on a port defined by an environment variable.
  2. We then create a Dockerfile for this application, it defines an exposed port of 8081 which we expect our application to listen on.
  3. To complete the web application setup we can then push it to a registry. In this case for ease we'll assume it's on Docker Hub
  4. Then we identify the official Postgres docker image postgres:10.3 and the official Redis image redis:3.2.11. These will be the containers which act as our two data storage systems. Both of these are also avalible viar Docker Hub.
  5. We now have all the components for our application, if we want to do some local development we might run this using Compose to play around locally.
  6. If we want to host it as a production system, we are more likely to adopt something like Kubernetes. With the use of a few Kuberentes files we can then host all 3 of our services, perhaps with multiple instances of our main application. We can then define some sort of Load Balancer / proxy in front of the service using a Kubernetes service definition.

Further reading

This article acts a very high level introduction to the container ecosystem. From here you might choose to go on and read more detailed introductions of various technologies. I'd suggest checking out some of the Getting Started guides by Docker to begin with. Those guides should give you a good overview of a number of technologies with practical examples you can follow along with.