Skip to main content

Containers Part 3: How to run existing applications in containers?

Dmitry Melanchenko
May 23 - 7 min read

This is part 3 of a 3 part series on containers. Check out the background intro in case you missed it.

Now that you have some basic knowledge of Docker and docker-compose, let’s cover how to stitch together multiple containers into a containerized local development environment.

Find all services the app depends on

The goal of the first step is to find all services that make up a local dev environment. All of these services should be divided into 3 major categories:

  • Containerized to run locally
  • Managed and hosted services
  • Mocked services

If it makes sense to run a service locally, put it into the first category. Of course, the main application and applications the environment creates should be in this category.

Some services can’t be put into containers and it’s much easier to connect to and use an existing instance managed by someone else. Good examples here are massive databases like Oracle, services running on different platforms like Active Directory running on Windows, and services where neither code nor binaries are available like AWS DynamoDB. These types of services belong in the second category.

The third category is for services that can be mocked. Such services aren’t critical parts of the environment but some basic emulation can be sufficient. Services that aren’t developed yet but that you expect to integrate at some point in future also belong in this category.

Build containers for all components

This step seems obvious. However in some cases, like when several people use the local dev environment, it means that someone needs to establish a process to build containers for every application in the environment on major commits. For example, all commits to the development or master branch should trigger a build process for a container. Containers should be labeled and stored in a centralized registry.

It’s recommended to use mutable tags like latest for images on a development branch. This allows developers to pull the latest images without the extra step of figuring out the latest build or commit on the branch. Mutable here means that when a new image is generated the label will be moved from the current holder to the new image.

If there are images for third party services owned by someone else, just put paths to the images into your docker-compose.yaml file. Otherwise you’ll have to build the images by yourself and put them into the same centralized registry.

With all images stored in the centralized registry, everyone on a team can call docker-compose pull --parallel and docker-compose will pull images for all services listed in the docker-compose.yaml file.

Define boot sequence

The next step is to define a startup sequence for services in the environment. If you don’t need features of docker-compose 3.0+, you can use a condition version of the depends_on parameter in combination with health checks defined for every service. This is an extremely powerful mechanism to define such sequences.

Let’s imagine your application uploads and runs something on Apache Storm. Apache Storm is three components by itself and it needs Apache Zookeeper to run. Plus, your application may need a queue service like Apache Kafka which also depends on Apache Zookeeper. As a result, your docker-compose file will look something like this:

services:
zookeeper:
image: my-zookeeper
healthcheck:
test: ["CMD", "/opt/tools/zookeeper/zookeeper-3.4.8/bin/zkServer.sh", "status"]
interval: 10s
timeout: 10s
retries: 3

kafka:
image: my-kafka
depends_on:
zookeeper:
condition: service_healthy
healthcheck:
test: ["CMD", "/usr/bin/nc", "-z", "127.0.0.1", "9092"]
interval: 30s
timeout: 10s
retries: 3

nimbus:
image: storm-numbus
depends_on:
zookeeper:
condition: service_healthy
healthcheck:
test: /usr/bin/ncat 127.0.0.1 6627 < /dev/null > /dev/null && echo "yes"
interval: 30s
timeout: 10s
retries: 3

supervisor:
image: storm-supervisor
depends_on:
nimbus:
condition: service_started
zookeeper:
condition: service_healthy ui:
image: storm-ui
healthcheck:
test: /usr/bin/ncat 127.0.0.1 8080 < /dev/null > /dev/null && echo "yes"
interval: 30s
timeout: 10s
retries: 3
ports:
- "8080:8080"

myapp:
build: ../my-app
image: my-app:dev-latest
healthcheck:
test: ["CMD", "/usr/bin/curl", "-s", "127.0.0.1:88/status"]
interval: 10s
timeout: 5s
retries: 3
ports:
- "8088:88"
depends_on:
kafka:
condition: service_healthy
nimbus:
condition: service_healthy
supervisor:
condition: service_healthy
ui:
condition: service_started

As you can see from the example, all the services here come with health checks and use two types of dependencies: service_started (start-to-start dependency) and service_healthy (become-healthy-to-start).

A single command, docker-compose up myapp, starts all services in the right order and quickly gives a developer access to a local environment for trying integration scenarios.

Initialize database if needed

If you run a database service as part of your local dev environment and your application has no functionality to initialize a database, then you need a special service container to do the job.

There are several different tools, like Flyway, that can do the job. These tools start the container right after the database service becomes available while all other components wait for the job to complete.

Unfortunately, docker-compose doesn’t support a concept of init containers like Kubernetes does. So you’ll have to put the container into an infinite sleep as soon as database initialization is completed.

This service container can also be used to load sample data into the database.

Hack an entrypoint script

Authors of images frequently put some fancy logic into entrypoint scripts: the scripts may wait for dependencies, request files from somewhere (certificates for example), generate configuration using information from environment variables, etc.

In many cases, none of this is needed. You can skip these extra steps when the container runs locally where all the settings are known. Some of these actions can also be almost impossible to reproduce locally when, for example, the scripts make calls to services that aren’t available as part of the environment.

In all these cases it makes sense to create a simplified version of the entry point script and inject it into a container overriding the original file using the volumes property of the service.

Inject configuration

Configuration files are another example of predefined files you might need to inject into a container. You might want to direct all logging to STDOUT instead of files, you might want to disable a feature like HTTPS endpoints that’s unavailable when an application runs in the containerized local environment, or you might want to run an application with different settings to see how it works in that new mode.

In all these cases, you can create a local configuration file and mount it into a container using the volumes property of the service. Some configuration files can be mounted into several containers at the same time. A good example here is a logging configuration. You might want all logs from all services to be formatted the same way, so it’s a reasonable approach to put the settings into a single fine and mount it to all containers.

Imitate a real life setup

A production environment is normally created and managed by different teams over a long period of time. To replicate this environment locally, with all its bells and whistles, is resource intensive and often can’t be done in a reasonable time frame.

Agile and iterative approaches help a lot here. The most critical features can be identified and planned out on a roadmap. For example, the initial phase can be to stitch all 3rd party services together with a minimal viable configuration, i.e. without any authentication and authorization. The next phase can be to add all the services owned by a team, then add AuthNZ into the environment, next integrate with managed services, and so on.

It’s important to remember that the goal of the environment isn’t to run production in a box but to give developers an instrument to develop faster without waiting for some else to unblock or fix that centralized development environment.

Track versions of third party services

As mentioned earlier, there is a group of third party services that run locally in containers. An image used to run these services as part of the environment should be in-sync with how the services run on a development and production environment. In-sync here means that versions and configurations for the services should be ideally identical there and locally. If the owner of one of the services plans to migrate to the next version, there should be an image prepared for the new version. This will help teams test integration early and avoid unpleasant surprises down the road.

Issues moving apps into containers

“Smart” applications

An application can be considered “smart” when it makes guesses about environment where it runs. Maybe it makes assumptions about host names where other services run, or when the application configures itself using compiled in configuration files.

Normally such applications know nothing about your containerized environment. Not many things can be done if all the configuration is compiled into the application. The best you can do is to replicate a wonderland where the application expects to find itself, including domain and host names, ports and other assumed “defaults.”

An easier case is when the smartness is in a startup script generates a configuration file using these assumptions and runs the main application with that file. In this case, it’s possible to inject an correct config file for the local containerized environment into the container and run the application directly, skipping the startup script.

Multiprocess container

Sometimes it’s a good idea to package and run several closely connected applications in a single container when only one of the applications is user-facing. It is a risky implementation; the main problem here is how to report back to Docker that all the components of the container are healthy. Practically, such a container needs one more application to aggregate health checks from all the apps running in the container and report the results back to Docker.

If one of the container applications is unhealthy, Docker restarts the container.

Here are a few other possible problems with such containers:

  • Logs — To make logs traceable, every app in a container should attribute logs with its own name.
  • Resource limits — Resource quotas can be assigned to a container and all apps in the container should take a fair amount of resources. This division helps avoid OOM kills and other unpleasant consequences.
  • PID 1 problem — A process with PID 1 in Linux carries extra responsibilities (read more). So, an entry point of your container should know how to reap zombie processes. There are many tools available to solve the problem, i.e. tini.

Thanks for reading! Let me know your feedback on this part in the comments.

Related DevOps Articles

View all