Docker

While I have built some very basic docker containers, or adapted some more complicated ones (for instance upgrade my teams Airflow version), prior to building this site I had not had a lot of experience/understanding of how to do things “well”. I’m definitely still missing a lot! I’m starting to get an idea of why things were being done in a particular way. Below I detail a few of my learning over the last year or so of very intermittent use of Docker

I have found there have been three areas I needed to check to understand exactly how the environment was set up

1. Dockerfile

The Dockerfile contains:

  • The operating system
  • Required Static packages
  • Required Static files e.g. code, dependency files

2. entrypoint / cmd

The script or process that is going to run when they container is started. Where you’ll find what the point of the docker container.

ENTRYPOINT is more the fixed “purpose” of the container, CMD is more like a default, although both can be overridden

If both are used they get concatenated ENTRYPOINT ["echo"] CMD ["Hello", "world"]

Shell vs Exec

Generally exec is better, see links for more details why

Shell
CMD echo "Hello" "world"

command is run within the shell

Exec
CMD ["echo", "Hello", "world"]

command is run directly

More details

entrypoint vs cmd

3. Docker Compose yml

Potentially moving beyond config for just one container, to groups of related containers Docker Compose can contain configuration for:

  • The virtual network in which the container is going to run, e.g. exposed ports
  • Startup order of containers
  • Restart Policies for containers
  • Volumes (directories) typically on the host machine
  • Resource (CPU, Memory, etc)
  • Environment variables (that you don’t want static in the container)
  • And much more…

Best Practices

Small Containers

Something else that my good friend Dave (who graciously hosts this site) taught me was a few useful tips about how to make Docker containers smaller.

Dockerfile Layers

First off, he explained that every separate command in the Dockerfile is a separate layer of the container, which enables reuse of earlier parts of a build. This though means that if you delete things in later layers, it’s not really deleted just hidden. Hence you should delete any build-only content in the same layer as created it.

Common examples of this would be build tools needed to install some packages that are not needed to run them. Here I’m installing, using and then removing the build dependencies all in one layer

RUN \
  apk add --no-cache --virtual .build-deps \
  python3-dev \
  openssl-dev \
  libffi-dev \
  gcc \
  autoconf \
  automake \
  g++ \
  make \
  postgresql-dev \
  musl-dev \
  && set -ex && pipenv install --deploy --system \
  && apk del --no-cache .build-deps

In this example there are a few other tools being use to keep the size down.

Alpine

Alpine is a very small, lightweight version of Linux. Typically your container is only going to need to do one thing, so a small selection of things, so it does not need all the general purpose stuff, what it needs can be specifically installed

Alpine’s package manager (apk) has a few useful options for reducing the size too:

  • --no-cache means I won’t have any left over files from the installation
  • --virtual For convenience to name the group of dependencies for easier uninstall/deletion later