Tips for efficient Dockerfiles

Introduction

Nota Bene

We will see how to:

  • Reduce the number of layers.

  • Leverage the build cache so that builds can be faster.

  • Embed unit testing in the build process.

Reducing the number of layers

  • Each line in a Dockerfile creates a new layer.

  • Build your Dockerfile to take advantage of Docker's caching system.

  • Combine commands by using && to continue commands and \ to wrap lines.

Note: it is frequent to build a Dockerfile line by line:

RUN apt-get install thisthing
RUN apt-get install andthatthing andthatotherone
RUN apt-get install somemorestuff

And then refactor it trivially before shipping:

RUN apt-get install thisthing andthatthing andthatotherone somemorestuff

Avoid re-installing dependencies at each build

  • Classic Dockerfile problem:

    "each time I change a line of code, all my dependencies are re-installed!"

  • Solution: COPY dependency lists (package.json, requirements.txt, etc.) by themselves to avoid reinstalling unchanged dependencies every time.

Example "bad" Dockerfile

The dependencies are reinstalled every time, because the build system does not know if requirements.txt has been updated.

FROM python
WORKDIR /src
COPY . .
RUN pip install -qr requirements.txt
EXPOSE 5000
CMD ["python", "app.py"]

Fixed Dockerfile

Adding the dependencies as a separate step means that Docker can cache more efficiently and only install them when requirements.txt changes.

FROM python
COPY requirements.txt /tmp/requirements.txt
RUN pip install -qr /tmp/requirements.txt
WORKDIR /src
COPY . .
EXPOSE 5000
CMD ["python", "app.py"]

Embedding unit tests in the build process

FROM <baseimage>
RUN <install dependencies>
COPY <code>
RUN <build code>
RUN <install test dependencies>
COPY <test data sets and fixtures>
RUN <unit tests>
FROM <baseimage>
RUN <install dependencies>
COPY <code>
RUN <build code>
CMD, EXPOSE ...
  • The build fails as soon as an instruction fails
  • If RUN <unit tests> fails, the build doesn't produce an image
  • If it succeeds, it produces a clean image (without test libraries and data)

Dockerfile examples

There are a number of tips, tricks, and techniques that we can use in Dockerfiles.

But sometimes, we have to use different (and even opposed) practices depending on:

  • the complexity of our project,

  • the programming language or framework that we are using,

  • the stage of our project (early MVP vs. super-stable production),

  • whether we're building a final image or a base for further images,

  • etc.

We are going to show a few examples using very different techniques.

When to optimize an image

When authoring official images, it is a good idea to reduce as much as possible:

  • the number of layers,

  • the size of the final image.

This is often done at the expense of build time and convenience for the image maintainer; but when an image is downloaded millions of time, saving even a few seconds of pull time can be worth it.

RUN apt-get update && apt-get install -y libpng12-dev libjpeg-dev && rm -rf /var/lib/apt/lists/* \
    && docker-php-ext-configure gd --with-png-dir=/usr --with-jpeg-dir=/usr \
    && docker-php-ext-install gd
...
RUN curl -o wordpress.tar.gz -SL https://wordpress.org/wordpress-${WORDPRESS_UPSTREAM_VERSION}.tar.gz \
    && echo "$WORDPRESS_SHA1 *wordpress.tar.gz" | sha1sum -c - \
    && tar -xzf wordpress.tar.gz -C /usr/src/ \
    && rm wordpress.tar.gz \
    && chown -R www-data:www-data /usr/src/wordpress

(Source: Wordpress official image)

When to not optimize an image

Sometimes, it is better to prioritize maintainer convenience.

In particular, if:

  • the image changes a lot,

  • the image has very few users (e.g. only 1, the maintainer!),

  • the image is built and run on the same machine,

  • the image is built and run on machines with a very fast link ...

In these cases, just keep things simple!

(Next slide: a Dockerfile that can be used to preview a Jekyll / github pages site.)

FROM debian:sid

RUN apt-get update -q
RUN apt-get install -yq build-essential make
RUN apt-get install -yq zlib1g-dev
RUN apt-get install -yq ruby ruby-dev
RUN apt-get install -yq python-pygments
RUN apt-get install -yq nodejs
RUN apt-get install -yq cmake
RUN gem install --no-rdoc --no-ri github-pages

COPY . /blog
WORKDIR /blog

VOLUME /blog/_site

EXPOSE 4000
CMD ["jekyll", "serve", "--host", "0.0.0.0", "--incremental"]

Multi-dimensional versioning systems

Images can have a tag, indicating the version of the image.

But sometimes, there are multiple important components, and we need to indicate the versions for all of them.

This can be done with environment variables:

ENV PIP=9.0.3 \
    ZC_BUILDOUT=2.11.2 \
    SETUPTOOLS=38.7.0 \
    PLONE_MAJOR=5.1 \
    PLONE_VERSION=5.1.0 \
    PLONE_MD5=76dc6cfc1c749d763c32fff3a9870d8d

(Source: Plone official image)

Entrypoints and wrappers

It is very common to define a custom entrypoint.

That entrypoint will generally be a script, performing any combination of:

  • pre-flights checks (if a required dependency is not available, display a nice error message early instead of an obscure one in a deep log file),

  • generation or validation of configuration files,

  • dropping privileges (with e.g. su or gosu, sometimes combined with chown),

  • and more.

A typical entrypoint script

 #!/bin/sh
 set -e

 # first arg is '-f' or '--some-option'
 # or first arg is 'something.conf'
 if [ "${1#-}" != "$1" ] || [ "${1%.conf}" != "$1" ]; then
     set -- redis-server "$@"
 fi

 # allow the container to be started with '--user'
 if [ "$1" = 'redis-server' -a "$(id -u)" = '0' ]; then
     chown -R redis .
     exec su-exec redis "$0" "$@"
 fi

 exec "$@"

(Source: Redis official image)

Factoring information

To facilitate maintenance (and avoid human errors), avoid to repeat information like:

  • version numbers,

  • remote asset URLs (e.g. source tarballs) ...

Instead, use environment variables.

ENV NODE_VERSION 10.2.1
...
RUN ...
    && curl -fsSLO --compressed "https://nodejs.org/dist/v$NODE_VERSION/node-v$NODE_VERSION.tar.xz" \
    && curl -fsSLO --compressed "https://nodejs.org/dist/v$NODE_VERSION/SHASUMS256.txt.asc" \
    && gpg --batch --decrypt --output SHASUMS256.txt SHASUMS256.txt.asc \
    && grep " node-v$NODE_VERSION.tar.xz\$" SHASUMS256.txt | sha256sum -c - \
    && tar -xf "node-v$NODE_VERSION.tar.xz" \
    && cd "node-v$NODE_VERSION" \
...

(Source: Nodejs official image)

Overrides

In theory, development and production images should be the same.

In practice, we often need to enable specific behaviors in development (e.g. debug statements).

One way to reconcile both needs is to use Compose to enable these behaviors.

Let's look at the trainingwheels demo app for an example.

Production image

This Dockerfile builds an image leveraging gunicorn:

FROM python
RUN pip install flask
RUN pip install gunicorn
RUN pip install redis
COPY . /src
WORKDIR /src
CMD gunicorn --bind 0.0.0.0:5000 --workers 10 counter:app
EXPOSE 5000

(Source: trainingwheels Dockerfile)

Development Compose file

This Compose file uses the same image, but with a few overrides for development:

  • the Flask development server is used (overriding CMD),

  • the DEBUG environment variable is set,

  • a volume is used to provide a faster local development workflow.

services:
  www:
    build: www
    ports:
      - 8000:5000
    user: nobody
    environment:
      DEBUG: 1
    command: python counter.py
    volumes:
      - ./www:/src

(Source: trainingwheels Compose file)

How to know which best practices are better?

  • The main goal of containers is to make our lives easier.

  • In this chapter, we showed many ways to write Dockerfiles.

  • These Dockerfiles use sometimes diametrally opposed techniques.

  • Yet, they were the "right" ones for a specific situation.

  • It's OK (and even encouraged) to start simple and evolve as needed.

  • Feel free to review this chapter later (after writing a few Dockerfiles) for inspiration!