Reducing image size

Introduction

  • In the previous example, our final image contained:

    • our hello program

    • its source code

    • the compiler

  • Only the first one is strictly necessary.

  • We are going to see how to obtain an image without the superfluous components.

Can't we remove superfluous files with RUN?

What happens if we do one of the following commands?

  • RUN rm -rf ...

  • RUN apt-get remove ...

  • RUN make clean ...

This adds a layer which removes a bunch of files.

But the previous layers (which added the files) still exist.

Removing files with an extra layer

When downloading an image, all the layers must be downloaded.

Dockerfile instruction Layer size Image size
FROM ubuntu Size of base image Size of base image
... ... Sum of this layer
+ all previous ones
RUN apt-get install somepackage Size of files added
(e.g. a few MB)
Sum of this layer
+ all previous ones
... ... Sum of this layer
+ all previous ones
RUN apt-get remove somepackage Almost zero
(just metadata)
Same as previous one

Therefore, RUN rm does not reduce the size of the image or free up disk space.

Removing unnecessary files

Various techniques are available to obtain smaller images:

  • collapsing layers,

  • adding binaries that are built outside of the Dockerfile,

  • squashing the final image,

  • multi-stage builds.

Let's review them quickly.

Collapsing layers

You will frequently see Dockerfiles like this:

FROM ubuntu
RUN apt-get update && apt-get install xxx && ... && apt-get remove xxx && ...

Or the (more readable) variant:

FROM ubuntu
RUN apt-get update \
 && apt-get install xxx \
 && ... \
 && apt-get remove xxx \
 && ...

This RUN command gives us a single layer.

The files that are added, then removed in the same layer, do not grow the layer size.

Collapsing layers: pros and cons

Pros:

  • works on all versions of Docker

  • doesn't require extra tools

Cons:

  • not very readable

  • some unnecessary files might still remain if the cleanup is not thorough

  • that layer is expensive (slow to build)

Building binaries outside of the Dockerfile

This results in a Dockerfile looking like this:

FROM ubuntu
COPY xxx /usr/local/bin

Of course, this implies that the file xxx exists in the build context.

That file has to exist before you can run docker build.

For instance, it can:

  • exist in the code repository,
  • be created by another tool (script, Makefile...),
  • be created by another container image and extracted from the image.

See for instance the busybox official image or this older busybox image.

Building binaries outside: pros and cons

Pros:

  • final image can be very small

Cons:

  • requires an extra build tool

  • we're back in dependency hell and "works on my machine"

Cons, if binary is added to code repository:

  • breaks portability across different platforms

  • grows repository size a lot if the binary is updated frequently

Squashing the final image

The idea is to transform the final image into a single-layer image.

This can be done in (at least) two ways.

  • Activate experimental features and squash the final image:

    docker image build --squash ...
    
  • Export/import the final image.

    docker build -t temp-image .
    docker run --entrypoint true --name temp-container temp-image
    docker export temp-container | docker import - final-image
    docker rm temp-container
    docker rmi temp-image
    

Squashing the image: pros and cons

Pros:

  • single-layer images are smaller and faster to download

  • removed files no longer take up storage and network resources

Cons:

  • we still need to actively remove unnecessary files

  • squash operation can take a lot of time (on big images)

  • squash operation does not benefit from cache
    (even if we change just a tiny file, the whole image needs to be re-squashed)

Multi-stage builds

Multi-stage builds allow us to have multiple stages.

Each stage is a separate image, and can copy files from previous stages.

We're going to see how they work in more detail.

Multi-stage builds

  • At any point in our Dockerfile, we can add a new FROM line.

  • This line starts a new stage of our build.

  • Each stage can access the files of the previous stages with COPY --from=....

  • When a build is tagged (with docker build -t ...), the last stage is tagged.

  • Previous stages are not discarded: they will be used for caching, and can be referenced.

Multi-stage builds in practice

  • Each stage is numbered, starting at 0

  • We can copy a file from a previous stage by indicating its number, e.g.:

    COPY --from=0 /file/from/first/stage /location/in/current/stage
    
  • We can also name stages, and reference these names:

    FROM golang AS builder
    RUN ...
    FROM alpine
    COPY --from=builder /go/bin/mylittlebinary /usr/local/bin/
    

Multi-stage builds for our C program

We will change our Dockerfile to:

  • give a nickname to the first stage: compiler

  • add a second stage using the same ubuntu base image

  • add the hello binary to the second stage

  • make sure that CMD is in the second stage

The resulting Dockerfile is on the next slide.

Multi-stage build Dockerfile

Here is the final Dockerfile:

FROM ubuntu AS compiler
RUN apt-get update
RUN apt-get install -y build-essential
COPY hello.c /
RUN make hello
FROM ubuntu
COPY --from=compiler /hello /hello
CMD /hello

Let's build it, and check that it works correctly:

docker build -t hellomultistage .
docker run hellomultistage

Comparing single/multi-stage build image sizes

List our images with docker images, and check the size of:

  • the ubuntu base image,

  • the single-stage hello image,

  • the multi-stage hellomultistage image.

We can achieve even smaller images if we use smaller base images.

However, if we use common base images (e.g. if we standardize on ubuntu), these common images will be pulled only once per node, so they are virtually "free."