Dockerfile: 10 good practices

I recently attended the PyStok#23 [1] meeting. After
the event, me and my colleague got into an idea to write a small write-up about Dockerfile best practices. This article is divided into several headers, each describes one advice for proper Docker image creation. Enjoy!

Reduce number of layers

First of all, The less layers you will create, the smaller your image will be. Of course, you need to find readability / ascetic balance. Actually, it is perfect to have only one layer, but indeed, it is really rarely seen and possible.

Wrong way. You do not care about layers.

FROM centos  
RUN yum install -y httpd  
RUN yum install -y curl  

Good way. You do keep in mind that reducing layers is good to have smaller images.

FROM centos  
RUN yum install -y httpd curl  

Inherit images

Let's assume a typical situation. You would like to deploy a fully operational Tomcat based application with CentOS operating system libraries underneath. How will you do that?

Wrong way. You build a big Dockerfile with bunch of dependencies basing on CentOS image.

FROM centos  
RUN yum install -y java-1.7.0-openjdk tomcat  
COPY app.war /usr/local/tomcat/webapps/app.war  

Good way. You use official image as a base. Then, you will create a fully operational container with Java.  After that, you will be able to create a Tomcat image and your last step will be to create an Application image with your build
application container on Tomcat build basis.

FROM registery.example.com/centos:7  
RUN yum install -y java-1.7.0-openjdk  
FROM registery.example.com/centos7-java:1.7.0  
RUN yum install -y tomcat  
FROM registery.example.com/centos7-java170-tomcat:7  
COPY app.war /usr/local/tomcat/webapps/app.war  

Well, few words of explaination. You need to think for the future. Assume you will need to run a second application that will be based on the same dependencies. Wouldn't it be cool to have a template for Tomcat based applications? Assuming your developer would like to run a simple JAR as an application? Wouldn't it be cool to have a Java focused image? Isn't it much more readable to have one purpose for each image and then build a stack? Think about that.

Add metadata

Always add a maintainer and some metadata. There is no documentation for your Dockerfile besides this Dockerfile and metadata. So do not be afraid of adding many labels such

  • project name.
  • maintainer,
  • project description,
  • release date,
  • ...

and many others.

Use .dockerignore

Ok. One of most famous security issues is to keep .git directory in
your HTTP server. Making long story short - use .dockerignore to
avoid copying data that might cause problems or is not a requirement
to run your application. *Wrong way. *You do not use .dockerignore and copy everything what is in application root tree. *Good way. *You copy data with caution. An example of a well-defined .dockerignore is attached below:

.git
.gitignore
LICENSE  
VERSION  
README.md  
Changelog.md  
Makefile  
docker-compose.yml  
docs  

Make commands readable

You need to remember, you are not the only one who will be using your Dockerfile. So, instead of enjoying serial features lines, I really encourage you to try to keep your code clean.

Wrong way. You create an extremely long run commands.

FROM ubuntu:xenial  
RUN apt-get install -y tar git curl nano wget dialog net-tools build-essential  

Good way. You split your commands into multiline parts.

FROM ubuntu:xenial  
RUN apt-get install -y tar \  
    git \
    curl \
    nano \
    wget \
    dialog \
    net-tools \
    build-essential

Best way. You split your commands into multiline parts and sort them alphanumerically.

FROM ubuntu:xenial  
RUN apt-get install -y build-essential \  
    curl \
    dialog \
    git \
    nano \
    net-tools \
    tar \
    wget

Use WORKDIR

Well, it is highly recommended to:

  • use absolute paths inside Dockerfile-s,
  • use WORKDIR instead of RUN cd ...

It is much more readable to the absolute values and one, defined working
directory then bunch of cd-s  executions.

Clean after yourself

If you make a mess, you need to clean after yourself, so instead of
leaving it for later, do it as soon as it is possible. What is the
reason? The same. Image size. *Wrong way. *You do not clean after
yourself

FROM centos:7  
RUN yum install -y httpd  

Good way. You clean after yourself at all.

FROM centos:7  
RUN yum install -y httpd  
ADD . /srv  
RUN yum clean all  

Best way. You clean after yourself as soon as you can.

FROM centos:7  
RUN yum install -y httpd && \  
    yum clean all
ADD . /srv  

Keep your container clear

Basically, to run your application you do not need to have unit-testing
libraries installed. The same thing with integration and performance
testing packages as well. My tip: do not install build-essentials
packages to the image that will run your application. *Wrong way. *You
install everything and your Docker images are larger than they should
be. *Good way. *You install only needed and nescessary packages to run
your code. *Best way. *You keep your containers well-defined.
One purpose. One process. You keep your installation list to a minimum.

Install updates

The only good way. Update your packages.

FROM centos:7  
RUN yum -y update && \  
    yum install -y httpd && \
    yum clean all
ADD . /srv  

Do not use latest tag

latest tag is like a SNAPSHOT for Maven familiar people. The best way is to precise your base image version and use this only one. Wrong
way.
You use latest tag.

FROM centos  

Good way. You precise image version.

FROM centos:7  

Be aware of order

Remember. If your layer has changed, every next layer will be rebuild.

Wrong way. You do not care about the order of your commands.

Good way. You think about the build process. Let me present you an example.

You have a Django  based application. The Python packages are
defined in requirements.txt. A wrong example can look like the
following:

FROM centos:7  
...
WORKDIR /srv  
ADD . /srv/app  
RUN pip install -r /srv/requirements.txt  

What is wrong with this code? It is obvious, your code is changed any
time you run your build. If it is not, your artifacts should not change.
Anyway, if your code has changed, the first layer is changed by every
build. It causes that your application requirements will be reinstalled
by every build. And imagine your requirements.txt contains numpy
package. The better alternative is to create the following code.

FROM centos:7  
...
WORKDIR /srv  
ADD ./requirements.txt /srv/requirements.txt  
RUN pip install -r /srv/requirements.txt  
ADD . /srv/  

As you can see, this Dockerfile contains one additional line and the
layers from the previous example.

HEALTHCHECK is your friend

Starting a process is not all. We want to be sure container is up and
running. I would highly recommend to add HEALTHCHECK option to all
your containers. It convince you* *that your HTTP server is able to
handle new connection, that your Redis database is up and running. An
example of HEALTHCHECK is attached:

HEALTHCHECK --interval=5m \  
            --timeout=3s \
            CMD curl -I -L -f https://localhost/ || exit 1

It is not all

Have you noticed I gave you over 10 tips? :-) Anyway, of course, I have not mentioned bunch of security cases, written about the complementary of build, mentioned stability of your applications and storage considerations. I will write a big article about security topic in April this year. I hope this article was helpful for you. If you are interested in Docker images security, I will be happy to see a feedback. I really encourage you to read the following references: [2], [3], [4], [5]. My above considerations were based on them and my own practice and experience as well. They will definitely improve your Docker images understading.