Docker 102: Faster image building
Gitlab 103: Kaniko image building described, how to build Docker respective OCI images using Kaniko.
Compared to Docker-in-Docker it has one drawback:
speed — at least when your Dockerfile
contains RUN
commands, which take a long time like doing an apt-get install
with many or large packages.
This was a problem for the UCS pipeline for building ucslint
:
downloading and installing the Debian package dependencies took most of the 3:24 minutes, while adding the latest version from git was done under a second.
Here are some tricks on how to improve your image and how to speed up the build process for them.
Previous layer caching
If you carefully read Docker 101: Container basics you will notice the following behavior, when an image is built:
for each layer docker
tracks, which underlying layer gets used and what command is used to build the next layer.
This can be seen by running the docker image history
command:
# cat temp/Dockerfile
FROM debian:bullseye-slim
RUN touch /stamp
ADD Dockerfile /
# docker build -t empty temp/
Sending build context to Docker daemon 2.048kB
Step 1/3 : FROM debian:bullseye-slim
---> bfbec70f8488
Step 2/3 : RUN touch /stamp
---> Running in ce41dd79a586
Removing intermediate container ce41dd79a586
---> d3831cac03c4
Step 3/3 : ADD Dockerfile /
---> e1eddc7fc1d8
Successfully built e1eddc7fc1d8
Successfully tagged empty:latest
# docker image history empty
IMAGE CREATED CREATED BY SIZE COMMENT
e1eddc7fc1d8 29 seconds ago /bin/sh -c #(nop) ADD file:02a0d6b33cf854ad6… 60B
d3831cac03c4 29 seconds ago |0 /bin/sh -c touch /stamp 0B
bfbec70f8488 3 months ago /bin/sh -c #(nop) CMD ["bash"] 0B
3 months ago /bin/sh -c #(nop) ADD file:8b1e79f91081eb527… 80.4MB
If an image is rebuild docker
first checks, if the previous layer can be reused:
It is when the both the base layer and the Dockerfile
command are unchanged.
- For
ADD
andCOPY
the content of the copied files is also checked for changes, so that changes in the context will end up in your rebuilt. - On the other hand commands like
RUN
may not get re-executed: Thetouch
-command above defaults to the current time. The file/stamp
above will carry the time-stamp, when the image was first created. On subsequent buildsdocker
detects, that theRUN
command is unchanged and will reuse the original layer with the original time-stamp.
You can see this, when you re-built the image from above:
# echo >> temp/Dockerfile
# docker build -t empty temp/
Sending build context to Docker daemon 2.048kB
Step 1/3 : FROM debian:bullseye-slim
---> bfbec70f8488
Step 2/3 : RUN touch /stamp
---> Using cache
---> d3831cac03c4
Step 3/3 : ADD Dockerfile /
---> 75816d4c2d28
Successfully built 75816d4c2d28
Successfully tagged empty:latest
# docker image history empty
IMAGE CREATED CREATED BY SIZE COMMENT
75816d4c2d28 6 seconds ago /bin/sh -c #(nop) ADD file:ad97b7631b5bf96d6… 61B
d3831cac03c4 13 minutes ago |0 /bin/sh -c touch /stamp 0B
bfbec70f8488 3 months ago /bin/sh -c #(nop) CMD ["bash"] 0B
3 months ago /bin/sh -c #(nop) ADD file:8b1e79f91081eb527… 80.4MB
docker
reuses the original layer d3831cac03c4
and only the top-most layer gets replaced from e1eddc7fc1d8
to 75816d4c2d28
.
Therefore it is important to write efficient and correct Dockefiles
:
- Put steps which change infrequently first like installing base packages from Debian or PyPI: If you dependency do not change often these layers stay unchanged for a long time and can be re-used many time speeding up many builds.
- Add your ever changing code as late as possible: That way only the last few layers need to be re-build each time.
- Be careful with volatile data and when the current time is important:
Your
apt-get update && apt-get install
command may not pick up the latest package versions asdocker
decides to re-use the layer from a previous build. - Minimize the number of
RUN
commands: each command adds an additional layer to your image, which must be downloaded. It also may lead to strange caching issues and may increase the net image size unnecessarily due to temporary data: For exampleapt-get update
will downloaded the Debian package index files and store them below/var/lib/apt/lists/
. Your final image probably will not need them as you don’t expect your users to do aapt-get install
within your image. On the other hand you will get into trouble yourself, when you re-build the image and do thatapt-get install
yourself in a later step as your index files might be out-of-date by than and do no longer match what is on the ever changing Debian repository servers. - If you ever get into caching issues (on your own host) delete any previous image from your
dockerd
by usingdocker image rm …
: That way the old layers will be deleted too anddocker build
will no longer find the layers cached locally.
Docker-in-Docker
With Docker-in-docker you get a pristine dockerd
for each build:
It is empty and thus does not have any previous images and thus layers cached.
Thus a build will always starts from scratch, which might be slow.
Therefore you can use docker build --cache-from=…
to load a previous version of your image, which will pull in all the old layers for caching.
Kaniko
The architecture of Kaniko is vastly different as there is no central dockerd
for caching any layers.
But 2 caching modes are supported:
--cache=true
enabled Caching Layers, wherekaniko
builds an artificial image, which can be pushed to a remote image registry. These are no regular OCI images: they re-use the data structure and infrastructure for images, but you cannot execute them withdocker run
or similar. You must not push these images todocker-registry.knut.univention.de
as that registry has no automatic cache cleanup mechanism to remove old and unused layers/images. Usegitregistry.knut.univention.de
instead but make sure to setup Cleanup policy belowSetting → Packages and Registries → Clean up image tag
with a sensible retention period for images namedcache
. A separate registry can be specified via the option--cache-repo
. By default Kaniko does not cache COPY layers, which must be enabled explicitly via the option--cache-copy-layers
if desired.- With Caching Base Images layers are cached locally in a directory, but this is a pain to setup: it requires an extra step where a separate command has to be used to warm the cache. This is not integrated into Kaniko itself and extra care has to be taken to not create any concurrency issues.
Currently caching is not enabled by default in our kaniko.yml
as setting up the cache retention policy is not automated.
You can enable it yourself after reading the warning above by passing the extra argument via the pipeline variable KANIKO_ARGS: --cache=true
.
It is already documented in kaniko.md
.
See univention/ucs>ucslint for a real-world example.