GitLab on Kubernetes
Continuing my GitLab and Kubernetes (k8s) odyssey from k8s @ Debian I’ve learned two things:
- DNS problems with Alpine Linux
- SSL certificate hell
DNS problems with Alpine Linux
Many Docker images are based on Alpine Linux, which is smaller than most Debian images.
But the use a different C library names
musl, which is smaller but more restricted then
Among others it uses a different DNS resolver, which sometimes fails for me.
The symptoms are that the GitLab runner fails the initial
Getting source from Git repository Fetching changes with git depth set to 50... Initialized empty Git repository in /builds/work/XXX/.git/ Created fresh repository. fatal: unable to access 'https://git.XXX.XXX.de/XXX/XXX.git/': Could not resolve host: git.XXX.XXX.de
Strangely this does not happen every time, but mostly after the cluster was idle for some time. Restarting the failed job again manually most often fixed the problem as then the host was resolvable.
There is a bug, where the initial DNS lookup is delayed for 5s and more. I can trigger this with the following command:
kubectl run -it --rm --restart=Never busybox --image=alpine:3.12 -- nslookup git.XXX.XXX.de ... ;; connection timed out; no servers could be reached ...
/etc/resolv.conf you will find this:
nameserver 169.254.25.10 search default.svc.cluster.local svc.cluster.local cluster.local pmhahn.XXX options ndots:5
ndots:5 leads to the following (strange) situation:
git.XXX.XXX.de has less then 5 dots, the resolver will try to resolve the following URLs:
Only the last will succeed, but the other ones will take time.
This should be avoidable by appending a trailing dot to the URL like
https://git.XXX.XXX.de./, but did not work for me.
clone_url = "https://git.XXX.XXX.de./" in
config.tomp did not improve the situation.
Our environment is IPv6 ready, but not every host has a IPv6 address. Namely our git-server does not have one.
When the name is resolved both IPv4 and IPv6 addresses are queried.
musl DNS resolver seems to perform the in parallel re-using the same UDP socket.
There seems to be a known problem with this as the Linux connection tracking code is confused by this.
NXDOMAIN answer for IPv6 is then used also for IPv4 and DNS resolution fails.
For the GitLab k8s runner there currently is no way to overwrite the
Therefore I setup a pre_clone_script to rewrite the
envVars: - name: RUNNER_PRE_CLONE_SCRIPT value: 'd="$(grep ^nameserver /etc/resolv.conf)";echo "$d" >/etc/resolv.conf'
This improved the situation for some time, but again I see failures.
nslookup experiment from above I’ve now extended to this:
envVars: - name: RUNNER_PRE_CLONE_SCRIPT value: 'd="$(grep ^nameserver /etc/resolv.conf)";echo "$d" >/etc/resolv.conf;for i in 1 2 3 4 5;do getent hosts git.XXX.XXX.de&&break;done'
which is supposed to retry DNS resolution 5 times before the real job starts. Probably this will not work for the helper image, but lets see.
PS: My k8s cluster is using calico CNI.
SSL certificate hell
You should secure your connections by using
But this is more complicated if you have a private installation and are using self-signed SSL certificates.
dockerdwill eventually pull images from you private registry.
- Your Gitlab k8s runner needs to communicate with your GitLab server.
- Scripts running in your pipeline may want to communicate with those services.
Getting this to work in all situations is a long process:
Docker on host
ERROR: Job failed: image pull failed: Back-off pulling image “docker-registry.XXX.XXX.de/phahn/ucs-minbase:latest”
install -m 0444 ucs-root-ca.crt /usr/local/share/ca-certificates/ update-ca-certificates systemctl restart docker.service
Docker in docker
To be able to build new Docker images with GitLab k8s runners, which are themselves running as Docker containers, I’m using Docker in Docker (dind).
Normally you can directly use the Docker image
docker:dind for that, but that one does not have your CA certificate.
Therefore I build my own with the following
FROM docker:dind COPY ucs-root-ca.crt /usr/local/share/ca-certificates/ca.crt RUN update-ca-certificates
The image is built by this:
docker build -t docker-registry.XXX.XXX.de/ucs/docker:dind .
It is used in
.gitlab-ci.yaml like this:
.docker: services: - name: docker-registry.XXX.XXX.de/ucs/docker:dind alias: docker variables: DOCKER_HOST: tcp://docker:2375/ DOCKER_DRIVER: overlay2 tags: - docker image: docker:stable some job: extends: .docker script: - docker build .
k8s GitLab runner
A k8s GitLab runner consists of multiple containers:
- The permanent container registered with GitLab
- A container for each build job as specified with image:.
- Additional containers for the services as specified with services:.
- A helper container to handle Git, artifacts and cache operations.
Previously I was using this in
envVars: - name: CI_SERVER_TLS_CA_FILE value: /home/gitlab-runner/.gitlab-runner/certs/ucs-root-ca.crt - name: CONFIG_FILE value: /home/gitlab-runner/.gitlab-runner/config.toml
This worked for cloning and most other operations, but failed for LFS:
LFS: Get https://git.XXX.XXX.de/XXX/XXX/ansible-playbooks.git/gitlab-lfs/objects/e16be6c5cd3e3b7e79dd09b3dd0662cdb4d70df7d1d38e56c3a6504e132e9f06: x509: certificate signed by unknown authority
runners: config: | [[runners]] clone_url = "https://git.XXX.XXX.de./" [runners.kubernetes] image = "docker-registry.XXX.XXX.de/phahn/ucs-minbase:latest" imagePullPolicy = "always" privileged = true [[runners.kubernetes.volumes.secret]] name = "ca" mount_path = "/etc/gitlab-runner/certs/" read_only = true [runners.kubernetes.volumes.secret.items] "ucs-root-ca.crt" = "ca.crt"
ca referred to the k8s secret, which I already use for
# kubectl describe secret ca Name: ca Namespace: default ... Type: Opaque ... Data ==== ucs-root-ca.crt: 2537 bytes
For Debian based images inside the pipeline:
apt-get update -qq apt-get install -q --assume-yes ca-certificates install -m 0444 ucs-root-ca.crt /usr/local/share/ca-certificates/ update-ca-certificates