Rolling updates
- Introduction
- Rolling updates
- Checking current rollout parameters
- Rolling updates in practice
- Building a new version of the
worker
service - Rolling out the new
worker
service - Give it some time
- Rolling out something invalid
- What's going on with our rollout?
- The nitty-gritty details
- Checking the dashboard during the bad rollout
- Recovering from a bad rollout
- Changing rollout parameters
- Applying changes through a YAML patch
Introduction
By default (without rolling updates), when a scaled resource is updated:
new pods are created
old pods are terminated
... all at the same time
if something goes wrong, ¯\_(ツ)_/¯
Rolling updates
With rolling updates, when a resource is updated, it happens progressively
Two parameters determine the pace of the rollout:
maxUnavailable
andmaxSurge
They can be specified in absolute number of pods, or percentage of the
replicas
countAt any given time ...
there will always be at least
replicas
-maxUnavailable
pods availablethere will never be more than
replicas
+maxSurge
pods in totalthere will therefore be up to
maxUnavailable
+maxSurge
pods being updated
We have the possibility to rollback to the previous version
(if the update fails or is unsatisfactory in any way)
Checking current rollout parameters
- Recall how we build custom reports with
kubectl
andjq
:
Exercise
Show the rollout plan for our deployments:
kubectl get deploy -o json | jq ".items[] | {name:.metadata.name} + .spec.strategy.rollingUpdate"
Rolling updates in practice
As of Kubernetes 1.8, we can do rolling updates with:
deployments
,daemonsets
,statefulsets
Editing one of these resources will automatically result in a rolling update
Rolling updates can be monitored with the
kubectl rollout
subcommand
Building a new version of the worker
service
Exercise
Go to the
stack
directory:cd ~/container.training/stacks
Edit
dockercoins/worker/worker.py
; update the firstsleep
line to sleep 1 secondBuild a new tag and push it to the registry:
#export REGISTRY=localhost:3xxxx export TAG=v0.2 docker-compose -f dockercoins.yml build docker-compose -f dockercoins.yml push
Rolling out the new worker
service
Exercise
Let's monitor what's going on by opening a few terminals, and run:
kubectl get pods -w kubectl get replicasets -w kubectl get deployments -w
Update
worker
either withkubectl edit
, or by running:kubectl set image deploy worker worker=$REGISTRY/worker:$TAG
That rollout should be pretty quick. What shows in the web UI?
Give it some time
At first, it looks like nothing is happening (the graph remains at the same level)
According to
kubectl get deploy -w
, thedeployment
was updated really quicklyBut
kubectl get pods -w
tells a different storyThe old
pods
are still here, and they stay inTerminating
state for a whileEventually, they are terminated; and then the graph decreases significantly
This delay is due to the fact that our worker doesn't handle signals
Kubernetes sends a "polite" shutdown request to the worker, which ignores it
After a grace period, Kubernetes gets impatient and kills the container
(The grace period is 30 seconds, but can be changed if needed)
Rolling out something invalid
- What happens if we make a mistake?
Exercise
Update
worker
by specifying a non-existent image:export TAG=v0.3 kubectl set image deploy worker worker=$REGISTRY/worker:$TAG
Check what's going on:
kubectl rollout status deploy worker
Our rollout is stuck. However, the app is not dead.
(After a minute, it will stabilize to be 20-25% slower.)
What's going on with our rollout?
Why is our app a bit slower?
Because
MaxUnavailable=25%
... So the rollout terminated 2 replicas out of 10 available
Okay, but why do we see 5 new replicas being rolled out?
Because
MaxSurge=25%
... So in addition to replacing 2 replicas, the rollout is also starting 3 more
It rounded down the number of MaxUnavailable pods conservatively,
but the total number of pods being rolled out is allowed to be 25+25=50%
The nitty-gritty details
We start with 10 pods running for the
worker
deploymentCurrent settings: MaxUnavailable=25% and MaxSurge=25%
When we start the rollout:
- two replicas are taken down (as per MaxUnavailable=25%)
- two others are created (with the new version) to replace them
- three others are created (with the new version) per MaxSurge=25%)
Now we have 8 replicas up and running, and 5 being deployed
Our rollout is stuck at this point!
Checking the dashboard during the bad rollout
If you haven't deployed the Kubernetes dashboard earlier, just skip this slide.
Exercise
Check which port the dashboard is on:
kubectl -n kube-system get svc socat
Note the
3xxxx
port.
Exercise
- Connect to http://oneofournodes:3xxxx/
- We have failures in Deployments, Pods, and Replica Sets
Recovering from a bad rollout
We could push some
v0.3
image(the pod retry logic will eventually catch it and the rollout will proceed)
Or we could invoke a manual rollback
Exercise
Cancel the deployment and wait for the dust to settle down:
kubectl rollout undo deploy worker kubectl rollout status deploy worker
Changing rollout parameters
We want to:
- revert to
v0.1
- be conservative on availability (always have desired number of available workers)
- go slow on rollout speed (update only one pod at a time)
- give some time to our workers to "warm up" before starting more
- revert to
The corresponding changes can be expressed in the following YAML snippet:
spec:
template:
spec:
containers:
- name: worker
image: $REGISTRY/worker:v0.1
strategy:
rollingUpdate:
maxUnavailable: 0
maxSurge: 1
minReadySeconds: 10
Applying changes through a YAML patch
We could use
kubectl edit deployment worker
But we could also use
kubectl patch
with the exact YAML shown before
Exercise
Apply all our changes and wait for them to take effect:
kubectl patch deployment worker -p " spec: template: spec: containers: - name: worker image: $REGISTRY/worker:v0.1 strategy: rollingUpdate: maxUnavailable: 0 maxSurge: 1 minReadySeconds: 10 " kubectl rollout status deployment worker kubectl get deploy -o json worker | jq "{name:.metadata.name} + .spec.strategy.rollingUpdate"